wow-inequalities/manuscript/article.qmd

---
title: Inequalities in the World of Work
subtitle: What do we know?
author:
    - name: Miguel Niño-Zarazúa
      email: mn39@soas.ac.uk
      affiliations:
        - id: soas
          name: SOAS University of London
          department: Department of Economics
      attributes:
        corresponding: true
    - name: Marty Oehme
      email: mail@martyoeh.me
date: last-modified
abstract: |
    We are researching the effectiveness of policies which target inequalities on the labour market.
keywords:
    - labour markets
    - world of work
    - inequality
    - policy
    - systematic scoping review
lang: en
crossref: # to fix the appendix crossrefs being separate from main
  custom:
    - kind: float
      key: appatbl
      latex-env: appatbl
      reference-prefix: Table A
      space-before-numbering: false
      latex-list-of-description: Appendix A Table
    - kind: float
      key: appbtbl
      latex-env: appbtbl
      reference-prefix: Table B
      space-before-numbering: false
      latex-list-of-description: Appendix B Table
---

```{python}
#| label: standard-imports
#| echo: false
#| output: false
import src.globals as g
from IPython.display import display, Markdown, HTML
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tabulate import tabulate
import seaborn as sns
sns.set_style("whitegrid")
```

```{python}
#| label: load-dataframes
#| echo: false
#| output: false
from src import df, df_by_intervention, validities
```

{{< portrait >}}

# Introduction

* Context and statement of the problem
* Aims and rationale of the systematic scoping review
* Summary of the main findings
* Description of the structure of the paper

# Conceptual framework

* Theories, policies, mechanisms and outcomes

# Review methodology

The following section will discuss the methodology that is proposed to conduct the review of the literature on policy interventions that are expected to address inequalities in forms of work and labour market outcomes,
as well as give an overview of the collected data.
This study follows the principles of a systematic review framework, to systematically assess the impact of an array of policies on inequalities in the world of work.
It strives to follow the clear and reproducible method of identification prior to synthesis of relevant research,
while limiting "bias by the systematic assembly, critical appraisal and synthesis" through applying scientific strategies to the review itself [@Cook1995].
It thereby attempts to provide an improved basis for comparative analysis between studies through the rigorous application of systematic criteria and thus to avoid the potential bias of narrative reviews.

Unlike purely systematic reviews which typically focus on specific policy questions and interventions, systematic scoping reviews focus on a wider spectrum of policies, where different study designs and research questions can be investigated.
Since scoping reviews allow both broad and in-depth analyses, they are the most appropriate rigorous method to make a synthesis of the current evidence in this area [@Arksey2005].

The scoping review allows broad focus to be given to a subject for which no unified path with clear edges has been laid out yet by prior reviews, as remains the case with policies targeting inequalities in the world of work.
It does so through a breadth-first approach through a search protocol which favours working through a large body of literature to subsequently move toward a depth-favouring approach once the literature has been sufficiently delimited.
Its purpose, clearly mapping a body of literature on a (broad) topic area, is thereby useful on its own or in combination with a systematic approach [@Arksey2005].
With an increasingly adopted approach in recent years, with rigorous dichotomy of inclusion and exclusion criteria it provides a way of charting the relevance of literature related to its overall body that strives to be free of influencing biases which could affect the skew of the resulting literature sample [@Pham2014].

## Inclusion criteria

Concise narrowing criteria are applied to restrict the sample to studies looking at i) the effects of individual evidence-based policy measures or intervention initiatives ii) attempting to address a single or multiple of the defined inequalities in the world of work.
iii) using appropriate quantitative methods to examine the links of intervention and impact on the given inequalities.
The narrowing process makes use of the typology of inequalities, of forms of work, and of policy areas introduced above as its criteria.

An overview of the respective criteria used for inclusion or exclusion can be found in @tbl-inclusion-criteria.
It restricts studies to those that comprise primary research published after 2000,
with a focus on the narrowing criteria specified in @tbl-inclusion-criteria.

::: {#tbl-inclusion-criteria}

```{python}
inclusion_criteria = pd.read_csv(f"{g.SUPPLEMENTARY_DATA}/inclusion-criteria.tsv", sep="\t")
Markdown(tabulate(inclusion_criteria, showindex=False, headers="keys", tablefmt="grid"))
```

Source: Author's elaboration

Study inclusion and exclusion scoping criteria

:::

## Search protocol

The search protocol follows a three-staged process of execution: identification, screening and extraction.
First, in identification, the relevant policy, inequality and world of work related dimensions are combined through Boolean operators to conduct a search through the database repository Web of Science and supplemental searches via Google Scholar to supply potential grey literature.
While the resulting study pools could be screened for in multiple languages, the search queries themselves are passed to the databases in English-language only.
Relevant results are then complemented through the adoption of a 'snowballing' technique,
in which an array of identified adjacent published reviews is analysed for their reference lists to find cross-references of potentially missing literature and in turn add those to the pool of studies.

To identify potential studies and create an initial sample, relevant terms for the clusters of world of work, inequality and policy interventions have been extracted from the existing reviews as well as the ILO definitions.[^existingreviews]

[^existingreviews]: TODO: citation of existing reviews used; ILO definitions if mentioned

Identified terms comprising the world of work can be found in the Appendix tables @appatbl-wow-terms, @appatbl-intervention-terms, and @appatbl-inequality-terms,
with the search query requiring a term from the general column and one other column of each table respectively.
Each cluster is made up of a general signifier (such as “work”, “inequality” or “intervention”) which has to be labelled in a study to form part of the sample,
as well as any additional terms looking into one or multiple specific dimensions or categories of these signifiers (such as “domestic” work, “gender” inequality, “maternity leave” intervention).
For the database query, a single term from the respective general category is required to be included in addition to one term from any of the remaining categories.

Second, in screening, duplicate results are removed and the resulting literature sample is sorted based on a variety of excluding characteristics based on:
language, title, abstract, full text and literature supersession through newer publications.
Properties in these characteristics are used to assess an individual study on its suitability for further review in concert with the inclusion criteria mentioned in @tbl-inclusion-criteria.

To facilitate the screening process, with the help of 'Zotero' reference manager a system of keywords is used to tag individual studies in the sample with their reason for exclusion,
such as 'excluded::language', 'excluded::title', 'excluded::abstract', and 'excluded::superseded'.
This keyword-based system is equally used to further categorize the sample studies that do not fall into exclusion criteria, based on primary country of analysis, world region, as well as income level classification.
To that end, a 'country::', 'region::' and 'income::' are used to disambiguate between the respective characteristics, such as 'region::LAC' for Latin America and the Caribbean, 'region::SSA' for Sub-Saharan Africa; as well as for example 'income::low-middle', 'income::upper-middle' or 'income::high'.
These two delineations follow the ILO categorizations on world regions and the country income classifications based on World Bank income groupings [@ILO2022].

Similarly, if a specific type of inequality, or a specific intervention, represents the focus of a study, these will be reflected in the same keyword system (such as 'inequality::income' or 'inequality::gender').
The complete process of identification and screening is undertaken with the help of the Zotero reference manager.
Last, for extraction, studies are screened for their full-texts, irrelevant studies excluded with 'excluded::full-text' as explained above and relevant studies then ingested into the final sample pool.

Should any literature reviews be identified as relevant during this screening process,
they will in turn be crawled for cited sources in a 'snowballing' process.
The sources will be added to the sample to undergo the same screening process explained above,
ultimately resulting in the process represented in the PRISMA chart in @fig-prisma.

```{mermaid}
%%| label: fig-prisma
%%| fig-cap: PRISMA flowchart for scoping process
%%| file: ../data/processed/prisma.mmd
```

All relevant data concerning both their major findings and statistical significance are then extracted from the individual studies into a collective results matrix.
The results to be identified in the matrix include a study's: i) key outcome measures (dependent variables), ii) main findings, iii) main policy interventions (independent variables), iv) study design and sample size, v) dataset and methods of evaluation, vi) direction of relation and level of representativeness, vii) level of statistical significance, viii) main limitations.

```{python}
from src.model import prisma
p = prisma.PrismaNumbers()
```

The query execution results in an initial sample of
`{python} p.raw_db`
potential studies identified from the database search as well as
`{python} p.raw_snowball`
potential studies from other sources,
leading to a total initial number of
`{python} p.raw_full`.
This accounts for all identified studies without duplicate removal, without controlling for literature that has been superseded or applying any other screening criteria.
Of these,
`{python} p.dedup_full - p.out_title - p.out_abstract - p.out_language`
have been identified as potentially relevant studies for the purposes of this scoping review and selected for a full text review,
from which in turn
`{python} p.final_extracted`
have ultimately been extracted.

@fig-intervention-types shows the predominant interventions contained in the reviewed literature.
Overall, there is a focus on measures of minimum wage, subsidisation, considerations of trade liberalisation and collective bargaining, education and training.
The entire spread of policies captures interventions aimed primarily at institutional and structural mechanisms, but also mechanisms focused on individual agency.

```{python}
#| label: fig-intervention-types
#| fig-cap: Available studies by primary type of intervention

sort_order = df_by_intervention["intervention"].value_counts().index

fig = plt.figure()
fig.set_size_inches(6, 3)
ax = sns.countplot(df_by_intervention, x="intervention", order=df_by_intervention["intervention"].value_counts().index)
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")
plt.show()
del sort_order, fig, ax
```

# Synthesis of evidence

Since policies employed in the pursuit of increased equality can take a wide form of actors, strategy approaches and implementation details,
the following synthesis will first categorise between the main thematic area and its associated interventions.
Individual observations are then descriptively distinguished between for the primary outcome variables (inequalities) of interest.
Thus, in the following synthesis each reviewed study will be analysed through the primary policies or mechanisms they use as independent variables to analyse the effects on a variety of inequalities.

One of the primary lenses of inequality in viewing policy interventions to reduce inequalities in the world of work is that of income,
often measured for all people throughout a country (vertical inequality) or subsets thereof (horizontal inequality).
At the same time, the primacy of income should not be overstated as disregarding the intersectional nature of inequalities could lead to diminished intervention outcomes through adverse targeting.

Each main thematic area will be preceded by a table presenting a summary of findings for the respective policies,
their identified channels and an estimation of their strength of evidence base.
Afterwards, the analytical lens will be inverted for the discussion and the reviewed studies discussed from a perspective of their analysed inequalities and limitations,
to better identify areas of strong analytical lenses or areas of more limited analyses.

## Institutional factors

{{< portrait >}}

::: {#tbl-findings-institutional}

```{python}
#| label: tbl-findings-institutional
from src.model import validity

study_strength_bins = {
    0.0: r"\-",
    5.0: r"\+",
    10.0: r"\++",
}


def strength_for(val):
    return list(study_strength_bins.keys())[
        list(study_strength_bins.values()).index(val)
    ]


findings_institutional = pd.read_csv(f"{g.SUPPLEMENTARY_DATA}/findings-institutional.csv")

outp = Markdown(
    tabulate(
        validity.add_to_findings(
            findings_institutional, df_by_intervention, study_strength_bins
        )[
            [
                "area of policy",
                "internal_validity",
                "external_validity",
                "findings",
                "channels",
            ]
        ].fillna(""),
        showindex=False,
        headers=[
            "area of policy",
            "internal strength",
            "external strength",
            "main findings",
            "channels",
        ],
        tablefmt="grid",
    )
)
del findings_institutional
outp
```

Note: Each main finding is presented with an internal strength of evidence and an external strength of evidence which describe the combined validities of the evidence base for the respective finding.
Validities are segmented to a weak (-) evidence base under a validity ranking of `{python} strength_for(r"\+")`,
evidential (+) from `{python} strength_for(r"\+")` and under `{python} strength_for(r"\++")` and strong evidence base (++) for `{python} strength_for(r"\++")` and above.

Summary of main findings for institutional policies

:::

{{< landscape >}}

### Labour regulation and paid leave

<!-- maternity leave and benefits -->
@Dustmann2012 analyse the long-run effects of a series of increases in the period of paid leave for mothers in Germany,
first up to 18 months and then extending unpaid leave up to 36 months.
Though primarily focused on children's outcomes,
it also analyses the policy's effects on the return to work rates and cumulative incomes of the mothers.[^dustmann-childoutcomes]
While increases of paid leave periods (up to 6 months) significantly increased incomes,
longer periods (up to 10 months) saw a decrease with marginal significance for low-income mothers.
Further increases, including the unpaid but job-protected increase to 36 months,
significantly decreased cumulative incomes across income brackets.[^cumulative]
For those returning to work, there is a significant increase in the months away from work among all wage segments for all paid leave period increases,
roughly corresponding to the respective provided leave length.
Still, similar numbers of mothers return once the leave period ends,
with significant decreases for the longer leave periods from 18 to 36 months.
Some limitations of the study include its sample being restricted to mothers who go on maternity leave and some control group identification restrictions possibly introducing some sampling bias.

[^cumulative]: Cumulative income being defined as the sum of mother's income until the child is 40 months old, combined from monthly earnings if working or monthly child benefit if not working but eligible for paid leave.

[^dustmann-childoutcomes]: For its analysis of long-term educational outcomes on children, however, it does not find any evidence for the expansions improving children's outcomes, even suggesting a possible decrease of educational attainment for the paid leave extension to 36 months. The authors suggest that the negative effect for children under the long-term paid leave program of 36 months may stem from the fact that children require more external stimuli (aside from the mother) before this period ends, as well as the negative long-term effects of the mother's significantly reduced income for the long-term leave periods.

@Mun2018, taking a look at hiring discrimination due to introducing maternity leave laws in Japan,
find similar results:[^laws-japan]
no increase in hiring discrimination or job promotions was visible and the laws in fact had a partly positive impact on job promotions.
They argue these positive impacts may predominantly be due to voluntary firm compliance to maintain positive reputations,
arguing for an incentive-based approach over mandated ones though no causal analysis was undertaken.[^welfare-paradox]
Their analysis focused on women in managerial positions which may bias findings away from lower income brackets.

[^laws-japan]: The study focuses on the 1992 introduced Childcare Leave Act which, as the first major childcare policy, mandated one year childcare leave per child for both men and women,
and the 2005 introduced Act on Advancement of Measures to Support Raising Next-Generation Children which focused on yielding incentives for companies to provide paid leave to at least 70 percent of its female employees and have at least one male employee taking paid leave.

[^welfare-paradox]: These results run contrary to notions of demand-side mechanisms of the welfare state paradox, with women being less represented in high-authority employment positions due to hiring or workplace discrimination against them with increased maternity benefits. The authors suggest that the welfare paradox may rather be due to supply-side mechanisms, based on individual career planning, as well as reinforced along existing gender divisions of household labour which may increase alongside the laws.

@Broadway2020 study the introduction of universal paid maternal leave in Australia,
analysing the impacts on mothers' return to work as well as the conditions they return under.
They also find a short-term decrease of mothers returning to work since they make use of the introduced leave period,
but a long-term (after six to nine months) significant positive impact on return to work.
Furthermore, there is a positive impact on returning to the same job and under the same conditions,
the effects of which are stronger for more disadvantaged mothers.[^aus-disadvantaged]
This suggests that the intervention reduced the opportunity costs for delaying the return to work,
especially for those women that did not have employer-funded leave options.
The study cannot account for child-care costs or completely exclude selection bias into motherhood or through exogenous shocks.

[^aus-disadvantaged]: Disadvantages measured as a combination of income, education and access to employer-funded leave.

@Davies2022 focus on the difference in return to work ratios between working under fixed-term and open-ended contracts for high-skill women working in UK public universities.
There is both a significantly decreased return to work probability for those with fixed-term contracts,
and most universities provide policies with more limited access to maternity payment for fixed-term contracted staff.
The results suggest strict payment and repayment policies for early contract termination and the requirement for long-term service to qualify for enhanced maternity benefits may deter utilization under fixed-term contracts. Additionally, significant internal heterogeneity exists regarding maternity policy documents,
with few offering favourable conditions within fixed-term contracts.

@Adams2015 examine the macro-level relationships between business and credit regulations, labour laws and income inequality in developing countries from 1970 to 2012.
In MENA, SSA, LAC and to some extent AP, they find stricter labour and business regulations actually negatively related to equitable income distribution,
with market regulation having no significant impacts.
They identify lacking institutional capability to accomplish regulatory policies optimized for benefits in developing countries and see the need for policies aimed at more specific targeting of inequality reduction.[^adams-targeting]
The study also analyses the effects of FDI and school enrolment which are reviewed in their respective sections,
though its focus remains primarily on regional trends rather than individual factors as causes for inequality.

[^adams-targeting]: The authors furthermore suggest that regulatory policy in developing countries thus needs to be built specifically for their individual contexts and can not be exported in unaltered form from developed countries due to different structural make-up and institutional capabilities.

### Minimum wage laws

Studies focusing on minimum wage effects further delineate themselves into ones that look at the effects on a national level such as @Wong2019, @Alinaghi2020 and @Chao2022,
and studies which specifically take sub-national spatial effects into account.
@Wong2019 specifically focuses analysis on the impacts on income and hours worked of low-wage earners,
finding that, generally, there was a significant positive effect on income and on waged workers' hours worked,
which can in turn reflect positively on the country's equitable income distribution.
At the same time, potential negative effects on the income of high earners are identified,
suggesting an income-compression effect as employers freeze or reduce high-earners wages to offset low-earners raised floors.
The findings hide internal heterogeneity, however:
For hours worked there is a significant negative impact on women,
potentially pointing to a decreased intensive margin for female workers.[^wong-limits]
For income the effect is largest for agricultural workers,
while for women the effect is significantly smaller than the overall sample,
possibly also affected by the decrease in hours worked.
Thus, while overall income inequality seems well targeted in the intervention,
it may exacerbate the gender gap that already existed at the same time.

[^wong-limits]: The study can only analyse effects during a period of economic growth for the country, which, combined with some sort-dependency in the panel data, may introduce a form of unobservable exogenous bias into this finding.

@Chao2022, looking at the effects in a sample of 43 countries including LMIC and HIC,
find strong short-term and long-term differences in outcomes:
In the short term minimum wage introductions lead to a reduction of the skilled-unskilled wage gap,
however an increase in unemployment and welfare,
while in the long term the results are an overall decrease in wage inequality as well as improved social welfare.[^chao-indicator]
It finds those results primarily stem from LMIC which experience significant effects driven by a long-term firm exit from urban manufacturing sectors,
thereby increasing available capital for the rural agricultural sector,
while in HIC the results largely remain insignificant.
Some limitations of the study include the necessity to omit short-term urban firm exit for the effects to remain significant,
as well as requiring the prior assumption of decreased inequality through increased rural agricultural capital.

[^chao-indicator]: To identify the overall income inequality within the countries, the study primarily utilizes the Gini coefficient.

@Alinaghi2020 conduct a microsimulation to estimate the effects of a minimum wage increase in New Zealand on overall income inequality and further disaggregate along gender and poverty lines.
It finds limited redistributional effects for the policy, with negligible impact on overall income inequality and the possibility of actually increasing inequalities among lower percentile income households.
The authors caution against overestimation of the results' generalisability due to large sample weights possibly biasing results towards sole parent outcomes.
While the effects on poverty measures overall also remain insignificant for sole parents,
it does find significant poverty reduction for sole parents which are in employment.
The authors suggest these findings point to bad programme targeting,
which at best has negligible positive impact on income equality and at worst may worsen income inequality for lower income households,
as low-wage earners are often the secondary earners in higher-income households but low-wage households often have no wage earners at all.

Looking at the effect of increases in Romania,
@Militaru2019 find that minimum wage increases generally correlate with a small wage inequality decrease,
and also carry a larger positive impact for women.
They identify a two-fold mechanism which increases the number of waged workers in the total number of employees and mainly concentrates benefits for workers at the minimum income level.[^militaru-limits]
They also suggest this being the probable channel for larger impacts on female workers since they make up larger parts of low-income and minimum wage households in Romania.

[^militaru-limits]: One limitation of the study may be the over-representation of employees in the sample, as well as not being able to account for tax evasion or other behavioural changes in the model.

<!-- non-spatial policy but spatial effects -->
Turning to studies which take into account spatial effects between different regions,
@Gilbert2001 similarly find insignificant effects on income inequality in the UK,
agreeing with the results of @Chao2022.
However, the effects for rural areas differ depending on their proximity to urban areas.
While rural areas which are accessible to urban markets are less affected resulting in similarly insignificant impacts,
more remote rural households experience almost double the reduction in inequality,
which the authors argue points to effective targeting of the policy.
For the results to hold, the study has to assume no significant effects on employment after the enactment of the minimum wage.

Analysing both the effects of minimum wage and direct cash transfers in Brazil,
@SilveiraNeto2011 also focus on the spatial impacts within the country.
Incomes between regions have converged during the time frame and overall the cash transfers under the 'Bolsa Familia' programme and minimum wage are identified as accounting for 26.2% of the effect.
Minimum wage contributed 16.6% of the effect to overall Gini reduction between the regions while cash transfers accounted for 9.6% of the effect.
The authors argue that this highlights the way even ostensibly non-spatial policies can have a positive
(or, with worse targeting or selection, negative) influence on spatial inequalities,
as transfers occurring predominantly to poorer regions and minimum wages having larger impacts in those regions created quasi-regional effects without forming explicit part of the policies.[^silveiraneto-limits]

[^silveiraneto-limits]: For the analysis, minimum wage effects had to be constructed from the effects that wages equal to the minimum wage had, and cash transfer impacts could only be estimated for the end-line analysis.

On the other hand also in Brazil, @Sotomayor2021,
looking at the poverty and inequality outcomes of subsequent minimum wage floor increases,
finds a poverty reduction by 2.8% and income inequality reduction by 2.4% in the short term (3 months).
In the long term the results largely agree with @SilveiraNeto2011,
finding that minimum wage increases show diminishing returns where the legal minimum is already high in relation to median earnings.
Overall the study finds additional unemployment costs --
created through new job losses through the introduction --
are offset by the increased benefits found in higher wages for workers.
The author suggests an inelastic relationship between increases and poverty incidence,
with the limitation that the data can only track individual dwellings (instead of household connected to their inhabitants) and thus both resembles repeat cross-sectional data more than panel data,
and is not able to account for people or households moving to new dwellings.

### Collective bargaining

@Alexiou2023 take a macro-level perspective and investigate the impact of governmental party political orientation and trade unionisation levels on income inequality across countries.
The findings indicate a negative correlation between strong unionisation and income inequality,
attributed to enhanced political power redistribution via collective action in national contexts of powerful unions.
Regions with weak unionisation have higher income inequality post-redistribution,
also generally indicating a propensity towards uneven redistributive policies.[^alexious-rightwing]

[^alexious-rightwing]: The study observes a positive association between right-leaning governments and income inequality, whereas centrists exhibit varied outcomes, hinting at possible inconsistencies in their redistributive strategies. However, the study can not directly identify the causal factors within these relationships.

@Ahumada2023, taking the opposite approach,
explore how imbalanced political power distributions affect the availability and strength of collective labor rights.[^ahumada-approach]
Generally, they concur that contexts characterized by significant power disparities weaken opportunities for collective bargaining,
primarily due to either more restricted or disregarded labour rights coupled with less deeply rooted trade unionism.
In contrast, well establishes unionism curtails employers' lobbying efforts and make them susceptible to governments' divide-and-conquer strategies,
being more separate and less coordinated.

[^ahumada-approach]: The study employes a mix of quantitative global comparisons and qualitative analyses more specifically focused on Argentina and Chile. Thus, the strong institutional context of the two countries provides an analytical background which makes its qualitative analysis more difficult to generalize the quantitative findings.

Focusing on the intersection between collective organisation and gender more specifically,
@Dieckhoff2015 examine the influence of trade unionisation on gender inequalities within European labour markets.
The study establishes a positive link between unionisation rates and the likelihood of standard employment contracts for both genders.
While it finds no direct advantage for men solely through increased unionisation,
analysis in combination with temporary contracts and family policy reforms sees men experiencing greater benefits than women.
There is no absolute detrimental effect for either gender as women's employment in standard contracts remains stable, however,
it may be one factor towards an increase in relative inequality for women which would agree with the findings of @Davies2022.[^dieckhoff-limit]

[^dieckhoff-limit]: The study's causal explanatory power is limited somewhat by its aggregate approach across countries precluding analysis for nation-specific labour market contexts or to disaggregate the gender findings.

@Cardinaleschi2019 investigate turn to collective organisation's effects on the gender wage gap in Italy.
They identify occupational segregation as the principal cause of wage disparity as opposed to educational inequalities,
with women predominantly working in more 'feminized' industries.
While collective bargaining practices specifically targeting managerial representation and wages show some reduction in the wage gap,
the impact is only marginally significant.[^cardinaleschi-msg]
The authors suggest a stronger mix of policy approaches such as including human capital development through well targeted active labour market policies.

[^cardinaleschi-msg]: The marginal significance primarily stems from internal heterogeneity which only significantly affects the median part of wage distributions while the rest remains insignificant.

@Ferguson2015 specifically examines the relationship between unionisation and the representation of women and minority groups in management positions within U.S. companies.
It finds that while stronger unionisation is associated with higher representation of both in management and in the overall workforce,
the effects are only marginally significant.
Further, the study acknowledges potential confounding factors, such as selection biases,
should more union-friendly enterprises attract individuals who support diversity.[^ferguson-limit]

[^ferguson-limit]: The study bases its analysis on union elections, and thus can not exclude self-selection effects of people joining more heavily unionised enterprises rather than unionisation increasing representation.

### Workfare programmes

@Whitworth2021 analyse the repercussions of a UK work programme on spatial factors of job deprivation or opportunity increases.
Despite adopting a quasi-market model rewarding positive employment outcomes,
the study contends that the policy's non-spatial execution inadvertently exacerbates existing spatial disparities.
Applying concepts of "social creaming" and "parking" to spatial analysis,
the study shows that areas already suffering from job deprivation experience further deterioration under the programme.
Meanwhile, wealthier regions may receive beneficial impacts in an attempt to enhance programme performance metrics,
leading to the conclusion of bad targeting through neglecting spatial components.

@Li2022 conduct a study on the effects of existing inequalities on the outcomes of a work programme in India intended to provide job opportunity equality for already disadvantages population.[^li-nrega]
Using land ownership inequality as a proxy for initial inequality levels,
it finds a significant negative relationship to the provision of jobs through the programme.[^li-indicator]
Primarily the authors identify resistance from landlords against programme expansion as the underlying mechanism ---
its expansion often precedes wage hikes in the districts ---
as they leverage their disproportionate power to influence politics or diminish collective bargaining possibilities.

[^li-nrega]: The National Rural Employment Guarantee Scheme (NREGA) is a workfare programme implemented in India, the largest of its kind, which seeks to provide 100 days of employment for each household per year. It was rolled out from 2005 over several phases until it reached all districts in India in 2008.
[^li-indicator]: The study uses the Gini coefficient as an indicator for these initial conditions of ownership inequalities and thus concludes the programme being significantly compromised through higher pre-existing capital inequality. The findings also hold true when measuring land inequality as the share of land owned by the top 10 percent of holders.

### Social protection

<!-- TODO Should we include Pi2016 on social security? -->

<!-- social assistance benefits and wages -->
@Wang2016 undertake an observational study on the levels of social assistance benefits and wages in a national comparative study within 26 OECD countries.
It finds that real minimum income benefit levels generally increased in most countries from 1990 to 2009, with only a few countries, mostly in Eastern European welfare states, showing decreases during the time frame.
The majority of changes in real benefit levels are from deliberate policy changes and the study calculates them by a comparison of the changes in benefit levels to the changes in consumer prices.
Secondly, it finds that changes for income replacement rates are more mixed, with rates decreasing even in some countries which have increasing real benefits levels.
The study suggests this is because benefit levels are in most cases not linked to wages and policy changes also do not take changes in wages into account resulting in diverging benefit levels and wages, which may lead to exacerbating inequality gaps between income groups.

<!-- conditional cash transfer -->
@Debowicz2014 conduct a study looking at the impact of the cash transfer programme Oportunidades in Mexico, conditioned on a household's children school attendance, on income inequality among others.
It finds that a combination of effects raises the average income of the poorest households by 23 percent.
The authors argue in the short run this benefits households through the direct cash influx itself, as well as generating a positive wage effect benefitting those who keep their children at work.
For the estimation of income inequality it uses the Gini coefficient.
Additionally, over the long-term for the children in the model there is a direct benefit for those whose human capital is increased due to the programme, but also an indirect benefit for those who did not increase their human capital, because of the increased scarcity of unskilled labor as a secondary effect.
Due to the relatively low cost of the programme if correctly targeted, it seems to have a significantly positive effect on the Mexican economy and its income equality.

In a study on the labour force impacts for women @Hardoy2015 look at the effects of reducing overall child care costs in Norway through subsidies.
It finds that overall the reductions in child care cost increased the female labour supply in the country (by about 5 per cent),
while there were no significant impacts on mothers which already participated in the labour market.
It also finds some internal heterogeneity, with the impact being strongest for low-education mothers and low-income households,
a finding the authors expected due to day care expenditure representing a larger part of those households' budgets thus creating a larger impact.
Though it may alternatively also be generated by the lower average pre-intervention employment rate for those households.
Interestingly when disaggregating by native and immigrant mothers there is only a significant impact on native mothers,
though the authors do not form an inference on why this difference would be.
A limitation of the study is that there was a simultaneous child care capacity increase in the country,
which may bias the labour market results due to being affected by both the cost reduction and the capacity increase.

<!-- health care -->
@Carstens2018 conduct an analysis of the potential factors influencing mentally ill individuals in the United States to participate in the labour force, using correlation between different programmes of Medicaid and labour force status.
In trying to find labour force participation predictors it finds employment motivating factors in reduced depression and anxiety, increased responsibility and problem-solving and stress management being positive predictors.
In turn increased stress, discrimination based on their mental, loss of free time, loss of government benefits and tests for illegal drugs were listed as barriers negatively associated with labour force participation.
For the government benefits, it finds significant variations for the different varieties of Medicaid programmes, with the strongest negative labour force participation correlated to Medicaid ABD, a programme for which it has to be demonstrated that an individual cannot work due to their disability.
The authors suggest this shows the primary channel of the programme becoming a benefit trap, with disability being determined by not working and benefits disappearing when participants enter the labour force, creating dependency to the programme as a primary barrier.
Two limitations of the study are its small sample size due to a low response rate, and an over-representation of racial minorities, women and older persons in the sample mentioned as introducing possible downward bias for measured labour force participation rates.

<!-- UBI -->
<!-- TODO Potentially mention single sentence of Standing also looking into UBI -->
@Cieplinski2021 undertake a simulation study on the income inequality effects of both a policy targeting a reduction in working time and the introduction of a UBI in Italy.
It finds that while both decrease overall income inequality, measured through Gini coefficient, they do so through different channels.
While provision of a UBI sustains aggregate demand, thereby spreading income in a more equitable manner,
working time reductions significantly decrease aggregate demand through lower individual income but significantly increases labour force participation and thus employment.
It also finds that through these channels of changing aggregate demand, the environmental outcomes are oppositional, with work time reduction decreasing and UBI increasing the overall ecological footprint.
One limitation of the study is the modelling assumption that workers will have to accept both lower income and lower consumption levels under a policy of work time reduction through stable labour market entry for the results to hold.

## Structural factors

## Agency factors

# Robustness of evidence

## Output chronology

The identified literature rises in volume over time between 2000 and 2023,
with first larger outputs identified from 2014 onwards,
as can be seen in @fig-publications-per-year.
While fluctuating overall, with a significantly smaller outputs 2017 and in turn significantly higher in 2021,
the overall output volume strongly increased during this period.

```{python}
#| label: fig-publications-per-year
#| fig-cap: Publications per year

df_study_years = (
    df.groupby(["author", "year", "title"])
    .first()
    .reset_index()
    .drop_duplicates()
    ["year"].value_counts()
    .sort_index()
)
# use order to ensure all years are displayed, even ones without values
years_range = list(range(df_study_years.index.min(), df_study_years.index.max()+1))
ax = sns.barplot(df_study_years, order=years_range)

ax.set_ylabel("Count")
ax.set_xlabel("Year")
plt.tight_layout()
ax.tick_params(axis='x', rotation=90)
ax.set_ylabel("Citations")
ax.set_xlabel("Year")
plt.show()
del df_study_years
```

Such anomalies can point to a dispersed or different focus during the time span,
newly arising alternative term clusters which have not been captured by the search query
or a diversion of efforts towards different interventions or policies.
Their temporary nature, however, makes non-permanent causes more likely than fundamental changes to approaches or terms which could signal more biased results for this review.

The literature is predominantly based on white literature, with only a marginal amount solely published as grey literature.
Such a gap in volume seems expected with the database query efforts primarily aimed at finding the most current versions of white literature.
It also points to a well targeted identification procedure, with more up-to-date white literature correctly superseding potential previous grey publications.
@fig-citations-per-year-avg shows the average number of citations for all studies published within an individual year.

```{python}
#| label: fig-citations-per-year-avg
#| fig-cap: Average citations per year
df["zot_cited"] = df["zot_cited"].dropna().astype("int")
df_avg_citations = df.groupby(["year"], as_index=False)["zot_cited"].mean()
fig, ax = plt.subplots()
ax.bar(df_avg_citations["year"], df_avg_citations["zot_cited"])
sns.regplot(x=df_avg_citations["year"], y=df_avg_citations["zot_cited"], ax=ax)
#ax = sns.lmplot(data=df_avg_citations, x="year", y="zot_cited", fit_reg=True)

ax.set_ylabel("Citations")
ax.set_xlabel("Year")
plt.tight_layout()
years_range = list(range(df_avg_citations["year"].min(), df_avg_citations["year"].max()+1))
ax.set_xticks(years_range)
ax.tick_params(axis='x', rotation=90)
plt.show()
del df_avg_citations
```

From the literature sample, several patterns emerge:
First, in general, citation counts are slightly decreasing over time ---
a trend which should generally be expected as less time has passed to allow newer studies' contents to be distributed and fewer repeat citations to have occurred.
Second, larger changes between individual years appear more erratically.
Taken together, this suggests that, though no overall decrease in academic interest in the topic over time occurred,
it may point to the volume of relevant output not necessarily rising as steadily as overall output.

Early outliers also suggest clearly influential individual studies having been produced during those years,
a possibility which may be more relevant during years of more singular releases (such as 2011 and 2013).
This is because, as @fig-publications-per-year showed, the overall output was nowhere near as rich as in the following years, allowing single influential works to skew the visible means for those years.

## Validity ranking

Finally, following @Maitrot2017, the relevant studies are ranked for their validity.
Here, a 2-dimensional approach is taken to separate the external validity from the internal validity of the studies.
The ranking process then uses the representativeness of a study's underlying dataset,
from a non-representative survey sample, through a sub-nationally representative sample, a nationally representative and the use of census data,
to arrive at a ranking between 2.0 and 5.0 respectively.
Similarly, the studies are ranked for internal validity using the study design,
with only quasi-experimental and experimental studies receiving similar rankings between 2.0 and 5.0 depending on the individually applied methods due to their quantifiability,
while observational and qualitative studies go without an internal validity rank (0.0) due to the more contextual nature of their analyses.
For a full list of validity ranks, see @appbtbl-validity-external and @appbtbl-validity-internal.

Using the validity ranking separated into internal and external validity for each study,
it is possible to identify the general make-up of the overall sample,
the relationship between both dimensions and the distribution of studies within.

@fig-validity-relation shows the relation between each study's validity on the internal dimension and the external dimension,
with experimental studies additionally distinguished.
Generally, studies that have a lower internal validity, between 2.0 and 3.5, rank higher on their external validity,
while studies with higher internal validity in turn do not reach as high on the external validity ranking.

```{python}
#| label: fig-validity-relation
#| fig-cap: "Relation between internal and external validity"

plt.figure().set_figheight(5)
sns.violinplot(
    data=validities,
    x="Internal Validity", y="External Validity", hue="design",
    cut=0, bw_method="scott",
    orient="x"
)
sns.swarmplot(
    data=validities,
    x="Internal Validity", y="External Validity", legend=False,
    color="darkmagenta",
    s=4
)
```

Studies with an internal validity ranking of of 3.0 (primarily made up of difference-in-difference approaches) and an internal ranking of 5.0 (randomized control trials) have the same tight clustering around an external validity between 4.0 (national) and 5.0 (census-based), and 2.0 (local) and 3.0 (subnational), respectively.
This clearly shows the expected overall relationship of studies with high internal validity generally ranking lower on their external validity.

The situation is less clear-cut with the internal rankings of 2.0 (primarily ordinary least squares) and 4.0 (primarily instrumental variable),
which show a larger external validity spread.
For 2.0-ranked studies, there is an overall larger spread with most using nationally representative data,
while a significant amount makes use of census-based data and others in turn only being subnationally representative.
Studies ranked 4.0 internally have a higher heterogeneity with the significant outlier of @Thoresen2021,
which had the limitation of its underlying data being non-representative.

Looking at the overall density of studies along their external validity dimension,
@fig-validity-distribution reiterates this overall relationship with internal validity.
It additionally shows that studies with low internal validity make up the dominant number of nationally representative analyses and the slight majority of census-based analyses,
while locally or non-representative samples are almost solely made up of internally highly valid (ranking 4.0 or above) analyses,
again with the exception of @Thoresen2021 already mentioned.

```{python}
#| label: fig-validity-distribution
#| fig-cap: "Distribution of internal validities"

sns.displot(
    data=validities,
    x="External Validity", hue="Internal Validity",
    kind="kde",
    multiple="fill", clip=(0, None),
    palette="ch:rot=-0.5,hue=1.5,light=0.9",
    bw_adjust=.65, cut=0,
    warn_singular = False
)
```

Looking at the data per region, census-based studies are primarily spread between Latin America and the Caribbean, as well as Europe and Central Asia.
Meanwhile, studies using nationally, subnationally or non-representative data then to have a larger focus on North America, as well as East Asia and the Pacific.
A slight trend towards studies focusing on evidence-based research in developing countries is visible,
though with an overall rising output as could be seen in @fig-publications-per-year, and the possibly a reliance on more recent datasets, this would be expected.

## Regional spread

As can be seen in @fig-region-counts, taken by region for the overall study sample,
the evidence base receives a relatively even split between the World Bank regional country groupings with the exception of the Middle East and North Africa (MENA) region,
in which fewer studies have been identified.

```{python}
#| label: fig-region-counts
#| fig-cap: Studies by regions analysed

by_region = (
    df[["region"]]
    .assign(
        region = lambda _df: (_df["region"]
            .str.replace(r" ?; ?", ";", regex=True)
            .str.strip()
            .str.split(";")
        )
    )
    .explode("region")
    .reset_index(drop=True)
)
ax = sns.countplot(by_region, x="region", order=by_region["region"].value_counts().index)
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")
plt.show()
del by_region

def regions_for_inequality(df, inequality:str):
    df_temp = df.loc[(df["inequality"] == inequality)]
    return sns.countplot(df_temp, x="region", order=df_temp["region"].value_counts().index)
```

Most studies come from a context of East Asia and the Pacific, though with an almost equal amount analysing Europe and Central Asia.
With slightly fewer studies, the contexts of North America, Sub-Saharan Africa follow for amount of anlalyses,
and in turn Latin America and the Caribbean and South Asia with an equal amount of studies for each region.

The lower amount of studies stemming from a MENA context can point to a variety of underlying causes:
First, it is possible that there is simply not as much evidence-based analysis undertaken for countries in the region as for other national or subnational contexts,
with research either following a more theoretical trajectory, or missing the underlying data collection that is available for other regional contexts.

However, it cannot be ruled out that the search protocol itself did not capture the same depth of analytical material as for other contexts,
with each region often having both a specific focus in policy-orientations and academically,
and in some cases also differing underlying term bases.
Such a contextual term differences may then not be captured adequately by the existing query terms and would point to a necessity to re-align it to the required specifics.

One reason for such a differentiation could be a larger amount of grey literature captured compared to other regions,
which may be utilising less established terms than the majority of captured literature for policy implementations.
Another reason could be the actual implementation of different policy programmes which are then equally not captured by existing term clusters.

# Discussion

# Conclusions

# Bibliography

::: {#refs}

:::

# Appendices {.appendix .unnumbered}

## Appendix A - Term clusters {.unnumbered}

::: {#appatbl-wow-terms}

```{python}
terms_wow = pd.read_csv(f"{g.SUPPLEMENTARY_DATA}/terms_wow.csv")
Markdown(tabulate(terms_wow.fillna(""), showindex=False, headers="keys", tablefmt="grid"))
```

World of work term cluster

:::

::: {#appatbl-intervention-terms}

```{python}
terms_policy = pd.read_csv(f"{g.SUPPLEMENTARY_DATA}/terms_policy.csv")
# different headers to include 'social norms'
headers = ["General", "Institutional", "Structural", "Agency & social norms"]
Markdown(tabulate(terms_policy.fillna(""), showindex=False, headers=headers, tablefmt="grid"))
```

Policy intervention term cluster

:::

::: {#appatbl-inequality-terms}

```{python}
terms_inequality = pd.read_csv(f"{g.SUPPLEMENTARY_DATA}/terms_inequality.csv")
Markdown(tabulate(terms_inequality.fillna(""), showindex=False, headers="keys", tablefmt="grid"))
```

Inequality term cluster

:::

## Appendix B - Validity rankings {#sec-appendix-validity-rankings .unnumbered}

::: {#appbtbl-validity-external}

| Representativeness                          | Ranking |
| ---                                         | ---     |
| non-representative survey/dataset           | 2.0     |
| subnationally representative survey/dataset | 3.0     |
| nationally representative survey/dataset    | 4.0     |
| census-based dataset                        | 5.0     |

External validity ranking. Adapted from @Maitrot2017.

:::

::: {#appbtbl-validity-internal}

| Method                                         | Ranking |
| ---                                            | ---     |
| ordinary least squares & fixed-effects         | 2.0     |
| discontinuity matching                         | 3.0     |
| difference in difference (& triple difference) | 3.0     |
| propensity score matching                      | 3.5     |
| instrumental variable                          | 4.0     |
| general method of moments                      | 4.0     |
| regression discontinuity                       | 4.5     |
| randomised control trial                       | 5.0     |

Internal validity ranking. Adapted from @Maitrot2017.

:::

## Appendix C - Boolean search query {.unnumbered}

```{python}
#| label: full-search-query
#| echo: false
#| output: asis
with open(f"{g.SUPPLEMENTARY_DATA}/query.txt") as f:
    query = f.read()

t3 = "`" * 3
print(f"""
```sql
{query}
{t3}
""")
```