wow-inequalities/article.qmd

---
title: Inequalities in the World of Work
subtitle: What do we know?
author:
    - name: Miguel Niño-Zarazúa
      email: mn39@soas.ac.uk
      affiliations:
        - id: soas
          name: SOAS University of London
          department: Department of Economics
      attributes:
        corresponding: true
    - name: Marty Oehme
      email: mail@martyoeh.me
date: last-modified
abstract: |
    We are researching the effectiveness of policies which target inequalities on the labour market.
keywords:
    - labour markets
    - world of work
    - inequality
    - policy
    - systematic scoping review
lang: en
crossref: # to fix the appendix crossrefs being separate from main
  custom:
    - kind: float
      key: appatbl
      latex-env: appatbl
      reference-prefix: Table A
      space-before-numbering: false
      latex-list-of-description: Appendix A Table
    - kind: float
      key: appbtbl
      latex-env: appbtbl
      reference-prefix: Table B
      space-before-numbering: false
      latex-list-of-description: Appendix B Table
---

{{< portrait >}}

```{python}
#| label: load-data
#| echo: false
#| output: false
from pathlib import Path
import re
## standard imports
from IPython.core.display import Markdown as md
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from tabulate import tabulate
import bibtexparser

sns.set_style("whitegrid")

DATA_DIR=Path("./02-data")
RAW_DATA=DATA_DIR.joinpath("raw")
WORKING_DATA=DATA_DIR.joinpath("intermediate")
PROCESSED_DATA=DATA_DIR.joinpath("processed")
SUPPLEMENTARY_DATA=DATA_DIR.joinpath("supplementary")

from src import prep_data

# raw database-search results
bib_sample_raw_db = prep_data.bib_library_from_dir(RAW_DATA)
# the complete library of sampled (and working) literature
bib_sample = prep_data.bib_library_from_dir(WORKING_DATA)

# load relevant studies
from src import load_data

bib_df = prep_data.observations_with_metadata_df(
    raw_observations = load_data.from_yml(PROCESSED_DATA),
    study_metadata = prep_data.bib_metadata_df(bib_sample),
    country_groups = prep_data.country_groups_df(Path(f"{SUPPLEMENTARY_DATA}/wb-country-groupings.xlsx")),
)
raw_observations = None
zot_df = None
df_country_groups = None
```

# Introduction

* Context and statement of the problem
* Aims and rationale of the systematic scoping review
* Summary of the main findings
* Description of the structure of the paper

# Conceptual framework

* Theories, policies, mechanisms and outcomes

# Review methodology

The following section will discuss the methodology that is proposed to conduct the review of the literature on policy interventions that are expected to address inequalities in forms of work and labour market outcomes,
as well as give an overview of the collected data.
This study follows the principles of a systematic review framework, to systematically assess the impact of an array of policies on inequalities in the world of work.
It strives to follow the clear and reproducible method of identification prior to synthesis of relevant research,
while limiting "bias by the systematic assembly, critical appraisal and synthesis" through applying scientific strategies to the review itself [@Cook1995].
It thereby attempts to provide an improved basis for comparative analysis between studies through the rigorous application of systematic criteria and thus to avoid the potential bias of narrative reviews.

Unlike purely systematic reviews which typically focus on specific policy questions and interventions, systematic scoping reviews focus on a wider spectrum of policies, where different study designs and research questions can be investigated.
Since scoping reviews allow both broad and in-depth analyses, they are the most appropriate rigorous method to make a synthesis of the current evidence in this area [@Arksey2005].

The scoping review allows broad focus to be given to a subject for which no unified path with clear edges has been laid out yet by prior reviews, as remains the case with policies targeting inequalities in the world of work.
It does so through a breadth-first approach through a search protocol which favours working through a large body of literature to subsequently move toward a depth-favouring approach once the literature has been sufficiently delimited.
Its purpose, clearly mapping a body of literature on a (broad) topic area, is thereby useful on its own or in combination with a systematic approach [@Arksey2005].
With an increasingly adopted approach in recent years, with rigorous dichotomy of inclusion and exclusion criteria it provides a way of charting the relevance of literature related to its overall body that strives to be free of influencing biases which could affect the skew of the resulting literature sample [@Pham2014].

## Inclusion criteria

Concise narrowing criteria are applied to restrict the sample to studies looking at i) the effects of individual evidence-based policy measures or intervention initiatives ii) attempting to address a single or multiple of the defined inequalities in the world of work.
iii) using appropriate quantitative methods to examine the links of intervention and impact on the given inequalities.
The narrowing process makes use of the typology of inequalities, of forms of work, and of policy areas introduced above as its criteria.

An overview of the respective criteria used for inclusion or exclusion can be found in @tbl-inclusion-criteria.
It restricts studies to those that comprise primary research published after 2000,
with a focus on the narrowing criteria specified in @tbl-inclusion-criteria.

::: {#tbl-inclusion-criteria}

```{python}
#| label: tbl-inclusion-criteria

inclusion_criteria = pd.read_csv("02-data/supplementary/inclusion-criteria.tsv", sep="\t")
md(tabulate(inclusion_criteria, showindex=False, headers="keys", tablefmt="grid"))
```

Source: Author's elaboration

Study inclusion and exclusion scoping criteria

:::

## Search protocol

The search protocol follows a three-staged process of execution: identification, screening and extraction.
First, in identification, the relevant policy, inequality and world of work related dimensions are combined through Boolean operators to conduct a search through the database repository Web of Science and supplemental searches via Google Scholar to supply potential grey literature.
While the resulting study pools could be screened for in multiple languages, the search queries themselves are passed to the databases in English-language only.
Relevant results are then complemented through the adoption of a 'snowballing' technique,
in which an array of identified adjacent published reviews is analysed for their reference lists to find cross-references of potentially missing literature and in turn add those to the pool of studies.

To identify potential studies and create an initial sample, relevant terms for the clusters of world of work, inequality and policy interventions have been extracted from the existing reviews as well as the ILO definitions.[^existingreviews]

[^existingreviews]: TODO: citation of existing reviews used; ILO definitions if mentioned

Identified terms comprising the world of work can be found in the Appendix tables @appatbl-wow-terms, @appatbl-intervention-terms, and @appatbl-inequality-terms,
with the search query requiring a term from the general column and one other column of each table respectively.
Each cluster is made up of a general signifier (such as “work”, “inequality” or “intervention”) which has to be labelled in a study to form part of the sample,
as well as any additional terms looking into one or multiple specific dimensions or categories of these signifiers (such as “domestic” work, “gender” inequality, “maternity leave” intervention).
For the database query, a single term from the respective general category is required to be included in addition to one term from any of the remaining categories.

Second, in screening, duplicate results are removed and the resulting literature sample is sorted based on a variety of excluding characteristics based on:
language, title, abstract, full text and literature supersession through newer publications.
Properties in these characteristics are used to assess an individual study on its suitability for further review in concert with the inclusion criteria mentioned in @tbl-inclusion-criteria.

To facilitate the screening process, with the help of 'Zotero' reference manager a system of keywords is used to tag individual studies in the sample with their reason for exclusion,
such as 'excluded::language', 'excluded::title', 'excluded::abstract', and 'excluded::superseded'.
This keyword-based system is equally used to further categorize the sample studies that do not fall into exclusion criteria, based on primary country of analysis, world region, as well as income level classification.
To that end, a 'country::', 'region::' and 'income::' are used to disambiguate between the respective characteristics, such as 'region::LAC' for Latin America and the Caribbean, 'region::SSA' for Sub-Saharan Africa; as well as for example 'income::low-middle', 'income::upper-middle' or 'income::high'.
These two delineations follow the ILO categorizations on world regions and the country income classifications based on World Bank income groupings [@ILO2022].

Similarly, if a specific type of inequality, or a specific intervention, represents the focus of a study, these will be reflected in the same keyword system (such as 'inequality::income' or 'inequality::gender').
The complete process of identification and screening is undertaken with the help of the Zotero reference manager.
Last, for extraction, studies are screened for their full-texts, irrelevant studies excluded with 'excluded::full-text' as explained above and relevant studies then ingested into the final sample pool.

Should any literature reviews be identified as relevant during this screening process,
they will in turn be crawled for cited sources in a 'snowballing' process.
The sources will be added to the sample to undergo the same screening process explained above,
ultimately resulting in the process represented in the PRISMA chart in @fig-prisma.

```{python}
#| label: calculate-scoping-flowchart
#| echo: false
#| output: asis

nr_database_query_raw = len(bib_sample_raw_db.entries)
nr_snowballing_raw = 2240

all_keywords = [entry["keywords"] for entry in bib_sample.entries if "keywords" in entry.fields_dict.keys()]
nr_database_deduplicated = len([1 for kw in all_keywords if "sample::database" in kw])
nr_snowballing_deduplicated = len([1 for kw in all_keywords if "sample::snowballing" in kw])
nr_out_superseded = len([1 for kw in all_keywords if "out::superseded" in kw])

FULL_RAW_SAMPLE_NOTHING_REMOVED = nr_database_query_raw + nr_snowballing_raw
FULL_SAMPLE_DUPLICATES_REMOVED = nr_database_deduplicated + nr_snowballing_deduplicated + nr_out_superseded

NON_ZOTERO_CAPTURE_TITLE_REMOVAL = 1150
NON_ZOTERO_CAPTURE_ABSTRACT_REMOVAL = 727
NON_ZOTERO_CAPTURE_FULLTEXT_REMOVAL = 348

nr_out_duplicates = FULL_RAW_SAMPLE_NOTHING_REMOVED - FULL_SAMPLE_DUPLICATES_REMOVED
nr_out_title = len([1 for kw in all_keywords if "out::title" in kw]) + NON_ZOTERO_CAPTURE_TITLE_REMOVAL
nr_out_abstract = len([1 for kw in all_keywords if "out::abstract" in kw]) + NON_ZOTERO_CAPTURE_ABSTRACT_REMOVAL
nr_out_fulltext = len([1 for kw in all_keywords if "out::full-text" in kw]) + NON_ZOTERO_CAPTURE_FULLTEXT_REMOVAL
nr_out_language = len([1 for kw in all_keywords if "out::language" in kw])
nr_extraction_done = len([1 for kw in all_keywords if "done::extracted" in kw])

t3 = "`" * 3
# FIXME use 02-data/supplementary undeduplciated counts to get database starting and snowballing counts
# from: https://github.com/quarto-dev/quarto-cli/discussions/6508
print(f"""
```{{mermaid}}
%%| label: fig-prisma
%%| fig-cap: "Sample sorting process through identification and screening"
%%| fig-width: 6
flowchart TD;
    search_db["Records identified through database searching (n={nr_database_query_raw})"] --> starting_sample;
    search_prev["Records identified through other sources (n={nr_snowballing_raw})"] --> starting_sample["Starting sample (n={FULL_RAW_SAMPLE_NOTHING_REMOVED})"];

    starting_sample -- "Duplicate removal ({nr_out_duplicates+nr_out_superseded} removed) "--> dedup["Records after duplicates removed (n={FULL_SAMPLE_DUPLICATES_REMOVED})"];

    dedup -- "Title screening ({nr_out_title} excluded)" --> title_screened["Records after titles screened (n={FULL_SAMPLE_DUPLICATES_REMOVED - nr_out_title})"];

    title_screened -- "Abstract screening ({nr_out_abstract} excluded)"--> abstract_screened["Records after abstracts screened (n={FULL_SAMPLE_DUPLICATES_REMOVED-nr_out_title-nr_out_abstract})"];

    abstract_screened -- "  Language screening ({nr_out_language} excluded)  "--> language_screened["Records after language screened (n={FULL_SAMPLE_DUPLICATES_REMOVED-nr_out_title-nr_out_abstract-nr_out_language})"];

    language_screened -- "  Full-text screening ({nr_out_fulltext} excluded)  "--> full-text_screened["Full-text articles assessed for eligibility (n={nr_extraction_done})"];
{t3}
""")
```

All relevant data concerning both their major findings and statistical significance are then extracted from the individual studies into a collective results matrix.
The results to be identified in the matrix include a study's: i) key outcome measures (dependent variables), ii) main findings, iii) main policy interventions (independent variables), iv) study design and sample size, v) dataset and methods of evaluation, vi) direction of relation and level of representativeness, vii) level of statistical significance, viii) main limitations.

The query execution results in an initial sample of `{python} nr_database_query_raw` potential studies identified from the database search as well as `{python} nr_snowballing_raw` potential studies from other sources, leading to a total initial number of `{python} FULL_RAW_SAMPLE_NOTHING_REMOVED`.
This accounts for all identified studies without duplicate removal, without controlling for literature that has been superseded or applying any other screening criteria.
Of these, `{python} FULL_SAMPLE_DUPLICATES_REMOVED-nr_out_title-nr_out_abstract-nr_out_language` have been identified as potentially relevant studies for the purposes of this scoping review and selected for a full text review,
from which in turn `{python} nr_extraction_done` have ultimately been extracted.

@fig-intervention-types shows the predominant interventions contained in the reviewed literature.
Overall, there is a focus on measures of minimum wage, subsidisation, considerations of trade liberalisation and collective bargaining, education and training.
The entire spread of policies captures interventions aimed primarily at institutional and structural mechanisms, but also mechanisms focused on individual agency.

```{python}
#| label: fig-intervention-types
#| fig-cap: Available studies by primary type of intervention

by_intervention = (
    bib_df
    .fillna("")
    .groupby(["author", "year", "title", "design", "method", "representativeness", "citation"])
    .agg(
        {
            "intervention": lambda _col: "; ".join(_col),
        }
    )
    .reset_index()
    .drop_duplicates()
    .assign(
        intervention=lambda _df: _df["intervention"].apply(
            lambda _cell: set([x.strip() for x in re.sub(r"\(.*\)", "", _cell).split(";")])
        ),
    )
    .explode("intervention")
)
sort_order = by_intervention["intervention"].value_counts().index

fig = plt.figure()
fig.set_size_inches(6, 3)
ax = sns.countplot(by_intervention, x="intervention", order=by_intervention["intervention"].value_counts().index)
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")
plt.show()
```

# Synthesis of evidence

Since policies employed in the pursuit of increased equality can take a wide form of actors, strategy approaches and implementation details,
the following synthesis will first categorise between the main thematic area and its associated interventions.
Individual observations are then descriptively distinguished between for the primary outcome variables (inequalities) of interest.
Thus, in the following synthesis each reviewed study will be analysed through the primary policies or mechanisms they use as independent variables to analyse the effects on a variety of inequalities.

One of the primary lenses of inequality in viewing policy interventions to reduce inequalities in the world of work is that of income,
often measured for all people throughout a country (vertical inequality) or subsets thereof (horizontal inequality).
At the same time, the primacy of income should not be overstated as disregarding the intersectional nature of inequalities could lead to diminished intervention outcomes through adverse targeting.

Each main thematic area will be preceded by a table presenting a summary of findings for the respective policies,
their identified channels and an estimation of their strength of evidence base.
Afterwards, the analytical lens will be inverted for the discussion and the reviewed studies discussed from a perspective of their analysed inequalities and limitations,
to better identify areas of strong analytical lenses or areas of more limited analyses.

## Institutional factors

{{< portrait >}}

::: {#tbl-findings-institutional}

```{python}
#| label: tbl-findings-institutional
from src.model import validity

study_strength_bins = {
    0.0: r"\-",
    5.0: r"\+",
    10.0: r"\++",
}
def strength_for(val):
    return list(study_strength_bins.keys())[list(study_strength_bins.values()).index(val)]

findings_institutional = pd.read_csv("02-data/supplementary/findings-institutional.csv")
fd_df = validity.add_to_findings(findings_institutional, by_intervention, study_strength_bins)

md(tabulate(fd_df[["area of policy",  "internal_validity", "external_validity", "findings", "channels"]].fillna(""), showindex=False, headers=["area of policy", "internal strength", "external strength", "main findings", "channels"], tablefmt="grid"))
```

Note: Each main finding is presented with an internal strength of evidence and an external strength of evidence which describe the combined validities of the evidence base for the respective finding.
Validities are segmented to a weak (-) evidence base under a validity ranking of `{python} strength_for(r"\+")`,
evidential (+) from `{python} strength_for(r"\+")` and under `{python} strength_for(r"\++")` and strong evidence base (++) for `{python} strength_for(r"\++")` and above.

Summary of main findings for institutional policies

:::

{{< landscape >}}

### Labour laws and regulatory systems

@Adams2015 study the effects of labour, business and credit regulations and looks at their long-term correlations to income inequality in developing countries from 1970 to 2012.
Additionally, the study looks at the effects of FDI and school enrolment, which will be reviewed in their respective policy sections.
They find that in MENA, SSA, LAC and to some extend AP increased labour and business regulations are actually negatively related to equitable income distribution, with market regulation not having significant effects.
The authors identify developing countries lacking in institutional capability to accomplish regulatory policies optimized for benefits and see the need for policies requiring more specific targeting of inequality reduction as their agenda.
Overall, the authors suggest that regulatory policy in developing countries needs to be built for their specific contexts and not exported from developed countries due to their different institutional capabilities and structural make-up.
The study is limited in its design focus relying purely on the macro-level regional analyses and can thus,
when finding correlations towards income inequality, not necessarily drill down into their qualitative root causes.

<!-- maternity leave and benefits -->
@Broadway2020 study the introduction of universal paid maternal leave in Australia, looking at its impacts on mothers returning to work and the conditions they return under.
The study finds that, while there is a short-term decrease of mothers returning to work since they make use of the introduced leave period, over the long-term (after six to nine months) there is a significant positive impact on return to work.
Furthermore, there is a positive impact on returning to work in the same job and under the same conditions,
the effects of which are stronger for more disadvantaged mothers (measured through income, education and access to employer-funded leave).
This suggests that the intervention reduced the opportunity costs for delaying the return to work, and especially for those women that did not have employer-funded leave options, directly benefiting more disadvantaged mothers.
Some potential biases of the study are its inability to account for child-care costs, as well as not being able to fully exclude selection bias into motherhood.
There also remains the potential of results being biased through pre-birth labour supply effects or the results of the financial crisis, which may create a down-ward bias for either the short- or long-term effects.

@Dustmann2012 analyse the long-run effects on children's outcomes of increasing the period of paid leave for mothers in Germany.
While the study focuses on the children's outcomes, it also analyses the effects on the return to work rates and cumulative incomes of the policies within the first 40 months after childbirth.
It finds that, while short-term increases of paid leave periods (up to 6 months) significantly increased incomes, over longer periods (10-36 months) the cumulative incomes in fact decreased significantly,
marginally for low-wage mothers for 10 month periods, and across all wage segments for 36 month periods.
For the share of mothers returning to work, it finds that there is a significant increase in the months away from work among all wage segments for all paid leave period increases, positively correlated with their length.
Still similar numbers of mothers return once the leave period ends, though with significant decreases for leave periods from 18 to 36 months.
For its analysis of long-term educational outcomes on children, however, it does not find any evidence for the expansions improving children's outcomes, even suggesting a possible decrease of educational attainment for the paid leave extension to 36 months.[^dustmann-childoutcomes]
Some limitations of the study include its sample being restricted to mothers who go on maternity leave and some control group identification restrictions possibly introducing some sampling bias.

[^dustmann-childoutcomes]: The authors suggest that the negative effect for children under the long-term paid leave program of 36 months may stem from the fact that children require more external stimuli (aside from the mother) before this period ends, as well as the negative long-term effects of the mother's significantly reduced income for the long-term leave periods.

In a study on the effects of introductions of a variety of maternity leave laws in Japan, @Mun2018 look at the effects on employment numbers and job quality in managerial positions of women.
Contrary to notions of demand-side mechanisms of the welfare state paradox, with women being less represented in high-authority employment positions due to hiring or workplace discrimination against them with increased maternity benefits,
it finds that this is not the case for the Japanese labour market between 1992 and 2009.
There were no increases in hiring discrimination against women, and either no significant change in promotions for firms not providing paid leave before the laws or instead a positive impact on promotions for firms that already provided paid leave.
The authors suggest the additional promotions were primarily based on voluntary compliance of firms in order to maintain positive reputations,
signalled through a larger positive response to incentive-based laws than for mandate-based ones.
Additionally, the authors suggest that the welfare paradox may rather be due to supply-side mechanisms, based on individual career planning, as well as reinforced along existing gender divisions of household labour which may increase alongside the laws.
Limitations of the study include foremost its limited generalizability due to the unique Japanese institutional labour market structure (with many employments, for example, being within a single firm until retirement), as well as no ability yet to measure the true causes and effects of adhering to the voluntary incentive-based labour policies, with lasting effects or done as symbolic compliance efforts and mere impression management.

@Davies2022 conduct a study on the return to work ratios for high-skill women workers in public academic universities in the United Kingdom, comparing the results for those in fixed-term contract work versus those in open-ended contracts.
It finds that there is a significantly decreased return to work probability for those working under fixed-term contracts, and most universities providing policies with more limited access to maternity payment for fixed-contract staff.
This is possibly due to provisions in the policies implicitly working against utilization under fixed-terms:
there are strict policies on payments if a contract ends before the maternity leave period is over, and obligations on repayments if not staying in the position long enough after rtw.
Additionally, most policies require long-term continuous service before qualifying for enhanced payments in the maternity policies.
There is high internal heterogeneity between the universities, primarily due to the diverging maternity policy documents, only a small number of the overall dataset providing favourable conditions for fixed-term work within.

### Minimum wage laws

@Chao2022, in a study looking at the effects of minimum wage increases on a country's income inequality, analyse the impacts in a sample of 43 countries, both LMIC and HIC.
Using a general-equilibrium model, it finds that there are differences between the short-term and long-term effects of the increase:
In the short term it leads to a reduction of the skilled-unskilled wage gap, however an increase in unemployment and welfare,
while in the long term the results are an overall decrease in wage inequality as well as improved social welfare.
It finds those results primarily stem from LMIC which experience significant effects driven by a long-term firm exit from the urban manufacturing sector thereby increasing available capital for the rural agricultural sector, while in HIC the results remain insignificant.
The study uses the Gini coefficient for identifying a country's inequality.
Some limitations of the study include the necessity to omit short-term urban firm exit for the impact to be significant, as well as requiring the, reasonable but necessary, prior assumption of decreased inequality through increased rural agricultural capital.

@Alinaghi2020 conduct a study using a microsimulation to estimate the effects of a minimum wage increase in New Zealand on overall income inequality and further disaggregation along gender and poverty lines.
It finds limited redistributional effects for the policy, with negligible impact on overall income inequality and the possibility of actually increasing inequalities among lower percentile income households.
Additionally, while it finds a significant reduction in some poverty measures for sole parents that are in employment, when looking at sole parents overall the effects become insignificant again.
The authors suggest this points to bad programme targeting, which at best has negligible positive impact on income equality and at worst worsens income inequality in lower income households, due to may low-wage earners being the secondary earners of higher-income households but low-wage households often having no wage earners at all.
A pertinent limitation of the study includes its large sample weights possibly biasing the impacts on specific groups such as sole parents and thus being careful not to overestimate their significance.

In a study on the impacts of minimum wage increases in Ecuador @Wong2019 specifically looks at the income and hours worked of low-wage earners to analyse the policies effectiveness.
The study finds that, generally, there was a significant increase on the income of low-wage earners and also a significant increase on wage workers hours worked which would reflect positively on a decrease in the country's income inequality.
At the same time, it finds some potential negative effects on the income of high earners, suggesting an income-compression effect as employers freeze or reduce high-earners wages to offset low-earners new floors.
The findings hide internal heterogeneity, however:
For income the effect is largest for agricultural workers while for women the effect is significantly smaller than overall affected workers.
For hours worked there is a significant negative impact on women's hours worked, a fact which may point to a decreased intensive margin for female workers and thus also affect their lower income increases.
Limitations of the study include some sort-dependency in their panel data and only being able to account for effects during a period of economic growth.
Thus, while overall income inequality seems well targeted in the intervention, it may exacerbate the gender gap that already existed at the same time.

<!-- non-spatial policy but spatial effects -->
@Gilbert2001 undertake a study looking at the distributional effects of introducing a minimum wage in Britain, with a specific spatial component.
Overall it finds little effect on income inequality in the country.
It finds that the effects on rural areas differ depending on their proximity to urban areas.
While overall income inequality only decreases a small amount, the intervention results in effective targeting with remote rural households having around twice the reduction in inequality compared to others.
Rural areas that are accessible to urban markets are less affected, with insignificant impacts to overall income inequality reduction.
One limit of the study is that it has to assume no effects on employment after the enaction of the minimum wage for its results to hold.

In a study on the impacts of minimum wage and direct cash transfers in Brazil on the country's income inequality,
@SilveiraNeto2011 especially analyse the way the policies interact with spatial inequalities.
It finds that incomes between regions have converged during the time frame and overall the cash transfers under the 'Bolsa Familia' programme and minimum wage were accounting for 26.2% of the effect.
Minimum wage contributed 16.6% of the effect to overall Gini reduction between the regions while cash transfers accounted for 9.6% of the effect.
The authors argue that this highlights the way even non-spatial policies can have a positive (or, with worse targeting or selection, negative) influence on spatial inequalities,
as transfers occurring predominantly to poorer regions and minimum wages having larger impacts in those regions created quasi-regional effects without being explicitly addressed in the policies.
Some limitations include limited underlying data only making it possible to estimate the cash transfer impacts for the analysis end-line,
and minimum wage effects having to be constructed from the effects wages equal to minimum wage.

@Militaru2019 conduct an analysis of the effects of minimum wage increases on income inequality in Romania.
They find that, generally, minimum wage increases correlate with small wage inequality decreases, but carry a larger impact for women.
The channels for the policies effects are two-fold in that there is an inequality decrease as the number of wage earners in total number of employees increases,
as well as the concentration of workers at the minimum level mattering --- the probable channel for a larger impact on women since they make up larger parts of low-income and minimum wage households in Romania.
Limitations to the study are some remaining unobservables for the final inequality outcomes (such as other wages or incomes), the sample over-representing employees and not being able to account for any tax evasion or behavioural changes in the model.

@Sotomayor2021 conducts a study on the impact of subsequent minimum wage floor introductions on poverty and income inequality in Brazil.
He finds that in the short-term (3 months) wage floor increases reduced poverty by 2.8% and reduced income inequality by 2.4%.
Over the longer-term though these impacts decrease,
the minimum wage increases only show diminishing returns when the legal minimum is already high in relation to median earnings.
It suggests that additional unemployment costs, created through new job losses through the introduction, are offset by the increased benefits --- the higher wages for workers.
The authors also suggest an inelastic relationship between increases and poverty incidence.
One limitation of the study is the limit of tracking individuals in the underlying data which can not account for people moving household to new locations.
The data can only track individual dwellings --- instead of the households and inhabitants within --- and thus resembles repeated cross-sectional data more than actual panel data.

### Collective bargaining

@Alexiou2023 study the effects of both political orientation of governments' parties and a country's trade unionisation on its income inequality.
They find that, generally, strong unionisation is strongly related to decreasing income inequality, most likely through a redistribution of political power through collective mobilization in national contexts of stronger unions.
It also suggests that in contexts of weaker unionisation, post-redistribution income inequality is higher, thus also fostering unequal redistributive policies.
Lastly, it finds positive relations between right-wing orientation of a country's government and its income inequality, with more mixed results for centrist governments pointing to potential fragmentations in their redistributive policy approaches.
The study is mostly limited in not being able to account for individual drivers (or barriers) and can thus not disaggregate for the effects for example arbitration or collective bargaining.

@Dieckhoff2015 undertake a study on the effect of trade unionisation in European labour markets, with a specific emphasis on its effects on gender inequalities.
It finds, first of all, that increased unionisation is related to the probability of being employed on a standard employment contract for both men and women.
It also finds no evidence that men seem to carry increased benefits from increased unionisation alone,
although in combination with temporary contract and family policy re-regulations, men can experience greater benefits than women.
At the same time women's employment under standard contracts does not decrease, such that there is no absolute detrimental effect for either gender.
It does, however, leave open the question of the allocation of relative benefits between the genders through unionisation efforts.
The study is limited in that, by averaging outcomes across European nations, it can not account for nation-specific labour market contexts or gender disaggregations.

@Cardinaleschi2019 study the wage gap in the Italian labour market, looking especially at the effects of collective negotiation practices.
It finds that the Italian labour market's wage gap exists primarily due to occupational segregation between the genders, with women often working in more 'feminized' industries, and not due to educational lag by women in Italy.
It also finds that collective negotiation practices targeting especially managerial representation and wages do address the gender pay gap, but only marginally significantly.
The primary channel for only marginal significance stems from internal heterogeneity in that only the median part of wage distributions is significantly affected by the measures.
Instead, the authors recommend a stronger mix of policy approaches, also considering the human-capital aspects with for example active labour-market policies targeting it.

@Ferguson2015 conducts a study on the effects of a more unionised workforce in the United States, on the representation of women and minorities in the management of enterprises.
It finds that while stronger unionisation is associated both with more women and more minorities represented in the overall workforce and in management, this effect is only marginally significant.
Additionally, there are drivers which may be based on unobservables and not a direct effect ---
it may be a selection effect of more unionised enterprises.
It uses union elections as its base of analysis, and thus can not exclude self-selection effects of people joining more heavily unionised enterprises rather than unionisation increasing representation in its conclusions.

@Ahumada2023 on the other hand create a study on the effects of unequal distributions of political power on the extent and provision of collective labour rights.
It is a combination of quantitative global comparison with qualitative case studies for Argentina and Chile.
It finds that, for societies in which power is more unequally distributed, collective bargaining possibilities are more limited and weaker.
It suggests that, aside from a less entrenched trade unionisation in the country, the primary channel for its weakening are that existing collective labour rights are often either restricted or disregarded outright.
Employers were restricted in their ability to effectively conduct lobbying, and made more vulnerable to what the authors suggest are 'divide-and-conquer' strategies by government with a strongly entrenched trade unionisation, due to being more separate and uncoordinated.
A limit is the strong institutional context of the two countries which makes generalizable application of its underlying channels more difficult to the overarching quantitative analysis of inequality outcomes.

### Workfare programmes

@Whitworth2021 analyse the spatial consequences of a UK work programme on spatial factors of job deprivation or opportunity increases.
The programme follows a quasi-marketized approach of rewarding employment-favourable results of transitions into employment and further sustained months in employment.
The author argues, however, that the non-spatial implementation of the policy leads to spatial outcomes.
Founded on the approach of social 'creaming' and 'parking' and applied to the spatial dimension,
the study shows that already job-deprived areas indeed experience further deprivations under the programme,
while non-deprived areas are correlated with positive impacts, thereby further deteriorating spatial inequality outcomes.
This occurs because of providers in the programme de-prioritizing the already deprived areas ('parking') in favour prioritizing wealthier areas for improved within-programme results.

@Li2022 conduct a study on the effects of previous inequalities on the outcomes of a work programme in India intended to provide job opportunity equality for already disadvantages population.
It specifically looks at the NREGA programme in India, and takes the land-ownership inequality measured through the Gini coefficient as its preceding inequality.[^nrega]
The study finds that there is significantly negative relationship between the Gini coefficient and the provision of jobs through the work programme.
In other words, the workfare policy implemented at least in part to reduce a district's inequality seems to be less effective if there is a larger prior capital inequality.
The authors see the primary channel to be the landlords' opposition to broad workfare programme introduction since they are often followed by overall wage increases in the districts.
They suggest that in more inequally distributed channels the landlords can use a more unequal power structure to lobby and effect political power decreasing the effectiveness of the programmes,
in addition to often reduced collective bargaining power on the side of labour in these districts.
The results show the same trends for measurement of land inequality using the share of land owned by the top 10 per cent largest holdings instead.

[^nrega]: The National Rural Employment Guarantee Scheme (NREGA) is a workfare programme implemented in India, the largest of its kind, which seeks to provide 100 days of employment for each household per year. It was rolled out from 2005 over several phases until it reached all districts in India in 2008.

### Social protection

<!-- TODO Should we include Pi2016 on social security? -->

<!-- social assistance benefits and wages -->
@Wang2016 undertake an observational study on the levels of social assistance benefits and wages in a national comparative study within 26 OECD countries.
It finds that real minimum income benefit levels generally increased in most countries from 1990 to 2009, with only a few countries, mostly in Eastern European welfare states, showing decreases during the time frame.
The majority of changes in real benefit levels are from deliberate policy changes and the study calculates them by a comparison of the changes in benefit levels to the changes in consumer prices.
Secondly, it finds that changes for income replacement rates are more mixed, with rates decreasing even in some countries which have increasing real benefits levels.
The study suggests this is because benefit levels are in most cases not linked to wages and policy changes also do not take changes in wages into account resulting in diverging benefit levels and wages, which may lead to exacerbating inequality gaps between income groups.

<!-- conditional cash transfer -->
@Debowicz2014 conduct a study looking at the impact of the cash transfer programme Oportunidades in Mexico, conditioned on a household's children school attendance, on income inequality among others.
It finds that a combination of effects raises the average income of the poorest households by 23 percent.
The authors argue in the short run this benefits households through the direct cash influx itself, as well as generating a positive wage effect benefitting those who keep their children at work.
For the estimation of income inequality it uses the Gini coefficient.
Additionally, over the long-term for the children in the model there is a direct benefit for those whose human capital is increased due to the programme, but also an indirect benefit for those who did not increase their human capital, because of the increased scarcity of unskilled labor as a secondary effect.
Due to the relatively low cost of the programme if correctly targeted, it seems to have a significantly positive effect on the Mexican economy and its income equality.

In a study on the labour force impacts for women @Hardoy2015 look at the effects of reducing overall child care costs in Norway through subsidies.
It finds that overall the reductions in child care cost increased the female labour supply in the country (by about 5 per cent),
while there were no significant impacts on mothers which already participated in the labour market.
It also finds some internal heterogeneity, with the impact being strongest for low-education mothers and low-income households,
a finding the authors expected due to day care expenditure representing a larger part of those households' budgets thus creating a larger impact.
Though it may alternatively also be generated by the lower average pre-intervention employment rate for those households.
Interestingly when disaggregating by native and immigrant mothers there is only a significant impact on native mothers,
though the authors do not form an inference on why this difference would be.
A limitation of the study is that there was a simultaneous child care capacity increase in the country,
which may bias the labour market results due to being affected by both the cost reduction and the capacity increase.

<!-- health care -->
@Carstens2018 conduct an analysis of the potential factors influencing mentally ill individuals in the United States to participate in the labour force, using correlation between different programmes of Medicaid and labour force status.
In trying to find labour force participation predictors it finds employment motivating factors in reduced depression and anxiety, increased responsibility and problem-solving and stress management being positive predictors.
In turn increased stress, discrimination based on their mental, loss of free time, loss of government benefits and tests for illegal drugs were listed as barriers negatively associated with labour force participation.
For the government benefits, it finds significant variations for the different varieties of Medicaid programmes, with the strongest negative labour force participation correlated to Medicaid ABD, a programme for which it has to be demonstrated that an individual cannot work due to their disability.
The authors suggest this shows the primary channel of the programme becoming a benefit trap, with disability being determined by not working and benefits disappearing when participants enter the labour force, creating dependency to the programme as a primary barrier.
Two limitations of the study are its small sample size due to a low response rate, and an over-representation of racial minorities, women and older persons in the sample mentioned as introducing possible downward bias for measured labour force participation rates.

<!-- UBI -->
<!-- TODO Potentially mention single sentence of Standing also looking into UBI -->
@Cieplinski2021 undertake a simulation study on the income inequality effects of both a policy targeting a reduction in working time and the introduction of a UBI in Italy.
It finds that while both decrease overall income inequality, measured through Gini coefficient, they do so through different channels.
While provision of a UBI sustains aggregate demand, thereby spreading income in a more equitable manner,
working time reductions significantly decrease aggregate demand through lower individual income but significantly increases labour force participation and thus employment.
It also finds that through these channels of changing aggregate demand, the environmental outcomes are oppositional, with work time reduction decreasing and UBI increasing the overall ecological footprint.
One limitation of the study is the modelling assumption that workers will have to accept both lower income and lower consumption levels under a policy of work time reduction through stable labour market entry for the results to hold.

## Structural factors

## Agency factors

# Robustness of evidence

## Output chronology

The identified literature rises in volume over time between 2000 and 2023,
with first larger outputs identified from 2014 onwards,
as can be seen in @fig-publications-per-year.
While fluctuating overall, with a significantly smaller outputs 2017 and in turn significantly higher in 2021,
the overall output volume strongly increased during this period.

```{python}
#| label: fig-publications-per-year
#| fig-cap: Publications per year

df_study_years = (
    bib_df.groupby(["author", "year", "title"])
    .first()
    .reset_index()
    .drop_duplicates()
    ["year"].value_counts()
    .sort_index()
)
# use order to ensure all years are displayed, even ones without values
years_range = list(range(df_study_years.index.min(), df_study_years.index.max()+1))
ax = sns.barplot(df_study_years, order=years_range)

ax.set_ylabel("Count")
ax.set_xlabel("Year")
plt.tight_layout()
years_list = np.arange(2000, 2024).tolist()
ax.tick_params(axis='x', rotation=90)
ax.set_ylabel("Citations")
ax.set_xlabel("Year")
plt.show()
df_study_years = None
```

Such anomalies can point to a dispersed or different focus during the time span,
newly arising alternative term clusters which have not been captured by the search query
or a diversion of efforts towards different interventions or policies.
Their temporary nature, however, makes non-permanent causes more likely than fundamental changes to approaches or terms which could signal more biased results for this review.

The literature is predominantly based on white literature, with only a marginal amount solely published as grey literature.
Such a gap in volume seems expected with the database query efforts primarily aimed at finding the most current versions of white literature.
It also points to a well targeted identification procedure, with more up-to-date white literature correctly superseding potential previous grey publications.
@fig-citations-per-year-avg shows the average number of citations for all studies published within an individual year.

```{python}
#| label: fig-citations-per-year-avg
#| fig-cap: Average citations per year
bib_df["zot_cited"] = bib_df["zot_cited"].dropna().astype("int")
grpd = bib_df.groupby(["year"], as_index=False)["zot_cited"].mean()
fig, ax = plt.subplots()
ax.bar(grpd["year"], grpd["zot_cited"])
sns.regplot(x=grpd["year"], y=grpd["zot_cited"], ax=ax)
#ax = sns.lmplot(data=grpd, x="year", y="zot_cited", fit_reg=True)

years_list = np.arange(2000, 2024).tolist()
ax.set_xticks(years_list)
ax.tick_params(axis='x', rotation=90)
ax.set_ylabel("Citations")
ax.set_xlabel("Year")
plt.tight_layout()
plt.show()
```

From the literature sample, several patterns emerge:
First, in general, citation counts are slightly decreasing over time ---
a trend which should generally be expected as less time has passed to allow newer studies' contents to be distributed and fewer repeat citations to have occurred.
Second, larger changes between individual years appear more erratically.
Taken together, this suggests that, though no overall decrease in academic interest in the topic over time occurred,
it may point to the volume of relevant output not necessarily rising as steadily as overall output.

Early outliers also suggest clearly influential individual studies having been produced during those years,
a possibility which may be more relevant during years of more singular releases (such as 2011 and 2013).
This is because, as @fig-publications-per-year showed, the overall output was nowhere near as rich as in the following years, allowing single influential works to skew the visible means for those years.

## Validity ranking

Finally, following @Maitrot2017, the relevant studies are ranked for their validity.
Here, a 2-dimensional approach is taken to separate the external validity from the internal validity of the studies.
The ranking process then uses the representativeness of a study's underlying dataset,
from a non-representative survey sample, through a sub-nationally representative sample, a nationally representative and the use of census data,
to arrive at a ranking between 2.0 and 5.0 respectively.
Similarly, the studies are ranked for internal validity using the study design,
with only quasi-experimental and experimental studies receiving similar rankings between 2.0 and 5.0 depending on the individually applied methods due to their quantifiability,
while observational and qualitative studies go without an internal validity rank (0.0) due to the more contextual nature of their analyses.
For a full list of validity ranks, see @appbtbl-validity-external and @appbtbl-validity-internal.

Using the validity ranking separated into internal and external validity for each study,
it is possible to identify the general make-up of the overall sample,
the relationship between both dimensions and the distribution of studies within.

@fig-validity-relation shows the relation between each study's validity on the internal dimension and the external dimension,
with experimental studies additionally distinguished.
Generally, studies that have a lower internal validity, between 2.0 and 3.5, rank higher on their external validity,
while studies with higher internal validity in turn do not reach as high on the external validity ranking.

```{python}
#| label: fig-validity-relation
#| fig-cap: "Relation between internal and external validity"

from src.model import validity

validities = validity.calculate(by_intervention)
validities["identifier"] = validities["author"].str.replace(r',.*$', '', regex=True) + " (" + validities["year"].astype(str) + ")"
validities = validities.loc[(validities["design"] == "quasi-experimental") | (validities["design"] == "experimental")]
#validities["external_validity"] = validities["external_validity"].astype('category')
validities["internal_validity"] = validities["internal_validity"].astype('category')
validities["External Validity"] = validities["external_validity"]
validities["Internal Validity"] = validities["internal_validity"]

plt.figure().set_figheight(5)
sns.violinplot(
    data=validities,
    x="Internal Validity", y="External Validity", hue="design",
    cut=0, bw_method="scott",
    orient="x"
)
sns.swarmplot(
    data=validities,
    x="Internal Validity", y="External Validity", legend=False,
    color="darkmagenta",
    s=4
)
```

Studies with an internal validity ranking of of 3.0 (primarily made up of difference-in-difference approaches) and an internal ranking of 5.0 (randomized control trials) have the same tight clustering around an external validity between 4.0 (national) and 5.0 (census-based), and 2.0 (local) and 3.0 (subnational), respectively.
This clearly shows the expected overall relationship of studies with high internal validity generally ranking lower on their external validity.

The situation is less clear-cut with the internal rankings of 2.0 (primarily ordinary least squares) and 4.0 (primarily instrumental variable),
which show a larger external validity spread.
For 2.0-ranked studies, there is an overall larger spread with most using nationally representative data,
while a significant amount makes use of census-based data and others in turn only being subnationally representative.
Studies ranked 4.0 internally have a higher heterogeneity with the significant outlier of @Thoresen2021,
which had the limitation of its underlying data being non-representative.

Looking at the overall density of studies along their external validity dimension,
@fig-validity-distribution reiterates this overall relationship with internal validity.
It additionally shows that studies with low internal validity make up the dominant number of nationally representative analyses and the slight majority of census-based analyses,
while locally or non-representative samples are almost solely made up of internally highly valid (ranking 4.0 or above) analyses,
again with the exception of @Thoresen2021 already mentioned.

```{python}
#| label: fig-validity-distribution
#| fig-cap: "Distribution of internal validities"

sns.displot(
    data=validities,
    x="External Validity", hue="Internal Validity",
    kind="kde",
    multiple="fill", clip=(0, None),
    palette="ch:rot=-0.5,hue=1.5,light=0.9",
    bw_adjust=.65, cut=0,
    warn_singular = False
)
```

Looking at the data per region, census-based studies are primarily spread between Latin America and the Caribbean, as well as Europe and Central Asia.
Meanwhile, studies using nationally, subnationally or non-representative data then to have a larger focus on North America, as well as East Asia and the Pacific.
A slight trend towards studies focusing on evidence-based research in developing countries is visible,
though with an overall rising output as could be seen in @fig-publications-per-year, and the possibly a reliance on more recent datasets, this would be expected.

## Regional spread

As can be seen in @fig-region-counts, taken by region for the overall study sample,
the evidence base receives a relatively even split between the World Bank regional country groupings with the exception of the Middle East and North Africa (MENA) region,
in which fewer studies have been identified.

```{python}
#| label: fig-region-counts
#| fig-cap: Studies by regions analysed

by_region = (
    bib_df[["region"]]
    .assign(
        region = lambda _df: (_df["region"]
            .str.replace(r" ?; ?", ";", regex=True)
            .str.strip()
            .str.split(";")
        )
    )
    .explode("region")
    .reset_index(drop=True)
)
ax = sns.countplot(by_region, x="region", order=by_region["region"].value_counts().index)
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")
plt.show()

def regions_for_inequality(df, inequality:str):
    df_temp = df.loc[(df["inequality"] == inequality)]
    return sns.countplot(df_temp, x="region", order=df_temp["region"].value_counts().index)
```

Most studies come from a context of East Asia and the Pacific, though with an almost equal amount analysing Europe and Central Asia.
With slightly fewer studies, the contexts of North America, Sub-Saharan Africa follow for amount of anlalyses,
and in turn Latin America and the Caribbean and South Asia with an equal amount of studies for each region.

The lower amount of studies stemming from a MENA context can point to a variety of underlying causes:
First, it is possible that there is simply not as much evidence-based analysis undertaken for countries in the region as for other national or subnational contexts,
with research either following a more theoretical trajectory, or missing the underlying data collection that is available for other regional contexts.

However, it cannot be ruled out that the search protocol itself did not capture the same depth of analytical material as for other contexts,
with each region often having both a specific focus in policy-orientations and academically,
and in some cases also differing underlying term bases.
Such a contextual term differences may then not be captured adequately by the existing query terms and would point to a necessity to re-align it to the required specifics.

One reason for such a differentiation could be a larger amount of grey literature captured compared to other regions,
which may be utilising less established terms than the majority of captured literature for policy implementations.
Another reason could be the actual implementation of different policy programmes which are then equally not captured by existing term clusters.

# Discussion

# Conclusions

# Bibliography

::: {#refs}

:::

# Appendices {.appendix .unnumbered}

## Appendix A {.unnumbered}

::: {#appatbl-wow-terms}

```{python}
terms_wow = pd.read_csv("02-data/supplementary/terms_wow.csv")
md(tabulate(terms_wow.fillna(""), showindex=False, headers="keys", tablefmt="grid"))
```

World of work term cluster

:::

::: {#appatbl-intervention-terms}

```{python}
terms_policy = pd.read_csv("02-data/supplementary/terms_policy.csv")
# different headers to include 'social norms'
headers = ["General", "Institutional", "Structural", "Agency & social norms"]
md(tabulate(terms_policy.fillna(""), showindex=False, headers=headers, tablefmt="grid"))
```

Policy intervention term cluster

:::

::: {#appatbl-inequality-terms}

```{python}
terms_inequality = pd.read_csv("02-data/supplementary/terms_inequality.csv")
md(tabulate(terms_inequality.fillna(""), showindex=False, headers="keys", tablefmt="grid"))
```

Inequality term cluster

:::

## Appendix B - Validity rankings {#sec-appendix-validity-rankings .unnumbered}

::: {#appbtbl-validity-external}

| Representativeness                          | Ranking |
| ---                                         | ---     |
| non-representative survey/dataset           | 2.0     |
| subnationally representative survey/dataset | 3.0     |
| nationally representative survey/dataset    | 4.0     |
| census-based dataset                        | 5.0     |

External validity ranking. Adapted from @Maitrot2017.

:::

::: {#appbtbl-validity-internal}

| Method                                         | Ranking |
| ---                                            | ---     |
| ordinary least squares & fixed-effects         | 2.0     |
| discontinuity matching                         | 3.0     |
| difference in difference (& triple difference) | 3.0     |
| propensity score matching                      | 3.5     |
| instrumental variable                          | 4.0     |
| general method of moments                      | 4.0     |
| regression discontinuity                       | 4.5     |
| randomised control trial                       | 5.0     |

Internal validity ranking. Adapted from @Maitrot2017.

:::

## Appendix C - Boolean search query {.unnumbered}

```{python}
#| label: full-search-query
#| echo: false
#| output: asis
with open(f"{SUPPLEMENTARY_DATA}/query.txt") as f:
    query = f.read()

t3 = "`" * 3
print(f"""
```sql
{query}
{t3}
""")
```