diff --git a/scoping_review.qmd b/scoping_review.qmd index ec0918a..abd2cc1 100644 --- a/scoping_review.qmd +++ b/scoping_review.qmd @@ -28,14 +28,18 @@ zotero: ```{python} #| echo: false from pathlib import Path -data_dir=Path("./02-data") +DATA_DIR=Path("./02-data") +BIB_PATH = DATA_DIR.joinpath("raw/01_wos-sample_2023-11-02") ## standard imports from IPython.core.display import Markdown as md import numpy as np import pandas as pd +from matplotlib import pyplot as plt import seaborn as sns from tabulate import tabulate + +sns.set_style("whitegrid") ``` # Introduction @@ -280,7 +284,7 @@ Since scoping reviews allow both broad and in-depth analyses, they are the most import bibtexparser bib_string="" -for partial_bib in data_dir.joinpath("raw/wos").glob("*.bib"): +for partial_bib in BIB_PATH.glob("*.bib"): with open(partial_bib) as f: bib_string+="\n".join(f.readlines()) sample = bibtexparser.parse_string(bib_string) @@ -542,11 +546,81 @@ The results to be identified in the matrix include a study’s: i) key outcome m sample_size = len(sample.entries) md(f""" -The exploratory execution of queries results in an initial sample of {sample_size} studies after the identification process. -The majority of studies result from the ‘income’ inequality cluster of the Boolean search, with horizontal cluster terms used often but rarely on their own. +The exploratory execution of queries results in an initial sample of {sample_size} potential studies after the identification process. +This contains all identified studies without duplicate removal, controlling for literature that has been superseded or any other screening criteria. """) ``` +The currently identified literature rises almost continuously in volume, +with small decreases between 2001 and 2008, as well as more significant ones in 2012 and 2016, +as can be seen in @fig-publications-per-year. +Keeping in mind that these results are not yet screened for their full relevance to the topic at hand, so far only being *potentially* relevant in falling into the requirements of the search pattern, an increased results output does not necessarily mean a clearly rising amount of relevant literature. + +```{python} +#| label: fig-publications-per-year +#| fig-cap: Publications per year +reformatted = [] +for e in sample.entries: + reformatted.append([e["Year"], e["Author"], e["Title"], e["Type"], e["Times-Cited"], e["Usage-Count-Since-2013"]]) +bib_df = pd.DataFrame(reformatted, columns = ["Year", "Author", "Title", "Type", "Cited", "Usage"]) +bib_df["Date"] = pd.to_datetime(bib_df["Year"], format="%Y") +bib_df["Year"] = bib_df["Date"].dt.year + +# only keep newer entries +bib_df = bib_df[bib_df["Year"] >= 2000] + +# create dummy category for white or gray lit type (based on 'article' appearing in type) +bib_df["Type"].value_counts() +bib_df["Literature"] = np.where(bib_df["Type"].str.contains("article", case=False, regex=False), "white", "gray") +bib_df["Literature"] = bib_df["Literature"].astype("category") + +# plot by year, distinguished by literature type +ax = sns.countplot(bib_df, x="Year", hue="Literature") +ax.tick_params(axis='x', rotation=45) +# ax.set_xlabel("") +plt.tight_layout() +plt.show() +``` + +Anomalies such as the relatively significant dips in output in 2016 and 2012 become especially interesting against the strong later increase of output. +While this can mean a decreased interest or different focus points within academia during those time spans, +it may also point towards missing alternative term clusters that are newly arising, or a re-focus towards different interventions, and should thus be kept in mind for future scoping efforts. + +Looking at the distribution between white and gray literature a strong difference with white literature clearly overtaking gray literature can be seen, a gap which should not be surprising since our database query efforts are primarily aimed at finding the most current versions of white literature. +The gap will perhaps shrink once the snowballing process is fully completed, +though it should remain clearly visible during the entire scoping process as a sign of a well targeted identification step. + +@fig-citations-per-year-avg shows the average number of citations for all studies published within an individual year. +From the current un-screened literature sample, several patterns become visible: +First, in general, citation counts are slightly decreasing - as should generally be expected with newer publications as less time has passed allowing either their contents be dissected and distributed or any repeat citations having taken place. + +```{python} +#| label: fig-citations-per-year-avg +#| fig-cap: Average citations per year +bib_df["Cited"] = bib_df["Cited"].astype("int") +grpd = bib_df.groupby(["Year"], as_index=False)["Cited"].mean() +ax = sns.barplot(grpd, x="Year", y="Cited") +ax.tick_params(axis='x', rotation=45) +plt.tight_layout() +plt.show() +``` + +Second, while such a decrease is visible in relatively recent years (especially 2019--2023), it is not a linear decrease throughout but rather a more erratically stable citation output. +This points to, first, no decrease in academic interest in the topic over this period of time, +second, no linearly developing concentration or centralization of knowledge output and dissemination, +and third potentially no clear-cut increase of *relevant* output over time either. + +Lastly, several years such as 2001, 2002, 2005 and 2008 are clear outliers in their large amount of average citations which can point to one of several things: + +It can point to clusters of relevant literature feeding wider dissemination or cross-disciplinary interest, a possible sign of still somewhat unfocused research production which does not approach from a single coherent perspective yet. +It can also point to a centralization of knowledge production, with studies feeding more intensely off each other during the review process, a possible sign of more focused knowledge production and thus valuable to more closely review during the screening process. + +Or it may mean that clearly influential studies have been produced during those years, a possibility which may be more relevant during the early years (2000-2008). +This is because, as @fig-publications-per-year showed, the overall output was nowhere near rich as in the following years, allowing single influential works to skew the visible means for those years. + +In all of these cases, such outliers should provide clear points of interest during the screening process for possible re-evaluation of current term clusters for scoping. +Should they point towards gaps (or over-optimization) of sepcific areas of interest during those time-frames or more generally, they may provide an impetus for tweaking the identification query terms to better align with the prevailing literature output. + # Synthesis of Evidence This section will present a synthesis of evidence from the scoping review. @@ -644,7 +718,7 @@ TS= universal basic income OR provision of living wage OR maternity leave - ) + ) OR ( cash benefits OR