chore(script): Update data description

This commit is contained in:
Marty Oehme 2024-01-06 10:05:59 +01:00
parent d5c5cfe6d3
commit 64daf369e5
Signed by: Marty
GPG key ID: EDBF2ED917B2EF6A

View file

@ -490,10 +490,9 @@ Of these, {nr_relevant} have been identified as potentially relevant studies for
<!-- {{++ FIXME: Update description for changing study pool ++}} -->
The currently identified literature rises almost continuously in volume,
with small decreases between 2001 and 2008, as well as more significant ones in 2012 and 2016,
The currently identified literature rises somewhat in volume over time,
with first larger outputs identified from 2014,
as can be seen in @fig-publications-per-year.
Keeping in mind that these results are not yet screened for their full relevance to the topic at hand, so far only being *potentially* relevant in falling into the requirements of the search pattern, an increased results output does not necessarily mean a clearly rising amount of relevant literature.
<!-- {{++ FIXME: give full year scale ++}} -->
@ -520,14 +519,15 @@ df_study_years = None
Anomalies such as the relatively significant dips in output in 2016 and 2012 become especially interesting against the strong later increase of output.
While this can mean a decreased interest or different focus points within academia during those time spans,
it may also point towards missing alternative term clusters that are newly arising, or a re-focus towards different interventions, and should thus be kept in mind for future scoping efforts.
it may also point towards alternative term clusters that are newly arising, or a re-focus towards different interventions,
and should thus be kept in mind for future scoping efforts.
Looking at the distribution between white and gray literature a strong difference with white literature clearly overtaking gray literature can be seen, a gap which should not be surprising since our database query efforts are primarily aimed at finding the most current versions of white literature.
The gap will perhaps shrink once the snowballing process is fully completed,
though it should remain clearly visible during the entire scoping process as a sign of a well targeted identification step.
The predominant amount of literature is based on white literature, with only a marginal amount solely published as gray literature.
This represents a gap which seems reasonable and not surprising since the database query efforts were primarily aimed at finding the most current versions of white literature.
Such a stark gap speaks to a well targeted identifaction procedure, with more up-to-date white literature correctly superseding potential previous publications.
@fig-citations-per-year-avg shows the average number of citations for all studies published within an individual year.
From the current un-screened literature sample, several patterns become visible:
From the literature sample, several patterns emerge:
First, in general, citation counts are slightly decreasing - as should generally be expected with newer publications as less time has passed allowing either their contents be dissected and distributed or any repeat citations having taken place.
```{python}
@ -535,25 +535,28 @@ First, in general, citation counts are slightly decreasing - as should generally
#| fig-cap: Average citations per year
bib_df["zot_cited"] = bib_df["zot_cited"].dropna().astype("int")
grpd = bib_df.groupby(["year"], as_index=False)["zot_cited"].mean()
ax = sns.barplot(grpd, x="year", y="zot_cited")
fig, ax = plt.subplots()
ax.bar(grpd["year"], grpd["zot_cited"])
sns.regplot(x=grpd["year"], y=grpd["zot_cited"], ax=ax)
#ax = sns.lmplot(data=grpd, x="year", y="zot_cited", fit_reg=True)
ax.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
```
Second, while such a decrease is visible in relatively recent years (especially 2019--2023), it is not a linear decrease throughout but rather a more erratically stable citation output.
This points to, first, no decrease in academic interest in the topic over this period of time,
second, no linearly developing concentration or centralization of knowledge output and dissemination,
and third potentially no clear-cut increase of *relevant* output over time either.
Second, while such a decrease is visible the changes between individual years are more erratic due to strong changes from year to year.
This suggests, first, no overall decrease in academic interest in the topic over this period of time,
and second, no linearly developing concentration or centralization of knowledge output and dissemination,
though it also throws into question a clear-cut increase of *relevant* output over time.
Positive outlier years in citation amount can point to clusters of relevant literature feeding wider dissemination or cross-disciplinary interest, a possible sign of still somewhat unfocused research production which does not approach from a single coherent perspective yet.
It can also point to a centralization of knowledge production, with studies feeding more intensely off each other during the review process, a possible sign of more focused knowledge production and thus valuable to more closely review during the screening process.
Or it may mean that clearly influential studies have been produced during those years, a possibility which may be more relevant during the early years (2000-2008).
This is because, as @fig-publications-per-year showed, the overall output was nowhere near rich as in the following years, allowing single influential works to skew the visible means for those years.
It may also suggest that clearly influential studies have been produced during those years, a possibility which may be more relevant during years of more singular releases (such as 2011 and 2013).
This is because, as @fig-publications-per-year showed, the overall output was nowhere near as rich as in the following years, allowing single influential works to skew the visible means for those years.
In all of these cases, such outliers should provide clear points of interest during the screening process for possible re-evaluation of current term clusters for scoping.
Should they point towards gaps (or over-optimization) of sepcific areas of interest during those time-frames or more generally, they may provide an impetus for tweaking the identification query terms to better align with the prevailing literature output.
In all of these cases, such outliers should provide clear points of interest during the screening process for eventual re-evaluation of utilized scoping term clusters and for future research focus.
Should they point towards gaps (or over-optimization) of specific areas of interest during those time-frames or more generally, they may provide an impetus for tweaking future identification queries to better align with the prevailing literature output.
<!-- {{++ TODO: Add breakdown by thematic area++}} -->
@ -588,6 +591,10 @@ plt.show()
by_intervention = None
```
@fig-intervention-types shows the most often analysed interventions for the literature reviewed.
Overall, there is a focus on measures of minimum wage and education interventions,
as well as collective action, subsidies, trade liberalization changes and training.
This points to a spread capturing both institutional, as well as structural and agency-driven programmes.
<!-- {{++ TODO: describe intervention types with complete dataset ++}} -->