From 2c5cf37b2c98084524303d4151728b574ed271ea Mon Sep 17 00:00:00 2001 From: Marty Oehme Date: Wed, 8 Oct 2025 20:30:57 +0200 Subject: [PATCH] Add appendix text --- popcorn.qmd | 78 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 62 insertions(+), 16 deletions(-) diff --git a/popcorn.qmd b/popcorn.qmd index bede54c..30aa8cf 100644 --- a/popcorn.qmd +++ b/popcorn.qmd @@ -523,6 +523,57 @@ provides a more macro-level view on how big the statistics have grown to be over We can see that, as each individually reported day adds up to 400KB nowadays, the cumulative size is up to almost 700MB currently. +### Packages monthwise and per weekday + +Let's also look at the packages installed on systems for different time slices. +We'll start with a look at the packages per weekday. + +```{python} +from notebooks.popcorn import plt_weekday_packages +pplot(plt_weekday_packages) +``` + +There is no significant difference between the individual weekdays, as we would +expect. It seems strange to have a specific day on which everybody decides to +install or uninstall new packages. + +That said, there is some slight variation, with Wednesdays generally having a +few fewer total packages to boast than other days, especially Tuesdays which +are slightly above the curve. + +Let's just imagine everybody gets bored on Tuesday, installs a new package and +drops it again by Wednesday, along with a slew of other packages. Try-out +Tuesdays and Wastebin Wednesdays if you will. + +Alright, but let's also take a look at the package numbers per month instead. + +```{python} +from notebooks.popcorn import plt_month_packages +pplot(plt_month_packages) +``` + +Here we can see a bit more variation. First it is important to note that I have +removed the first months of 2018 prior to October from the analysis cut off any +days after September 2025, to have only full years represented and avoid any +months being present more often than others.[^months-removed] + +[^months-removed]: I chose the first couple of months in the data, rather than + the most recent months as fewer people were collecting data, thus we have less + of a loss. Additionally, I presume people are more interested in current + statistics than older ones, just generally. + +It is quite surprising to me just how much variation is visible in the results: +months from October to February have markedly fewer packages than the spring +and summer months. Are people generally more willing to use and try out new +packages in the summer? Alternatively, were any of the major usage dips taking +place during winter, while the increases in usage occured more toward summer? + +I have not delved deep into the interpretation of these questions, but it may +be interesting to do so. The last option, of course, is that the data itself, +the data collection or analysis contains an error that I am not aware of. + +### Missing days and dates + There are some missing days in the statistics. ```{python} @@ -532,17 +583,19 @@ outp, defs = tab_missing_days.run() outp ``` -### Packages monthwise and per weekday +These missing days are primarily occuring at the end of January 2019, and +throughout 2025. However, with over 2600 days where the statistics _are_ +available, these rows represent an insignificant issue for the overall data. -```{python} -from notebooks.popcorn import plt_weekday_packages -pplot(plt_weekday_packages) -``` +It would seem there was some kind of issue collecting or storing the collected +data at that point in 2019, which means a few days in a row are missing. This +skews absolute numbers for that week downwards, as well as any weekly averages +relying on this date-range. -```{python} -from notebooks.popcorn import plt_month_packages -pplot(plt_month_packages) -``` +However, no significant visual differences stem from this fact, which is why it +is not called out in the main article. As it is --- an interesting fact, and, +where this a more rigorous investigation, perhaps worthy of taking into account +as biasing the result, but for our purposes not too bad. ## Outline @@ -570,10 +623,3 @@ pplot(plt_month_packages) - things we can't see (limitations) - packages on offer in the repositories - this could shed light on the bumps of users and relative package ownership - -Modified date != descriptive (named) date - -```{python} -from notebooks.popcorn import plt_modified_times -pplot(plt_modified_times) -```