From 9e3726402de3d3d9b5a8d3db3606019c5f3a1ae9 Mon Sep 17 00:00:00 2001 From: Marty Oehme Date: Wed, 8 Oct 2025 16:03:27 +0200 Subject: [PATCH] Add kernel text --- popcorn.qmd | 151 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 113 insertions(+), 38 deletions(-) diff --git a/popcorn.qmd b/popcorn.qmd index 67dd709..bede54c 100644 --- a/popcorn.qmd +++ b/popcorn.qmd @@ -109,6 +109,7 @@ or there was some kind of issue on the backend so the stats for those days are lost. + We take a look at the missing days among other things at the end of this article. @@ -161,12 +162,13 @@ something happened to data collection or everybody collectively decided to leave their PC offline just for that day, but the numbers are back to normal the day after.[^independence-day] -[^independence-day]: I suppose one interpretation would be people taking their - 4th of July celebrations very seriously, and thus not being present in the - statistics for the day after. However, I am not sure if this would reflect so - strongly in data collection, and it additionally pre-supposes the data - collected predominantly stemming from the United States. Lastly, one would - suppose this having a similar effect every year if that was the case. +[^independence-day]: + I suppose one interpretation would be people taking their + 4th of July celebrations very seriously, and thus not being present in the + statistics for the day after. However, I am not sure if this would reflect so + strongly in data collection, and it additionally pre-supposes the data + collected predominantly stemming from the United States. Lastly, one would + suppose this having a similar effect every year if that was the case. This curve also goes some way to explaining the dip in overall package installations previously. When there are fewer people uploading their daily @@ -209,9 +211,14 @@ may signify new users checking out Void Linux and downloading a large variety of packages in the process. + For a breakdown of the absolute numbers of packages on systems by weekday and month of the year instead of over time, see the Appendix below. +An interesting trend is visible toward the end of the timeline window, with a +rapid decline in package numbers per user. It is too early for to clearly see +if this is just variability or an actual trend in the data. + Beyond pure installation numbers, let's take a look at the actual top-installed packages on users' systems. @@ -226,15 +233,16 @@ The top packages are unsurprisingly the `base-system` and `xtools` packages, followed by `wget`, `htop` and `rsync`.[^popcorn-removal] -[^popcorn-removal]: I have removed the PopCorn package itself from the data. - Funnily enough, since _everybody_ who is represented in the data has to have - PopCorn installed or the data wouldn't be collected in the first place, if we - extrapolate from the collected data naively this means more people have PopCorn - installed than the base-system. Of course, viewed over the majority of Void - Linux installations this is hogwash. We have the absolute numbers and only - around 150 people ever have PopCorn installed. But it nicely represents some of - the danger of over-interpreting the results before us without also reflecting - on sample bias. +[^popcorn-removal]: + I have removed the PopCorn package itself from the data. + Funnily enough, since _everybody_ who is represented in the data has to have + PopCorn installed or the data wouldn't be collected in the first place, if we + extrapolate from the collected data naively this means more people have PopCorn + installed than the base-system. Of course, viewed over the majority of Void + Linux installations this is hogwash. We have the absolute numbers and only + around 150 people ever have PopCorn installed. But it nicely represents some of + the danger of over-interpreting the results before us without also reflecting + on sample bias. In my opinion the list of top packages reflect the technical audience of Void Linux and does not hold too many surprises. Almost everyone uses `socklog` and @@ -278,13 +286,14 @@ On the Y-axis we see the amount of packages while on the X-axis we see the amoun What this means is that we see _how often_ packages tend to be installed, and where the majority of packages is grouped.[^density-approximation] -[^density-approximation]: In the package density count above, since we are - accumulating over the absolute numbers of all installations of all users, the - overall high numbers are really _high_, i.e. above 150,000. Since we are - sorting the package counts into a finite number of bins to make visualizing it - possible, the lowest bin overshoots the 0-mark and we get an estimation of - minus-installation counts. Of course, this is not possible, no package in the - data has been installed negative amount of times --- to my knowledge! +[^density-approximation]: + In the package density count above, since we are + accumulating over the absolute numbers of all installations of all users, the + overall high numbers are really _high_, i.e. above 150,000. Since we are + sorting the package counts into a finite number of bins to make visualizing it + possible, the lowest bin overshoots the 0-mark and we get an estimation of + minus-installation counts. Of course, this is not possible, no package in the + data has been installed negative amount of times --- to my knowledge! _Many_ packages are installed 0 to 10 times. Some packages are installed above 10 times, @@ -318,21 +327,72 @@ packages between eleven and 20 installations, and `python f"{get_num(twenty_thirty):,}"` packages between 21 and 30 installations. `python f"{get_num(thirty_plus):,}"` packages have over 30 installations. +For now, these are the explorations I have done for the package data collected. +I think it is interesting to see, especially the evolution of package installations over time, +and per user, +as well as getting a glimpse of the most used packages in the sample. + +But there are yet more things to explore in the statistics overall. + ## Kernel Analysis +Beyond package numbers, the data also encapsulate information about the Linux +kernels used by Void Linux users. +The files report the exact kernel version users are running, including the major version, +minor versions, and any suffixes as well. + +For example, there are many reports containing the `4.19.0-9-amd64` kernel, or +some containing the `6.1.53-1-lts` kernel, or `6.11.2-asahi-6.11.2-1_4`. These +are 'extraordinary' kernels in my opinion, and they do not follow clear naming +patterns. For the purposes of the following visualizations any such suffixes +have been cut off, looking only at the versioning of the main kernels +themselves. + +Let's start by looking at the prevalence of the different major versions. + ```{python} from notebooks.popcorn import plt_kernel_versions pplot(plt_kernel_versions) ``` -When looking at the kernel versions used, we see a very strong jump between major kernel version -4 and major kernel version 5. +This is an accumulation of the three major versions used during the collected timeline, +over the _whole_ time as absolute numbers. -For this analysis we had to exclude {kernel_df_v99.select(pl.len()).item()} rows which were -apparently from the future, as they were running variations of major kernel version 99. In all -likelihood there is a custom kernel version out there which reports its own major version as 99. -The strange version starts appearing on {kernel_df_v99.select("date").row(0)0} and shows up -all the way until {kernel_df_v99.select("date").row(-1)[0]}. +When looking at the kernel versions used, we see a very strong jump between major kernel version +4 and major kernel version 5, with version 4 being significantly less prevalent in the data. + +Of course, this makes sense from a release standpoint: kernel version 5.0 was +released in March 2019, just a single year after the start of data collection.[^kernel-releases] +Additionally, as we established above, this was also the time of the fewest +unique data reports, so the absolute amount of kernel 4 reports is even +smaller. + +[^kernel-releases]: + Data collection began in May 2018. + All information on the kernel release timelines is taken + from the nicely comprehensive _Linux Kernel Version History_ Wikipedia page: + . + +Kernel version 5 still provides the dominant amount of reported kernel versions, +but just barely. This makes sense since major version 6.0 was released in October 2022. +It has thus been just over three years of version 5 being the latest kernel, +and almost exactly three years of version 6 being the latest kernel. + +Again, we have to keep the curve of unique installations in mind for absolute numbers like these: +Kernel 5 was released right as the massive increase in unique Void Linux installation reports happened, +and kernel 6 right after the report slump happened. +This, in all likelihood, accounts for the slight imbalance between the numbers, +and will shift over the coming months. + +Just like with kernel suffixes, for this analysis we also had to exclude +{kernel_df_v99.select(pl.len()).item()} rows which were apparently from the +future --- as they were running variations of major kernel version 99. In all +likelihood there is a custom compiled kernel version out there which reports its own +major version as 99. The strange version starts appearing on +{kernel_df_v99.select("date").row(0)0} and shows up all the way until +{kernel_df_v99.select("date").row(-1)[0]}. + +Let's turn to the actual adoption of kernels over time in the next visualization. ```{python} from notebooks.popcorn import plt_kernel_timeline @@ -356,16 +416,31 @@ last_kernel5: date = weekly_kernel_df.filter(pl.col("major_ver") == "5")[-1][ ].item() ``` -A timeline analysis of the kernels used to report daily downloads shows that people generally -adopt new major kernel versons at roughly the same time. This change is especially stark between -major kernel versions 5 and 6, which seem to have traded place in usage almost over night. +A timeline analysis of the prevalent kernels in the data shows that new major +kernel version are adopted relatively rapidly and with the majority of switches +occuring at roughly the same time. -The first time that major version 5 of the kernel shows up is on {first_kernel5}. From here, it -took a long time for the last of the version 4 kernels to disappear, coinciding with the big -switch between major version 5 and 6. The last time a major version 4 is seen is on -{last_kernel4}, while the last major version 5 kernels still pop up. -It would seem, then, that the people still running kernel version 4 used the opportunity of -everybody switching to the stable version of 6 to also upgrade their machines. +This change is especially stark between major kernel versions 5 and 6, which +seem to have traded place in usage almost over night. A reasonable speculation +for this rapid switch is that the `linux` kernel meta-package was pointed at +the new version at that time, so each update pulled the new kernel. + +The first time that major version 5 of the kernel shows up is on +{first_kernel5}. From here, it took a long time for the last of the version 4 +kernels to disappear. Interestingly, this roughly coincides with the big switch +between major version 5 and 6. The last time a major version 4 is seen is on +{last_kernel4}, while the last major version 5 kernels still pop up. It would +seem, then, that the people still running kernel version 4 used the opportunity +of everybody switching to the stable version of 6 to also upgrade their +machines. + +If we cautiously extrapolate a little from the data we have, it would seem +reasonable that the last remnants of kernel version 5 may be disappearing +around May or June 2026. A lot of course depends on the upstream kernel release +windows and the stability of the releases themselves. But barring any major +upheavals in the kernel releases (of a magnitude like the removal of +[bcachefs](https://en.wikipedia.org/wiki/Bcachefs)) or major stability issues, +this seems a reasonable assumption to me. ## Appendix: Odds and Ends