Add kernel text

This commit is contained in:
Marty Oehme 2025-10-08 16:03:27 +02:00
parent 9687eb662b
commit 9e3726402d
Signed by: Marty
GPG key ID: 4E535BC19C61886E

View file

@ -109,6 +109,7 @@ or there was some kind of issue on the backend so the stats for those days are
lost. lost.
<!-- TODO: is this still true? --> <!-- TODO: is this still true? -->
We take a look at the missing days We take a look at the missing days
among other things at the end of this article. among other things at the end of this article.
@ -161,12 +162,13 @@ something happened to data collection or everybody collectively decided to
leave their PC offline just for that day, but the numbers are back to normal leave their PC offline just for that day, but the numbers are back to normal
the day after.[^independence-day] the day after.[^independence-day]
[^independence-day]: I suppose one interpretation would be people taking their [^independence-day]:
4th of July celebrations very seriously, and thus not being present in the I suppose one interpretation would be people taking their
statistics for the day after. However, I am not sure if this would reflect so 4th of July celebrations very seriously, and thus not being present in the
strongly in data collection, and it additionally pre-supposes the data statistics for the day after. However, I am not sure if this would reflect so
collected predominantly stemming from the United States. Lastly, one would strongly in data collection, and it additionally pre-supposes the data
suppose this having a similar effect every year if that was the case. collected predominantly stemming from the United States. Lastly, one would
suppose this having a similar effect every year if that was the case.
This curve also goes some way to explaining the dip in overall package This curve also goes some way to explaining the dip in overall package
installations previously. When there are fewer people uploading their daily installations previously. When there are fewer people uploading their daily
@ -209,9 +211,14 @@ may signify new users checking out Void Linux and downloading a large variety
of packages in the process. of packages in the process.
<!-- TODO: still accurate? --> <!-- TODO: still accurate? -->
For a breakdown of the absolute numbers of packages on systems by weekday and For a breakdown of the absolute numbers of packages on systems by weekday and
month of the year instead of over time, see the Appendix below. month of the year instead of over time, see the Appendix below.
An interesting trend is visible toward the end of the timeline window, with a
rapid decline in package numbers per user. It is too early for to clearly see
if this is just variability or an actual trend in the data.
Beyond pure installation numbers, let's take a look at the actual top-installed Beyond pure installation numbers, let's take a look at the actual top-installed
packages on users' systems. packages on users' systems.
@ -226,15 +233,16 @@ The top packages are unsurprisingly
the `base-system` and `xtools` packages, followed by `wget`, `htop` and the `base-system` and `xtools` packages, followed by `wget`, `htop` and
`rsync`.[^popcorn-removal] `rsync`.[^popcorn-removal]
[^popcorn-removal]: I have removed the PopCorn package itself from the data. [^popcorn-removal]:
Funnily enough, since _everybody_ who is represented in the data has to have I have removed the PopCorn package itself from the data.
PopCorn installed or the data wouldn't be collected in the first place, if we Funnily enough, since _everybody_ who is represented in the data has to have
extrapolate from the collected data naively this means more people have PopCorn PopCorn installed or the data wouldn't be collected in the first place, if we
installed than the base-system. Of course, viewed over the majority of Void extrapolate from the collected data naively this means more people have PopCorn
Linux installations this is hogwash. We have the absolute numbers and only installed than the base-system. Of course, viewed over the majority of Void
around 150 people ever have PopCorn installed. But it nicely represents some of Linux installations this is hogwash. We have the absolute numbers and only
the danger of over-interpreting the results before us without also reflecting around 150 people ever have PopCorn installed. But it nicely represents some of
on sample bias. the danger of over-interpreting the results before us without also reflecting
on sample bias.
In my opinion the list of top packages reflect the technical audience of Void In my opinion the list of top packages reflect the technical audience of Void
Linux and does not hold too many surprises. Almost everyone uses `socklog` and Linux and does not hold too many surprises. Almost everyone uses `socklog` and
@ -278,13 +286,14 @@ On the Y-axis we see the amount of packages while on the X-axis we see the amoun
What this means is that we see _how often_ packages tend to be installed, What this means is that we see _how often_ packages tend to be installed,
and where the majority of packages is grouped.[^density-approximation] and where the majority of packages is grouped.[^density-approximation]
[^density-approximation]: In the package density count above, since we are [^density-approximation]:
accumulating over the absolute numbers of all installations of all users, the In the package density count above, since we are
overall high numbers are really _high_, i.e. above 150,000. Since we are accumulating over the absolute numbers of all installations of all users, the
sorting the package counts into a finite number of bins to make visualizing it overall high numbers are really _high_, i.e. above 150,000. Since we are
possible, the lowest bin overshoots the 0-mark and we get an estimation of sorting the package counts into a finite number of bins to make visualizing it
minus-installation counts. Of course, this is not possible, no package in the possible, the lowest bin overshoots the 0-mark and we get an estimation of
data has been installed negative amount of times --- to my knowledge! minus-installation counts. Of course, this is not possible, no package in the
data has been installed negative amount of times --- to my knowledge!
_Many_ packages are installed 0 to 10 times. _Many_ packages are installed 0 to 10 times.
Some packages are installed above 10 times, Some packages are installed above 10 times,
@ -318,21 +327,72 @@ packages between eleven and 20 installations, and
`python f"{get_num(twenty_thirty):,}"` packages between 21 and 30 installations. `python f"{get_num(twenty_thirty):,}"` packages between 21 and 30 installations.
`python f"{get_num(thirty_plus):,}"` packages have over 30 installations. `python f"{get_num(thirty_plus):,}"` packages have over 30 installations.
For now, these are the explorations I have done for the package data collected.
I think it is interesting to see, especially the evolution of package installations over time,
and per user,
as well as getting a glimpse of the most used packages in the sample.
But there are yet more things to explore in the statistics overall.
## Kernel Analysis ## Kernel Analysis
Beyond package numbers, the data also encapsulate information about the Linux
kernels used by Void Linux users.
The files report the exact kernel version users are running, including the major version,
minor versions, and any suffixes as well.
For example, there are many reports containing the `4.19.0-9-amd64` kernel, or
some containing the `6.1.53-1-lts` kernel, or `6.11.2-asahi-6.11.2-1_4`. These
are 'extraordinary' kernels in my opinion, and they do not follow clear naming
patterns. For the purposes of the following visualizations any such suffixes
have been cut off, looking only at the versioning of the main kernels
themselves.
Let's start by looking at the prevalence of the different major versions.
```{python} ```{python}
from notebooks.popcorn import plt_kernel_versions from notebooks.popcorn import plt_kernel_versions
pplot(plt_kernel_versions) pplot(plt_kernel_versions)
``` ```
When looking at the kernel versions used, we see a very strong jump between major kernel version This is an accumulation of the three major versions used during the collected timeline,
4 and major kernel version 5. over the _whole_ time as absolute numbers.
For this analysis we had to exclude {kernel_df_v99.select(pl.len()).item()} rows which were When looking at the kernel versions used, we see a very strong jump between major kernel version
apparently from the future, as they were running variations of major kernel version 99. In all 4 and major kernel version 5, with version 4 being significantly less prevalent in the data.
likelihood there is a custom kernel version out there which reports its own major version as 99.
The strange version starts appearing on {kernel_df_v99.select("date").row(0)0} and shows up Of course, this makes sense from a release standpoint: kernel version 5.0 was
all the way until {kernel_df_v99.select("date").row(-1)[0]}. released in March 2019, just a single year after the start of data collection.[^kernel-releases]
Additionally, as we established above, this was also the time of the fewest
unique data reports, so the absolute amount of kernel 4 reports is even
smaller.
[^kernel-releases]:
Data collection began in May 2018.
All information on the kernel release timelines is taken
from the nicely comprehensive _Linux Kernel Version History_ Wikipedia page:
<https://en.wikipedia.org/wiki/Linux_kernel_version_history>.
Kernel version 5 still provides the dominant amount of reported kernel versions,
but just barely. This makes sense since major version 6.0 was released in October 2022.
It has thus been just over three years of version 5 being the latest kernel,
and almost exactly three years of version 6 being the latest kernel.
Again, we have to keep the curve of unique installations in mind for absolute numbers like these:
Kernel 5 was released right as the massive increase in unique Void Linux installation reports happened,
and kernel 6 right after the report slump happened.
This, in all likelihood, accounts for the slight imbalance between the numbers,
and will shift over the coming months.
Just like with kernel suffixes, for this analysis we also had to exclude
{kernel_df_v99.select(pl.len()).item()} rows which were apparently from the
future --- as they were running variations of major kernel version 99. In all
likelihood there is a custom compiled kernel version out there which reports its own
major version as 99. The strange version starts appearing on
{kernel_df_v99.select("date").row(0)0} and shows up all the way until
{kernel_df_v99.select("date").row(-1)[0]}.
Let's turn to the actual adoption of kernels over time in the next visualization.
```{python} ```{python}
from notebooks.popcorn import plt_kernel_timeline from notebooks.popcorn import plt_kernel_timeline
@ -356,16 +416,31 @@ last_kernel5: date = weekly_kernel_df.filter(pl.col("major_ver") == "5")[-1][
].item() ].item()
``` ```
A timeline analysis of the kernels used to report daily downloads shows that people generally A timeline analysis of the prevalent kernels in the data shows that new major
adopt new major kernel versons at roughly the same time. This change is especially stark between kernel version are adopted relatively rapidly and with the majority of switches
major kernel versions 5 and 6, which seem to have traded place in usage almost over night. occuring at roughly the same time.
The first time that major version 5 of the kernel shows up is on {first_kernel5}. From here, it This change is especially stark between major kernel versions 5 and 6, which
took a long time for the last of the version 4 kernels to disappear, coinciding with the big seem to have traded place in usage almost over night. A reasonable speculation
switch between major version 5 and 6. The last time a major version 4 is seen is on for this rapid switch is that the `linux` kernel meta-package was pointed at
{last_kernel4}, while the last major version 5 kernels still pop up. the new version at that time, so each update pulled the new kernel.
It would seem, then, that the people still running kernel version 4 used the opportunity of
everybody switching to the stable version of 6 to also upgrade their machines. The first time that major version 5 of the kernel shows up is on
{first_kernel5}. From here, it took a long time for the last of the version 4
kernels to disappear. Interestingly, this roughly coincides with the big switch
between major version 5 and 6. The last time a major version 4 is seen is on
{last_kernel4}, while the last major version 5 kernels still pop up. It would
seem, then, that the people still running kernel version 4 used the opportunity
of everybody switching to the stable version of 6 to also upgrade their
machines.
If we cautiously extrapolate a little from the data we have, it would seem
reasonable that the last remnants of kernel version 5 may be disappearing
around May or June 2026. A lot of course depends on the upstream kernel release
windows and the stability of the releases themselves. But barring any major
upheavals in the kernel releases (of a magnitude like the removal of
[bcachefs](https://en.wikipedia.org/wiki/Bcachefs)) or major stability issues,
this seems a reasonable assumption to me.
## Appendix: Odds and Ends ## Appendix: Odds and Ends