Add kernel text
This commit is contained in:
parent
9687eb662b
commit
9e3726402d
1 changed files with 113 additions and 38 deletions
151
popcorn.qmd
151
popcorn.qmd
|
|
@ -109,6 +109,7 @@ or there was some kind of issue on the backend so the stats for those days are
|
||||||
lost.
|
lost.
|
||||||
|
|
||||||
<!-- TODO: is this still true? -->
|
<!-- TODO: is this still true? -->
|
||||||
|
|
||||||
We take a look at the missing days
|
We take a look at the missing days
|
||||||
among other things at the end of this article.
|
among other things at the end of this article.
|
||||||
|
|
||||||
|
|
@ -161,12 +162,13 @@ something happened to data collection or everybody collectively decided to
|
||||||
leave their PC offline just for that day, but the numbers are back to normal
|
leave their PC offline just for that day, but the numbers are back to normal
|
||||||
the day after.[^independence-day]
|
the day after.[^independence-day]
|
||||||
|
|
||||||
[^independence-day]: I suppose one interpretation would be people taking their
|
[^independence-day]:
|
||||||
4th of July celebrations very seriously, and thus not being present in the
|
I suppose one interpretation would be people taking their
|
||||||
statistics for the day after. However, I am not sure if this would reflect so
|
4th of July celebrations very seriously, and thus not being present in the
|
||||||
strongly in data collection, and it additionally pre-supposes the data
|
statistics for the day after. However, I am not sure if this would reflect so
|
||||||
collected predominantly stemming from the United States. Lastly, one would
|
strongly in data collection, and it additionally pre-supposes the data
|
||||||
suppose this having a similar effect every year if that was the case.
|
collected predominantly stemming from the United States. Lastly, one would
|
||||||
|
suppose this having a similar effect every year if that was the case.
|
||||||
|
|
||||||
This curve also goes some way to explaining the dip in overall package
|
This curve also goes some way to explaining the dip in overall package
|
||||||
installations previously. When there are fewer people uploading their daily
|
installations previously. When there are fewer people uploading their daily
|
||||||
|
|
@ -209,9 +211,14 @@ may signify new users checking out Void Linux and downloading a large variety
|
||||||
of packages in the process.
|
of packages in the process.
|
||||||
|
|
||||||
<!-- TODO: still accurate? -->
|
<!-- TODO: still accurate? -->
|
||||||
|
|
||||||
For a breakdown of the absolute numbers of packages on systems by weekday and
|
For a breakdown of the absolute numbers of packages on systems by weekday and
|
||||||
month of the year instead of over time, see the Appendix below.
|
month of the year instead of over time, see the Appendix below.
|
||||||
|
|
||||||
|
An interesting trend is visible toward the end of the timeline window, with a
|
||||||
|
rapid decline in package numbers per user. It is too early for to clearly see
|
||||||
|
if this is just variability or an actual trend in the data.
|
||||||
|
|
||||||
Beyond pure installation numbers, let's take a look at the actual top-installed
|
Beyond pure installation numbers, let's take a look at the actual top-installed
|
||||||
packages on users' systems.
|
packages on users' systems.
|
||||||
|
|
||||||
|
|
@ -226,15 +233,16 @@ The top packages are unsurprisingly
|
||||||
the `base-system` and `xtools` packages, followed by `wget`, `htop` and
|
the `base-system` and `xtools` packages, followed by `wget`, `htop` and
|
||||||
`rsync`.[^popcorn-removal]
|
`rsync`.[^popcorn-removal]
|
||||||
|
|
||||||
[^popcorn-removal]: I have removed the PopCorn package itself from the data.
|
[^popcorn-removal]:
|
||||||
Funnily enough, since _everybody_ who is represented in the data has to have
|
I have removed the PopCorn package itself from the data.
|
||||||
PopCorn installed or the data wouldn't be collected in the first place, if we
|
Funnily enough, since _everybody_ who is represented in the data has to have
|
||||||
extrapolate from the collected data naively this means more people have PopCorn
|
PopCorn installed or the data wouldn't be collected in the first place, if we
|
||||||
installed than the base-system. Of course, viewed over the majority of Void
|
extrapolate from the collected data naively this means more people have PopCorn
|
||||||
Linux installations this is hogwash. We have the absolute numbers and only
|
installed than the base-system. Of course, viewed over the majority of Void
|
||||||
around 150 people ever have PopCorn installed. But it nicely represents some of
|
Linux installations this is hogwash. We have the absolute numbers and only
|
||||||
the danger of over-interpreting the results before us without also reflecting
|
around 150 people ever have PopCorn installed. But it nicely represents some of
|
||||||
on sample bias.
|
the danger of over-interpreting the results before us without also reflecting
|
||||||
|
on sample bias.
|
||||||
|
|
||||||
In my opinion the list of top packages reflect the technical audience of Void
|
In my opinion the list of top packages reflect the technical audience of Void
|
||||||
Linux and does not hold too many surprises. Almost everyone uses `socklog` and
|
Linux and does not hold too many surprises. Almost everyone uses `socklog` and
|
||||||
|
|
@ -278,13 +286,14 @@ On the Y-axis we see the amount of packages while on the X-axis we see the amoun
|
||||||
What this means is that we see _how often_ packages tend to be installed,
|
What this means is that we see _how often_ packages tend to be installed,
|
||||||
and where the majority of packages is grouped.[^density-approximation]
|
and where the majority of packages is grouped.[^density-approximation]
|
||||||
|
|
||||||
[^density-approximation]: In the package density count above, since we are
|
[^density-approximation]:
|
||||||
accumulating over the absolute numbers of all installations of all users, the
|
In the package density count above, since we are
|
||||||
overall high numbers are really _high_, i.e. above 150,000. Since we are
|
accumulating over the absolute numbers of all installations of all users, the
|
||||||
sorting the package counts into a finite number of bins to make visualizing it
|
overall high numbers are really _high_, i.e. above 150,000. Since we are
|
||||||
possible, the lowest bin overshoots the 0-mark and we get an estimation of
|
sorting the package counts into a finite number of bins to make visualizing it
|
||||||
minus-installation counts. Of course, this is not possible, no package in the
|
possible, the lowest bin overshoots the 0-mark and we get an estimation of
|
||||||
data has been installed negative amount of times --- to my knowledge!
|
minus-installation counts. Of course, this is not possible, no package in the
|
||||||
|
data has been installed negative amount of times --- to my knowledge!
|
||||||
|
|
||||||
_Many_ packages are installed 0 to 10 times.
|
_Many_ packages are installed 0 to 10 times.
|
||||||
Some packages are installed above 10 times,
|
Some packages are installed above 10 times,
|
||||||
|
|
@ -318,21 +327,72 @@ packages between eleven and 20 installations, and
|
||||||
`python f"{get_num(twenty_thirty):,}"` packages between 21 and 30 installations.
|
`python f"{get_num(twenty_thirty):,}"` packages between 21 and 30 installations.
|
||||||
`python f"{get_num(thirty_plus):,}"` packages have over 30 installations.
|
`python f"{get_num(thirty_plus):,}"` packages have over 30 installations.
|
||||||
|
|
||||||
|
For now, these are the explorations I have done for the package data collected.
|
||||||
|
I think it is interesting to see, especially the evolution of package installations over time,
|
||||||
|
and per user,
|
||||||
|
as well as getting a glimpse of the most used packages in the sample.
|
||||||
|
|
||||||
|
But there are yet more things to explore in the statistics overall.
|
||||||
|
|
||||||
## Kernel Analysis
|
## Kernel Analysis
|
||||||
|
|
||||||
|
Beyond package numbers, the data also encapsulate information about the Linux
|
||||||
|
kernels used by Void Linux users.
|
||||||
|
The files report the exact kernel version users are running, including the major version,
|
||||||
|
minor versions, and any suffixes as well.
|
||||||
|
|
||||||
|
For example, there are many reports containing the `4.19.0-9-amd64` kernel, or
|
||||||
|
some containing the `6.1.53-1-lts` kernel, or `6.11.2-asahi-6.11.2-1_4`. These
|
||||||
|
are 'extraordinary' kernels in my opinion, and they do not follow clear naming
|
||||||
|
patterns. For the purposes of the following visualizations any such suffixes
|
||||||
|
have been cut off, looking only at the versioning of the main kernels
|
||||||
|
themselves.
|
||||||
|
|
||||||
|
Let's start by looking at the prevalence of the different major versions.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
from notebooks.popcorn import plt_kernel_versions
|
from notebooks.popcorn import plt_kernel_versions
|
||||||
pplot(plt_kernel_versions)
|
pplot(plt_kernel_versions)
|
||||||
```
|
```
|
||||||
|
|
||||||
When looking at the kernel versions used, we see a very strong jump between major kernel version
|
This is an accumulation of the three major versions used during the collected timeline,
|
||||||
4 and major kernel version 5.
|
over the _whole_ time as absolute numbers.
|
||||||
|
|
||||||
For this analysis we had to exclude {kernel_df_v99.select(pl.len()).item()} rows which were
|
When looking at the kernel versions used, we see a very strong jump between major kernel version
|
||||||
apparently from the future, as they were running variations of major kernel version 99. In all
|
4 and major kernel version 5, with version 4 being significantly less prevalent in the data.
|
||||||
likelihood there is a custom kernel version out there which reports its own major version as 99.
|
|
||||||
The strange version starts appearing on {kernel_df_v99.select("date").row(0)0} and shows up
|
Of course, this makes sense from a release standpoint: kernel version 5.0 was
|
||||||
all the way until {kernel_df_v99.select("date").row(-1)[0]}.
|
released in March 2019, just a single year after the start of data collection.[^kernel-releases]
|
||||||
|
Additionally, as we established above, this was also the time of the fewest
|
||||||
|
unique data reports, so the absolute amount of kernel 4 reports is even
|
||||||
|
smaller.
|
||||||
|
|
||||||
|
[^kernel-releases]:
|
||||||
|
Data collection began in May 2018.
|
||||||
|
All information on the kernel release timelines is taken
|
||||||
|
from the nicely comprehensive _Linux Kernel Version History_ Wikipedia page:
|
||||||
|
<https://en.wikipedia.org/wiki/Linux_kernel_version_history>.
|
||||||
|
|
||||||
|
Kernel version 5 still provides the dominant amount of reported kernel versions,
|
||||||
|
but just barely. This makes sense since major version 6.0 was released in October 2022.
|
||||||
|
It has thus been just over three years of version 5 being the latest kernel,
|
||||||
|
and almost exactly three years of version 6 being the latest kernel.
|
||||||
|
|
||||||
|
Again, we have to keep the curve of unique installations in mind for absolute numbers like these:
|
||||||
|
Kernel 5 was released right as the massive increase in unique Void Linux installation reports happened,
|
||||||
|
and kernel 6 right after the report slump happened.
|
||||||
|
This, in all likelihood, accounts for the slight imbalance between the numbers,
|
||||||
|
and will shift over the coming months.
|
||||||
|
|
||||||
|
Just like with kernel suffixes, for this analysis we also had to exclude
|
||||||
|
{kernel_df_v99.select(pl.len()).item()} rows which were apparently from the
|
||||||
|
future --- as they were running variations of major kernel version 99. In all
|
||||||
|
likelihood there is a custom compiled kernel version out there which reports its own
|
||||||
|
major version as 99. The strange version starts appearing on
|
||||||
|
{kernel_df_v99.select("date").row(0)0} and shows up all the way until
|
||||||
|
{kernel_df_v99.select("date").row(-1)[0]}.
|
||||||
|
|
||||||
|
Let's turn to the actual adoption of kernels over time in the next visualization.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
from notebooks.popcorn import plt_kernel_timeline
|
from notebooks.popcorn import plt_kernel_timeline
|
||||||
|
|
@ -356,16 +416,31 @@ last_kernel5: date = weekly_kernel_df.filter(pl.col("major_ver") == "5")[-1][
|
||||||
].item()
|
].item()
|
||||||
```
|
```
|
||||||
|
|
||||||
A timeline analysis of the kernels used to report daily downloads shows that people generally
|
A timeline analysis of the prevalent kernels in the data shows that new major
|
||||||
adopt new major kernel versons at roughly the same time. This change is especially stark between
|
kernel version are adopted relatively rapidly and with the majority of switches
|
||||||
major kernel versions 5 and 6, which seem to have traded place in usage almost over night.
|
occuring at roughly the same time.
|
||||||
|
|
||||||
The first time that major version 5 of the kernel shows up is on {first_kernel5}. From here, it
|
This change is especially stark between major kernel versions 5 and 6, which
|
||||||
took a long time for the last of the version 4 kernels to disappear, coinciding with the big
|
seem to have traded place in usage almost over night. A reasonable speculation
|
||||||
switch between major version 5 and 6. The last time a major version 4 is seen is on
|
for this rapid switch is that the `linux` kernel meta-package was pointed at
|
||||||
{last_kernel4}, while the last major version 5 kernels still pop up.
|
the new version at that time, so each update pulled the new kernel.
|
||||||
It would seem, then, that the people still running kernel version 4 used the opportunity of
|
|
||||||
everybody switching to the stable version of 6 to also upgrade their machines.
|
The first time that major version 5 of the kernel shows up is on
|
||||||
|
{first_kernel5}. From here, it took a long time for the last of the version 4
|
||||||
|
kernels to disappear. Interestingly, this roughly coincides with the big switch
|
||||||
|
between major version 5 and 6. The last time a major version 4 is seen is on
|
||||||
|
{last_kernel4}, while the last major version 5 kernels still pop up. It would
|
||||||
|
seem, then, that the people still running kernel version 4 used the opportunity
|
||||||
|
of everybody switching to the stable version of 6 to also upgrade their
|
||||||
|
machines.
|
||||||
|
|
||||||
|
If we cautiously extrapolate a little from the data we have, it would seem
|
||||||
|
reasonable that the last remnants of kernel version 5 may be disappearing
|
||||||
|
around May or June 2026. A lot of course depends on the upstream kernel release
|
||||||
|
windows and the stability of the releases themselves. But barring any major
|
||||||
|
upheavals in the kernel releases (of a magnitude like the removal of
|
||||||
|
[bcachefs](https://en.wikipedia.org/wiki/Bcachefs)) or major stability issues,
|
||||||
|
this seems a reasonable assumption to me.
|
||||||
|
|
||||||
## Appendix: Odds and Ends
|
## Appendix: Odds and Ends
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue