Add kernel longevity section

This commit is contained in:
Marty Oehme 2025-11-20 17:12:01 +01:00
parent 6005f140f1
commit 2f6a7c9af6
Signed by: Marty
GPG key ID: 4E535BC19C61886E
2 changed files with 111 additions and 9 deletions

View file

@ -10,29 +10,29 @@
Some interesting questions to pose Some interesting questions to pose
1. Long-term growth 1. [ ] Long-term growth
How many unique machines download packages per day, and is the growth linear, exponential, or flattening? How many unique machines download packages per day, and is the growth linear, exponential, or flattening?
2. Weekly rhythm 2. [x] Weekly rhythm
Does the number of unique downloaders follow a weekly cycle (week-day peaks vs. weekend dips)? Does the number of unique downloaders follow a weekly cycle (week-day peaks vs. weekend dips)?
3. Kernel lag 3. [ ] Kernel lag
On average, how many days elapse between a new kernel being published upstream and the first time it appears in the logs? On average, how many days elapse between a new kernel being published upstream and the first time it appears in the logs?
*(Group kernels by major.minor, compute min(date) per kernel, compare with its official release date.)* *(Group kernels by major.minor, compute min(date) per kernel, compare with its official release date.)*
4. Kernel longevity 4. [x] Kernel longevity
Which kernel versions have the longest total lifespan (first → last appearance) and which ones disappear fastest? Which kernel versions have the longest total lifespan (first → last appearance) and which ones disappear fastest?
5. Top packages 5. [ ] Top packages
Which five packages have the highest median daily download count across the whole period? Which five packages have the highest median daily download count across the whole period?
6. Version stickiness 6. [ ] Version stickiness
For packages with ≥10 versions, what fraction of users stay on the older version at least one week after a newer version becomes For packages with ≥10 versions, what fraction of users stay on the older version at least one week after a newer version becomes
available? available?
7. Big-bang updates 7. [ ] Big-bang updates
Are there days when the total number of package downloads is >3σ above the 30-day rolling mean (indicating a bulk-update campaign)? Are there days when the total number of package downloads is >3σ above the 30-day rolling mean (indicating a bulk-update campaign)?
8. File-size vs. activity 8. [ ] File-size vs. activity
Is there a correlation between the size of the daily JSON snapshot and the number of unique downloaders? Is there a correlation between the size of the daily JSON snapshot and the number of unique downloaders?
*(Large files might mirror repository-wide rebuilds.)* *(Large files might mirror repository-wide rebuilds.)*

104
index.md
View file

@ -220,7 +220,7 @@ Indeed, there is very little variation between the week days (Mon-Fri, 1-5) and
In fact, the only day on which repository interactions rise a little seems to be Tuesday, In fact, the only day on which repository interactions rise a little seems to be Tuesday,
which is surprising. which is surprising.
Well, corroborate this with my own statistics! Well, let's corroborate this with my own statistics!
I use [`atuin`](https://atuin.sh/) to track my shell history, I use [`atuin`](https://atuin.sh/) to track my shell history,
which can be queried with `atuin history list`. which can be queried with `atuin history list`.
@ -312,4 +312,106 @@ Curiously, I can also glean from the list above that I have indeed _never_ updat
## Kernel longevity ## Kernel longevity
Another question that I find quite interesting is this:
How long were the various kernel versions in use?
Or, more precisely, which ones are the versions that have the longest 'life-spans' in the repository, or the shortest ones?
But first, let's investigate the overall download numbers per kernel.
For this we'll use the `kernels.csv` file, so let's take a look.
| date | kernel | downloads |
| --- | --- | --- |
| 2025-11-20 | 6.17.7_1 | 6 |
| 2025-11-20 | 6.17.8-tkg-bore-alderlake_1 | 1 |
| 2025-11-20 | 6.17.8-tkg-bore-zen_1 | 1 |
| 2025-11-20 | 6.17.8_1 | 12 |
| 2025-11-20 | 6.6.111_1 | 1 |
| 2025-11-20 | 6.6.116_1 | 3 |
| 2025-11-20 | 6.6.65_1 | 1 |
| 2025-11-20 | 6.6.87.2-microsoft-standard-WSL2 | 1 |
This file is almost perfectly usable as-is, but I am only interested in the actual kernel versions,
so the first three version dots (e.g. `6.17.7`).
I don't care about the void-internal release version (the `_1`),
nor the weird custom-compiled kernels people are using (e.g. `tkg-bore-alderlake_1`).
But since I also don't want to straight drop them from the data,
we'll do a little regex string substitution:
```nu
mkdir outputs
open input/popcorn/output/kernels.csv |
update kernel { str replace --regex '^(\d.\d+.\d+).*' "$1"} |
group-by --to-table kernel |
save outputs/kernels_standardized.json
```
Here we remove anything that is not part of the version string by essentially replacing the whole line with just the version itself.
This process takes a while for the over 57.000 lines contained in the file,
so I am saving an intermediate output version that I'll use for the next steps.
We'll start by summing up the absolute numbers of kernel uses per version,
of which we can keep the top 5:
```nu
open output/kernels_standardized.json | update items { $in.downloads | math sum } | sort-by items | last 10
```
This show us that:
| kernel | items |
| --- | --- |
| 6.1.31 | 1340 |
| 5.8.18 | 1674 |
| 6.12.41 | 1744 |
| 5.13.19 | 2500 |
| 6.3.13 | 2624 |
The kernel that was run the most in terms of _absolute numbers_ was kernel version 6.3.13,
with 5.13.19 coming up relatively closely behind.
The other kernels are trailing somewhat further behind with the next kernel having almost 1.000 fewer uses.
But I originally wanted to know about the _longest lived_ kernel in these data,
so how do we extract that?
We'll take the grouped `json` file and do a similar aggregation as up above,
except creating a new column for the first (`math min`) and last (`math max`) appearance of each kernel version.
Then we can take those two and,
since they are of type `datetime`,
simply subtract one from the other to get the total `duration` that the respective kernel appeared in the data.
```nu
open output/kernels_standardized.json |
insert first { $in.items.date | math min } |
insert last { $in.items.date | math max } |
reject items |
into datetime first last |
insert delta {$in.last - $in.first } |
sort-by delta |
last 10
```
By sorting on the delta value and keeping the last ones we have essentially filtered for the 'longest'-lived kernel versions,
leaving us with the following:
| kernel | first | last | delta |
| --- | --- | --- | --- |
| 6.1.6 | Mon, 16 Jan 2023 00:00:00 +0100 (2 years ago) | Sat, 5 Apr 2025 00:00:00 +0200 (7 months ago) | 115wk 4day 23hr |
| 4.19.59 | Wed, 17 Jul 2019 00:00:00 +0200 (6 years ago) | Fri, 28 Jan 2022 00:00:00 +0100 (3 years ago) | 132wk 2day 1hr |
| 5.10.9 | Fri, 22 Jan 2021 00:00:00 +0100 (4 years ago) | Tue, 12 Sep 2023 00:00:00 +0200 (2 years ago) | 137wk 3day 23hr |
| 5.19.14 | Thu, 13 Oct 2022 00:00:00 +0200 (3 years ago) | Tue, 5 Aug 2025 00:00:00 +0200 (3 months ago) | 146wk 5day |
| 5.15.36 | Fri, 29 Apr 2022 00:00:00 +0200 (3 years ago) | Mon, 3 Mar 2025 00:00:00 +0100 (8 months ago) | 148wk 3day 1hr |
| 5.13.8 | Fri, 6 Aug 2021 00:00:00 +0200 (4 years ago) | Fri, 2 Aug 2024 00:00:00 +0200 (a year ago) | 156wk |
| 5.13.10 | Sat, 14 Aug 2021 00:00:00 +0200 (4 years ago) | Sun, 15 Sep 2024 00:00:00 +0200 (a year ago) | 161wk 1day |
| 5.12.13 | Sat, 26 Jun 2021 00:00:00 +0200 (4 years ago) | Mon, 23 Sep 2024 00:00:00 +0200 (a year ago) | 169wk 2day |
| 5.11.22 | Fri, 21 May 2021 00:00:00 +0200 (4 years ago) | Sun, 22 Sep 2024 00:00:00 +0200 (a year ago) | 174wk 2day |
| 5.2.13 | Sat, 7 Sep 2019 00:00:00 +0200 (6 years ago) | Sun, 7 Sep 2025 00:00:00 +0200 (2 months ago) | 313wk 1day |
We can see that especially kernel version 5 was long-lived,
with version 5.2.13 being in use for just over 6 _years_.
The exact nature of the time frame (September 7 to September 7) makes me think this may be some sort of automated installation.
Without skipping ahead too much, this makes sense to me looking at the wider picture,
as the `popcorn` statistics gathering was introduced in the middle of kernel 4's existence,
and we are not yet anywhere near the end of the kernel 6 life-span,
so version 5 probably had the most opportunity to have long-running installations.