From 2f6a7c9af62a10c13f0e7c8b81988816b577af75 Mon Sep 17 00:00:00 2001 From: Marty Oehme Date: Thu, 20 Nov 2025 17:12:01 +0100 Subject: [PATCH] Add kernel longevity section --- README.md | 16 ++++----- index.md | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 111 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 8049430..4d75efc 100644 --- a/README.md +++ b/README.md @@ -10,29 +10,29 @@ Some interesting questions to pose -1. Long-term growth +1. [ ] Long-term growth How many unique machines download packages per day, and is the growth linear, exponential, or flattening? -2. Weekly rhythm +2. [x] Weekly rhythm Does the number of unique downloaders follow a weekly cycle (week-day peaks vs. weekend dips)? -3. Kernel lag +3. [ ] Kernel lag On average, how many days elapse between a new kernel being published upstream and the first time it appears in the logs? *(Group kernels by major.minor, compute min(date) per kernel, compare with its official release date.)* -4. Kernel longevity +4. [x] Kernel longevity Which kernel versions have the longest total lifespan (first → last appearance) and which ones disappear fastest? -5. Top packages +5. [ ] Top packages Which five packages have the highest median daily download count across the whole period? -6. Version stickiness +6. [ ] Version stickiness For packages with ≥10 versions, what fraction of users stay on the older version at least one week after a newer version becomes available? -7. Big-bang updates +7. [ ] Big-bang updates Are there days when the total number of package downloads is >3σ above the 30-day rolling mean (indicating a bulk-update campaign)? -8. File-size vs. activity +8. [ ] File-size vs. activity Is there a correlation between the size of the daily JSON snapshot and the number of unique downloaders? *(Large files might mirror repository-wide rebuilds.)* diff --git a/index.md b/index.md index ccafb0e..aba21b4 100644 --- a/index.md +++ b/index.md @@ -220,7 +220,7 @@ Indeed, there is very little variation between the week days (Mon-Fri, 1-5) and In fact, the only day on which repository interactions rise a little seems to be Tuesday, which is surprising. -Well, corroborate this with my own statistics! +Well, let's corroborate this with my own statistics! I use [`atuin`](https://atuin.sh/) to track my shell history, which can be queried with `atuin history list`. @@ -312,4 +312,106 @@ Curiously, I can also glean from the list above that I have indeed _never_ updat ## Kernel longevity +Another question that I find quite interesting is this: +How long were the various kernel versions in use? +Or, more precisely, which ones are the versions that have the longest 'life-spans' in the repository, or the shortest ones? +But first, let's investigate the overall download numbers per kernel. + +For this we'll use the `kernels.csv` file, so let's take a look. + +| date | kernel | downloads | +| --- | --- | --- | +| 2025-11-20 | 6.17.7_1 | 6 | +| 2025-11-20 | 6.17.8-tkg-bore-alderlake_1 | 1 | +| 2025-11-20 | 6.17.8-tkg-bore-zen_1 | 1 | +| 2025-11-20 | 6.17.8_1 | 12 | +| 2025-11-20 | 6.6.111_1 | 1 | +| 2025-11-20 | 6.6.116_1 | 3 | +| 2025-11-20 | 6.6.65_1 | 1 | +| 2025-11-20 | 6.6.87.2-microsoft-standard-WSL2 | 1 | + +This file is almost perfectly usable as-is, but I am only interested in the actual kernel versions, +so the first three version dots (e.g. `6.17.7`). +I don't care about the void-internal release version (the `_1`), +nor the weird custom-compiled kernels people are using (e.g. `tkg-bore-alderlake_1`). +But since I also don't want to straight drop them from the data, +we'll do a little regex string substitution: + +```nu +mkdir outputs +open input/popcorn/output/kernels.csv | + update kernel { str replace --regex '^(\d.\d+.\d+).*' "$1"} | + group-by --to-table kernel | + save outputs/kernels_standardized.json +``` + +Here we remove anything that is not part of the version string by essentially replacing the whole line with just the version itself. +This process takes a while for the over 57.000 lines contained in the file, +so I am saving an intermediate output version that I'll use for the next steps. + +We'll start by summing up the absolute numbers of kernel uses per version, +of which we can keep the top 5: + +```nu +open output/kernels_standardized.json | update items { $in.downloads | math sum } | sort-by items | last 10 +``` + +This show us that: + +| kernel | items | +| --- | --- | +| 6.1.31 | 1340 | +| 5.8.18 | 1674 | +| 6.12.41 | 1744 | +| 5.13.19 | 2500 | +| 6.3.13 | 2624 | + +The kernel that was run the most in terms of _absolute numbers_ was kernel version 6.3.13, +with 5.13.19 coming up relatively closely behind. +The other kernels are trailing somewhat further behind with the next kernel having almost 1.000 fewer uses. + +But I originally wanted to know about the _longest lived_ kernel in these data, +so how do we extract that? + +We'll take the grouped `json` file and do a similar aggregation as up above, +except creating a new column for the first (`math min`) and last (`math max`) appearance of each kernel version. +Then we can take those two and, +since they are of type `datetime`, +simply subtract one from the other to get the total `duration` that the respective kernel appeared in the data. + +```nu +open output/kernels_standardized.json | + insert first { $in.items.date | math min } | + insert last { $in.items.date | math max } | + reject items | + into datetime first last | + insert delta {$in.last - $in.first } | + sort-by delta | + last 10 +``` + +By sorting on the delta value and keeping the last ones we have essentially filtered for the 'longest'-lived kernel versions, +leaving us with the following: + +| kernel | first | last | delta | +| --- | --- | --- | --- | +| 6.1.6 | Mon, 16 Jan 2023 00:00:00 +0100 (2 years ago) | Sat, 5 Apr 2025 00:00:00 +0200 (7 months ago) | 115wk 4day 23hr | +| 4.19.59 | Wed, 17 Jul 2019 00:00:00 +0200 (6 years ago) | Fri, 28 Jan 2022 00:00:00 +0100 (3 years ago) | 132wk 2day 1hr | +| 5.10.9 | Fri, 22 Jan 2021 00:00:00 +0100 (4 years ago) | Tue, 12 Sep 2023 00:00:00 +0200 (2 years ago) | 137wk 3day 23hr | +| 5.19.14 | Thu, 13 Oct 2022 00:00:00 +0200 (3 years ago) | Tue, 5 Aug 2025 00:00:00 +0200 (3 months ago) | 146wk 5day | +| 5.15.36 | Fri, 29 Apr 2022 00:00:00 +0200 (3 years ago) | Mon, 3 Mar 2025 00:00:00 +0100 (8 months ago) | 148wk 3day 1hr | +| 5.13.8 | Fri, 6 Aug 2021 00:00:00 +0200 (4 years ago) | Fri, 2 Aug 2024 00:00:00 +0200 (a year ago) | 156wk | +| 5.13.10 | Sat, 14 Aug 2021 00:00:00 +0200 (4 years ago) | Sun, 15 Sep 2024 00:00:00 +0200 (a year ago) | 161wk 1day | +| 5.12.13 | Sat, 26 Jun 2021 00:00:00 +0200 (4 years ago) | Mon, 23 Sep 2024 00:00:00 +0200 (a year ago) | 169wk 2day | +| 5.11.22 | Fri, 21 May 2021 00:00:00 +0200 (4 years ago) | Sun, 22 Sep 2024 00:00:00 +0200 (a year ago) | 174wk 2day | +| 5.2.13 | Sat, 7 Sep 2019 00:00:00 +0200 (6 years ago) | Sun, 7 Sep 2025 00:00:00 +0200 (2 months ago) | 313wk 1day | + +We can see that especially kernel version 5 was long-lived, +with version 5.2.13 being in use for just over 6 _years_. +The exact nature of the time frame (September 7 to September 7) makes me think this may be some sort of automated installation. + +Without skipping ahead too much, this makes sense to me looking at the wider picture, +as the `popcorn` statistics gathering was introduced in the middle of kernel 4's existence, +and we are not yet anywhere near the end of the kernel 6 life-span, +so version 5 probably had the most opportunity to have long-running installations.