An approach using nushell to analyse the Void Linux Popcorn statistics. https://martyoeh.me/blog/2025-11-20-nushell-popcorn-analysis
Find a file
2025-11-20 18:04:52 +01:00
.datalad [DATALAD] new dataset 2025-11-20 12:26:02 +01:00
code Apply YODA dataset setup 2025-11-20 12:26:03 +01:00
input [DATALAD] Added subdataset 2025-11-20 12:31:39 +01:00
output Add intermediate kernel output 2025-11-20 17:11:43 +01:00
.gitattributes Apply YODA dataset setup 2025-11-20 12:26:03 +01:00
.gitmodules [DATALAD] Added subdataset 2025-11-20 12:31:39 +01:00
CHANGELOG.md Apply YODA dataset setup 2025-11-20 12:26:03 +01:00
index.md Add dataset context to intro 2025-11-20 18:04:52 +01:00
README.md Add kernel longevity section 2025-11-20 17:12:01 +01:00

Project

Dataset structure

  • All inputs (i.e. building blocks from other sources) are located in inputs/.
  • All custom code is located in code/.

Questions

Some interesting questions to pose

  1. Long-term growth How many unique machines download packages per day, and is the growth linear, exponential, or flattening?

  2. Weekly rhythm Does the number of unique downloaders follow a weekly cycle (week-day peaks vs. weekend dips)?

  3. Kernel lag On average, how many days elapse between a new kernel being published upstream and the first time it appears in the logs? (Group kernels by major.minor, compute min(date) per kernel, compare with its official release date.)

  4. Kernel longevity Which kernel versions have the longest total lifespan (first → last appearance) and which ones disappear fastest?

  5. Top packages Which five packages have the highest median daily download count across the whole period?

  6. Version stickiness For packages with ≥10 versions, what fraction of users stay on the older version at least one week after a newer version becomes available?

  7. Big-bang updates Are there days when the total number of package downloads is >3σ above the 30-day rolling mean (indicating a bulk-update campaign)?

  8. File-size vs. activity Is there a correlation between the size of the daily JSON snapshot and the number of unique downloaders? (Large files might mirror repository-wide rebuilds.)