| .datalad | ||
| code | ||
| input | ||
| .gitattributes | ||
| .gitmodules | ||
| CHANGELOG.md | ||
| README.md | ||
Project
Dataset structure
- All inputs (i.e. building blocks from other sources) are located in
inputs/. - All custom code is located in
code/.
Questions
Some interesting questions to pose
-
Long-term growth How many unique machines download packages per day, and is the growth linear, exponential, or flattening?
-
Weekly rhythm Does the number of unique downloaders follow a weekly cycle (week-day peaks vs. weekend dips)?
-
Kernel lag On average, how many days elapse between a new kernel being published upstream and the first time it appears in the logs? (Group kernels by major.minor, compute min(date) per kernel, compare with its official release date.)
-
Kernel longevity Which kernel versions have the longest total lifespan (first → last appearance) and which ones disappear fastest?
-
Top packages Which five packages have the highest median daily download count across the whole period?
-
Version stickiness For packages with ≥10 versions, what fraction of users stay on the older version at least one week after a newer version becomes available?
-
Big-bang updates Are there days when the total number of package downloads is >3σ above the 30-day rolling mean (indicating a bulk-update campaign)?
-
File-size vs. activity Is there a correlation between the size of the daily JSON snapshot and the number of unique downloaders? (Large files might mirror repository-wide rebuilds.)