# Project ## Dataset structure - All inputs (i.e. building blocks from other sources) are located in `inputs/`. - All custom code is located in `code/`. ## Questions Some interesting questions to pose 1. [ ] Long-term growth How many unique machines download packages per day, and is the growth linear, exponential, or flattening? 2. [x] Weekly rhythm Does the number of unique downloaders follow a weekly cycle (week-day peaks vs. weekend dips)? 3. [ ] Kernel lag On average, how many days elapse between a new kernel being published upstream and the first time it appears in the logs? *(Group kernels by major.minor, compute min(date) per kernel, compare with its official release date.)* 4. [x] Kernel longevity Which kernel versions have the longest total lifespan (first → last appearance) and which ones disappear fastest? 5. [ ] Top packages Which five packages have the highest median daily download count across the whole period? 6. [ ] Version stickiness For packages with ≥10 versions, what fraction of users stay on the older version at least one week after a newer version becomes available? 7. [ ] Big-bang updates Are there days when the total number of package downloads is >3σ above the 30-day rolling mean (indicating a bulk-update campaign)? 8. [ ] File-size vs. activity Is there a correlation between the size of the daily JSON snapshot and the number of unique downloaders? *(Large files might mirror repository-wide rebuilds.)*