38 lines
1.5 KiB
Markdown
38 lines
1.5 KiB
Markdown
# Project <insert name>
|
||
|
||
## Dataset structure
|
||
|
||
- All inputs (i.e. building blocks from other sources) are located in
|
||
`inputs/`.
|
||
- All custom code is located in `code/`.
|
||
|
||
## Questions
|
||
|
||
Some interesting questions to pose
|
||
|
||
1. [ ] Long-term growth
|
||
How many unique machines download packages per day, and is the growth linear, exponential, or flattening?
|
||
|
||
2. [x] Weekly rhythm
|
||
Does the number of unique downloaders follow a weekly cycle (week-day peaks vs. weekend dips)?
|
||
|
||
3. [ ] Kernel lag
|
||
On average, how many days elapse between a new kernel being published upstream and the first time it appears in the logs?
|
||
*(Group kernels by major.minor, compute min(date) per kernel, compare with its official release date.)*
|
||
|
||
4. [x] Kernel longevity
|
||
Which kernel versions have the longest total lifespan (first → last appearance) and which ones disappear fastest?
|
||
|
||
5. [ ] Top packages
|
||
Which five packages have the highest median daily download count across the whole period?
|
||
|
||
6. [ ] Version stickiness
|
||
For packages with ≥10 versions, what fraction of users stay on the older version at least one week after a newer version becomes
|
||
available?
|
||
|
||
7. [ ] Big-bang updates
|
||
Are there days when the total number of package downloads is >3σ above the 30-day rolling mean (indicating a bulk-update campaign)?
|
||
|
||
8. [ ] File-size vs. activity
|
||
Is there a correlation between the size of the daily JSON snapshot and the number of unique downloaders?
|
||
*(Large files might mirror repository-wide rebuilds.)*
|