ds-voidlinux-popcorn/README.md

57 lines
2 KiB
Markdown

# Project Popcorn Voidlinux
Contains the gathered data of the <https://popcorn.voidlinux.org> statistics collection for all
available dates (2025-09-24 as of today) in easier to work with CSV form.
Data can be cleaned and processed with the available code.
Any action can easily be started using [`just`](https://github.com/casey/just) with the available `justfile`.
## Dataset structure
- All inputs (i.e. building blocks from other sources) are located in `input/`.
- All custom code is located in `code/`.
- All final output data is located in `output/`
## Output data structure
### Files
Represents information about the individual JSON files available in the raw dataset.
Contained in `files.csv`, 4 columns:
- `date`: the date a specific file is relevant for
- `filename`: the full filename as it exists in the `input/` directory
- `mtime`: the last modification time of the file on the system
- `filesize`: the size of the file, in bytes
### Kernels
Represents information about the kernel versions represented in the raw dataset.
Contained in `kernels.csv`, 3 columns:
- `date`: the date a specific file is relevant for
- `kernel`: the full kernel name that is available in the raw data, including major version, minor
version and suffix
- `downloads`: the amount of times the kernel has been seen on the observation date
### Packages
Represents information about the package versions represented in the raw dataset.
Contained in `packages.csv`, 4 columns:
- `date`: the date a specific file is relevant for
- `package`: the full package name as it is available in the raw data
- `version`: the full package version as it is available in the raw data
- `count`: the amount of times the package and version combination has been seen on the observation date
### Unique installs
Represents information about the unique system installations represented in the raw dataset.
Contained in `unique_installs.csv`, 2 columns:
- `date`: the date a specific file is relevant for
- `unique`: the amount of unique installations counted on the observation date