A full CSV representation of the Voidlinux popcorn data from https://popcorn.voidlinux.org

Find a file

Marty Oehme 2faeda87c3 Validate CSV output schemas Also moved code dir to src. There are reasons to do standard things in standard ways. While it is possible to get the `code/` directory to work, and recognize it as a package path, this requires wrangling the pyproject.toml file. Additionally, any import from the `code.something` path automatically shadows the python stdlib `code` module. While it may not be necessary, it still is good to not shadow standard library modules.		2025-10-01 10:23:10 +02:00
.datalad	[DATALAD] new dataset	2025-09-30 16:54:46 +02:00
input	Remove empty raw 0byte files	2025-09-30 20:45:29 +02:00
output	[DATALAD RUNCMD] Create updated output data	2025-09-30 20:53:50 +02:00
src	Validate CSV output schemas	2025-10-01 10:23:10 +02:00
.gitattributes	Add uv skeleton	2025-09-30 21:40:18 +02:00
.gitignore	Add validation dependencies to venv	2025-10-01 10:22:54 +02:00
.python-version	Add uv skeleton	2025-09-30 21:40:18 +02:00
CHANGELOG.md	Update README.md and CHANGELOG.md	2025-09-30 21:13:19 +02:00
justfile	Validate CSV output schemas	2025-10-01 10:23:10 +02:00
pyproject.toml	Add validation dependencies to venv	2025-10-01 10:22:54 +02:00
README.md	Validate CSV output schemas	2025-10-01 10:23:10 +02:00
uv.lock	Add validation dependencies to venv	2025-10-01 10:22:54 +02:00

README.md

Project Popcorn Voidlinux

Contains the gathered data of the https://popcorn.voidlinux.org statistics collection for all available dates (2025-09-24 as of today) in easier to work with CSV form.

Data can be cleaned and processed with the available code. Any action can easily be started using just with the available justfile.

Dataset structure

All inputs (i.e. building blocks from other sources) are located in input/.
All custom code is located in src/.
All final output data is located in output/

Output data structure

Files

Represents information about the individual JSON files available in the raw dataset.

Contained in files.csv, 4 columns:

date: the date a specific file is relevant for
filename: the full filename as it exists in the input/ directory
mtime: the last modification time of the file on the system
filesize: the size of the file, in bytes

Kernels

Represents information about the kernel versions represented in the raw dataset.

Contained in kernels.csv, 3 columns:

date: the date a specific file is relevant for
kernel: the full kernel name that is available in the raw data, including major version, minor version and suffix
downloads: the amount of times the kernel has been seen on the observation date

Packages

Represents information about the package versions represented in the raw dataset.

Contained in packages.csv, 4 columns:

date: the date a specific file is relevant for
package: the full package name as it is available in the raw data
version: the full package version as it is available in the raw data
count: the amount of times the package and version combination has been seen on the observation date

Unique installs

Represents information about the unique system installations represented in the raw dataset.

Contained in unique_installs.csv, 2 columns:

date: the date a specific file is relevant for
unique: the amount of unique installations counted on the observation date