A full CSV representation of the Voidlinux popcorn data from https://popcorn.voidlinux.org
Find a file
Marty Oehme 2faeda87c3
Validate CSV output schemas
Also moved code dir to src.
There are reasons to do standard things in standard ways. While it is
possible to get the `code/` directory to work, and recognize it as a
package path, this requires wrangling the pyproject.toml file.
Additionally, any import from the `code.something` path automatically
shadows the python stdlib `code` module. While it may not be necessary,
it still is good to not shadow standard library modules.
2025-10-01 10:23:10 +02:00
.datalad [DATALAD] new dataset 2025-09-30 16:54:46 +02:00
input Remove empty raw 0byte files 2025-09-30 20:45:29 +02:00
output [DATALAD RUNCMD] Create updated output data 2025-09-30 20:53:50 +02:00
src Validate CSV output schemas 2025-10-01 10:23:10 +02:00
.gitattributes Add uv skeleton 2025-09-30 21:40:18 +02:00
.gitignore Add validation dependencies to venv 2025-10-01 10:22:54 +02:00
.python-version Add uv skeleton 2025-09-30 21:40:18 +02:00
CHANGELOG.md Update README.md and CHANGELOG.md 2025-09-30 21:13:19 +02:00
justfile Validate CSV output schemas 2025-10-01 10:23:10 +02:00
pyproject.toml Add validation dependencies to venv 2025-10-01 10:22:54 +02:00
README.md Validate CSV output schemas 2025-10-01 10:23:10 +02:00
uv.lock Add validation dependencies to venv 2025-10-01 10:22:54 +02:00

Project Popcorn Voidlinux

Contains the gathered data of the https://popcorn.voidlinux.org statistics collection for all available dates (2025-09-24 as of today) in easier to work with CSV form.

Data can be cleaned and processed with the available code. Any action can easily be started using just with the available justfile.

Dataset structure

  • All inputs (i.e. building blocks from other sources) are located in input/.
  • All custom code is located in src/.
  • All final output data is located in output/

Output data structure

Files

Represents information about the individual JSON files available in the raw dataset.

Contained in files.csv, 4 columns:

  • date: the date a specific file is relevant for
  • filename: the full filename as it exists in the input/ directory
  • mtime: the last modification time of the file on the system
  • filesize: the size of the file, in bytes

Kernels

Represents information about the kernel versions represented in the raw dataset.

Contained in kernels.csv, 3 columns:

  • date: the date a specific file is relevant for
  • kernel: the full kernel name that is available in the raw data, including major version, minor version and suffix
  • downloads: the amount of times the kernel has been seen on the observation date

Packages

Represents information about the package versions represented in the raw dataset.

Contained in packages.csv, 4 columns:

  • date: the date a specific file is relevant for
  • package: the full package name as it is available in the raw data
  • version: the full package version as it is available in the raw data
  • count: the amount of times the package and version combination has been seen on the observation date

Unique installs

Represents information about the unique system installations represented in the raw dataset.

Contained in unique_installs.csv, 2 columns:

  • date: the date a specific file is relevant for
  • unique: the amount of unique installations counted on the observation date