Validate CSV output schemas

Also moved code dir to src. There are reasons to do standard things in standard ways. While it is possible to get the `code/` directory to work, and recognize it as a package path, this requires wrangling the pyproject.toml file. Additionally, any import from the `code.something` path automatically shadows the python stdlib `code` module. While it may not be necessary, it still is good to not shadow standard library modules.
2025-09-30 22:14:30 +02:00 · 2025-09-30 22:14:30 +02:00 · 2faeda87c3
commit 2faeda87c3
parent de96b67fac
14 changed files with 111 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -9,7 +9,7 @@ Any action can easily be started using [`just`](https://github.com/casey/just) w
 ## Dataset structure

 - All inputs (i.e. building blocks from other sources) are located in `input/`.
- All custom code is located in `code/`.
+- All custom code is located in `src/`.
 - All final output data is located in `output/`

 ## Output data structure
@ -51,7 +51,7 @@ Contained in `packages.csv`, 4 columns:

 Represents information about the unique system installations represented in the raw dataset.

-Contained in `packages.csv`, 2 columns:
+Contained in `unique_installs.csv`, 2 columns:

 - `date`: the date a specific file is relevant for
 - `unique`: the amount of unique installations counted on the observation date