Validate CSV output schemas
Also moved code dir to src. There are reasons to do standard things in standard ways. While it is possible to get the `code/` directory to work, and recognize it as a package path, this requires wrangling the pyproject.toml file. Additionally, any import from the `code.something` path automatically shadows the python stdlib `code` module. While it may not be necessary, it still is good to not shadow standard library modules.
This commit is contained in:
parent
de96b67fac
commit
2faeda87c3
14 changed files with 111 additions and 7 deletions
|
|
@ -9,7 +9,7 @@ Any action can easily be started using [`just`](https://github.com/casey/just) w
|
|||
## Dataset structure
|
||||
|
||||
- All inputs (i.e. building blocks from other sources) are located in `input/`.
|
||||
- All custom code is located in `code/`.
|
||||
- All custom code is located in `src/`.
|
||||
- All final output data is located in `output/`
|
||||
|
||||
## Output data structure
|
||||
|
|
@ -51,7 +51,7 @@ Contained in `packages.csv`, 4 columns:
|
|||
|
||||
Represents information about the unique system installations represented in the raw dataset.
|
||||
|
||||
Contained in `packages.csv`, 2 columns:
|
||||
Contained in `unique_installs.csv`, 2 columns:
|
||||
|
||||
- `date`: the date a specific file is relevant for
|
||||
- `unique`: the amount of unique installations counted on the observation date
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue