papis-extract/ROADMAP

49 lines
2.5 KiB
Text

# ROADMAP to first release
Known issues to be fixed:
- [x] if both content and text are empty, do not extract an annotation
- [x] Speed?
- should be fine, on my machine (old i5 laptop) it takes around 90s for ~1000 documents with ~4000 annotations
- [x] ensure all cmdline options do what they should
- [x] annotations carry over color object from fitz, should just be Color object or simple tuple with rgb vals
- [x] docstrings, docstrings!
- [ ] testing testing testing!!
- [ ] refactor into some better abstractions (e.g. Exporter Protocol -> stdout/markdown implementations; Extractor Protocol -> PDF implementation)
- [ ] dependency injection for extractor/exporter/formatter/annotation modules
- [ ] any call to papis.config should start from init and be injected?
features to be implemented:
- [ ] CICD
- [x] static analysis (lint, typecheck etc) on pushes
- [x] test pipeline on master pushes
- [ ] release pipeline to pypi on tags
- [x] add page number if available
- exists in Annotation, just need to place in output
- [ ] show overall amount of extractions at the end
- implemented for writing to notes (notes exporter)
- KNOWN ISSUE: currently returns number of annotation rows (may be multiple per annot)
- [ ] custom formatting decided by user
- in config as { "myformatter": ">{tag}\n{quote}\n{note}\n{page} etc"}
- [ ] improved default exporters
- [x] markdown into notes
- [ ] pretty display on stdout (rich?)
- [x] csv/tsv to stdout
- [ ] table fmt stdout?
- [ ] allow custom colors -> tag name settings not dependent on color name existing (e.g. {"important": (1.0,0.0,0.0)})
- [ ] `--overwrite` mode where existing annotations are not dropped but overwritten on same line of note
- [x] `--force` mode where we simply do not drop anything
- called `--duplicates` in current implementation
- [x] `--format` option to choose from default or set up a custom formatter
- called `--output` in current implementation
- [ ] on_add hook to extract annotations as files are added
- needs upstream help, 'on_add' hook, and pass-through of affected documents
- [ ] target same minimum Python version as papis upstream (3.8 as of papis 0.14, 3.10 for upcoming papis ~0.15)
- [ ] change detection:
- how does it handle updated citations? updated colors? should it be configurable?
upstream changes:
- [ ] need a hook for adding a document/file
- [ ] need hooks to actually pass through information on the thing they worked on (i.e. their document)