Commit graph

13 commits

Author SHA1 Message Date
Marty Oehme 629932a5e8
feat: Loop through all chosen extractors 2024-01-23 09:10:42 +01:00
Marty Oehme f477deea7c
feat: Add extractor cli choice
Can only choose pdf for the time being, but allows additional
extractors to be added in the future.
2024-01-23 08:58:32 +01:00
Marty Oehme 3b4db7b6b8
refactor: Extract PDF extractor into class
Extractor is a general protocol with the PDF extraction routine now being
one implementation of the protocol. Preparation for adding multiple
extractors (epub,djvu, or specific progammes) in the future.
2024-01-20 18:02:51 +01:00
Marty Oehme 765de505bb
refactor: Remove AnnotatedDocument class
The AnnotatedDocument class was, essentially, a simple tuple of a document
and a list of annotations. While not bad in a vacuum, it is unwieldy and
passing this around instead of a document, annotations, or both where
necessary is more restrictive and frankly unnecessary.

This commit removes the data class and any instances of its use. Instead,
we now pass the individual components around to anything that needs them.
This also frees us up to pass only annotations around for example.

We also do not iterate through the selected papis documents to work on
in each exporter anymore (since we only pass a single document), but
in the main function itself. This leads to less duplication and makes
the overall run function the overall single source of iteration through
selected documents. Everything else only knows about a single document -
the one it is operating on - which seems much neater.

For now, it does not change much, but should make later work on extra
exporters or extractors easier.
2024-01-20 16:36:24 +01:00
Marty Oehme cd5f787220
chore: Update dependencies to fix single-thread warning
All checks were successful
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
Fixed single-threaded warning provided from the fitz pymupdf library
since the issue does not exist for this new version anymore.
Bump version along the way.
2024-01-18 18:26:00 +01:00
Marty Oehme 31b878c9eb
refactor: Move Annotations into annotation module 2023-09-20 17:22:29 +02:00
Marty Oehme 07d4de9a46
docs: Add docstrings 2023-09-20 09:13:04 +02:00
Marty Oehme 20873e6ef8
Change annotation color to simple rgb tuple
Some checks failed
ci/woodpecker/push/test unknown status
ci/woodpecker/push/lint Pipeline failed
ci/woodpecker/push/static_analysis Pipeline was successful
2023-08-29 22:23:52 +02:00
Marty Oehme e325b89c9b
Move all extraction logic into extractor module
The publically accessible default interface only contains
the command line command interface and a single run function.
2023-08-29 12:40:36 +02:00
Marty Oehme b564ab4792
Add continuous integration pipeline
Some checks failed
ci/woodpecker/push/test unknown status
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline was successful
Added static analysis (lint, type checking) to be done on each push, and
testing to be done on each master branch commit.
2023-08-29 12:15:10 +02:00
Marty Oehme e68b801ca1
Fix color mapping to tag
Using the papis-like value getting from the options file we should
now correctly get the values for mapping colors to tags.
Why did they not just implement e.g. a toml reader I wonder?
2023-08-28 16:41:18 +02:00
Marty Oehme 1bb1b80620
Add debug logging for extractor 2023-08-28 12:53:17 +02:00
Marty Oehme a22cc635b2
initial commit 2023-08-28 10:28:06 +02:00