papis-extract

Author	SHA1	Message	Date
Marty Oehme	3b4db7b6b8	refactor: Extract PDF extractor into class Extractor is a general protocol with the PDF extraction routine now being one implementation of the protocol. Preparation for adding multiple extractors (epub,djvu, or specific progammes) in the future.	2024-01-20 18:02:51 +01:00
Marty Oehme	765de505bb	refactor: Remove AnnotatedDocument class The AnnotatedDocument class was, essentially, a simple tuple of a document and a list of annotations. While not bad in a vacuum, it is unwieldy and passing this around instead of a document, annotations, or both where necessary is more restrictive and frankly unnecessary. This commit removes the data class and any instances of its use. Instead, we now pass the individual components around to anything that needs them. This also frees us up to pass only annotations around for example. We also do not iterate through the selected papis documents to work on in each exporter anymore (since we only pass a single document), but in the main function itself. This leads to less duplication and makes the overall run function the overall single source of iteration through selected documents. Everything else only knows about a single document - the one it is operating on - which seems much neater. For now, it does not change much, but should make later work on extra exporters or extractors easier.	2024-01-20 16:36:24 +01:00
Marty Oehme	cd5f787220	chore: Update dependencies to fix single-thread warning All checks were successful ci/woodpecker/push/lint Pipeline was successful Details ci/woodpecker/push/static_analysis Pipeline was successful Details ci/woodpecker/push/test Pipeline was successful Details Fixed single-threaded warning provided from the fitz pymupdf library since the issue does not exist for this new version anymore. Bump version along the way.	2024-01-18 18:26:00 +01:00
Marty Oehme	31b878c9eb	refactor: Move Annotations into annotation module	2023-09-20 17:22:29 +02:00
Marty Oehme	07d4de9a46	docs: Add docstrings	2023-09-20 09:13:04 +02:00
Marty Oehme	20873e6ef8	Change annotation color to simple rgb tuple Some checks failed ci/woodpecker/push/test unknown status Details ci/woodpecker/push/lint Pipeline failed Details ci/woodpecker/push/static_analysis Pipeline was successful Details	2023-08-29 22:23:52 +02:00
Marty Oehme	e325b89c9b	Move all extraction logic into extractor module The publically accessible default interface only contains the command line command interface and a single run function.	2023-08-29 12:40:36 +02:00
Marty Oehme	b564ab4792	Add continuous integration pipeline Some checks failed ci/woodpecker/push/test unknown status Details ci/woodpecker/push/lint Pipeline was successful Details ci/woodpecker/push/static_analysis Pipeline was successful Details Added static analysis (lint, type checking) to be done on each push, and testing to be done on each master branch commit.	2023-08-29 12:15:10 +02:00
Marty Oehme	e68b801ca1	Fix color mapping to tag Using the papis-like value getting from the options file we should now correctly get the values for mapping colors to tags. Why did they not just implement e.g. a toml reader I wonder?	2023-08-28 16:41:18 +02:00
Marty Oehme	1bb1b80620	Add debug logging for extractor	2023-08-28 12:53:17 +02:00
Marty Oehme	a22cc635b2	initial commit	2023-08-28 10:28:06 +02:00

11 commits