Commit graph

43 commits

Author SHA1 Message Date
Marty Oehme 629932a5e8
feat: Loop through all chosen extractors 2024-01-23 09:10:42 +01:00
Marty Oehme f477deea7c
feat: Add extractor cli choice
Can only choose pdf for the time being, but allows additional
extractors to be added in the future.
2024-01-23 08:58:32 +01:00
Marty Oehme 3b4db7b6b8
refactor: Extract PDF extractor into class
Extractor is a general protocol with the PDF extraction routine now being
one implementation of the protocol. Preparation for adding multiple
extractors (epub,djvu, or specific progammes) in the future.
2024-01-20 18:02:51 +01:00
Marty Oehme 765de505bb
refactor: Remove AnnotatedDocument class
The AnnotatedDocument class was, essentially, a simple tuple of a document
and a list of annotations. While not bad in a vacuum, it is unwieldy and
passing this around instead of a document, annotations, or both where
necessary is more restrictive and frankly unnecessary.

This commit removes the data class and any instances of its use. Instead,
we now pass the individual components around to anything that needs them.
This also frees us up to pass only annotations around for example.

We also do not iterate through the selected papis documents to work on
in each exporter anymore (since we only pass a single document), but
in the main function itself. This leads to less duplication and makes
the overall run function the overall single source of iteration through
selected documents. Everything else only knows about a single document -
the one it is operating on - which seems much neater.

For now, it does not change much, but should make later work on extra
exporters or extractors easier.
2024-01-20 16:36:24 +01:00
Marty Oehme cd5f787220
chore: Update dependencies to fix single-thread warning
All checks were successful
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
Fixed single-threaded warning provided from the fitz pymupdf library
since the issue does not exist for this new version anymore.
Bump version along the way.
2024-01-18 18:26:00 +01:00
Marty Oehme 1ef9a91e55
test: Remove deprecated pipeline instruction
All checks were successful
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
2024-01-06 11:15:51 +01:00
Marty Oehme 376282eaaa
test: Fix test running on main branch
All checks were successful
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
2023-10-17 22:09:54 +02:00
Marty Oehme 5cd5a05062
chore: Fix black fmt
Some checks failed
ci/woodpecker/push/test unknown status
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline was successful
2023-10-17 22:07:09 +02:00
Marty Oehme aeb18ae358
feat: Add option to force-add annotations
Will turn off looking for duplicate annotations and simply add all.
2023-10-17 22:05:11 +02:00
Marty Oehme 14f1b9e75c
test: Add poetry-cov library 2023-10-17 21:16:40 +02:00
Marty Oehme c9736a5f32
test: Add tests for formatter sad paths 2023-10-12 19:27:16 +02:00
Marty Oehme f67ac8cdb3
chore: Fix markdown lint issues 2023-10-12 19:26:41 +02:00
Marty Oehme 2700e4adc3
test: Add code coverage dev dependency 2023-09-22 21:53:55 +02:00
Marty Oehme 1e29642cba
test: Fix formatting and annotation tests 2023-09-22 21:49:52 +02:00
Marty Oehme ee4690f52b
feat: Add atx-style markdown
Added markdown with atx style headers, can be chosen as
alternative markdown template on the cli.
The existing 'markdown' template will still default to
setext-style headers.
2023-09-21 22:05:39 +02:00
Marty Oehme 7ee8d4911e
refactor: Make formatters functions
Formatters have been classes so far which contained some data (the
tamplate to use for formatting and the annotations and documents to
format) and the actual formatting logic (an execute function).

However, we can inject the annotations to be formatted and the templates
so far are static only, so they can be simple variables (we can think
about how to inject them at another point should it come up, no
bikeshedding now).

This way, we can simply pass around one function per formatter, which
should make the code much lighter, easier to add to and especially less
stateful which means less areas of broken interactions to worry about.
2023-09-21 21:54:24 +02:00
Marty Oehme 929e70d7ac
chore: Update poetry.lock 2023-09-21 19:36:00 +02:00
Marty Oehme 31b878c9eb
refactor: Move Annotations into annotation module 2023-09-20 17:22:29 +02:00
Marty Oehme 3670f70319
docs: Add formatting documentation
Added documentation on using output templates and that they will
invalidate the 'existing' annotation search.
2023-09-20 09:15:00 +02:00
Marty Oehme e511ffa48d
feat: Add CSV formatter
Added formatter for csv-compatible syntax. The formatting is quite basic
with no escaping happening should that be necessary. However, for an
initial csv output it suffices for me.
2023-09-20 09:15:00 +02:00
Marty Oehme 5f0bc2ffad
feat: Add count formatter
Added formatter which counts and outputs the number of
annotations in each document.
2023-09-20 09:14:59 +02:00
Marty Oehme 5a6d672c76
refactor: Move formatting logic to formatters
Formatters (previously templates) were pure data containers before,
continating the 'template' for how things should be formatted using
mustache. The formatting would be done a) in the exporters and b) in the
annotations.

This spread of formatting has now been consolidated into the Formatter,
which fixes the overall spread of formatting code and now can coherently
format a whole output instead of just individual annotations.

A formatter contains references to all documents and contained
annotations and will format everything at once by default, but the
formatting function can be invoked with reference to a specific
annotated document to only format that.

This commit should put more separation into the concerns of exporter and
formatter and made formatting a concern purely of the formatters and
annotation objects.
2023-09-20 09:14:58 +02:00
Marty Oehme 66f937e2a8
test: Add local papis settings for testing 2023-09-20 09:14:55 +02:00
Marty Oehme cbe2e7cb03
feat: Allow cli option for template choice 2023-09-20 09:14:54 +02:00
Marty Oehme 9674592a9f
docs: Add developer notes to README 2023-09-20 09:14:43 +02:00
Marty Oehme 07d4de9a46
docs: Add docstrings 2023-09-20 09:13:04 +02:00
Marty Oehme 4eb983d9e3
refactor: Move templating to separate file 2023-09-20 09:12:59 +02:00
Marty Oehme e633c0335e
chore: Make whoosh database optional dependency 2023-09-20 09:12:54 +02:00
Marty Oehme 5450776eb2
refactor: Extract templating to model module 2023-09-20 09:12:45 +02:00
Marty Oehme e56f014136
Add formatting style Markdown 2023-08-31 21:40:17 +02:00
Marty Oehme 20873e6ef8
Change annotation color to simple rgb tuple
Some checks failed
ci/woodpecker/push/test unknown status
ci/woodpecker/push/lint Pipeline failed
ci/woodpecker/push/static_analysis Pipeline was successful
2023-08-29 22:23:52 +02:00
Marty Oehme 256117d451
Add mustache templating
Added mustache templating engine to be able to provide custom
formatting strings.
2023-08-29 13:49:22 +02:00
Marty Oehme e325b89c9b
Move all extraction logic into extractor module
The publically accessible default interface only contains
the command line command interface and a single run function.
2023-08-29 12:40:36 +02:00
Marty Oehme b564ab4792
Add continuous integration pipeline
Some checks failed
ci/woodpecker/push/test unknown status
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline was successful
Added static analysis (lint, type checking) to be done on each push, and
testing to be done on each master branch commit.
2023-08-29 12:15:10 +02:00
Marty Oehme c6b95a4742
Fix spacing in print output 2023-08-28 18:05:52 +02:00
Marty Oehme 2109c6535d
Add decoration to pretty printed author 2023-08-28 17:04:15 +02:00
Marty Oehme fab58a9fc5
Add preliminary README 2023-08-28 16:42:16 +02:00
Marty Oehme 1af0f8f7bc
Remove default tag mappings
Since meanings assigned to highlight colors are often very personal
I do not want to make any assumptions about their use. Remove any
default associations.
2023-08-28 16:41:59 +02:00
Marty Oehme e68b801ca1
Fix color mapping to tag
Using the papis-like value getting from the options file we should
now correctly get the values for mapping colors to tags.
Why did they not just implement e.g. a toml reader I wonder?
2023-08-28 16:41:18 +02:00
Marty Oehme ff84a28c4a
Format code 2023-08-28 12:55:01 +02:00
Marty Oehme 1bb1b80620
Add debug logging for extractor 2023-08-28 12:53:17 +02:00
Marty Oehme df235caf8f
Fix logging formatting error 2023-08-28 12:52:51 +02:00
Marty Oehme a22cc635b2
initial commit 2023-08-28 10:28:06 +02:00