Commit graph

10 commits

Author SHA1 Message Date
9c80281220
fix: Respect minimum color similarity option
Previously we would always assign a minimum color similarity of 1.0,
regardless of the option set. Now we set a minimum similarity according
to the option set in the configuration, otherwise the default set for
that option and fall back to a simple default value declared at the top
of the file.
2024-11-30 21:45:29 +01:00
6b35b2f918
chore: Fix strict pyright analysis errors 2024-06-14 15:13:24 +02:00
86d53a19d4
chore: Fix import lint error 2024-01-24 13:39:01 +01:00
f6c0189529
fix: Tag automation on tag creation
Tagging by color only worked on manually invoking the
`annotation.color = ()` setter. Now it works on initial
instance creation.
2024-01-24 11:20:00 +01:00
67bfc30396
refactor: Switch annotation away from dataclass
To ease employing getters and setters, we switch the dataclass
to a normal python undecorated class.
2024-01-24 11:15:10 +01:00
c53cd563b7
feat: Add pocketbook extraction 2024-01-24 08:56:21 +01:00
ddb34fca7b
refactor: Move tagging by color to Annotation 2024-01-24 08:53:54 +01:00
11d570f9d8
refactor: Rename annotation content variables
Renamed the two variables describing an annotation's highlighted PDF-text and
its appended note if any exists. Previously called 'text' (for the in-PDF
highlighted content) and 'content' (for the additional supplied content).

Now they are called 'content' for the IN PDF words, highlighted.
and 'note' for the appended note given (or not) in an annotation.
2024-01-23 09:54:36 +01:00
765de505bb
refactor: Remove AnnotatedDocument class
The AnnotatedDocument class was, essentially, a simple tuple of a document
and a list of annotations. While not bad in a vacuum, it is unwieldy and
passing this around instead of a document, annotations, or both where
necessary is more restrictive and frankly unnecessary.

This commit removes the data class and any instances of its use. Instead,
we now pass the individual components around to anything that needs them.
This also frees us up to pass only annotations around for example.

We also do not iterate through the selected papis documents to work on
in each exporter anymore (since we only pass a single document), but
in the main function itself. This leads to less duplication and makes
the overall run function the overall single source of iteration through
selected documents. Everything else only knows about a single document -
the one it is operating on - which seems much neater.

For now, it does not change much, but should make later work on extra
exporters or extractors easier.
2024-01-20 16:36:24 +01:00
31b878c9eb
refactor: Move Annotations into annotation module 2023-09-20 17:22:29 +02:00
Renamed from papis_extract/annotation_data.py (Browse further)