Commit graph

105 commits

Author SHA1 Message Date
24a4812051
chore: Update beautifulsoup4 dependency
Updated dependency and, with its newly provided type hints, removed some
pyright overrides.

Added a cast where there was still not enough type hinting.
2025-09-12 14:53:03 +02:00
ff6cdf3cca
ci: Update woodpecker ci to use uv images 2025-09-12 14:53:02 +02:00
ecb999a49e
chore: Formatting for ruff check 2025-09-12 14:53:01 +02:00
1a4b5e3a70
chore: Remove python-magic dependency
It relies on the libmagic module which is not necessarily installed
everywhere. Most of the functionality that we need for our purposes can
be recreated with lighter-weight methods.
2025-09-12 14:53:01 +02:00
17c6fefd89
chore: Remove unnecessary imports 2025-09-12 14:53:00 +02:00
3eb7f3f1c7
feat: Add Readest extractor 2025-09-12 14:53:00 +02:00
fd71482526
chore: Log found files for extractors to debug logger 2025-09-12 14:52:59 +02:00
a9ff4152af
fix: Do not parse the last ReadEra section 2025-09-12 14:52:59 +02:00
db47ad686d
chore: Implement Annotation sort and equality dunders 2025-09-12 14:52:58 +02:00
d840609ecb
fix: Fix annotation value comparison 2025-09-12 14:44:04 +02:00
3344147f1f
test: Add tests for readera extractor 2025-09-12 14:44:04 +02:00
e46219151b
refactor: Use generator for PDF extractor 2025-09-12 10:55:25 +02:00
ff36d30f91
docs: Add CHANGELOG 2025-09-12 10:55:24 +02:00
f5455b6946
chore: Fix for additional linting rules 2025-09-12 10:55:23 +02:00
96cd4929c9
chore: Format files with ruff 2025-09-12 10:55:23 +02:00
a854ef00d6
fix: Write annotations if duplicate detection is off
Previously we would never add annotations if the detection is off,
because we only added an empty list instead of the actual annotations
and would thus break out of writing early. JJ: Enter a description for
the selected changes.
2025-09-12 10:55:22 +02:00
f7801365f0
chore: Update uv lock 2025-09-12 10:55:22 +02:00
e90a123f88
chore!: Rename force option to duplicates
BREAKING CHANGE: Change the `--force/--no-force` cli option to
`--duplicates/--no-duplicates` since it describes a little clearer what
using it actually achieves (adding quote duplicates or not to output).
2025-09-12 10:55:21 +02:00
5f01aa1f2b
feat: Add eof heuristic for readera extractor
Every exported ReadEra annotation file also _ends_ with the ubiquitous
`*****` pattern, so we look for that to detect the file.
2025-09-12 10:55:21 +02:00
3ef45e24f7
feat: Add ReadEra extractor
For the readera epub/pdf reader application for android and ios.
2025-09-12 10:55:20 +02:00
5350b9215e
docs: Update README 2025-09-12 10:55:19 +02:00
1f65317d65
chore: Update to papis 0.14 2025-09-12 10:55:16 +02:00
7a69bd509f
chore: Remove redundant cast
Some checks failed
ci/woodpecker/push/lint Pipeline was successful
ci/woodpecker/push/static_analysis Pipeline failed
ci/woodpecker/push/test Pipeline failed
ci/woodpecker/manual/lint Pipeline was successful
ci/woodpecker/manual/static_analysis Pipeline failed
ci/woodpecker/manual/test Pipeline failed
2024-11-30 21:48:59 +01:00
ddaa75f44b
chore: Change woodpecker ci to use uv 2024-11-30 21:45:59 +01:00
9c80281220
fix: Respect minimum color similarity option
Previously we would always assign a minimum color similarity of 1.0,
regardless of the option set. Now we set a minimum similarity according
to the option set in the configuration, otherwise the default set for
that option and fall back to a simple default value declared at the top
of the file.
2024-11-30 21:45:29 +01:00
424ad34c68
refactor: Rename cli options for extractor and template
Renamed the extractor selection from the cli to '--input' since it
decides the various input formats that are used to gather annotations
from.
Renamed the template selection from the cli to '--output' since it
control the output format that annotations are displayed/written in.

This also somewhat more closely mirrors pandoc cli options, which are
generally a good guide to follow.
2024-11-30 12:14:55 +01:00
f690c5db51
chore: Simplify and update test dependencies 2024-11-30 11:48:00 +01:00
103c2ea2fc
chore: Switch to uv packaging and hatch backend
Switching this project over to the uv package manager as a pilot project
for my personal use.
Since this project is not yet widely used I can use it as an
experimental playground for discovering uv further without interrupting
anybody's workflow.
2024-11-15 11:28:50 +01:00
779519f580
fix: Only inform if no extractor finds valid files
Some checks failed
ci/woodpecker/push/lint Pipeline failed
ci/woodpecker/push/static_analysis Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
Until now whenever an extractor could not find any valid files for a
document it would inform the user of this case. However, this is not
very useful: if you have a pdf and an epub extractor running, it would
inform you for each document which only had one of the two formats as
well as those which actually did not have any valid files for *any* of
the extractors running.

This commit changes the behavior to only inform the user when none of
the running extractors find a valid file, since that is the actual case
a user might want to be informed about.
2024-06-14 21:50:55 +02:00
97b7ec0dc9
chore: Update dependencies 2024-06-14 15:19:57 +02:00
9e713193a8
refactor: Fix circular exception import 2024-06-14 15:18:22 +02:00
6b35b2f918
chore: Fix strict pyright analysis errors 2024-06-14 15:13:24 +02:00
8093259551
refactor: Remove pymupdf coupling in extraction
The library is only needed for pdf extraction which is taken care of
in its own extractor plugin. In the overall extraction routine we do not
need any knowledge of the existence of pymupdf.
2024-06-14 14:59:39 +02:00
7261e7d80c
chore: Refactor for strict pyright analysis 2024-06-13 21:20:53 +02:00
19599a66d7
chore: Black formatting 2024-06-12 11:46:39 +02:00
c2aec7add6
feat: Notify formatters if formatting first entry
This allows headers to be created by a formatter, which will
*only* be added to the very first entry created and not to
each entry. Currently for example this is used to create
a csv header but not for each document in turn.
2024-06-12 11:45:35 +02:00
9eb7399536
chore: Set strict typing mode for pyright lsp 2024-06-12 11:16:32 +02:00
b5c081fbf3
feat: Change count display to lead with count
The actual count is now the first item on each line,
to make it easier to sort, strip, delete and compare
afterwards.
2024-06-12 11:16:13 +02:00
d087c366c3
chore: Refactor markdown format string handling 2024-06-12 11:05:13 +02:00
c21ab4a76c
chore: Update dependencies
Some checks failed
ci/woodpecker/push/lint Pipeline failed
ci/woodpecker/push/static_analysis Pipeline failed
ci/woodpecker/push/test Pipeline was successful
Pin new versions of levenshtein and pymupdf to fix build process.

This also means updating from importing fitz to importing pymupdf soon
in the source.
2024-05-07 10:55:14 +02:00
5526b3d2c5
docs: Update limitation information 2024-05-07 10:54:11 +02:00
905b20a79c
fix: Default markdown atx formatter for note exporter
Some checks failed
ci/woodpecker/push/lint Pipeline failed
ci/woodpecker/push/static_analysis Pipeline failed
ci/woodpecker/push/test Pipeline was successful
2024-01-25 22:46:38 +01:00
163fd63038
fix: Fixed pocketbook extractor trying to read all files
The complete read routine would work before figuring out that it is
a file of xml mimetype. This means that it would try to read to memory
any file as the first thing, pdfs, even binaries. Of course doing
so crashed the program.
2024-01-25 21:42:34 +01:00
72ddaaf1bc
refactor: Extract exporters to separate module 2024-01-25 21:42:33 +01:00
c8e8453b68
feat: Add advanced pocketbook detection heuristic
Added heuristic which checks for the existence of a specific
meta tag written to the pocketbook XHTML file.
2024-01-24 14:57:10 +01:00
6a8f8a03bc
refactor: Extract pocketbook file opening method 2024-01-24 14:55:28 +01:00
86d53a19d4
chore: Fix import lint error 2024-01-24 13:39:01 +01:00
c2a5190237
refactor: Improve module availability checks
Followed ruff (Pyflakes) suggestion to use importlib utils directly
instead of try and erroring with imports.
2024-01-24 12:27:21 +01:00
2f41906e6a
docs: Add extractor and install info
Added extractor info for the two currently existing extractors.
Added install recommendation for pipx.
2024-01-24 12:21:51 +01:00
e7e5258b34
feat: Only activate pocketbook extractor optionally
Since we make the dependencies for pocketbook html extraction optional
as an extra, this commit ensures the extractor (and cli option) only
gets loaded when they exist.
2024-01-24 12:07:04 +01:00