papis-extract

Author	SHA1	Message	Date
Marty Oehme	424ad34c68	refactor: Rename cli options for extractor and template Renamed the extractor selection from the cli to '--input' since it decides the various input formats that are used to gather annotations from. Renamed the template selection from the cli to '--output' since it control the output format that annotations are displayed/written in. This also somewhat more closely mirrors pandoc cli options, which are generally a good guide to follow.	2024-11-30 12:14:55 +01:00
Marty Oehme	103c2ea2fc	chore: Switch to uv packaging and hatch backend Switching this project over to the uv package manager as a pilot project for my personal use. Since this project is not yet widely used I can use it as an experimental playground for discovering uv further without interrupting anybody's workflow.	2024-11-15 11:28:50 +01:00
Marty Oehme	779519f580	fix: Only inform if no extractor finds valid files Some checks failed ci/woodpecker/push/lint Pipeline failed Details ci/woodpecker/push/static_analysis Pipeline was successful Details ci/woodpecker/push/test Pipeline was successful Details Until now whenever an extractor could not find any valid files for a document it would inform the user of this case. However, this is not very useful: if you have a pdf and an epub extractor running, it would inform you for each document which only had one of the two formats as well as those which actually did not have any valid files for any of the extractors running. This commit changes the behavior to only inform the user when none of the running extractors find a valid file, since that is the actual case a user might want to be informed about.	2024-06-14 21:50:55 +02:00
Marty Oehme	7261e7d80c	chore: Refactor for strict pyright analysis	2024-06-13 21:20:53 +02:00
Marty Oehme	d087c366c3	chore: Refactor markdown format string handling	2024-06-12 11:05:13 +02:00
Marty Oehme	905b20a79c	fix: Default markdown atx formatter for note exporter Some checks failed ci/woodpecker/push/lint Pipeline failed Details ci/woodpecker/push/static_analysis Pipeline failed Details ci/woodpecker/push/test Pipeline was successful Details	2024-01-25 22:46:38 +01:00
Marty Oehme	72ddaaf1bc	refactor: Extract exporters to separate module	2024-01-25 21:42:33 +01:00
Marty Oehme	2880c06f53	chore: Improve cli option help texts Fixed the write option text to be without wrong negation. Show default settings for flag options.	2024-01-23 09:27:26 +01:00
Marty Oehme	a51205954c	refactor: Extract extractor list to extractor module	2024-01-23 09:21:46 +01:00
Marty Oehme	629932a5e8	feat: Loop through all chosen extractors	2024-01-23 09:10:42 +01:00
Marty Oehme	f477deea7c	feat: Add extractor cli choice Can only choose pdf for the time being, but allows additional extractors to be added in the future.	2024-01-23 08:58:32 +01:00
Marty Oehme	765de505bb	refactor: Remove AnnotatedDocument class The AnnotatedDocument class was, essentially, a simple tuple of a document and a list of annotations. While not bad in a vacuum, it is unwieldy and passing this around instead of a document, annotations, or both where necessary is more restrictive and frankly unnecessary. This commit removes the data class and any instances of its use. Instead, we now pass the individual components around to anything that needs them. This also frees us up to pass only annotations around for example. We also do not iterate through the selected papis documents to work on in each exporter anymore (since we only pass a single document), but in the main function itself. This leads to less duplication and makes the overall run function the overall single source of iteration through selected documents. Everything else only knows about a single document - the one it is operating on - which seems much neater. For now, it does not change much, but should make later work on extra exporters or extractors easier.	2024-01-20 16:36:24 +01:00
Marty Oehme	5cd5a05062	chore: Fix black fmt Some checks failed ci/woodpecker/push/test unknown status Details ci/woodpecker/push/lint Pipeline was successful Details ci/woodpecker/push/static_analysis Pipeline was successful Details	2023-10-17 22:07:09 +02:00
Marty Oehme	aeb18ae358	feat: Add option to force-add annotations Will turn off looking for duplicate annotations and simply add all.	2023-10-17 22:05:11 +02:00
Marty Oehme	ee4690f52b	feat: Add atx-style markdown Added markdown with atx style headers, can be chosen as alternative markdown template on the cli. The existing 'markdown' template will still default to setext-style headers.	2023-09-21 22:05:39 +02:00
Marty Oehme	7ee8d4911e	refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about.	2023-09-21 21:54:24 +02:00
Marty Oehme	3670f70319	docs: Add formatting documentation Added documentation on using output templates and that they will invalidate the 'existing' annotation search.	2023-09-20 09:15:00 +02:00
Marty Oehme	e511ffa48d	feat: Add CSV formatter Added formatter for csv-compatible syntax. The formatting is quite basic with no escaping happening should that be necessary. However, for an initial csv output it suffices for me.	2023-09-20 09:15:00 +02:00
Marty Oehme	5f0bc2ffad	feat: Add count formatter Added formatter which counts and outputs the number of annotations in each document.	2023-09-20 09:14:59 +02:00
Marty Oehme	5a6d672c76	refactor: Move formatting logic to formatters Formatters (previously templates) were pure data containers before, continating the 'template' for how things should be formatted using mustache. The formatting would be done a) in the exporters and b) in the annotations. This spread of formatting has now been consolidated into the Formatter, which fixes the overall spread of formatting code and now can coherently format a whole output instead of just individual annotations. A formatter contains references to all documents and contained annotations and will format everything at once by default, but the formatting function can be invoked with reference to a specific annotated document to only format that. This commit should put more separation into the concerns of exporter and formatter and made formatting a concern purely of the formatters and annotation objects.	2023-09-20 09:14:58 +02:00
Marty Oehme	cbe2e7cb03	feat: Allow cli option for template choice	2023-09-20 09:14:54 +02:00
Marty Oehme	4eb983d9e3	refactor: Move templating to separate file	2023-09-20 09:12:59 +02:00
Marty Oehme	5450776eb2	refactor: Extract templating to model module	2023-09-20 09:12:45 +02:00
Marty Oehme	e56f014136	Add formatting style Markdown	2023-08-31 21:40:17 +02:00
Marty Oehme	e325b89c9b	Move all extraction logic into extractor module The publically accessible default interface only contains the command line command interface and a single run function.	2023-08-29 12:40:36 +02:00
Marty Oehme	1af0f8f7bc	Remove default tag mappings Since meanings assigned to highlight colors are often very personal I do not want to make any assumptions about their use. Remove any default associations.	2023-08-28 16:41:59 +02:00
Marty Oehme	ff84a28c4a	Format code	2023-08-28 12:55:01 +02:00
Marty Oehme	df235caf8f	Fix logging formatting error	2023-08-28 12:52:51 +02:00
Marty Oehme	a22cc635b2	initial commit	2023-08-28 10:28:06 +02:00

29 commits