papis-extract/papis_extract/formatter.py

from collections.abc import Callable

from papis_extract.annotation import AnnotatedDocument

Formatter = Callable[[list[AnnotatedDocument]], str]


def format_markdown(
    docs: list[AnnotatedDocument] = [], atx_headings: bool = False
) -> str:
    template = (
        "{{#tag}}#{{tag}}\n{{/tag}}"
        "{{#quote}}> {{quote}}{{/quote}}{{#page}} [p. {{page}}]{{/page}}"
        "{{#note}}\n  NOTE: {{note}}{{/note}}"
    )
    output = ""
    for entry in docs:
        if not entry.annotations:
            continue

        heading = f"{entry.document['title']} - {entry.document['author']}\n"
        if atx_headings:
            output += f"# {heading}\n"
        else:
            title_decoration = (
                f"{'=' * len(entry.document.get('title', ''))}   "
                f"{'-' * len(entry.document.get('author', ''))}"
            )
            output += f"{title_decoration}\n" f"{heading}" f"{title_decoration}\n\n"

        for a in entry.annotations:
            output += a.format(template)
            output += "\n\n"

        output += "\n\n\n"

    return output.rstrip()


def format_markdown_atx(docs: list[AnnotatedDocument] = []) -> str:
    return format_markdown(docs, atx_headings=True)


def format_markdown_setext(docs: list[AnnotatedDocument] = []) -> str:
    return format_markdown(docs, atx_headings=False)


def format_count(docs: list[AnnotatedDocument] = []) -> str:
    output = ""
    for entry in docs:
        if not entry.annotations:
            continue

        count = 0
        for _ in entry.annotations:
            count += 1

        d = entry.document
        output += (
            f"{d['author'] if 'author' in d else ''}"
            f"{' - ' if 'author' in d else ''}"  # only put separator if author
            f"{entry.document['title'] if 'title' in d else ''}: "
            f"{count}\n"
        )

    return output.rstrip()


def format_csv(docs: list[AnnotatedDocument] = []) -> str:
    header: str = "type,tag,page,quote,note,author,title,ref,file"
    template: str = (
        '{{type}},{{tag}},{{page}},"{{quote}}","{{note}}",'
        '"{{doc.author}}","{{doc.title}}","{{doc.ref}}","{{file}}"'
    )
    output = f"{header}\n"
    for entry in docs:
        if not entry.annotations:
            continue

        d = entry.document
        for a in entry.annotations:
            output += a.format(template, doc=d)
            output += "\n"

    return output.rstrip()


formatters: dict[str, Formatter] = {
    "count": format_count,
    "csv": format_csv,
    "markdown": format_markdown,
    "markdown_atx": format_markdown_atx,
    "markdown_setext": format_markdown_setext,
}
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`from collections.abc import Callable`
refactor: Move formatting logic to formatters Formatters (previously templates) were pure data containers before, continating the 'template' for how things should be formatted using mustache. The formatting would be done a) in the exporters and b) in the annotations. This spread of formatting has now been consolidated into the Formatter, which fixes the overall spread of formatting code and now can coherently format a whole output instead of just individual annotations. A formatter contains references to all documents and contained annotations and will format everything at once by default, but the formatting function can be invoked with reference to a specific annotated document to only format that. This commit should put more separation into the concerns of exporter and formatter and made formatting a concern purely of the formatters and annotation objects. 2023-09-19 19:43:19 +00:00
refactor: Move Annotations into annotation module 2023-09-20 15:22:29 +00:00			`from papis_extract.annotation import AnnotatedDocument`
refactor: Move formatting logic to formatters Formatters (previously templates) were pure data containers before, continating the 'template' for how things should be formatted using mustache. The formatting would be done a) in the exporters and b) in the annotations. This spread of formatting has now been consolidated into the Formatter, which fixes the overall spread of formatting code and now can coherently format a whole output instead of just individual annotations. A formatter contains references to all documents and contained annotations and will format everything at once by default, but the formatting function can be invoked with reference to a specific annotated document to only format that. This commit should put more separation into the concerns of exporter and formatter and made formatting a concern purely of the formatters and annotation objects. 2023-09-19 19:43:19 +00:00
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`Formatter = Callable[[list[AnnotatedDocument]], str]`
refactor: Move formatting logic to formatters Formatters (previously templates) were pure data containers before, continating the 'template' for how things should be formatted using mustache. The formatting would be done a) in the exporters and b) in the annotations. This spread of formatting has now been consolidated into the Formatter, which fixes the overall spread of formatting code and now can coherently format a whole output instead of just individual annotations. A formatter contains references to all documents and contained annotations and will format everything at once by default, but the formatting function can be invoked with reference to a specific annotated document to only format that. This commit should put more separation into the concerns of exporter and formatter and made formatting a concern purely of the formatters and annotation objects. 2023-09-19 19:43:19 +00:00

feat: Add atx-style markdown Added markdown with atx style headers, can be chosen as alternative markdown template on the cli. The existing 'markdown' template will still default to setext-style headers. 2023-09-21 20:01:51 +00:00			`def format_markdown(`
			`docs: list[AnnotatedDocument] = [], atx_headings: bool = False`
			`) -> str:`
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`template = (`
refactor: Move formatting logic to formatters Formatters (previously templates) were pure data containers before, continating the 'template' for how things should be formatted using mustache. The formatting would be done a) in the exporters and b) in the annotations. This spread of formatting has now been consolidated into the Formatter, which fixes the overall spread of formatting code and now can coherently format a whole output instead of just individual annotations. A formatter contains references to all documents and contained annotations and will format everything at once by default, but the formatting function can be invoked with reference to a specific annotated document to only format that. This commit should put more separation into the concerns of exporter and formatter and made formatting a concern purely of the formatters and annotation objects. 2023-09-19 19:43:19 +00:00			`"{{#tag}}#{{tag}}\n{{/tag}}"`
test: Fix formatting and annotation tests 2023-09-22 18:04:39 +00:00			`"{{#quote}}> {{quote}}{{/quote}}{{#page}} [p. {{page}}]{{/page}}"`
			`"{{#note}}\n NOTE: {{note}}{{/note}}"`
refactor: Move formatting logic to formatters Formatters (previously templates) were pure data containers before, continating the 'template' for how things should be formatted using mustache. The formatting would be done a) in the exporters and b) in the annotations. This spread of formatting has now been consolidated into the Formatter, which fixes the overall spread of formatting code and now can coherently format a whole output instead of just individual annotations. A formatter contains references to all documents and contained annotations and will format everything at once by default, but the formatting function can be invoked with reference to a specific annotated document to only format that. This commit should put more separation into the concerns of exporter and formatter and made formatting a concern purely of the formatters and annotation objects. 2023-09-19 19:43:19 +00:00			`)`
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`output = ""`
			`for entry in docs:`
			`if not entry.annotations:`
			`continue`

feat: Add atx-style markdown Added markdown with atx style headers, can be chosen as alternative markdown template on the cli. The existing 'markdown' template will still default to setext-style headers. 2023-09-21 20:01:51 +00:00			`heading = f"{entry.document['title']} - {entry.document['author']}\n"`
			`if atx_headings:`
			`output += f"# {heading}\n"`
			`else:`
			`title_decoration = (`
			`f"{'=' * len(entry.document.get('title', ''))} "`
			`f"{'-' * len(entry.document.get('author', ''))}"`
			`)`
			`output += f"{title_decoration}\n" f"{heading}" f"{title_decoration}\n\n"`

refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`for a in entry.annotations:`
			`output += a.format(template)`
test: Fix formatting and annotation tests 2023-09-22 18:04:39 +00:00			`output += "\n\n"`
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00
			`output += "\n\n\n"`

test: Fix formatting and annotation tests 2023-09-22 18:04:39 +00:00			`return output.rstrip()`
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00

feat: Add atx-style markdown Added markdown with atx style headers, can be chosen as alternative markdown template on the cli. The existing 'markdown' template will still default to setext-style headers. 2023-09-21 20:01:51 +00:00			`def format_markdown_atx(docs: list[AnnotatedDocument] = []) -> str:`
			`return format_markdown(docs, atx_headings=True)`


			`def format_markdown_setext(docs: list[AnnotatedDocument] = []) -> str:`
			`return format_markdown(docs, atx_headings=False)`


refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`def format_count(docs: list[AnnotatedDocument] = []) -> str:`
			`output = ""`
			`for entry in docs:`
			`if not entry.annotations:`
			`continue`

			`count = 0`
			`for _ in entry.annotations:`
			`count += 1`

			`d = entry.document`
			`output += (`
			`f"{d['author'] if 'author' in d else ''}"`
			`f"{' - ' if 'author' in d else ''}" # only put separator if author`
			`f"{entry.document['title'] if 'title' in d else ''}: "`
			`f"{count}\n"`
			`)`

test: Fix formatting and annotation tests 2023-09-22 18:04:39 +00:00			`return output.rstrip()`
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00

			`def format_csv(docs: list[AnnotatedDocument] = []) -> str:`
feat: Add CSV formatter Added formatter for csv-compatible syntax. The formatting is quite basic with no escaping happening should that be necessary. However, for an initial csv output it suffices for me. 2023-09-20 06:49:55 +00:00			`header: str = "type,tag,page,quote,note,author,title,ref,file"`
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`template: str = (`
feat: Add CSV formatter Added formatter for csv-compatible syntax. The formatting is quite basic with no escaping happening should that be necessary. However, for an initial csv output it suffices for me. 2023-09-20 06:49:55 +00:00			`'{{type}},{{tag}},{{page}},"{{quote}}","{{note}}",'`
			`'"{{doc.author}}","{{doc.title}}","{{doc.ref}}","{{file}}"'`
			`)`
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`output = f"{header}\n"`
			`for entry in docs:`
			`if not entry.annotations:`
			`continue`
feat: Add CSV formatter Added formatter for csv-compatible syntax. The formatting is quite basic with no escaping happening should that be necessary. However, for an initial csv output it suffices for me. 2023-09-20 06:49:55 +00:00
refactor: Make formatters functions Formatters have been classes so far which contained some data (the tamplate to use for formatting and the annotations and documents to format) and the actual formatting logic (an execute function). However, we can inject the annotations to be formatted and the templates so far are static only, so they can be simple variables (we can think about how to inject them at another point should it come up, no bikeshedding now). This way, we can simply pass around one function per formatter, which should make the code much lighter, easier to add to and especially less stateful which means less areas of broken interactions to worry about. 2023-09-21 19:54:24 +00:00			`d = entry.document`
			`for a in entry.annotations:`
			`output += a.format(template, doc=d)`
			`output += "\n"`
refactor: Move formatting logic to formatters Formatters (previously templates) were pure data containers before, continating the 'template' for how things should be formatted using mustache. The formatting would be done a) in the exporters and b) in the annotations. This spread of formatting has now been consolidated into the Formatter, which fixes the overall spread of formatting code and now can coherently format a whole output instead of just individual annotations. A formatter contains references to all documents and contained annotations and will format everything at once by default, but the formatting function can be invoked with reference to a specific annotated document to only format that. This commit should put more separation into the concerns of exporter and formatter and made formatting a concern purely of the formatters and annotation objects. 2023-09-19 19:43:19 +00:00
test: Fix formatting and annotation tests 2023-09-22 18:04:39 +00:00			`return output.rstrip()`
feat: Add atx-style markdown Added markdown with atx style headers, can be chosen as alternative markdown template on the cli. The existing 'markdown' template will still default to setext-style headers. 2023-09-21 20:01:51 +00:00

			`formatters: dict[str, Formatter] = {`
			`"count": format_count,`
			`"csv": format_csv,`
			`"markdown": format_markdown,`
			`"markdown_atx": format_markdown_atx,`
			`"markdown_setext": format_markdown_setext,`
			`}`