2hoarder/wallabag2hoarder
Marty Oehme 1017b876e9
Attempt to calculate annotation offsets
There are clear issues remaining with this approach.

The wallabag-given 'start' and 'end' fields do _not_ just point to the
n-th paragraph all the time (like I thought) but actually represent a
beautifulsoup4 like tree descent.

So:  `p_start_match = re.match(r"/p\[(\d+)\]", annot["ranges"][0]["start"])`
will fail on any annotation not just at the n-th paragraph.

Instead we should see how we can move this tree into the beautifulsoup4
parser and make use of wallabag already having done the work for us?
2025-03-12 20:29:11 +01:00
..
__init__.py Refactor wallabag conversion to have simple cli 2025-03-12 20:24:47 +01:00
base.py Refactor wallabag conversion to have simple cli 2025-03-12 20:24:47 +01:00
convert.py Add API converter 2025-03-12 20:24:48 +01:00
convert_api.py Attempt to calculate annotation offsets 2025-03-12 20:29:11 +01:00
convert_native_json.py Add API converter 2025-03-12 20:24:48 +01:00
convert_netscape.py Refactor wallabag conversion to have simple cli 2025-03-12 20:24:47 +01:00
README.md Refactor wallabag conversion to have simple cli 2025-03-12 20:24:47 +01:00

wallabag2hoarder

Currently supports 2 conversions:

  • ./convert_netscape.py: Converts into the 'netscape bookmark' format which hoarder should understand as 'html' import. It's a very lossy conversion, essentially only retaining url, title and creation time. Not tested well.

  • ./convert_native_json.py: Uses the fact that wallabag outputs json and hoarder supports a native json export/import to transform the json into one that hoarder understands well. More tested, and works without a hitch, however does not correctly transfer any annotations made in wallabag. Annotations are added as a simple json object to the 'note' field in hoarder.

  • ./convert_api.py: WIP: Uses the public hoader API to move the wallabag articles over, including annotations at a best-effort. Annotation support is a little behind the curve in hoarder -- we can only have highlights, not 'notes' attached to a highlight.