There are clear issues remaining with this approach. The wallabag-given 'start' and 'end' fields do _not_ just point to the n-th paragraph all the time (like I thought) but actually represent a beautifulsoup4 like tree descent. So: `p_start_match = re.match(r"/p\[(\d+)\]", annot["ranges"][0]["start"])` will fail on any annotation not just at the n-th paragraph. Instead we should see how we can move this tree into the beautifulsoup4 parser and make use of wallabag already having done the work for us? |
||
---|---|---|
.. | ||
__init__.py | ||
base.py | ||
convert.py | ||
convert_api.py | ||
convert_native_json.py | ||
convert_netscape.py | ||
README.md |
wallabag2hoarder
Currently supports 2 conversions:
-
./convert_netscape.py: Converts into the 'netscape bookmark' format which hoarder should understand as 'html' import. It's a very lossy conversion, essentially only retaining url, title and creation time. Not tested well.
-
./convert_native_json.py: Uses the fact that wallabag outputs json and hoarder supports a native json export/import to transform the json into one that hoarder understands well. More tested, and works without a hitch, however does not correctly transfer any annotations made in wallabag. Annotations are added as a simple json object to the 'note' field in hoarder.
-
./convert_api.py: WIP: Uses the public hoader API to move the wallabag articles over, including annotations at a best-effort. Annotation support is a little behind the curve in hoarder -- we can only have highlights, not 'notes' attached to a highlight.