The complete read routine would work before figuring out that it is
a file of xml mimetype. This means that it would try to read to memory
any file as the first thing, pdfs, even binaries. Of course doing
so crashed the program.
Since we make the dependencies for pocketbook html extraction optional
as an extra, this commit ensures the extractor (and cli option) only
gets loaded when they exist.
Renamed the two variables describing an annotation's highlighted PDF-text and
its appended note if any exists. Previously called 'text' (for the in-PDF
highlighted content) and 'content' (for the additional supplied content).
Now they are called 'content' for the IN PDF words, highlighted.
and 'note' for the appended note given (or not) in an annotation.
Extractor is a general protocol with the PDF extraction routine now being
one implementation of the protocol. Preparation for adding multiple
extractors (epub,djvu, or specific progammes) in the future.