Hey, #AskFedi... I'm trying to covert PDFs to ODTs (on Debian) with some semblance of formatting retained. So far, I've tried LibreOffice Draw, pdf2odt, pdftotext, and Calibre. All of them either give me images or garbled text. Has anyone got this working?
Post
Replies:
6
@Steve are the original PDF actual text PDFs or just scanned documents? In the latter case you'll need some OCR tool
@Steve IME there is no clean path for this. Only the most recent iterations of the PDF standard even support the concept of "this text in order" rather than "paint a string here".
@RogerBW MS Word does provide some formatting of converted PDFs, but I suspect it's reconstructing it, as opposed to preserving it.
It doesn't help matters that I'm working on scans. I can OCR them, but the structure of the resulting text is divided up in truly bizarre ways.