Got mad at Amazon breaking the ability to decrypt my library books via Kindle for PC and resurrected an old technique:
- Screen recording of Kindle for PC
- hit the space bar as fast as humanly possible to page through the entire book in like a minute
- use ffmpeg to remove duplicate frames from the resultant video
- use ffmpeg to dump out each frame/page to a separate file named after the page number
- use tesseract to do OCR on each page and dump that output into a single text file
This works ridiculously well.
(Simply holding down the space bar slows down page rendering, resulting in captures of partially rendered pages, which thwarts the duplicate detection. I'll probably improve this by firing my own spacebar press event sequentially.)