Have you ever needed to extract text from images embedded in a #PDF? I can highly recommend the open source #CLI tool #OCRmyPDF which is easy to automate in for example a #DataPipeline.
It uses #Tesseract#OCR under the hood and has many options to experiment with to get the best possible accuracy for your language and PDF content.
You can get started with just a few commands:
https://samuelplumppu.se/blog/automated-text-extraction-from-pdf-images-with-ocrmypdf