Jump to content


  • Content Count

  • Joined

  • Last visited

Community Reputation

0 Neutral

About cmccambridge

  • Rank

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. cmccambridge

    [Support] cmccambridge - ocrmypdf-auto

    Hi @Abigel, sorry to hear you're having issues... I'm away on vacation at the moment and so don't have access to a computer to debug, but one thing comes to mind to try. The most recent change made to the code was regarding support for multiple languages. Perhaps we introduced a bug there that didn't surface until now. Could you try explicitly setting OCR_LANGUAGES="enu" (or your language of choice) even though it's supposed to work correctly without? Let me know if that changes anything...
  2. cmccambridge

    [Support] cmccambridge - ocrmypdf-auto

    Excellent, thanks very much for your help!
  3. cmccambridge

    [Support] cmccambridge - ocrmypdf-auto

    Great catch @trurl, thanks! The unRAID template is already up and running here, but I had forgotten to go back and tidy up my TODO list. There is now unRAID-specific documentation in the project's README file that describes the recommended container settings for anybody not installing via the defaults in the unRAID template directly. https://github.com/cmccambridge/ocrmypdf-auto/blob/master/README.md#unraid-integration (Note: At the moment, I've still got a few open questions to @Squid about that template in a DM, so I don't believe that it is live in CA just yet... feel free to wait on moving this thread until the template is live.)
  4. Application: ocrmypdf-auto Overview: Automatic OCR of image PDFs from an input directory to an output directory using ocrmypdf and the latest tesseract. Docker: https://quay.io/repository/cmccambridge/ocrmypdf-auto Application GitHub: https://github.com/cmccambridge/ocrmypdf-auto This container automates one stage in a "paperless" document processing pipeline: Take all the PDFs in some input folder, run OCR on them, and save the output to an output folder. It combines the excellent tools ocrmypdf and tesseract with file-monitoring and some new configurability. For example, you could configure a wireless document scanner to save all images to one share on your unRAID server, and use this container to monitor all new incoming files, OCR them, and write the finished (searchable!) PDFs to another share: For details on how to configure the container and ocrmypdf to tweak OCR behavior, please see the README on GitHub! You can configure: What options (per-folder) to pass to ocrmypdf e.g. one folder for clean, normal page size grayscale scans from the document scanner e.g. a separate folder for skewed, poor contrast receipts from a phone app e.g. a separate folder for multi-language scans What to do with original files after OCR Archive them to a 2nd output folder? Delete them? Where to store temporary files By default, within the container Or: configure your own high-speed temporary path (cache disk, ramdisk, etc.) Questions? Post any other questions or issues relating to this Docker container in this thread.