cmccambridge

Members
  • Posts

    17
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

cmccambridge's Achievements

Newbie

Newbie (1/14)

6

Reputation

  1. @Br0Ser Apologies for missing this a month ago. There's no mechanism for renaming based on content or integrating external code within ocrmypdf-auto today, but you can use it as one stage in a "pipeline." If you've got your own script, you can have ocrmypdf-auto write to an intermediate directory, then have your script pick up new files from the intermediate location, rename or edit them by content, and write to the final output/storage location. I do this with paperless-ng as the next stage of my pipeline. @Chanchalanch Only supports PDFs today... There have been a few requests for supporting image files, but I don't expect to have the time to implement this myself in the near future. If you're handy with Python, I'm very happy to take a pull request
  2. @Chad Kunsman Can you try the steps from this post, and see whether you are impacted by the same issue? If that's not it, i.e. if the repository is already correct, please let me know: I'm not on the 6.9 series yet as I haven't hit an opportunity where I could do the upgrade and have time to sort out what breaks... 🤞 Hoping my own container isn't one of the breaking changes.
  3. @Calimero It is, yes! I use my scanner in similar fashion to separate gray versus color documents. Two ideas to try: If there's a shared parent directory that includes all of your scanner profiles, you can just use that shared parent as /input or /output (I run my scanner this way) If there is no shared parent in the "real world" (meaning the real paths that your profiles point to), then you can create one within the container by mapping each input (and each output) directory to a sub-directory within the container, like this: For each input folder, add an extra "Path" mapping like: /mnt/user/one_share/path to /input/one /mnt/user/another_share/path to /input/two Likewise for each output folder (if you want the outputs separated), you can add an extra "Path" mapping like: /mnt/user/out_share/path to /output/one /mnt/user/other_out_share/path to /output/two The container will match your /input folder hierarchy into the /output location as the container sees it, as long as you are using the default OCR_OUTPUT_MODE = MIRROR_TREE setting... so /input/one/doc.pdf --> /output/one/doc.pdf, or /input/two/foo.pdf --> /output/two/foo.pdf. You just need to map your shares into the container so that all the inputs end up under /input, and the corresponding /output locations map back to where you would like them to land. Good luck! Let me know how it goes
  4. Hello all, BREAKING CHANGE ANNOUNCEMENT! If you have never customized your Mosquitto container, you MUST take action after the next container update! Starting with Mosquitto 2.0.0, the Eclipse project requires explicit configuration to enable listening on non-loopback network interfaces and to allow anonymous (unauthenticated) connections to the MQTT server. This change was made by the Eclipse project to harden their security posture, and is documented in the 2.0.0 release announcement and the official Eclipse migration guide. The unRAID container is taking this breaking change as well, though I'm trying to make it as painless as possible... Three options for you to consider, in order of increasing difficulty: Set the RUN_INSECURE_MQTT_SERVER variable to return to the old default/insecure 1.x settings: A MQTT server on port 1883 that allows anonymous connections. If you have never customized Mosquitto configuration files, this will be your easiest option! In Docker Manager, click "Check for Updates" and wait for it to complete. Edit your Mosquitto-unraid container configuration You should see a new variable field, "Run Insecure MQTT Server". Change the value from 0 to 1. Save your changes. If available, run the Mosquitto container update. Edit the mosquitto-unraid-default.conf file in the configuration directory. If you would like more hands-on customization of this new security posture, give this a try. Run the Mosquitto container update. You will find a generated file mosquitto-unraid-default.conf in your "Config" path, e.g. /mnt/user/appdata/mosquitto/mosquitto-unraid-default.conf Read the file, select an option to enable, and save your changes. Restart the Mosquitto container. Configure the Mosquitto container to your liking (or update your existing configuration), including at least one listener. For advanced users. Refer to the mosquitto-unraid documentation or the official Mosquitto configuration docs. Restart the Mosquitto container after making your changes. Let me know if you have any questions, and apologies for the disruption! I debated a long time about whether I could somehow work around this breaking change, and in the end concluded that I cannot... without knowing how you may have customized Mosquitto, any automatic change I might attempt could be even more disruptive than this change, so I have erred on the side of caution in propagating the breaking change. Hopefully the pre-generated configuration options in mosquitto-unraid-default.conf or the override RUN_INSECURE_MQTT_SERVER option will provide you a quick way to resume previous functionality!
  5. Great to hear @sunpower, @nik82! Thanks for letting me know that it's working for you - now I can be confident that I correctly understood the problem I will file an issue in my github backlog here to try and provide an explicit warning (or maybe even an error) from the container logs if it detects permissions that are unlikely to work properly... Since the /output directory is mapped to another location (e.g. appdata or a share), the permissions are set and controlled from outside ocrmypdf-auto... but I can try and at least observe them and provide a recommendation for folks to go use the "Docker Safe New Perms" tool. Nice detective work! - @cmccambridge
  6. Hi @Toskache, @nik82, @sunpower, Apologies for being absent this past week, and thanks for your patience! Thank you for letting me know about this issue. From your description and screenshots, I believe I understand the problem, and have deployed a fix to resolve the "Not Available" message under the Version column on the Docker tab. However, since the bug is related to checking for and applying updates... we may need to be a little more hands-on to repair your individual Docker containers. Details of the bug at the bottom, but first, let's get your Docker containers fixed.... this process will be one-time only to get past this update-related issue. From your Docker tab, find the ocrmypdf-auto docker container that you have configured. It will be shown as in @nik82's screenshot, with the "Not Available" status in the Version column. Click on your ocrmypdf-auto docker container and select Edit From the editor, find the field named Repository and edit the value to be exactly: cmccambridge/ocrmypdf-auto Please note that you will likely be deleting the text "quay.io/" from the beginning of this field. At the bottom of the page, click Apply. Unraid will restart your container and update it to point to the Docker Hub repository instead of quay.io. This will also update the image if your local container was not previously up to date. After returning to the main Unraid Docker tab, click the "Check for Updates" button at the bottom of the page. This process will take a few moments. You should now see proper version information for ocrmypdf-auto: That's it! Let me know if you continue to experience any issues. More technical details, for the curious: The container itself was not experiencing any problems. You should have noticed that it was still shown as ▶️started and was operating correctly. The trouble lies in Unraid's docker manager page, which is designed for the standard/first-party container repository Docker Hub. Although Unraid's docker manager does support installing and running a container from a third-party container repository like quay.io, it does not properly handle version and update checks for such a third-party repository. I learned about this rather too late, and had already deployed a few Community Applications using quay.io 🤦‍♂️ ocrmypdf-auto has not had any feature changes since I learned of the version reporting issue, so I had never gone through the migration process to Docker Hub, and Unraid's docker page simply reported "Update available" in many cases when no update was actually needed. For better or worse, the latest Unraid version is improved to be aware of the failure to query version information from third party repositories, and shows this failure as the "Not Available" message we've all been seeing on ocrmypdf-auto lately. The Fix: I've migrated the ocrmypdf-auto repository from quay.io to be mirrored at Docker Hub I've updated my Community App template for ocrmypdf-auto to refer to the Docker Hub version of the container, so all future users will never experience this issue. I've documented the manual remediation steps for you all, as I'm not confident that the update to the Community App template will necessarily be automatically applied for you, given that the bug is related to checking for updates, and I don't want anyone stuck with a broken container! Thanks again for reporting this issue, - @cmccambridge
  7. @sunpower - It sounds like you got it sorted out, but let me know if you are still having any issues!
  8. Ugh, great catch. Thanks for letting me know @JasonJoel! This bug has been there since I migrated from quay.io in April 2019 and I never noticed, because I made the same edit locally that you've just done... Good news is your fix is exactly correct. I've committed the same to the template repository, so other new installs and folks accidentally pinned at 1.5.8 should receive the update soon as well. Fixed with this PR.
  9. @jonp or @trurl could you move this over to the Docker Containers support thread section? @Squid Anything extra I need to do on the CA side for a second template in my existing repository? Thanks all for all of your help!
  10. Application: mosquitto-unraid Overview: Container for eclipse-mosquitto with Unraid ease-of-use tweaks Docker: https://quay.io/repository/cmccambridge/mosquitto-unraid Application GitHub: https://github.com/cmccambridge/mosquitto-unraid This container is a minimal port of the official Eclipse Mosquitto Docker container with minor tweaks to work more conveniently in Unraid. Breaking Change Announcement (2021-01-20): Starting with Version 2.0.0, this container requires manual configuration to run! See more details below: For details on how to configure the container, please see the README on GitHub! You can configure: Persistent Data Logging Authentication TLS Websockets Questions? Post any other questions or issues relating to this Docker container on Unraid in this thread, or by opening an issue on GitHub.
  11. Hi @Abigel - couple of thoughts... First and most important: the tesseract OCR engine that is used by ocrmypdf-auto really isn't optimized for handwriting. It's designed for typeset / printed text which has properties that make it "easier to read" like consistent letter shapes, letter spacing, word spacing, line breaks, etc. You can read all the gory details on the tesseract homepage, or explore some of the academic research efforts to extend the engine to handwriting, but the short version to my understanding is: handwriting recognition is a lot harder than recognizing typeset text. That said... here would be my best tips Sorry that I don't have any solution to this problem... Your example image appears to be a cellphone photo of a page of text. This should work, but you will probably get better results the closer your image looks to a black-and-white piece of paper. Tesseract has tips on improving recognition by improving image quality. For example, in your files the handwriting is blue on a tan background... this is clear to a human, but not as obvious to a computer. It will be easier for the computer to understand if all text is black on white. If you have access to a scanner, I would try that instead of a phone camera, since the scanner will remove some of the artificial "room coloring" that a phone camera sees. Or, convert your phone image to black and white and increase the contrast before trying to run OCR on it, so that the text versus background are very clear for the computer. Since the OCR engine does try to recognize full words, not just individual letters, it's important to tell it what language(s) to expect. This is what the OCR_LANGUAGES variable is for. In your case, since you're writing in German, I would try setting OCR_LANGUAGES="deu" to install the German language data, rather than the default of English. And as a side tip... the best program I've ever seen for recognizing actual handwriting is a somewhat unexpected one: Microsoft OneNote. This may not be helpful to you at all unless you have a Windows computer, but it could be worth a try :). I am not sure whether it will do as good a job recognizing handwriting in a photo as it does recognizing direct pen input on a tablet, though... I did a quick experiment with some of the tips above, and got slightly better results... enough that it might be worth it for you to keep experimenting? Up to you Converted your image to black and white. Increased the contrast until the handwriting was very black and the background was very white. Ran ocrmypdf-auto with OCR_LANGUAGES=deu The result was partial recognition: Das Haus ı5+ gem. Best of luck! input_black_white.pdf output_black_white.pdf
  12. I'm glad that it's working now for you, @Abigel - I believe I know what the problem was there, and will get an update posted so that other folks don't run into the same problem down the road. Thanks for reporting this! Re: handwriting recognition... This isn't really the intended purpose for tesseract, the OCR program that ocrmypdf-auto is using internally. I have limited success with recognizing block letter handwriting, such as the attached example... you can see that it mostly recognized the block letters (mistook "IS" for "1S"), did similarly OK on mixed upper and lowercase printed letters (mistook "Hello" for "Yello" and got some capitalization wrong), and did poorly on cursive lettering. If you want to research this further, here's a link I had found regarding academic research into customizing tesseract for handwriting recognition... it sounds like the accuracy is not very good: https://stackoverflow.com/questions/39556443/using-tesseract-for-handwriting-recognition Note: If it wasn't clear from the documentation or your experience with ocrmypdf-auto, there's one thing I should clarify: The program intentionally does not change the input image of the PDF itself, other than some minor quality enhancements like deskewing, etc. Instead, the program only adds an extra invisible "text layer" to the output PDF that lets you search for and highlight recognized text. For example, if you highlight all the handwriting in the output sample here, you can copy and paste the following "recognized" text: HELLO, THIS 1S AN OCR TEST. Yello, this is an OcR test. Alle, thin ko am OCR et. OCR Test Input.pdf OCR Test Output.pdf
  13. Hi @Abigel, sorry to hear you're having issues... I'm away on vacation at the moment and so don't have access to a computer to debug, but one thing comes to mind to try. The most recent change made to the code was regarding support for multiple languages. Perhaps we introduced a bug there that didn't surface until now. Could you try explicitly setting OCR_LANGUAGES="enu" (or your language of choice) even though it's supposed to work correctly without? Let me know if that changes anything...