cmccambridge Posted March 18, 2021 Author Share Posted March 18, 2021 Excellent, glad to hear that it's working now Quote Link to comment
Br0Ser Posted April 2, 2021 Share Posted April 2, 2021 Hi all, do you see an option to rename the outputfiles regarding it's content? Or can I start my own script after ocr process and before the safe process? Thanks for the great Docker! Quote Link to comment
Chanchalanch Posted May 7, 2021 Share Posted May 7, 2021 Is is possible to dump screenshots like a png or Jpeg into the import folder and have the container interpret the file or does this only work with PDF's? Thanks! Quote Link to comment
cmccambridge Posted May 8, 2021 Author Share Posted May 8, 2021 @Br0Ser Apologies for missing this a month ago. There's no mechanism for renaming based on content or integrating external code within ocrmypdf-auto today, but you can use it as one stage in a "pipeline." If you've got your own script, you can have ocrmypdf-auto write to an intermediate directory, then have your script pick up new files from the intermediate location, rename or edit them by content, and write to the final output/storage location. I do this with paperless-ng as the next stage of my pipeline. @Chanchalanch Only supports PDFs today... There have been a few requests for supporting image files, but I don't expect to have the time to implement this myself in the near future. If you're handy with Python, I'm very happy to take a pull request Quote Link to comment
Maddeen Posted September 16, 2021 Share Posted September 16, 2021 @cmccambridge is there any option to activate the feature "rotate image automatically"? Sometimes I have documents in landscape and I dont want to rotate them manually Quote Link to comment
Mihle Posted October 7, 2021 Share Posted October 7, 2021 Seems like this compresses my PDFs when it run more than I want to (I am perfectionist) The documentation on OCRmyPDF itself says there are settings for it, "Optimization", that says the command --optimize N (N is number depending on what you want) But I havent figured out how to do it in the Docker, so do anyone know how? Quote Link to comment
Toskache Posted October 7, 2021 Share Posted October 7, 2021 2 hours ago, Mihle said: Seems like this compresses my PDFs when it run more than I want to (I am perfectionist) The documentation on OCRmyPDF itself says there are settings for it, "Optimization", that says the command --optimize N (N is number depending on what you want) But I havent figured out how to do it in the Docker, so do anyone know how? Have you tried to set the option in the "ocr.config" (usually in appdata/ocrmypdf-auto)? Quote Link to comment
Mihle Posted October 10, 2021 Share Posted October 10, 2021 (edited) On 10/7/2021 at 10:57 PM, Toskache said: Have you tried to set the option in the "ocr.config" (usually in appdata/ocrmypdf-auto)? Tried, still compresses it to about 1,5mb per image in the PDF. (they are high resolution) Edited October 10, 2021 by Mihle Quote Link to comment
artwodeetwo Posted November 3, 2021 Share Posted November 3, 2021 What does "status 2" mean? I'm trying to integrate ocrmypdf into my Nextcloud by presenting the input & output directories as external storage (SMB) shares within Nextcloud directory structure. file1.pdf and file2.pdf are two copies of the same file, the first placed through Nextcloud and the second copied directly to the SMB share. The second one produces an output but the first one does not. For both files the SMB user is the same. 2021-11-02 09:43:13 - Processing: /input/file1.pdf -> /output/file1.pdf 2021-11-02 09:43:16 - Processing complete in 3.750000 seconds with status 2: /input/file1.pdf TESTOCR_PROCESS_RESULT/input/file1.pdf/output/file1.pdf��.750000 2021-11-02 09:46:18 - Processing: /input/file2.pdf -> /output/file2.pdf 2021-11-02 09:46:33 - Processing complete in 14.970000 seconds with status 0: /input/file2.pdf TESTOCR_PROCESS_RESULT/input/file2.pdf/output/file2.pdf�.970000 Quote Link to comment
artwodeetwo Posted November 5, 2021 Share Posted November 5, 2021 On 11/4/2021 at 10:02 AM, artwodeetwo said: What does "status 2" mean? Answering my own question: status 2 means it's not a PDF. I found out that NC was encrypting files placed into external storage, so while it still had a .pdf extension the file was no longer readable outside of my Nextcloud instance. I found the setting for external storage encryption and disabled that. Working well now. Quote Link to comment
Maddeen Posted December 1, 2021 Share Posted December 1, 2021 (edited) On 11/5/2021 at 5:30 AM, artwodeetwo said: Answering my own question: status 2 means it's not a PDF. I found out that NC was encrypting files placed into external storage, so while it still had a .pdf extension the file was no longer readable outside of my Nextcloud instance. I found the setting for external storage encryption and disabled that. Working well now. I'm facing similar problems @artwodeetwo. I uploaded about 10-12 scans via my HP scanner to my defined input folder but only 2 of the files are processed correctly and moved to the output directory. All other files staying in my input folder. For notice - this "scan to ocr-file process" worked for months (nearly a year) without any problems and I didn't changed a thing. I get the following output in my logs - did you find a doc where I can find what status 11 means? Thanks for any help. 2021-11-29 15:54:21 - Watching /input 2021-11-29 15:54:21 - Processing: /input/164211scan11282021.pdf -> /output/164211scan11282021.pdf 2021-11-29 15:54:21 - Processing: /input/164249scan11282021.pdf -> /output/164249scan11282021.pdf 2021-11-29 15:54:21 - Processing: /input/164322scan11282021.pdf -> /output/164322scan11282021.pdf 2021-11-29 15:54:28 - Processing complete in 7.730000 seconds with status -11: /input/164211scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164211scan11282021.pdf/output/164211scan11282021.pdf-11.730000 2021-11-29 15:54:28 - Processing: /input/164352scan11282021.pdf -> /output/164352scan11282021.pdf 2021-11-29 15:54:29 - Processing complete in 8.120000 seconds with status -11: /input/164322scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164322scan11282021.pdf/output/164322scan11282021.pdf-118.120000 2021-11-29 15:54:29 - Processing: /input/164531scan11282021.pdf -> /output/164531scan11282021.pdf 2021-11-29 15:54:30 - Processing complete in 9.170000 seconds with status -11: /input/164249scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164249scan11282021.pdf/output/164249scan11282021.pdf-119.170000 2021-11-29 15:54:30 - Processing: /input/164622scan11282021.pdf -> /output/164622scan11282021.pdf 2021-11-29 15:54:36 - Processing complete in 7.680000 seconds with status -11: /input/164531scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164531scan11282021.pdf/output/164531scan11282021.pdf-11.680000 2021-11-29 15:54:36 - Processing: /input/164836scan11282021.pdf -> /output/164836scan11282021.pdf 2021-11-29 15:54:41 - Processing complete in 4.700000 seconds with status -11: /input/164836scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164836scan11282021.pdf/output/164836scan11282021.pdf-11.700000 2021-11-29 15:54:41 - Processing: /input/165021scan11282021.pdf -> /output/165021scan11282021.pdf 2021-11-29 15:54:48 - Processing complete in 18.530000 seconds with status -11: /input/164622scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164622scan11282021.pdf/output/164622scan11282021.pdf-118.530000 2021-11-29 15:54:49 - Processing complete in 8.380000 seconds with status -11: /input/165021scan11282021.pdf TESTOCR_PROCESS_RESULT/input/165021scan11282021.pdf/output/165021scan11282021.pdf-118.380000 2021-11-29 15:55:01 - Processing complete in 32.790000 seconds with status -11: /input/164352scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164352scan11282021.pdf/output/164352scan11282021.pdf-11.790000 2021-11-29 15:55:19 - Processing: /input/164211scan11282021.pdf -> /output/164211scan11282021.pdf 2021-11-29 15:55:20 - Processing: /input/164249scan11282021.pdf -> /output/164249scan11282021.pdf 2021-11-29 15:55:21 - Processing: /input/164322scan11282021.pdf -> /output/164322scan11282021.pdf 2021-11-29 15:55:26 - Processing complete in 7.020000 seconds with status -11: /input/164211scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164211scan11282021.pdf/output/164211scan11282021.pdf-11.020000 2021-11-29 15:55:26 - Processing: /input/164352scan11282021.pdf -> /output/164352scan11282021.pdf 2021-11-29 15:55:29 - Processing complete in 7.800000 seconds with status -11: /input/164322scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164322scan11282021.pdf/output/164322scan11282021.pdf-11.800000 2021-11-29 15:55:29 - Processing: /input/164531scan11282021.pdf -> /output/164531scan11282021.pdf 2021-11-29 15:55:30 - Processing complete in 9.280000 seconds with status -11: /input/164249scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164249scan11282021.pdf/output/164249scan11282021.pdf-119.280000 2021-11-29 15:55:30 - Processing: /input/164622scan11282021.pdf -> /output/164622scan11282021.pdf 2021-11-29 15:55:36 - Processing complete in 7.390000 seconds with status -11: /input/164531scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164531scan11282021.pdf/output/164531scan11282021.pdf-11.390000 2021-11-29 15:55:36 - Processing: /input/164836scan11282021.pdf -> /output/164836scan11282021.pdf 2021-11-29 15:55:42 - Processing complete in 5.400000 seconds with status -11: /input/164836scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164836scan11282021.pdf/output/164836scan11282021.pdf-11.400000 2021-11-29 15:55:42 - Processing: /input/165021scan11282021.pdf -> /output/165021scan11282021.pdf 2021-11-29 15:55:49 - Processing complete in 19.000000 seconds with status -11: /input/164622scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164622scan11282021.pdf/output/164622scan11282021.pdf-119.000000 2021-11-29 15:55:50 - Processing complete in 8.620000 seconds with status -11: /input/165021scan11282021.pdf TESTOCR_PROCESS_RESULT/input/165021scan11282021.pdf/output/165021scan11282021.pdf-118.620000 2021-11-29 15:55:58 - Processing complete in 31.450000 seconds with status -11: /input/164352scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164352scan11282021.pdf/output/164352scan11282021.pdf-11.450000 2021-11-29 15:56:34 - Signal 15 (SIGTERM) Received. Shutting down... 2021-11-29 15:56:40 - Watching /input 2021-11-29 15:56:40 - Processing: /input/164211scan11282021.pdf -> /output/164211scan11282021.pdf 2021-11-29 15:56:40 - Processing: /input/164249scan11282021.pdf -> /output/164249scan11282021.pdf 2021-11-29 15:56:40 - Processing: /input/164322scan11282021.pdf -> /output/164322scan11282021.pdf 2021-11-29 15:56:48 - Processing complete in 7.270000 seconds with status 7: /input/164211scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164211scan11282021.pdf/output/164211scan11282021.pdf.270000 2021-11-29 15:56:48 - Processing: /input/164352scan11282021.pdf -> /output/164352scan11282021.pdf 2021-11-29 15:56:48 - Processing complete in 8.190000 seconds with status -11: /input/164322scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164322scan11282021.pdf/output/164322scan11282021.pdf-118.190000 2021-11-29 15:56:48 - Processing: /input/164531scan11282021.pdf -> /output/164531scan11282021.pdf 2021-11-29 15:56:50 - Processing complete in 9.810000 seconds with status -11: /input/164249scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164249scan11282021.pdf/output/164249scan11282021.pdf-119.810000 2021-11-29 15:56:50 - Processing: /input/164622scan11282021.pdf -> /output/164622scan11282021.pdf 2021-11-29 15:56:56 - Processing complete in 7.190000 seconds with status -11: /input/164531scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164531scan11282021.pdf/output/164531scan11282021.pdf-11.190000 2021-11-29 15:56:56 - Processing: /input/164836scan11282021.pdf -> /output/164836scan11282021.pdf 2021-11-29 15:57:01 - Processing complete in 5.110000 seconds with status -11: /input/164836scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164836scan11282021.pdf/output/164836scan11282021.pdf-11.110000 2021-11-29 15:57:01 - Processing: /input/165021scan11282021.pdf -> /output/165021scan11282021.pdf 2021-11-29 15:57:08 - Processing complete in 17.850000 seconds with status -11: /input/164622scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164622scan11282021.pdf/output/164622scan11282021.pdf-11.850000 2021-11-29 15:57:10 - Processing complete in 8.970000 seconds with status -11: /input/165021scan11282021.pdf TESTOCR_PROCESS_RESULT/input/165021scan11282021.pdf/output/165021scan11282021.pdf-118.970000 2021-11-29 15:57:20 - Processing complete in 32.400000 seconds with status -11: /input/164352scan11282021.pdf TESTOCR_PROCESS_RESULT/input/164352scan11282021.pdf/output/164352scan11282021.pdf-11.400000 Edited December 1, 2021 by Maddeen Quote Link to comment
artwodeetwo Posted December 2, 2021 Share Posted December 2, 2021 I didn’t find any docs on what each status code meant, but I was able to root cause my issue by setting the ocrmypdf docker to run in debug mode. Environment variable: OCR_VERBOSITY set to “debug”. 1 Quote Link to comment
ANAZHTHTHS Posted May 20, 2023 Share Posted May 20, 2023 Helllo, I am new here. I used the app with english and greek pdfs and it works fine. In greek it is not as good as other solutions. I would like to ask: When I put (in the input folder) a very large file (e.g. 600mb) it takes an hour to convert and it uses less than 10% of cpu. Can we force it to use more cpu resources to accelerate the process? Thank you. PS: I have an 8 core 16 threads cpu and the other apps and Unraid works in 1%. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.