[Support] cmccambridge - ocrmypdf-auto


Recommended Posts

  • 3 weeks later...
  • 1 month later...

@Br0Ser Apologies for missing this a month ago. There's no mechanism for renaming based on content or integrating external code within ocrmypdf-auto today, but you can use it as one stage in a "pipeline." If you've got your own script, you can have ocrmypdf-auto write to an intermediate directory, then have your script pick up new files from the intermediate location, rename or edit them by content, and write to the final output/storage location. I do this with paperless-ng as the next stage of my pipeline.

 

@Chanchalanch Only supports PDFs today... There have been a few requests for supporting image files, but I don't expect to have the time to implement this myself in the near future. If you're handy with Python, I'm very happy to take a pull request :)

Link to comment
  • 4 months later...
  • 3 weeks later...

Seems like this compresses my PDFs when it run more than I want to (I am perfectionist)

 

The documentation on OCRmyPDF itself says there are settings for it, "Optimization", that says the command

--optimize N

(N is number depending on what you want)

But I havent figured out how to do it in the Docker, so do anyone know how?

Link to comment
2 hours ago, Mihle said:

Seems like this compresses my PDFs when it run more than I want to (I am perfectionist)

 

The documentation on OCRmyPDF itself says there are settings for it, "Optimization", that says the command

--optimize N

(N is number depending on what you want)

But I havent figured out how to do it in the Docker, so do anyone know how?

Have you tried to set the option in the "ocr.config" (usually in appdata/ocrmypdf-auto)?

Link to comment
  • 4 weeks later...

What does "status 2" mean? 

 

I'm trying to integrate ocrmypdf into my Nextcloud by presenting the input & output directories as external storage (SMB) shares within Nextcloud directory structure. file1.pdf and file2.pdf are two copies of the same file, the first placed through Nextcloud and the second copied directly to the SMB share. The second one produces an output but the first one does not.  For both files the SMB user is the same.

 

2021-11-02 09:43:13 - Processing: /input/file1.pdf -> /output/file1.pdf
2021-11-02 09:43:16 - Processing complete in 3.750000 seconds with status 2: /input/file1.pdf
TESTOCR_PROCESS_RESULT/input/file1.pdf/output/file1.pdf��.750000
2021-11-02 09:46:18 - Processing: /input/file2.pdf -> /output/file2.pdf
2021-11-02 09:46:33 - Processing complete in 14.970000 seconds with status 0: /input/file2.pdf
TESTOCR_PROCESS_RESULT/input/file2.pdf/output/file2.pdf�.970000
Link to comment
On 11/4/2021 at 10:02 AM, artwodeetwo said:

What does "status 2" mean? 

 

Answering my own question: status 2 means it's not a PDF. I found out that NC was encrypting files placed into external storage, so while it still had a .pdf extension the file was no longer readable outside of my Nextcloud instance. I found the setting for external storage encryption and disabled that. Working well now. 

Link to comment
  • 4 weeks later...
On 11/5/2021 at 5:30 AM, artwodeetwo said:

Answering my own question: status 2 means it's not a PDF. I found out that NC was encrypting files placed into external storage, so while it still had a .pdf extension the file was no longer readable outside of my Nextcloud instance. I found the setting for external storage encryption and disabled that. Working well now. 

 

I'm facing similar problems @artwodeetwo.

I uploaded about 10-12 scans via my HP scanner to my defined input folder but only 2 of the files are processed correctly and moved to the output directory. All other files staying in my input folder. 

For notice - this "scan to ocr-file process" worked for months (nearly a year) without any problems and I didn't changed a thing. 

 

I get the following output in my logs - did you find a doc where I can find what status 11 means?

Thanks for any help.

 

2021-11-29 15:54:21 - Watching /input
2021-11-29 15:54:21 - Processing: /input/164211scan11282021.pdf -> /output/164211scan11282021.pdf
2021-11-29 15:54:21 - Processing: /input/164249scan11282021.pdf -> /output/164249scan11282021.pdf
2021-11-29 15:54:21 - Processing: /input/164322scan11282021.pdf -> /output/164322scan11282021.pdf
2021-11-29 15:54:28 - Processing complete in 7.730000 seconds with status -11: /input/164211scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164211scan11282021.pdf/output/164211scan11282021.pdf-11.730000
2021-11-29 15:54:28 - Processing: /input/164352scan11282021.pdf -> /output/164352scan11282021.pdf
2021-11-29 15:54:29 - Processing complete in 8.120000 seconds with status -11: /input/164322scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164322scan11282021.pdf/output/164322scan11282021.pdf-118.120000
2021-11-29 15:54:29 - Processing: /input/164531scan11282021.pdf -> /output/164531scan11282021.pdf
2021-11-29 15:54:30 - Processing complete in 9.170000 seconds with status -11: /input/164249scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164249scan11282021.pdf/output/164249scan11282021.pdf-119.170000
2021-11-29 15:54:30 - Processing: /input/164622scan11282021.pdf -> /output/164622scan11282021.pdf
2021-11-29 15:54:36 - Processing complete in 7.680000 seconds with status -11: /input/164531scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164531scan11282021.pdf/output/164531scan11282021.pdf-11.680000
2021-11-29 15:54:36 - Processing: /input/164836scan11282021.pdf -> /output/164836scan11282021.pdf
2021-11-29 15:54:41 - Processing complete in 4.700000 seconds with status -11: /input/164836scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164836scan11282021.pdf/output/164836scan11282021.pdf-11.700000
2021-11-29 15:54:41 - Processing: /input/165021scan11282021.pdf -> /output/165021scan11282021.pdf
2021-11-29 15:54:48 - Processing complete in 18.530000 seconds with status -11: /input/164622scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164622scan11282021.pdf/output/164622scan11282021.pdf-118.530000
2021-11-29 15:54:49 - Processing complete in 8.380000 seconds with status -11: /input/165021scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/165021scan11282021.pdf/output/165021scan11282021.pdf-118.380000
2021-11-29 15:55:01 - Processing complete in 32.790000 seconds with status -11: /input/164352scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164352scan11282021.pdf/output/164352scan11282021.pdf-11.790000
2021-11-29 15:55:19 - Processing: /input/164211scan11282021.pdf -> /output/164211scan11282021.pdf
2021-11-29 15:55:20 - Processing: /input/164249scan11282021.pdf -> /output/164249scan11282021.pdf
2021-11-29 15:55:21 - Processing: /input/164322scan11282021.pdf -> /output/164322scan11282021.pdf
2021-11-29 15:55:26 - Processing complete in 7.020000 seconds with status -11: /input/164211scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164211scan11282021.pdf/output/164211scan11282021.pdf-11.020000
2021-11-29 15:55:26 - Processing: /input/164352scan11282021.pdf -> /output/164352scan11282021.pdf
2021-11-29 15:55:29 - Processing complete in 7.800000 seconds with status -11: /input/164322scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164322scan11282021.pdf/output/164322scan11282021.pdf-11.800000
2021-11-29 15:55:29 - Processing: /input/164531scan11282021.pdf -> /output/164531scan11282021.pdf
2021-11-29 15:55:30 - Processing complete in 9.280000 seconds with status -11: /input/164249scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164249scan11282021.pdf/output/164249scan11282021.pdf-119.280000
2021-11-29 15:55:30 - Processing: /input/164622scan11282021.pdf -> /output/164622scan11282021.pdf
2021-11-29 15:55:36 - Processing complete in 7.390000 seconds with status -11: /input/164531scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164531scan11282021.pdf/output/164531scan11282021.pdf-11.390000
2021-11-29 15:55:36 - Processing: /input/164836scan11282021.pdf -> /output/164836scan11282021.pdf
2021-11-29 15:55:42 - Processing complete in 5.400000 seconds with status -11: /input/164836scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164836scan11282021.pdf/output/164836scan11282021.pdf-11.400000
2021-11-29 15:55:42 - Processing: /input/165021scan11282021.pdf -> /output/165021scan11282021.pdf
2021-11-29 15:55:49 - Processing complete in 19.000000 seconds with status -11: /input/164622scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164622scan11282021.pdf/output/164622scan11282021.pdf-119.000000
2021-11-29 15:55:50 - Processing complete in 8.620000 seconds with status -11: /input/165021scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/165021scan11282021.pdf/output/165021scan11282021.pdf-118.620000
2021-11-29 15:55:58 - Processing complete in 31.450000 seconds with status -11: /input/164352scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164352scan11282021.pdf/output/164352scan11282021.pdf-11.450000
2021-11-29 15:56:34 - Signal 15 (SIGTERM) Received. Shutting down...
2021-11-29 15:56:40 - Watching /input
2021-11-29 15:56:40 - Processing: /input/164211scan11282021.pdf -> /output/164211scan11282021.pdf
2021-11-29 15:56:40 - Processing: /input/164249scan11282021.pdf -> /output/164249scan11282021.pdf
2021-11-29 15:56:40 - Processing: /input/164322scan11282021.pdf -> /output/164322scan11282021.pdf
2021-11-29 15:56:48 - Processing complete in 7.270000 seconds with status 7: /input/164211scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164211scan11282021.pdf/output/164211scan11282021.pdf.270000
2021-11-29 15:56:48 - Processing: /input/164352scan11282021.pdf -> /output/164352scan11282021.pdf
2021-11-29 15:56:48 - Processing complete in 8.190000 seconds with status -11: /input/164322scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164322scan11282021.pdf/output/164322scan11282021.pdf-118.190000
2021-11-29 15:56:48 - Processing: /input/164531scan11282021.pdf -> /output/164531scan11282021.pdf
2021-11-29 15:56:50 - Processing complete in 9.810000 seconds with status -11: /input/164249scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164249scan11282021.pdf/output/164249scan11282021.pdf-119.810000
2021-11-29 15:56:50 - Processing: /input/164622scan11282021.pdf -> /output/164622scan11282021.pdf
2021-11-29 15:56:56 - Processing complete in 7.190000 seconds with status -11: /input/164531scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164531scan11282021.pdf/output/164531scan11282021.pdf-11.190000
2021-11-29 15:56:56 - Processing: /input/164836scan11282021.pdf -> /output/164836scan11282021.pdf
2021-11-29 15:57:01 - Processing complete in 5.110000 seconds with status -11: /input/164836scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164836scan11282021.pdf/output/164836scan11282021.pdf-11.110000
2021-11-29 15:57:01 - Processing: /input/165021scan11282021.pdf -> /output/165021scan11282021.pdf
2021-11-29 15:57:08 - Processing complete in 17.850000 seconds with status -11: /input/164622scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164622scan11282021.pdf/output/164622scan11282021.pdf-11.850000
2021-11-29 15:57:10 - Processing complete in 8.970000 seconds with status -11: /input/165021scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/165021scan11282021.pdf/output/165021scan11282021.pdf-118.970000
2021-11-29 15:57:20 - Processing complete in 32.400000 seconds with status -11: /input/164352scan11282021.pdf
TESTOCR_PROCESS_RESULT/input/164352scan11282021.pdf/output/164352scan11282021.pdf-11.400000

 

Edited by Maddeen
Link to comment
  • 1 year later...

Helllo, I am new here. I used the app with english and greek pdfs and it works fine. In greek it is not as good as other solutions.

I would like to ask: When I put (in the input folder) a very large file (e.g. 600mb) it takes an hour to convert and it uses less than 10% of cpu. Can we force it to use more cpu resources to accelerate the process?

Thank you.

 

PS: I have an 8 core 16 threads cpu and the other apps and Unraid works in 1%. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.