I am running paperless since a few days and i am absolutely in love with it.
Problem i ran into yesterday is bad performance when a PDF file is more than one page. I uploaded a 2Mb 8 pages file (not that much actually...) and it took the OCR process over 30 minutes while using 100% cpu on all 4 Xeon 1225-v3 cores. Maybe that has something to do with this issue https://github.com/the-paperless-project/paperless/issues/438 ?
Any one has any idea how to optimize that process?
paperless-consumer docker log:
Consuming /consume/03.2020.pdf
** Processing: /tmp/paperless/paperless-up38twsl/convert.png
500x700 pixels, 3x16 bits/pixel, RGB
Input IDAT size = 575331 bytes
Input file size = 575592 bytes
Trying:
zc = 9 zm = 9 zs = 0 f = 0 IDAT size = 545251
zc = 9 zm = 8 zs = 0 f = 0 IDAT size = 545208
Selecting parameters:
zc = 9 zm = 9 zs = 0 f = 1 IDAT size = 494809
Output file: /tmp/paperless/paperless-up38twsl/optipng.png
Output IDAT size = 494809 bytes (80522 bytes decrease)
Output file size = 494866 bytes (80726 bytes = 14.02% decrease)
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0002.pnm -> /tmp/paperless/paperless-up38twsl/convert-0002.unpaper.pnm
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0000.pnm -> /tmp/paperless/paperless-up38twsl/convert-0000.unpaper.pnm
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0001.pnm -> /tmp/paperless/paperless-up38twsl/convert-0001.unpaper.pnm
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0003.pnm -> /tmp/paperless/paperless-up38twsl/convert-0003.unpaper.pnm
[pgm_pipe @ 0x55698b596f80] [pgm_pipe @ 0x56315b5eaf80] [pgm_pipe @ 0x55b79cc53f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Stream #0: not enough frames to estimate rate; consider increasing probesize
[pgm_pipe @ 0x55d75f3f5f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x55b79cc55600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x55b79cc55600] Encoder did not produce proper pts, making some up.
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0004.pnm -> /tmp/paperless/paperless-up38twsl/convert-0004.unpaper.pnm
[pgm_pipe @ 0x55a4ad8d8f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x55698b598600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x55698b598600] Encoder did not produce proper pts, making some up.
out of deviation range - NO ROTATING
[image2 @ 0x55d75f3f7600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0005.pnm -> /tmp/paperless/paperless-up38twsl/convert-0005.unpaper.pnm
[image2 @ 0x55d75f3f7600] Encoder did not produce proper pts, making some up.
[pgm_pipe @ 0x564bda956f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0006.pnm -> /tmp/paperless/paperless-up38twsl/convert-0006.unpaper.pnm
[pgm_pipe @ 0x5610d26a6f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x56315b5ec600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x56315b5ec600] Encoder did not produce proper pts, making some up.
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0007.pnm -> /tmp/paperless/paperless-up38twsl/convert-0007.unpaper.pnm
[pgm_pipe @ 0x56090cae1f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x55a4ad8da600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x55a4ad8da600] Encoder did not produce proper pts, making some up.
[image2 @ 0x564bda958600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x564bda958600] Encoder did not produce proper pts, making some up.
[image2 @ 0x5610d26a8600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x5610d26a8600] Encoder did not produce proper pts, making some up.
[image2 @ 0x56090cae3600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x56090cae3600] Encoder did not produce proper pts, making some up.
OCRing the document
Parsing for deu
Parsing for deu
Parsing for deu
Detected document date 2014-01-20T00:00:00+01:00 based on string 20.01.2014
d
Document 20140120000000: 03.2020 consumption finished