[Support] Paperless Docker

February 5, 20206 yr

Is there a way to backup just the database of paperless, I see you can run a full backup and dump the files and database in a folder, however as we are most setup on unraid parity the one thing I need to be able to do that I can't figure out is to run a backup dump on the database every so often

February 10, 20206 yr

I got lost on step i think 4... with image.png.6138f2a5a35e6659e7b9e50819e6f4ab.png

I tried removing the port but it just adds it back in

Edited February 10, 20206 yr by scubieman

February 15, 20206 yr

FYI you can run both the web server and consumer in a single docker container by using a bash script:

#! /bin/bash

/sbin/docker-entrypoint.sh document_consumer &
/sbin/docker-entrypoint.sh runserver 0.0.0.0:8000 --insecure --noreload &
wait

save this file into a volume that's mounted in the container. i just put this in the appdata directory.

then turn on advanced view and override the entry point, e.g.

--entrypoint /usr/src/paperless/data/entry.sh

clear out the 'post arguments', since you're doing that in the bash script now.

3

February 18, 20206 yr

I am running paperless since a few days and i am absolutely in love with it.

Problem i ran into yesterday is bad performance when a PDF file is more than one page. I uploaded a 2Mb 8 pages file (not that much actually...) and it took the OCR process over 30 minutes while using 100% cpu on all 4 Xeon 1225-v3 cores. Maybe that has something to do with this issue https://github.com/the-paperless-project/paperless/issues/438 ?

Any one has any idea how to optimize that process?

paperless-consumer docker log:

Consuming /consume/03.2020.pdf
** Processing: /tmp/paperless/paperless-up38twsl/convert.png
500x700 pixels, 3x16 bits/pixel, RGB
Input IDAT size = 575331 bytes
Input file size = 575592 bytes

Trying:
zc = 9 zm = 9 zs = 0 f = 0 IDAT size = 545251
zc = 9 zm = 8 zs = 0 f = 0 IDAT size = 545208
Selecting parameters:
zc = 9 zm = 9 zs = 0 f = 1 IDAT size = 494809

Output file: /tmp/paperless/paperless-up38twsl/optipng.png

Output IDAT size = 494809 bytes (80522 bytes decrease)
Output file size = 494866 bytes (80726 bytes = 14.02% decrease)

Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0002.pnm -> /tmp/paperless/paperless-up38twsl/convert-0002.unpaper.pnm
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0000.pnm -> /tmp/paperless/paperless-up38twsl/convert-0000.unpaper.pnm
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0001.pnm -> /tmp/paperless/paperless-up38twsl/convert-0001.unpaper.pnm
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0003.pnm -> /tmp/paperless/paperless-up38twsl/convert-0003.unpaper.pnm
[pgm_pipe @ 0x55698b596f80] [pgm_pipe @ 0x56315b5eaf80] [pgm_pipe @ 0x55b79cc53f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Stream #0: not enough frames to estimate rate; consider increasing probesize
[pgm_pipe @ 0x55d75f3f5f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x55b79cc55600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x55b79cc55600] Encoder did not produce proper pts, making some up.
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0004.pnm -> /tmp/paperless/paperless-up38twsl/convert-0004.unpaper.pnm
[pgm_pipe @ 0x55a4ad8d8f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x55698b598600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x55698b598600] Encoder did not produce proper pts, making some up.
out of deviation range - NO ROTATING
[image2 @ 0x55d75f3f7600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0005.pnm -> /tmp/paperless/paperless-up38twsl/convert-0005.unpaper.pnm
[image2 @ 0x55d75f3f7600] Encoder did not produce proper pts, making some up.
[pgm_pipe @ 0x564bda956f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0006.pnm -> /tmp/paperless/paperless-up38twsl/convert-0006.unpaper.pnm
[pgm_pipe @ 0x5610d26a6f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x56315b5ec600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x56315b5ec600] Encoder did not produce proper pts, making some up.
Processing sheet #1: /tmp/paperless/paperless-up38twsl/convert-0007.pnm -> /tmp/paperless/paperless-up38twsl/convert-0007.unpaper.pnm
[pgm_pipe @ 0x56090cae1f80] Stream #0: not enough frames to estimate rate; consider increasing probesize
[image2 @ 0x55a4ad8da600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x55a4ad8da600] Encoder did not produce proper pts, making some up.
[image2 @ 0x564bda958600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x564bda958600] Encoder did not produce proper pts, making some up.
[image2 @ 0x5610d26a8600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x5610d26a8600] Encoder did not produce proper pts, making some up.
[image2 @ 0x56090cae3600] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[image2 @ 0x56090cae3600] Encoder did not produce proper pts, making some up.
OCRing the document
Parsing for deu
Parsing for deu
Parsing for deu
Detected document date 2014-01-20T00:00:00+01:00 based on string 20.01.2014

d
Document 20140120000000: 03.2020 consumption finished

February 20, 20206 yr

On 2/14/2020 at 4:23 PM, bling said:
FYI you can run both the web server and consumer in a single docker container by using a bash script:
#! /bin/bash

/sbin/docker-entrypoint.sh document_consumer &
/sbin/docker-entrypoint.sh runserver 0.0.0.0:8000 --insecure --noreload &
wait
save this file into a volume that's mounted in the container. i just put this in the appdata directory.

then turn on advanced view and override the entry point, e.g.
--entrypoint /usr/src/paperless/data/entry.sh
clear out the 'post arguments', since you're doing that in the bash script now.

Thanks Bling, can you elaborate on how you override the entrypoint? I don't see entrypoint as a variable in Unraid.

February 21, 20206 yr

On 2/20/2020 at 12:34 PM, ithelpme said:

Thanks Bling, can you elaborate on how you override the entrypoint? I don't see entrypoint as a variable in Unraid.

turn on advanced view and put it into 'extra parameters'.

February 21, 20206 yr

1 hour ago, bling said:

turn on advanced view and put it into 'extra parameters'.

I tried it but it says bad parameter. Any ideas why I'm having that error and it's working for you? Thank you.

February 29, 20206 yr

On 2/21/2020 at 2:45 PM, ithelpme said:

I tried it but it says bad parameter. Any ideas why I'm having that error and it's working for you? Thank you.

Remove everything in that field and only place the text "document_consumer" there (w/o quotes)

March 7, 20206 yr

On 2/4/2020 at 1:01 PM, Nickproof said:

Yeah, at first, I've created this folder on Unraid 🤦‍♂️. I'm new to Docker, respectively how it works inside, so don't blame me

Then i thought about it, and it seems logical to me that it affects the container itself. So I put the file in the container. After a few attempts, it now works. My approach: while the container starts, quickly upload the traineddata file via ssh (~~It is necessary to do so, because the container shuts down after about 30 seconds if it has an error~~ Only if you have any file in the /consumption folder, so make sure its empty.).

Paperless finds the file and performs OCR without errors (ant it works great!).

So the real question is, why does Paperless not download other languages after configuration?

Having the same issue right now

I went into /usr/share/tessdata and downloaded deu.traineddata via wget https://github.com/tesseract-ocr/tessdata/blob/master/deu.traineddata

chmod +x deu.traineddata

See

bash-5.0# cd /usr/share/tessdata/
bash-5.0# ls -l
total 35504
drwxr-xr-x    1 root     root           360 Mar  1 19:05 configs
-rwxr-xr-x    1 root     root         64820 Mar  7 18:14 deu.traineddata
-rwxr-xr-x    1 root     root      23466654 Jul  9  2019 eng.traineddata
-rwxr-xr-x    1 root     root       2251950 Jul  9  2019 equ.traineddata
-rwxr-xr-x    1 root     root      10562874 Jul  9  2019 osd.traineddata
-rw-r--r--    1 root     root           572 Jul  9  2019 pdf.ttf
drwxr-xr-x    1 root     root            88 Mar  1 19:05 tessconfigs

Still getting:

pyocr.error.TesseractError: (1, b'Error opening data file /usr/share/tessdata/deu.traineddata\nPlease make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.\nFailed loading language \'deu\'\nTesseract couldn\'t load any languages!\nCould not initialize tesseract.\n')

and the paperless_consumer docker crashed

Everything worked for some weeks and now this is happening

Any idea why?

March 10, 20206 yr

I don't know, where, when and whom this happened , but there are two problems and a kind of workaround.

First add to your Container a new variable named "PAPERLESS_OCR_LANGUAGES" . You see the additinal 'S' at the end of the name.

Set both 'PAPERLESS_OCR_LANGUAGES' and 'PAPERLESS_OCR_LANGUAGE' to only one language. In your case that should be 'deu' without the quotes.

I hadn't enough time to test other combinations, but this worked for me.

I hope this will help you

Michael

1

March 16, 20206 yr

Anyone know how to reset the master password?

March 16, 20206 yr

also, anyone have a chance to test this in a reverse proxy setup?

March 16, 20206 yr

25 minutes ago, djgizmo said:

Anyone know how to reset the master password?

figured it out...

I used...

 /manage.py changepassword (user)

March 17, 20206 yr

Hi there.
Can the GUI also be changed to German?
Can i keep the original file name? I save my documents according to this scheme YYYY-MM-DD - template.
If I see it correctly, the documents in the originals folder are simply numbered consecutively without meaning.

March 17, 20206 yr

On 3/15/2020 at 10:42 PM, djgizmo said:

also, anyone have a chance to test this in a reverse proxy setup?

reverse proxy worked for me just fine.

I use NginxProxyManager

March 17, 20206 yr

Does the consumer reach into directories in the consume directory or just consume in the root? (/consume)

ScannerPro added a (/ScannerPro) directory in my /consume directory and I can't figure out how to remove it.

And paperless hasn't consumed it yet, I assume thats why.

March 26, 20206 yr

If I change the time zone to something else from UTC, web UI stops working.

This suggestion below worked for me, thanks!

But I think this may be the reason that if I change any other settings UI stops working

On 2/14/2020 at 6:23 PM, bling said:
FYI you can run both the web server and consumer in a single docker container by using a bash script:
#! /bin/bash

/sbin/docker-entrypoint.sh document_consumer &
/sbin/docker-entrypoint.sh runserver 0.0.0.0:8000 --insecure --noreload &
wait
save this file into a volume that's mounted in the container. i just put this in the appdata directory.

then turn on advanced view and override the entry point, e.g.
--entrypoint /usr/src/paperless/data/entry.sh
clear out the 'post arguments', since you're doing that in the bash script now.

Edited March 27, 20206 yr by nextgenpotato

March 28, 20206 yr

On 3/26/2020 at 5:32 AM, nextgenpotato said:

If I change the time zone to something else from UTC, web UI stops working.

This suggestion below worked for me, thanks!

But I think this may be the reason that if I change any other settings UI stops working

the image doesn't have any time zone info so you need to add a volume mount for /usr/share/zoneinfo

March 31, 20206 yr

On 2/15/2020 at 1:23 AM, bling said:
FYI you can run both the web server and consumer in a single docker container by using a bash script:
#! /bin/bash

/sbin/docker-entrypoint.sh document_consumer &
/sbin/docker-entrypoint.sh runserver 0.0.0.0:8000 --insecure --noreload &
wait
save this file into a volume that's mounted in the container. i just put this in the appdata directory.

then turn on advanced view and override the entry point, e.g.
--entrypoint /usr/src/paperless/data/entry.sh
clear out the 'post arguments', since you're doing that in the bash script now.

tried this but if i do so the container doesnt start. iam only get this error in the log:

standard_init_linux.go:211: exec user process caused "no such file or directory"

andy idea what went wrong?

Edited March 31, 20206 yr by Eraxar

April 1, 20206 yr

On 3/10/2020 at 8:17 AM, Michael Baecker said:

I don't know, where, when and whom this happened , but there are two problems and a kind of workaround.

First add to your Container a new variable named "PAPERLESS_OCR_LANGUAGES" . You see the additinal 'S' at the end of the name.

Set both 'PAPERLESS_OCR_LANGUAGES' and 'PAPERLESS_OCR_LANGUAGE' to only one language. In your case that should be 'deu' without the quotes.

I hadn't enough time to test other combinations, but this worked for me.

I hope this will help you

Michael

This is great! Now the main paperless docker starts downloading the proper tesseract data. I think it is a problem with the Unraid template for the Docker and everybody using other then English OCR will run into this problem. How can the docker be updated with this other variable?

April 4, 20206 yr

Author

On 2/5/2020 at 9:28 PM, Nickfmc said:

Is there a way to backup just the database of paperless, I see you can run a full backup and dump the files and database in a folder, however as we are most setup on unraid parity the one thing I need to be able to do that I can't figure out is to run a backup dump on the database every so often

@NickfmcI use the "CA Backup / Restore Appdata" plugin to backup the paperless appdata folder (including the paperless data directory) to a backup share on my array. The uploaded documents reside inside another share on my array. Then I use the "Unassigned Devices" plugin with a custom script to backup both shares from my array to an external hard disk. Does this answer your question?

On 3/17/2020 at 4:03 PM, OOmatrixOO said:

Hi there.
Can the GUI also be changed to German?
Can i keep the original file name? I save my documents according to this scheme YYYY-MM-DD - template.
If I see it correctly, the documents in the originals folder are simply numbered consecutively without meaning.

@OOmatrixOO As long as the paperless metadata contain the original file name you should be save. In case you decide to move your documents to another management system, you can use the paperless Exporter to export the files with their original name. See Exporter documentation.

In case you access the documents not from the paperless web UI (e.g. via the share) the following Pull-Request might solve your problem. However, can't estimate when the feature gets merged.

On 4/1/2020 at 4:44 PM, pietjebell said:

This is great! Now the main paperless docker starts downloading the proper tesseract data. I think it is a problem with the Unraid template for the Docker and everybody using other then English OCR will run into this problem. How can the docker be updated with this other variable?

@pietjebell Sorry for the inconvenience. I created a PR including your request. ~~Should be available soon~~. Template change is available.

On 3/28/2020 at 6:14 PM, bling said:

the image doesn't have any time zone info so you need to add a volume mount for /usr/share/zoneinfo

@nextgenpotato @bling I have good news. The newest version of paperless adds the environment variable "TZ" (e.g. TZ=America/Los_Angeles). Now, UnRaid passes your servers time zone to the container automatically. You need to update your container in order to use this feature though. I will also remove the PAPERLESS_TIME_ZONE variable from the template as it works out of the box now.

BTW, the new paperless version also ships with a preview window of your documents in edit mode.

Edited April 4, 20206 yr by T0a
Answer more questions

April 7, 20206 yr

On 2/28/2020 at 6:23 PM, djgizmo said:

Remove everything in that field and only place the text "document_consumer" there (w/o quotes)

what is "that field" in this case?

I put document_consumer in "Post_arguments"

April 7, 20206 yr

On 2/14/2020 at 6:23 PM, bling said:
FYI you can run both the web server and consumer in a single docker container by using a bash script:
#! /bin/bash

/sbin/docker-entrypoint.sh document_consumer &
/sbin/docker-entrypoint.sh runserver 0.0.0.0:8000 --insecure --noreload &
wait
save this file into a volume that's mounted in the container. i just put this in the appdata directory.

then turn on advanced view and override the entry point, e.g.
--entrypoint /usr/src/paperless/data/entry.sh
clear out the 'post arguments', since you're doing that in the bash script now.

I did this and the document_consumer would run, but the webserver wasn't running. There was an error in the log about /etc/passwd being locked, not sure if that was the problem.

I switched the two lines in the entry.sh (listing the webserver first, then the document_consumer second, as below) and it works now.

#! /bin/bash

/sbin/docker-entrypoint.sh runserver 0.0.0.0:8000 --insecure --noreload &
/sbin/docker-entrypoint.sh document_consumer &
wait

And I also had to make the file executable (chmod +x).

April 9, 20206 yr

Hi, great software btw :) Just struggling to get it setup properly. I often get the problem where it does not recognise the language as english, when i scan letters, and therefore tries to parse in another language:

PARSE FAILURE for /consume/scan0002.pdf: Language detection failed. Set PAPERLESS_FORGIVING_OCR in config file to continue anyway.

I think part of the issue is that I live in Wales, all letters typically are in english, but have a bilingual header with part of the title or header in both english and welsh. I think this is causing problems with the OCR. I have no desire to have anything read in welsh, as I cannot speak or read it ha. Wondering if you had any suggestions on how to overcome this issue?

April 13, 20206 yr

Anyone managed to get email checking to work with this? Running in a single container with the bash script and passing the relevant email variables, but it's not working. Seeing this in the log:

Starting document consumer at /consume with inotify
Traceback (most recent call last):
File "/usr/src/paperless/src/manage.py", line 11, in <module>
execute_from_command_line(sys.argv)
File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
utility.execute()
File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 365, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 288, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 335, in execute
output = self.handle(*args, **options)
File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 97, in handle
self.loop_inotify(mail_delta)
File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 130, in loop_inotify
self.loop_step(mail_delta)
File "/usr/src/paperless/src/documents/management/commands/document_consumer.py", line 120, in loop_step
self.mail_fetcher.pull()
File "/usr/src/paperless/src/documents/mail.py", line 185, in pull
for message in self._get_messages():
File "/usr/src/paperless/src/documents/mail.py", line 203, in _get_messages
self._login()
File "/usr/src/paperless/src/documents/mail.py", line 227, in _login
login = self._connection.login(self._username, self._password)
File "/usr/lib/python3.8/imaplib.py", line 601, in login
typ, dat = self._simple_command('LOGIN', user, self._quote(password))
File "/usr/lib/python3.8/imaplib.py", line 1197, in _quote
arg = arg.replace('\\', '\\\\')
AttributeError: 'NoneType' object has no attribute 'replace'

Anyone have any idea what I'm doing wrong?

Thanks

[Support] Paperless Docker

Featured Replies

Top Posters In This Topic

Popular Days

Most Popular Posts

bling

p.wrangles

T0a

Posted Images

Top Posters In This Topic

Popular Days

Most Popular Posts

bling

p.wrangles

T0a

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)