[Support] Paperless-ngx Docker


Recommended Posts

On 9/18/2023 at 1:55 PM, Tuetenk0pp said:

Maybe something to do with the file format (line endings) or perhaps missing shebang?

 

 

I'm curious, what do you want to achieve with a pre-consumption script? I was always aware of that functionality but found no use for it.

 

I want to remove empty pages when I scan in double-side mode. So far, a script which I took from here is running fine. It works for me with regular white paper. When I scan recycled paper (e.g. german tax letter), it fails as on each page the color code sum is check and interpreted in this case as a page with some content; but this is an exceptional case as most is on regular white sheets.

 

If I want to call it as discussed there as well with the very elegant way to call a pre-consumption.sh file and from there (potentially) more than one other files, is not working. It's also not working for me to call python files (which I would prefer).

Link to comment
On 9/19/2023 at 12:09 AM, arsvendg said:

I have been trying to trigger a PAPERLESS_PRE_CONSUME_SCRIPT that I got from this site: https://piep.tech/posts/automatic-password-removal-in-paperless-ngx/.

 

I have tried to add a scripts folder to my array share, and added the password txt file and the .py script. The script looks like this:

#!/usr/bin/env python
import pikepdf
import os

def unlock_pdf(file_path):
    password = None
    print("reading passwords")
    with open("/usr/src/paperless/scripts/passwords.txt", "r") as f:
        passwords = f.readlines()
    for p in passwords:
        password = p.strip()
        try:
            with pikepdf.open(file_path, password=password, allow_overwriting_input=True) as pdf:
                print("password is working:" + password)
                pdf.save(file_path)
                break
        except pikepdf.PasswordError:
            print("password isn't working:" + password)
            continue
    if password is None:
        print("Empty password file")

file_path = os.environ.get('DOCUMENT_WORKING_PATH')
unlock_pdf(file_path)

 

I have set a path to give the container access to the scripts folder:

image.thumb.png.bcc3e2d043e4fd9edce2a520d5f8a62b.png

 

And I have set the PAPERLESS_PRE_CONSUME_SCRIPT as a variable to run the script, I just have removed it now since it doesn't work...

 

I am getting a permission denied error in the logs:

[2023-09-18 23:47:43,698] [ERROR] [paperless.consumer] Error while executing pre-consume script: [Errno 13] Permission denied: '/usr/src/paperless/scripts/removepassword.py'

 

At one point, I got these messages from the logs.

[2023-09-18 23:02:34,261] [WARNING] [paperless.consumer] Script stderr:

[2023-09-18 23:02:34,263] [WARNING] [paperless.consumer] /usr/bin/env: ‘python\r’: No such file or directory

[2023-09-18 23:02:34,266] [WARNING] [paperless.consumer] /usr/bin/env: use -[v]S to pass options in shebang lines

[2023-09-18 23:02:34,273] [ERROR] [paperless.consumer] Error while executing pre-consume script: Command '['/usr/src/paperless/scripts/removepassword.py', '/tmp/paperless/tmpl40jims_/loennslipp.pdf']' returned non-zero exit status 127.

 

Does anyone have an idea to what I might be doing wrong? Is there something wrong with the script?

 

I was thinking a little about this. I know absolutely nothing about programming og python, so please bear with me. I can see the script calls for a "import pikepdf". WIll Unraid just do that, or do I have to install something somehow so the script actually have something to import...?

Link to comment
7 hours ago, arsvendg said:

I can see the script calls for a "import pikepdf". WIll Unraid just do that, or do I have to install something somehow so the script actually have something to import...?

it depends: if the package is already installed in the paperless container, import is the only thing you need. If its not installed, python would give an error like this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pikepdf'

 

Link to comment
Quote

[2023-09-18 23:47:43,698] [ERROR] [paperless.consumer] Error while executing pre-consume script: [Errno 13] Permission denied: '/usr/src/paperless/scripts/removepassword.py'

Have you checked the permissions on this file? Use `ls -l /mnt/user/share_paperless_ngx/scripts/`for that. It should look like this: -rw-rw-rw-

 

Quote

[2023-09-18 23:02:34,263] [WARNING] [paperless.consumer] /usr/bin/env: ‘python\r’: No such file or directory

I have had these kinds of errors in the past and was able to fix it with LF line endings.

 

pikepdf comes installed in the paperless image so no need to worry about that.

Link to comment
On 9/20/2023 at 8:24 PM, Tuetenk0pp said:

Have you checked the permissions on this file? Use `ls -l /mnt/user/share_paperless_ngx/scripts/`for that. It should look like this: -rw-rw-rw-

 

I have had these kinds of errors in the past and was able to fix it with LF line endings.

 

pikepdf comes installed in the paperless image so no need to worry about that.

I checked the permissions, and it does look like -rw-rw-rw-. I even started the console inside the docker, and I see tha script there, at the right place.

Link to comment
On 9/24/2023 at 12:38 AM, arsvendg said:

I checked the permissions, and it does look like -rw-rw-rw-. I even started the console inside the docker, and I see tha script there, at the right place.

I found a solution to this. I ran a "chmod -755" command on all the files in the scripts folder, and now it works.

Link to comment
On 2/3/2023 at 4:56 AM, SOULV1CE said:

I just solved a problem I was having with adding the PAPERLESS_TRASH_DIR parameter that I wanted to share.

 

As mentioned in the Paperless documentation the default value of this parameter is "Defaults to empty (i.e. really delete documents)."

Because it is empty it is never mapped to any container path - I verified this by looking in the /paperless.conf.example and src/paperless/settings.py file.

 

So because of this you first need to create a container variable for PAPERLESS_TRASH_DIR and map it to a container path such as /usr/src/paperless/media/trash because nothing is ever defined for this.

Now that we have defined the variable to the container path, we can then proceed with mapping the container path /usr/src/paperless/media/trash to a host path such as /mnt/user/data/media/documents/paperless-ngx/media/trash/.

 

Finally after doing this when deleting a file it should be moved to the host path folder /mnt/user/data/media/documents/paperless-ngx/media/trash/ and not be permanently deleted.

 

image.thumb.png.982b8e2a23d68a7518cb44be1668b705.png

 

Thank you very much for that hint!

 

Additionally I want to mention, that the trash folder 

/usr/src/paperless/media/trash

did not exist on my paperless instance. Therefore I created the folder by opening the console:
image.png.add052d22be499330234d09229d82cfa.png

and added the folder with:

root@ab2ac556cb92:/usr/src/paperless/src# cd ../media/
root@ab2ac556cb92:/usr/src/paperless/media# mkdir trash
root@ab2ac556cb92:/usr/src/paperless/media# chown paperless trash/
root@ab2ac556cb92:/usr/src/paperless/media# chgrp paperless trash/

 

Then I followed your hint and created the Variable for PAPERLESS_TRASH_DIR to point to that newly created trash folder

image.png.f586f899253cdd0ae23122f5b5b82e65.png

and map that internal trash folder to path on my smb share

image.png.882cafcc8fae71505792e0af5f284376.png

 

so I have write access and can delete the files in the trash folder by myself or periodically via user script.

 

Thank you again for sharing.

Link to comment
  • 2 weeks later...

Hi there,

I hope anyone can help me :) I have a Paperless Docker runs under Unraid. I have created 2 variables for the Subdir as Tags function

 

PAPERLESS_CONSUMER_RECURSIVE: true
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: true

 

I can see the Docker runs up with these 2 variables but when I drop a Folder A  in paperless it creates no tags for the file.

 

I have Folder A -> Subfolder B -> File and Paperless only let me see the file without tag. 

 

The Logfile had no errors 

`[2023-10-11 19:30:55,765] [INFO] [celery.worker.strategy] Task documents.tasks.consume_file[6b1a221f-0668-4425-b725-9753b3d7ef98] received
[2023-10-11 19:30:56,232] [INFO] [paperless.consumer] Consuming 2023-09-01_Ausgabe_M85570_Sammellieferant.pdf
[2023-10-11 19:31:06,146] [INFO] [ocrmypdf._pipeline] skipping all processing on this page
[2023-10-11 19:31:06,147] [INFO] [ocrmypdf._sync] Postprocessing...
[2023-10-11 19:31:07,988] [INFO] [ocrmypdf._pipeline] Image optimization ratio: 1.00 savings: 0.4%
[2023-10-11 19:31:07,988] [INFO] [ocrmypdf._pipeline] Total file size ratio: 0.96 savings: -4.5%
[2023-10-11 19:31:08,002] [INFO] [ocrmypdf._sync] Output file is a PDF/A-2B (as expected)
[2023-10-11 19:31:11,646] [INFO] [paperless.consumer] Document 2023-09-01 2023-09-01_Ausgabe_M85570_Sammellieferant consumption finished

 

nico

Edited by nice83
Link to comment
  • 3 weeks later...

I am hoping someone can give my some guidance on getting Paperless-NGX to work with Nginx Proxy Manager. It works fine when I assign both Redis and Paperless-NGX a br0 address on my LAN. However, when I used a bridge address it does not work. I get an 502 error. 

 

This is for local access only. I run an instance of NPM for all my services where the names are *.local (unraid.local, plex.local) etc. PiHole for DNS resolution to point to the NPM server.

 

I generally prefer to setup custom Docker networks for services like this for the redis link, but I cannot seem to get the proxy to work without having both dockers having their own LAN IP. 

 

 

 

Link to comment
  • 3 weeks later...

Hello everyone,

 

I recently started having problems with the paperless-ngx Docker. It quits and from the logs I see an "OSError: [Errno 95]"

I don't know what the problem could be, the container has been running for half a year.

 

Thank you

 

 

Log:

 

output = self.handle(*args, **options)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/commands/check.py", line 76, in handle
    self.check(
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 475, in check
    all_issues = checks.run_checks(
  File "/usr/local/lib/python3.9/site-packages/django/core/checks/registry.py", line 88, in run_checks
    new_errors = check(app_configs=app_configs, databases=databases)
  File "/usr/src/paperless/src/paperless/checks.py", line 63, in paths_check
    path_check("PAPERLESS_DATA_DIR", settings.DATA_DIR)
  File "/usr/src/paperless/src/paperless/checks.py", line 34, in path_check
    with open(test_file, "w"):
OSError: [Errno 95] Operation not supported: '/usr/src/paperless/data/__paperless_write_test_150__'
Traceback (most recent call last):
  File "/usr/src/paperless/src/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 446, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 440, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 402, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 448, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/commands/check.py", line 76, in handle
    self.check(
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 475, in check
    all_issues = checks.run_checks(
  File "/usr/local/lib/python3.9/site-packages/django/core/checks/registry.py", line 88, in run_checks
    new_errors = check(app_configs=app_configs, databases=databases)
  File "/usr/src/paperless/src/paperless/checks.py", line 65, in paths_check
    + path_check("PAPERLESS_MEDIA_ROOT", settings.MEDIA_ROOT)
  File "/usr/src/paperless/src/paperless/checks.py", line 34, in path_check
    with open(test_file, "w"):
OSError: [Errno 95] Operation not supported: '/usr/src/paperless/media/__paperless_write_test_149__'
Traceback (most recent call last):
  File "/usr/src/paperless/src/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 446, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 440, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 402, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 448, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/commands/check.py", line 76, in handle
    self.check(
  File "/usr/local/lib/python3.9/site-packages/django/core/management/base.py", line 475, in check
    all_issues = checks.run_checks(
  File "/usr/local/lib/python3.9/site-packages/django/core/checks/registry.py", line 88, in run_checks
    new_errors = check(app_configs=app_configs, databases=databases)
  File "/usr/src/paperless/src/paperless/checks.py", line 66, in paths_check
    + path_check("PAPERLESS_CONSUMPTION_DIR", settings.CONSUMPTION_DIR)
  File "/usr/src/paperless/src/paperless/checks.py", line 34, in path_check
    with open(test_file, "w"):
OSError: [Errno 95] Operation not supported: '/usr/src/paperless/consume/__paperless_write_test_150__'
Paperless-ngx docker container starting...
Installing languages...
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm-updates InRelease
Hit:3 http://deb.debian.org/debian-security bookworm-security InRelease
Reading package lists...
Package tesseract-ocr-deu already installed!
Creating directory /tmp/paperless
Adjusting permissions of paperless files. This may take a while.
Waiting for Redis...
Connected to Redis broker.
Apply database migrations...
Operations to perform:
  Apply all migrations: admin, auth, authtoken, contenttypes, django_celery_results, documents, guardian, paperless_mail, sessions
Running migrations:
  No migrations to apply.
Running Django checks
Paperless-ngx docker container starting...
Installing languages...
Hit:1 http://deb.debian.org/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm-updates InRelease
Hit:3 http://deb.debian.org/debian-security bookworm-security InRelease
Reading package lists...
Package tesseract-ocr-deu already installed!
Creating directory /tmp/paperless
Adjusting permissions of paperless files. This may take a while.
Waiting for Redis...
Connected to Redis broker.
Apply database migrations...
Operations to perform:
  Apply all migrations: admin, auth, authtoken, contenttypes, django_celery_results, documents, guardian, paperless_mail, sessions
Running migrations:
  No migrations to apply.
Running Django checks

Link to comment

On startup paperless ist doing some checks such as checking the various paths for existence, readability and writeability (data dir, trash dir, media dir, consumption dir). 
 

The path checks tries to write a test file to the mounted directories of your container. See here
 

In your case the write check fails for your consumption and data dir. I suggest checking the docker mounts and directories on your server first. My guess would be that your container is not allowed writing the mounts.

  • Thanks 1
Link to comment
3 hours ago, T0a said:

On startup paperless ist doing some checks such as checking the various paths for existence, readability and writeability (data dir, trash dir, media dir, consumption dir). 
 

The path checks tries to write a test file to the mounted directories of your container. See here
 

In your case the write check fails for your consumption and data dir. I suggest checking the docker mounts and directories on your server first. My guess would be that your container is not allowed writing the mounts.

 

Thanks for the tip, it was the export directory that I added later. After I took it out again it also works.

  • Like 1
Link to comment
On 9/19/2023 at 5:07 AM, Fidelis said:

Hey All,

I am in need of your wisdom and help.

 

I am running Paperless-NGX along with Tika & Gutenburg on my UNRAID server. Now, I've got Paperless running fine, and I have no issues consuming pdf files into Paperless. However when it comes to consuming MS Office Documents I have a little bit more trouble. I have Tika and Gutenburg installed, but i am not 100% sure i have them installed correctly. When I try to consume a .doc file, it spits out the following error.

[Errno 99] Cannot assign requested address

 

When i set it all up i installed Redis, Paperless, Apche-Tika-server, and Gutenburg from the Apps Section of UNRAID, then i added the 3 Variables in the attached image to Paperless.

 

So, what the hell have i done wrong?

 

Any thoughts, idea, or help would be great. Oh i'm not an IT professional by any stretch of the imagination, i'm not a complete idiot, but i am not a professional either.

Paperless Variables.JPG

exactly my issue. have you found the actual solution?

Link to comment

Hi all,

 

I have the issue, that the consumer only works on container start.

I already activated polling, but it does not change.

 

to me it looks like the polling is not activated, because in the logs it still shows "Using inotify to watch directory for changes: /usr/src/paperless/consume"

 

my consume directory is in a remote share.

 

Any other suggestion?

 

image.thumb.png.dfe34e66d613c62055c545b0e62beccb.png

[2023-11-26 20:15:27,485] [INFO] [paperless.management.consumer] Received SIGINT, stopping inotify

[2023-11-26 20:15:27,488] [DEBUG] [paperless.management.consumer] Consumer exiting.

[2023-11-26 20:15:44,058] [INFO] [paperless.management.consumer] Adding /usr/src/paperless/consume/Eingangsbestätigung.pdf to the task queue.

[2023-11-26 20:15:44,252] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume

 

EDIT: I did not see, that the variable PAPERLESS_CONSUMER_POLLING: was already existing in the unraid template and created it a second time.

 

After deleting my variable again, everything works.

Edited by vincemue
solved
Link to comment

I can no longer start Paperless-ngx!

 

Log:

Traceback (most recent call last):
  File "/sbin/wait-for-redis.py", line 23, in <module>
    with Redis.from_url(url=REDIS_URL) as client:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 139, in from_url
    connection_pool = ConnectionPool.from_url(url, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 971, in from_url
    url_options = parse_url(url)
                  ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 870, in parse_url
    url = urlparse(url)
          ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/parse.py", line 395, in urlparse
    splitresult = urlsplit(url, scheme, allow_fragments)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/parse.py", line 500, in urlsplit
    _check_bracketed_host(bracketed_host)
  File "/usr/local/lib/python3.11/urllib/parse.py", line 448, in _check_bracketed_host
    raise ValueError(f"An IPv4 address cannot be in brackets")
ValueError: An IPv4 address cannot be in brackets
Paperless-ngx docker container starting...
Installing languages...
Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
Get:2 http://deb.debian.org/debian bookworm-updates InRelease [52.1 kB]
Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
Get:4 http://deb.debian.org/debian bookworm/main amd64 Packages [8780 kB]
Get:5 http://deb.debian.org/debian bookworm-updates/main amd64 Packages [6668 B]
Get:6 http://deb.debian.org/debian-security bookworm-security/main amd64 Packages [105 kB]
Fetched 9143 kB in 1s (8742 kB/s)
Reading package lists...
Package tesseract-ocr-eng already installed!
Mapping UID and GID for paperless:paperless to 99:100
Creating directory /tmp/paperless
Adjusting permissions of paperless files. This may take a while.
Waiting for Redis...

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.