[Support] Paperless-ng Docker


Recommended Posts

  logo-dark.png

 

Overview: Dedicated support thread for the Docker template paperless-ng provided via the selfhosters/unRAID-CA-templates repository.

Docker Hub: https://github.com/jonaswinkler/paperless-ng

Documentation: https://paperless-ng.readthedocs.io/en/latest/

Repository: https://github.com/jonaswinkler/paperless-ng

Changelog: https://paperless-ng.readthedocs.io/en/latest/changelog.html

 

This is the official paperless-ng Docker support thread. Feel free to ask questions, share your experience with paperless-ng or describe your paperless setup at home. I try to update this main post regularly based on your feedback. From here on, I will use the terms paperless and paperless-ng interchangeable.

 

1. What is paperless-ng and how does it differ from the original paperless?

Paperless-ng is a fork of paperless, adding a new interface and many other changes under the hood. The original project paperless hasn't received a lot of updates and bug fixes in the past. Even pull requests are not merged for some time now (Update: 26.08.2021, The old paperless repository has been archived).

 

Here are a few key features of paperless-ng:

  • New front end build with Angular. It features full text search with scored and highlighted results, savable filters, a dashboard, and document uploading on the landing page

  • Mobile support is also almost there. Some layouts don't work yet on small screens

  • New mail consumer that supports multiple accounts and custom filters and actions. Fully tested!

  • Paperless-ng trains a neural network on your documents and assigns tags and correspondents automatically, if you instruct it to do so

  • Updated dependencies

  • More tests of critical backend parts

  • A proper task processing queue that can consume multiple documents in parallel. Consumption of many documents is now blazing fast on multi core system. Fixed much of the consumer code, so that it does not block the database during consumption, for instance

 

2. How to Install

  • Download and install a Redis container from the community application store (CA)
  • Download and configure the paperless-ng container from the CA
    • Make sure you point the container to your Redis instance. Use your actual IP and not localhost, because the reference is resolved in the container. In case you need to pass a password to Redis, use the following connection string redis://:[PASSWORD]@[IP]:6379 instead. At the moment Redis doesn't support users and only provides authentication against a global password. You can pass anything as a username, including the empty string as in my example here. To configure a password for your Redis container, set 'redis-server --requirepass "your-secret"' as post arguments on the Redis docker container. Also make sure to not use any special characters. Otherwise, the connection string might not be readable by paperless.
  • Create a user account after this container is created i.e. from Unraids Docker UI, click the paperless-ng icon and choose Console. Then enter the command "python manage.py createsuperuser" in the prompt and follow the instructions. Alternative, set 'PAPERLESS_ADMIN_USER' and 'PAPERLESS_ADMIN_PASSWORD' in your paperless-ng docker template. With the later approach, it might be easier to find your password to sensible documents stored in paperless.

 

3. My personal workflow

I use the iOS app ScannerPro to scan my documents and upload them via the app to a web DAV target on my Unraid server. The web DAV target is then mounted in the container as consume directory. Further, I use the pre and post hooks to execute web hooks in order to check via Home Assistant whether the processing failed for a uploaded document. Home Assistant sends then notifications about the import status to my phone. This way I can throw away the physical document without worrying about it being not imported.

 

How does your workflow look like? Feel free to share it in this thread. Here you can also find the official recommended workflow for managing your documents with paperless-ng.

 

4. FAQ

 

4.1 Why does the consumer not pick up my files?

The consumer service uses `inotify` to detect new documents in the consume folder. This subsystem, however, does not support NFS shares. You can disable `inotify` and use a time-based polling mechanism instead (see `PAPERLESS_CONSUMER_POLLING`. If set to a value n greater than 0, inotify is disabled and the directory is polled every n seconds).

 

4.2 How to customize paperless-ng?

Paperless-ng does support much more environment variables than the Unraid template offers. You can find them in the documentation here.

 

4.3 What scanner do you use for paperless-ng at home?

A list of scanners used by our community:

  • iPhone with ScannerPro app; one time purchase (@T0a)
  • More will be added when you share your scanner

Paperless-ng also maintains a list of recommended scanners. Feel free opening a pull request over there to add your recommended scanner to the documentation too.

 

4.4 Can I use paperless-ng on a mobile device?

Mobile support in paperless-ng is also almost there. Some layouts don't work yet on small screens. There is also a mobile app in pretty early development stage. Though, it is only available on the Android store yet.

 

4.5 What is the future of the original paperless template in Unraid?

At some point, I will probably remove the paperless template and close its support thread.

 

4.6 How to configure PostgreSQL as a database?

See this post on how to configure PostgreSQL in the template. The official documentation gives further migration steps needed.

Edited by T0a
Mods
  • Like 1
Link to comment
Posted (edited)

Execute Management Utilities

 

Paperless-ng offers a few management utilities that might come handy in some situations. You can either execute them from the command line or create custom scripts with the user scripts plugin from available via the CA. The latter allows you to schedule the task execution e.g. to invoke the document exporter for backups on a regular basis. The command is as follows:

 

docker exec -it paperless-ng <command> <arguments>
# Example: Execute the document exporter and remove files that do not belong to the current export
document_exporter /usr/src/paperless/export -d

 

Have a look at the documentation for the available commands and individual arguments.

Edited by T0a
Link to comment

Really just want to say THANKS for bringing this template to the community. Paperless (OG) left something to be desired for me, but Paperless-ng really does what I want and has the look/feel I was after too. (Was probably just the dated look/feel of the original I guess)

 

I opted to run with PostgreSQL isntead of the baked-in SQLite, so I ended up adding DB host/port/name/user/pass variables to mine, as well as the Secret Key -- I do believe Secret Key is important and should be included in the template for everyone.

 

https://paperless-ng.readthedocs.io/en/ng-0.9.2/configuration.html

 

Quote

PAPERLESS_SECRET_KEY

Paperless uses this to make session tokens. If you exose paperless on the internet, you need to change this, since the default secret is well known.

Use any sequence of characters. The more, the better. You don’t need to remember this. Just face-roll your keyboard.

If anyone needs it later and isn't able to figure it out from the documentation, here's the setup needed for Postgres

 

image.thumb.png.ffe4a310b12ef1494d794002b8d468be.png

Edited by CorneliousJD
  • Like 1
Link to comment
Posted (edited)
31 minutes ago, estoris said:

Hi, I tried setting up paperless for the first time but I'm getting this error, anyone knows what I did wrong?
 

1/5/21, 9:22 PM INFO Using inotify to watch directory for changes: /usr/src/paperless/src/../consume

1/5/21, 9:22 PM ERROR Error while consuming document: Error -2 connecting to ip:6379. Name or service not known.

I have redis installed and this is what is shown in the template redis://[IP]:6379. I tried changing to localhost but it became connection refused

Change "[IP]" to the actual IP of your Unraid server or the server that runs the Redis service  i.e. 192.168.1.10 or something. Localhost would be inside your docker container causing the "connection refused" error. Let me know if that helps.

Edited by T0a
Link to comment
1 minute ago, ICDeadPpl said:

Is it possible to migrate from linuxserver/papermerge to this version?

No, only from original paperless to paperless-ng 

PaperMerge is a different project/product alltogether.

 

I migrated though myself manually from PaperMerge to Paperless-ng

 

I had a few things in Papermerge already so I just grabbed all the PDFs from Papermerge and put them into the consume directory of Paperless-ng and let it consume them and then manually went in and tagged everything I needed to.

  • Like 1
Link to comment

I was asked via PM how my home assistant integration with paperless-ng looks like. As stated in (3) I use the pre and post hooks to execute web hooks in order to check via home assistant whether the processing failed for uploaded documents. Thereby, I receive notifications about the import status. In the following you can find the scripts and automation:

 

Pre-consumption script

#!/usr/bin/env bash
# https://paperless.readthedocs.io/en/latest/consumption.html#hooking-into-the-consumption-process
echo Begin pre-processing script
echo - Original filename: [${1}]
curl -X POST http://<ha-ip>:8123/api/webhook/paperless_start_processing -d "{\"filename\": \"${1}\"}" -H "Content-Type:application/json"
echo End pre-processing script

Post-consumption script

#!/usr/bin/env bash
# https://paperless.readthedocs.io/en/latest/consumption.html#hooking-into-the-consumption-process
echo Begin post-processing script
echo - Document id:    [${1}]
echo - Generated filename: [${2}]
echo - Source path:    [${3}]
echo - Thumbnail path: [${4}]
echo - Download URL:   [${5}]
echo - Thumbnail URL:  [${6}]
echo - Correspondent:  [${7}]
echo - Tags:           [${8}]
curl -X POST http://<ha-ip>:8123/api/webhook/paperless_finish_processing -d "{\"filename\": \"${2}\", \"correspondent\": \"${7}\", \"tags\": \"${8}\"}" -H "Content-Type:application/json"
echo End post-processing script

Home Assistant Automation

- alias: 'Job - Paperless Process document'
  initial_state: true
  trigger:
    platform: webhook
    webhook_id: paperless_start_processing
  action:
    - variables:
        document: "{{ trigger.json.filename }}"
    - wait_for_trigger:
      - platform: webhook
        webhook_id: paperless_finish_processing
      timeout: '00:10:00'
    - choose:
      - conditions:
          # No trigger happened before timeout expired
          - condition: template
            value_template: "{{ wait.trigger == None }}"
        sequence:
          - service: notify.telegram
            data_template:
              title: 'Job - Paperless Import failed '
              message: |
                - Import failed for document "{{ document }}"
      default:
        - service: notify.telegram
          data_template:
            title: 'Job - Paperless Import successful '
            message: |
                - Original name: "{{ document }}"
                - Import name: {{ wait.trigger.json.filename }}
                - From: {{ wait.trigger.json.correspondent }}
                - Tags: {{ wait.trigger.json.tags }}

 

Unfortunately, the automation contains a little issue. When you consume document A and document B at the same time and document B is finished before document A, then the automation notifies success for document A. I had no time yet to look into this, but wanted to share the basic idea any way.

Link to comment
5 hours ago, T0a said:

Change "[IP]" to the actual IP of your Unraid server or the server that runs the Redis service  i.e. 192.168.1.10 or something. Localhost would be inside your docker container causing the "connection refused" error. Let me know if that helps.

Thank you, it's working perfectly now!

Link to comment

another plus is, that this docker uses a different method on checking the "consume" inbound directory for new documents.

Although I pointed this dir to a standard user share (including cache = YES), my array-disks are able to go to sleep just fine.

Something I could not solve with the other OG and papermerge dockers.

Edited by Ford Prefect
  • Like 2
Link to comment

Following the container update this morning, paperless-ng now stops shortly after starting up.

All was working fine prior to that.

The problem appears to be related to the PermissionError on the last line of the log but I am fairly new to unRAID and I'm not sure how to proceed in fixing this problem.

The PUID and PGID Container Variables are at default which is 99 and 100 respectively.

Some guidance would be really appreciated. 😁



ErrorWarningSystemArrayLogin

Get:1 http://deb.debian.org/debian buster InRelease [121 kB]
Get:2 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main amd64 Packages [260 kB]
Get:5 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [7860 B]
Fetched 8414 kB in 5s (1853 kB/s)
Reading package lists...
package tesseract-ocr-fre not found! :(
Mapping UID and GID for paperless:paperless to 99:100
Traceback (most recent call last):
File "manage.py", line 11, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 377, in execute
django.setup()
File "/usr/local/lib/python3.7/site-packages/django/__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/usr/local/lib/python3.7/site-packages/django/apps/registry.py", line 91, in populate
app_config = AppConfig.create(entry)
File "/usr/local/lib/python3.7/site-packages/django/apps/config.py", line 90, in create
module = import_module(entry)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/src/paperless/src/paperless_tika/apps.py", line 3, in <module>
from paperless_tika.signals import tika_consumer_declaration
File "/usr/src/paperless/src/paperless_tika/signals.py", line 1, in <module>
from .parsers import TikaDocumentParser
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 9, in <module>
from tika import parser
File "/usr/local/lib/python3.7/site-packages/tika/parser.py", line 19, in <module>
from .tika import parse1, callServer, ServerEndpoint
File "/usr/local/lib/python3.7/site-packages/tika/tika.py", line 155, in <module>
fileHandler = logging.FileHandler(log_file)
File "/usr/local/lib/python3.7/logging/__init__.py", line 1087, in __init__
StreamHandler.__init__(self, self._open())
File "/usr/local/lib/python3.7/logging/__init__.py", line 1116, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
PermissionError: [Errno 13] Permission denied: '/tmp/tika.log'
Hit:1 http://deb.debian.org/debian buster InRelease
Hit:2 http://security.debian.org/debian-security buster/updates InRelease
Hit:3 http://deb.debian.org/debian buster-updates InRelease
Reading package lists...
package tesseract-ocr-fre not found! :(
Traceback (most recent call last):
File "manage.py", line 11, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 377, in execute
django.setup()
File "/usr/local/lib/python3.7/site-packages/django/__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/usr/local/lib/python3.7/site-packages/django/apps/registry.py", line 91, in populate
app_config = AppConfig.create(entry)
File "/usr/local/lib/python3.7/site-packages/django/apps/config.py", line 90, in create
module = import_module(entry)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/src/paperless/src/paperless_tika/apps.py", line 3, in <module>
from paperless_tika.signals import tika_consumer_declaration
File "/usr/src/paperless/src/paperless_tika/signals.py", line 1, in <module>
from .parsers import TikaDocumentParser
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 9, in <module>
from tika import parser
File "/usr/local/lib/python3.7/site-packages/tika/parser.py", line 19, in <module>
from .tika import parse1, callServer, ServerEndpoint
File "/usr/local/lib/python3.7/site-packages/tika/tika.py", line 155, in <module>
fileHandler = logging.FileHandler(log_file)
File "/usr/local/lib/python3.7/logging/__init__.py", line 1087, in __init__
StreamHandler.__init__(self, self._open())
File "/usr/local/lib/python3.7/logging/__init__.py", line 1116, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
PermissionError: [Errno 13] Permission denied: '/tmp/tika.log'
Hit:1 http://security.debian.org/debian-security buster/updates InRelease
Hit:2 http://deb.debian.org/debian buster InRelease
Hit:3 http://deb.debian.org/debian buster-updates InRelease
Reading package lists...
package tesseract-ocr-fre not found! :(
Traceback (most recent call last):
File "manage.py", line 11, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 377, in execute
django.setup()
File "/usr/local/lib/python3.7/site-packages/django/__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/usr/local/lib/python3.7/site-packages/django/apps/registry.py", line 91, in populate
app_config = AppConfig.create(entry)
File "/usr/local/lib/python3.7/site-packages/django/apps/config.py", line 90, in create
module = import_module(entry)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/src/paperless/src/paperless_tika/apps.py", line 3, in <module>
from paperless_tika.signals import tika_consumer_declaration
File "/usr/src/paperless/src/paperless_tika/signals.py", line 1, in <module>
from .parsers import TikaDocumentParser
File "/usr/src/paperless/src/paperless_tika/parsers.py", line 9, in <module>
from tika import parser
File "/usr/local/lib/python3.7/site-packages/tika/parser.py", line 19, in <module>
from .tika import parse1, callServer, ServerEndpoint
File "/usr/local/lib/python3.7/site-packages/tika/tika.py", line 155, in <module>
fileHandler = logging.FileHandler(log_file)
File "/usr/local/lib/python3.7/logging/__init__.py", line 1087, in __init__
StreamHandler.__init__(self, self._open())
File "/usr/local/lib/python3.7/logging/__init__.py", line 1116, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
PermissionError: [Errno 13] Permission denied: '/tmp/tika.log'

 

Edited by Pacman56
Link to comment
3 minutes ago, Pacman56 said:

Following the container update this morning, paperless-ng now stops shortly after starting up.

All was working fine prior to that.

The problem appears to be related to the PermissionError on the last line of the log but I am fairly new to unRAID and I'm not sure how to proceed in fixing this problem.

The PUID and PGID Container Variables are at default which is 99 and 100 respectively.

Some guidance would be really appreciated. 😁

 

Just click force update - this has been fixed already in another new update.

  • Like 1
Link to comment
5 minutes ago, Pacman56 said:

Just switched to the advanced view, clicked on "force update" and all is good now! I have to remember this tip. 🥳

Much appreciated @CorneliousJD.

 

No problem at all! I ran into the same problem this morning when the container updated overnight, looks like we both got the same bad update. Good news is, at my next scheduled update it would have fixed itself, but why wait :D 

  • Like 1
Link to comment

is there any way to change the date format?  I cant find the config option from the original paperless PAPERLESS_DATE_ORDER="DMY" in the docks and it has no effect if I add it to the docker configuration as an extra variable.

 

Currently the docker uses MDY, which is the wroing format for my german documents.

Link to comment
20 minutes ago, comboy said:

is there any way to change the date format?  I cant find the config option from the original paperless PAPERLESS_DATE_ORDER="DMY" in the docks and it has no effect if I add it to the docker configuration as an extra variable.

 

Currently the docker uses MDY, which is the wroing format for my german documents.

 

The environment variable PAPERLESS_DATE_ORDER of the original paperless meant something different. AFAIK the date format is adapted with the localization setting. Have you tried to change it to German? Also see this issue.

Link to comment

ah shoot... nope. I'm running an english system since german localization is horrible most of the time.

I really don't like this trend to bind software language to the system language.

I'll see if I can find a browser locale switched addon...

 

edit: that's it. When I switch the browser language to german, the date is in the correct format. I still would prefer an option in the paperless config menu.

Edited by comboy
Link to comment
4 hours ago, comboy said:

ah shoot... nope. I'm running an english system since german localization is horrible most of the time.

I really don't like this trend to bind software language to the system language.

I'll see if I can find a browser locale switched addon...

 

 

 

I feel you. I run an English system too, because I prefer having technical terms in English.

 

Quote

edit: that's it. When I switch the browser language to german, the date is in the correct format. I still would prefer an option in the paperless config menu.

 

There is a feature request here to do so. Raise your voice.

Edited by T0a
Link to comment

I seem to be having a problem with the "created date" tags - they don't seem to be working properly. 

 

For the "PAPERLESS_FILENAME_FORMAT" field I'm using:

{correspondent} - {title} - {created_year}-{created_month}-{created_day}

For a file name of "Bank - Invoice - 2021-01-15" (and have also tried 2021-1-15).

 

The created date field is still bringing over the actual creation date coded from the date that the document was scanned, instead of the date that I'm using as the creation date.  Any help would be appreciated. 

 

-Russ

Link to comment

Is there anything special I need to do to get the schedules tasks running?

I've installed REDIS (with no password) so that should be fine, but I'm not sure how to check if it connected correctly to REDIS (I don't see anything in the logs).

Nonetheless, the schedules tasks don't seem to do anything, even after putting the time slightly in the future, not run is triggered:image.thumb.png.98185e54f3697cf0aed4031cdb9ba0aa.png

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.