[Support] Paperless-ngx Docker


Recommended Posts

I've been noticing the docker has been stopping randomly. It's not too annoying since the rare chance I need to upload or retrieve documents I just spin it up.

 

My best guess is it's the Redis or Paperless docker doing a weekly update that causes it, since they rely on each other. Does that sound right? Is there a way to fix this besides a User Script that just routinely checks if Paperless is running, if not, starts it?

 

Here are the logs from the stopped container:

 

[2023-04-02 04:00:00,450] [INFO] [celery.app.trace] Task paperless_mail.tasks.process_mail_accounts[50e61c75-e7f7-4e23-aff8-ee65d73cfca9] succeeded in 0.031346329022198915s: 'No new documents were added.'
[2023-04-02 04:00:17,470] [INFO] [paperless.management.consumer] Received SIGINT, stopping inotify

worker: Warm shutdown (MainProcess)
[2023-04-02 04:00:18 -0700] [318] [INFO] Handling signal: term
[2023-04-02 04:00:19 -0700] [318] [INFO] Shutting down: Master
2023-04-02 04:00:16,468 WARN received SIGTERM indicating exit request
2023-04-02 04:00:16,468 INFO waiting for gunicorn, celery, celery-beat, consumer to die
2023-04-02 04:00:17,820 INFO stopped: consumer (exit status 0)
celery beat v5.2.7 (dawn-chorus) is starting.
__    -    ... __   -        _
LocalTime -> 2023-04-01 08:06:56
Configuration ->
    . broker -> redis://192.168.1.200:6379//
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> /usr/src/paperless/data/celerybeat-schedule.db
    . logfile -> [stderr]@%INFO
    . maxinterval -> 5.00 minutes (300s)
2023-04-02 04:00:18,106 INFO stopped: celery-beat (exit status 0)
2023-04-02 04:00:18,902 INFO stopped: celery (exit status 0)
2023-04-02 04:00:19,649 INFO stopped: gunicorn (exit status 0)
Paperless-ngx docker container starting...
Installing languages...
Hit:1 http://deb.debian.org/debian bullseye InRelease
Get:2 http://deb.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:3 http://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Fetched 92.4 kB in 1s (158 kB/s)
Reading package lists...
Package tesseract-ocr-ara already installed!
Creating directory /tmp/paperless
Adjusting permissions of paperless files. This may take a while.
Waiting for Redis...
Redis ping #0 failed.
Error: Error 111 connecting to 192.168.1.200:6379. Connection refused..
Waiting 5s
Redis ping #1 failed.
Error: Error 111 connecting to 192.168.1.200:6379. Connection refused..
Waiting 5s
Redis ping #2 failed.
Error: Error 111 connecting to 192.168.1.200:6379. Connection refused..
Waiting 5s
Redis ping #3 failed.
Error: Error 111 connecting to 192.168.1.200:6379. Connection refused..
Waiting 5s
Redis ping #4 failed.
Error: Error 111 connecting to 192.168.1.200:6379. Connection refused..
Waiting 5s
Failed to connect to redis using environment variable PAPERLESS_REDIS.

** Press ANY KEY to close this window ** 
Edited by s449
  • Upvote 1
Link to comment

I keep getting the "503 Server Error" when trying to convert to pdf with Paperless-ngx, Tika, and Gotenberg?

 

I can see that if I add the values on the docker for Paperless the path is wrong:

http://192.168.0.6:3002/forms/libreoffice/convert#

So I removed everything after the "http://192.168.0.6:3002" and then I got this:

image.thumb.png.adab027aa2e01db128cc77cd89888b86.png

And you can see that it adds "/forms/chromium/convert/html"

 

I read something about that the Tika, and Gotenberg dockers where optimized for paperless ngx?

So maybe that's the problem?

 

Anyway have anyone gotten this to work?


My configuration.

GOTENBERG_ENDPOINT is using port 3002 instead of port 3000.
image.thumb.png.e455209a2703c79482c28355b5583625.png

 

Link to comment
On 4/17/2022 at 7:28 PM, fk_muck1 said:

I installed these 2 containers:

Repository: gotenberg/gotenberg
Repository: apache/tika

In Paperless i added these 3 varibles:

Name / Key:    PAPERLESS_TIKA_ENABLED

Value:    1

 

Name / Key:    PAPERLESS_TIKA_ENDPOINT

Value:    http://IP-of-tika-container:9998

 

Name / Key:    PAPERLESS_TIKA_GOTENBERG_ENDPOINT

Value:    http://IP-of-gotenberg-container:3000/forms/libreoffice/convert#

 

 

http:// before IP seems to be important. Only IP didnt work for me.

Hope this helpes

11.jpg

12.jpg

13.jpg

 

Does this still work for you? I can see that the Gotenberg docker now only exists from v7 and upwards?
No matter what I do I cant get the Tika and Gotenberg working it looks like it adds some string to the API
image.thumb.png.bbaa6e05bfef67f36cffa39a5ef72104.png

Link to comment
4 hours ago, casperse said:

 

Does this still work for you? I can see that the Gotenberg docker now only exists from v7 and upwards?
No matter what I do I cant get the Tika and Gotenberg working it looks like it adds some string to the API


The only thing that I have changed in the meantime is that I switched gotenberg and tika to the Docker from the CA.
I haven't changed the settings and it works.

1.jpg

Link to comment

Update: I found that you need a version above 7 for Gotenberg and you need to use the IP as endpoints and not the "docker name" on the internal "proxynet" also many variables are missing from the docker if you want Inv. proxy working and also the integrations to the Tika & Gotenberg - BUT IT WORKS NOW! 🙂

 

Also used the link below to compare the docker conf.
paperless-ngx/docker-compose.sqlite-tika.yml at main · paperless-ngx/paperless-ngx (github.com)

 

 

Edited by casperse
Link to comment
  • 3 weeks later...

Hello, one question

first i have this PAPERLESS_FILENAME_FORMAT=

{created}-{correspondent}-{title}
with this i make 100 documents.

Then i found this file format

PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}

i find this much more bedder.

how can i convert my 100 documents to this file format?

THX

Link to comment
  • 2 weeks later...
On 4/20/2023 at 3:38 PM, Lucascoco said:

Hello, one question

first i have this PAPERLESS_FILENAME_FORMAT=

{created}-{correspondent}-{title}
with this i make 100 documents.

Then i found this file format

PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}

i find this much more bedder.

how can i convert my 100 documents to this file format?

THX

 

Log in the console:
https://docs.paperless-ngx.com/administration/#management-commands

 

and run: document_renamer

https://docs.paperless-ngx.com/administration/#renamer

  • Thanks 1
Link to comment

Hello, I have a very serious problem with my paperless-ngx install. In short: when paperless consumes a (fairly large) document, the consumtion fails but instead, the docker image starts to fill up rapidly to a point, where containers fail and I had to delete the entire image to get everything back up and running. Here is the paperless log message:

 

[2023-05-02 20:48:38,906] [INFO] [paperless.management.consumer] Received SIGINT, stopping inotify

[2023-05-02 20:48:38,911] [DEBUG] [paperless.management.consumer] Consumer exiting.

[2023-05-02 20:48:39,483] [ERROR] [paperless.handlers] Updating PaperlessTask failed

Traceback (most recent call last):

  File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task

    R = retval = fun(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__

    return self.run(*args, **kwargs)

  File "/usr/src/paperless/src/documents/tasks.py", line 99, in consume_file

    doc_barcode_info = barcodes.scan_file_for_barcodes(

  File "/usr/src/paperless/src/documents/barcodes.py", line 181, in scan_file_for_barcodes

    barcodes = _pdf2image_barcode_scan(pdf_filepath)

  File "/usr/src/paperless/src/documents/barcodes.py", line 158, in _pdf2image_barcode_scan

    pages_from_path = convert_from_path(

  File "/usr/local/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 250, in convert_from_path

    data, err = proc.communicate(timeout=timeout)

  File "/usr/local/lib/python3.9/subprocess.py", line 1134, in communicate

    stdout, stderr = self._communicate(input, endtime, timeout)

  File "/usr/local/lib/python3.9/subprocess.py", line 1979, in _communicate

    ready = selector.select(timeout)

  File "/usr/local/lib/python3.9/selectors.py", line 416, in select

    fd_event_list = self._selector.poll(timeout)

  File "/usr/local/lib/python3.9/site-packages/billiard/common.py", line 119, in _shutdown_cleanup

    sys.exit(-(256 - signum))

  File "/usr/local/lib/python3.9/site-packages/billiard/pool.py", line 283, in exit

    return _exit(status)

SystemExit: -241

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute

    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 357, in execute

    return Database.Cursor.execute(self, query, params)

sqlite3.IntegrityError: NOT NULL constraint failed: documents_paperlesstask.status

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/usr/src/paperless/src/documents/signals/handlers.py", line 594, in task_postrun_handler

    task_instance.save()

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 812, in save

    self.save_base(

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 863, in save_base

    updated = self._save_table(

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 976, in _save_table

    updated = self._do_update(

  File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 1040, in _do_update

    return filtered._update(values) > 0

  File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1216, in _update

    return query.get_compiler(self.db).execute_sql(CURSOR)

  File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql

    cursor = super().execute_sql(result_type)

  File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1398, in execute_sql

    cursor.execute(sql, params)

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute

    return self._execute_with_wrappers(

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers

    return executor(sql, params, many, context)

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute

    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__

    raise dj_exc_value.with_traceback(traceback) from exc_value

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute

    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 357, in execute

    return Database.Cursor.execute(self, query, params)

django.db.utils.IntegrityError: NOT NULL constraint failed: documents_paperlesstask.status

[2023-05-02 20:50:03,379] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume

 

Link to comment

I just got everything installed and have been getting things sorted. My attempt at sending some Word docs (.docx) has failed. I've tried uploading into the GUI and copied into the consume directory and both fail with file extension not recognized.

 

Here is the error:

1479488544_Screenshot2023-05-06095110.thumb.png.387eb8c5cdc1a41abea89c8ff5736926.png

 

I've configured the paperless-ngx docker file to point to Tika and Gotenberg by following this thread. Here are my settings:

2061935959_Screenshot2023-05-06095002.thumb.png.e3d5e1d1bf2d12410e5270b512b0f609.png

 

Here are the containers, pulled from the unRAID community applications:

2135373010_Screenshot2023-05-06095041.png.6e7daeca97fdec75513eb6b190d91735.png

 

I can hit the API on Tika server, not sure how to test Gotenberg, but then again, I don't really know what process actually allows recognition of the docx file. 

 

Lastly, I did try other Word docs and same result. (text files and PDFs are processing correctly).

 

Appreciate any recommendations, thanks!

 

Link to comment
On 5/8/2023 at 7:22 PM, Sky-Dragon said:

@birdsofprey02

Your PAPERLESS_TIKA_GOTENBERG_ENDPOINT is wrong, use only http://192.168.70.56:3000 .

You don't need the rest, see documentation https://docs.paperless-ngx.com/configuration/#tika

I've tried both with the extended URL, with just http://192.168.70.56:3000 and with http://192.168.70.56:3000/   still same error. I've restarted the container, stopped and started.   Not really sure what else to try.

Link to comment
  • 4 weeks later...
On 5/6/2023 at 9:55 AM, birdsofprey02 said:

Here are the containers, pulled from the unRAID community applications:

2135373010_Screenshot2023-05-06095041.png.6e7daeca97fdec75513eb6b190d91735.png

 

 

Why do the containers have different IP's (54, 55, 56)? They would normally all have the same IP as the UNRAID server, just each on a different port

 

Link to comment
  • 2 weeks later...
On 6/16/2023 at 3:14 PM, rhodo said:

 

Why do the containers have different IP's (54, 55, 56)? They would normally all have the same IP as the UNRAID server, just each on a different port

 

I have multiple vLANs setup on my unRAID server. Over 30 containers and none of them are running on the same IP as unRAID except maybe the Cloudflare Tunnel container. I don't think there's a requirement anywhere that docker containers have to run on the same IP as the host. The containers can all still talk to each other. 

Link to comment
  • 2 weeks later...

This might be a silly question, but how do I know if paperless-ngx is using redis? I have it installed and on the same custom docker network and I point paperless to "redis://[redis]:6379". Paperless is working but I have no idea if it is actually using redis.
I honestly don't really understand how external databases work. I'm also wondering how I backup the redis database because it doesn't have an appdata mapping.

Link to comment

Thannks for your response, I'm using Chrome

image.thumb.png.1fa206bbeb637ad2b27e0091d69b6fa8.png
Log:
 

[2023-07-19 16:41:49 +0800] [67] [INFO] Server is ready. Spawning workers
[2023-07-19 16:41:49,085] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2023-07-19 16:41:50,597] [INFO] [celery.beat] beat: Starting...
[2023-07-19 16:41:50,720] [INFO] [celery.beat] Scheduler: Sending due task Optimize the index (documents.tasks.index_optimize)
[2023-07-19 16:41:50,736] [INFO] [celery.beat] Scheduler: Sending due task Train the classifier (documents.tasks.train_classifier)
[2023-07-19 16:41:50,741] [INFO] [celery.beat] Scheduler: Sending due task Check all e-mail accounts (paperless_mail.tasks.process_mail_accounts)
[2023-07-19 16:41:50,799] [INFO] [celery.worker.consumer.connection] Connected to redis://192.168.2.53:6380//
[2023-07-19 16:41:50,836] [INFO] [celery.apps.worker] celery@b46f1761350f ready.
[2023-07-19 16:41:50,841] [INFO] [celery.worker.strategy] Task documents.tasks.index_optimize[9e6d5250-2cb0-49f4-a594-2f0255d576bf] received
[2023-07-19 16:41:50,845] [INFO] [celery.worker.strategy] Task documents.tasks.train_classifier[0e04c1e9-aca7-4e79-8fd2-e585ce5f828e] received
[2023-07-19 16:41:50,848] [INFO] [celery.worker.strategy] Task paperless_mail.tasks.process_mail_accounts[72088d1b-028f-42cf-95a3-ca600e2ac119] received
[2023-07-19 16:41:50,935] [INFO] [celery.app.trace] Task documents.tasks.index_optimize[9e6d5250-2cb0-49f4-a594-2f0255d576bf] succeeded in 0.05476205900777131s: None
[2023-07-19 16:41:51,187] [INFO] [celery.app.trace] Task documents.tasks.train_classifier[0e04c1e9-aca7-4e79-8fd2-e585ce5f828e] succeeded in 0.03916469099931419s: None
[2023-07-19 16:41:51,440] [INFO] [celery.app.trace] Task paperless_mail.tasks.process_mail_accounts[72088d1b-028f-42cf-95a3-ca600e2ac119] succeeded in 0.039933549996931106s: 'No new documents were added.'
[2023-07-19 16:42:07,459] [INFO] [paperless.management.consumer] Received SIGINT, stopping inotify

worker: Warm shutdown (MainProcess)
[2023-07-19 16:42:08 +0800] [67] [INFO] Handling signal: term
[2023-07-19 16:42:09 +0800] [67] [INFO] Shutting down: Master
 
 -------------- celery@b46f1761350f v5.3.0 (emerald-rush)
--- ***** ----- 
-- ******* ---- Linux-6.1.38-Unraid-x86_64-with-glibc2.36 2023-07-19 16:41:50
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         paperless:0x1462dcf8d3a0
- ** ---------- .> transport:   redis://192.168.2.53:6380//
- ** ---------- .> results:     
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- .> task events: ON
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . documents.tasks.bulk_update_documents
  . documents.tasks.consume_file
  . documents.tasks.index_optimize
  . documents.tasks.sanity_check
  . documents.tasks.train_classifier
  . documents.tasks.update_document_archive_file
  . paperless_mail.mail.apply_mail_action
  . paperless_mail.mail.error_callback
  . paperless_mail.tasks.process_mail_accounts

2023-07-19 01:42:06,457 WARN received SIGTERM indicating exit request
2023-07-19 01:42:06,458 INFO waiting for gunicorn, celery, celery-beat, consumer to die
2023-07-19 01:42:07,634 INFO stopped: consumer (exit status 0)
celery beat v5.3.0 (emerald-rush) is starting.
__    -    ... __   -        _
LocalTime -> 2023-07-19 16:41:50
Configuration ->
    . broker -> redis://192.168.2.53:6380//
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> /usr/src/paperless/data/celerybeat-schedule.db
    . logfile -> [stderr]@%INFO
    . maxinterval -> 5.00 minutes (300s)
2023-07-19 01:42:07,826 INFO stopped: celery-beat (exit status 0)
2023-07-19 01:42:08,574 INFO stopped: celery (exit status 0)
2023-07-19 01:42:09,305 INFO stopped: gunicorn (exit status 0)
Paperless-ngx docker container starting...
Creating directory /tmp/paperless
Adjusting permissions of paperless files. This may take a while.
Waiting for Redis...
Connected to Redis broker.
Apply database migrations...
Operations to perform:
  Apply all migrations: admin, auth, authtoken, contenttypes, django_celery_results, documents, guardian, paperless_mail, sessions
Running migrations:
  No migrations to apply.
Running Django checks
System check identified no issues (0 silenced).
Executing /usr/local/bin/paperless_cmd.sh
2023-07-19 01:42:19,643 INFO Set uid to user 0 succeeded
2023-07-19 01:42:19,646 INFO supervisord started with pid 1
2023-07-19 01:42:20,648 INFO spawned: 'gunicorn' with pid 67
2023-07-19 01:42:20,649 INFO spawned: 'celery' with pid 68
2023-07-19 01:42:20,651 INFO spawned: 'celery-beat' with pid 69
2023-07-19 01:42:20,653 INFO spawned: 'consumer' with pid 70
2023-07-19 01:42:20,656 INFO spawned: 'celery-flower' with pid 71
Checking if we should start flower...
Not starting flower
2023-07-19 01:42:20,664 INFO success: celery-flower entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2023-07-19 01:42:20,665 INFO exited: celery-flower (exit status 0; expected)
2023-07-19 01:42:21,665 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-07-19 01:42:21,666 INFO success: celery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-07-19 01:42:21,666 INFO success: celery-beat entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-07-19 01:42:21,666 INFO success: consumer entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
[2023-07-19 16:42:22,749] [INFO] [paperless.management.consumer] Using inotify to watch directory for changes: /usr/src/paperless/consume
[2023-07-19 16:42:22 +0800] [67] [INFO] Starting gunicorn 20.1.0
[2023-07-19 16:42:22 +0800] [67] [INFO] Listening at: http://[::]:8000 (67)
[2023-07-19 16:42:22 +0800] [67] [INFO] Using worker: paperless.workers.ConfigurableWorker
[2023-07-19 16:42:22 +0800] [67] [INFO] Server is ready. Spawning workers
[2023-07-19 16:42:22,939] [INFO] [celery.beat] beat: Starting...
 
 -------------- celery@b46f1761350f v5.3.0 (emerald-rush)
--- ***** ----- 
-- ******* ---- Linux-6.1.38-Unraid-x86_64-with-glibc2.36 2023-07-19 16:42:22
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         paperless:0x14a17911b3a0
- ** ---------- .> transport:   redis://192.168.2.53:6380//
- ** ---------- .> results:     
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ---- .> task events: ON
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . documents.tasks.bulk_update_documents
  . documents.tasks.consume_file
  . documents.tasks.index_optimize
  . documents.tasks.sanity_check
  . documents.tasks.train_classifier
  . documents.tasks.update_document_archive_file
  . paperless_mail.mail.apply_mail_action
  . paperless_mail.mail.error_callback
  . paperless_mail.tasks.process_mail_accounts

[2023-07-19 16:42:23,079] [INFO] [celery.worker.consumer.connection] Connected to redis://192.168.2.53:6380//
[2023-07-19 16:42:23,104] [INFO] [celery.apps.worker] celery@b46f1761350f ready.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.