Help with Paperless DockerHub -> Unraid


Recommended Posts

Thanks to everyone who worked on bringing paperless to Unraid! All of the work that has been put into it so far has been super helpful.

 

This is the biggest annoyance so far:
  https://github.com/the-paperless-project/paperless/issues/546
which basically means if you dump a bunch of files in the comsume folder, you can't make any changes to the database until they have finished being processed.

 

Currently the docker writes files with GID 1000, I've submitted a PR with a change that will allow us to use GID 100 (users) like we do with everything else on Unraid:
  https://github.com/the-paperless-project/paperless/pull/599
Once that is accepted the template will need a variable for:
  USERMAP_GID: 100

 

I haven't fully decided if I will use paperless, but I have some suggestions for the template to help people get started faster. I could submit a PR, but I figured the people who have been using this more might be in a better position to decide if this would be helpful:

  • In the 'Overview', include a link to the documentation:
    https://paperless.readthedocs.io/en/latest/
     
  • For the 'Data' path, set a default of /mnt/user/appdata/paperless/data with the following description:
    Container Path: /usr/src/paperless/data . 
    This contains the paperless database. Should be in appdata.
     
  • For the 'Media' path, set a default of /mnt/user/appdata/paperless/media with the following description:
    Container Path: /usr/src/paperless/media . 
    Once consumed, files will be stored here. You may wish to place this on the array instead of in appdata.
     
  • For the 'Consumption' path, set a default of /mnt/user/appdata/paperless/consume with the following description:
    Container Path: /consume . 
    Files placed here will be consumed by paperless.
     
  • For the 'Export' path, set a default of /mnt/user/appdata/paperless/export with the following description:
    Container Path: /export . 
    Location for files used by the exporter utility.
    See https://paperless.readthedocs.io/en/latest/utilities.html#the-exporter
     
  • For PAPERLESS_OCR_LANGUAGES, set a default value of "eng" and include the following description:
    Container Variable: PAPERLESS_OCR_LANGUAGES.
    Space-separated list of 3-letter language codes used for OCR. List of valid codes available here: https://www.loc.gov/standards/iso639-2/php/code_list.php
     
  • How about adding the PAPERLESS_TIME_ZONE variable, defaulted to "UTC", with the following description: 
    Container Variable: PAPERLESS_TIME_ZONE.
    Override the default UTC time zone. For details see: https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE
     
  • How about adding the PAPERLESS_INLINE_DOC variable, defaulted to "false", with the following description:
    Container Variable: PAPERLESS_INLINE_DOC.
    When true, PDF files will be viewed in the browser. 
    When false, PDF files will be downloaded.

 

Link to comment

Here are a couple of scripts that give a little insight into the paperless consumption process. I called them "pre" and "post", and put them in the "data" directory:

 

/mnt/user/appdata/paperless/data/pre

#!/usr/bin/env bash
# https://paperless.readthedocs.io/en/latest/consumption.html#hooking-into-the-consumption-process
echo Begin pre-processing script
echo - Original filename: [${1}]
echo End pre-processing script

/mnt/user/appdata/paperless/data/post

#!/usr/bin/env bash
# https://paperless.readthedocs.io/en/latest/consumption.html#hooking-into-the-consumption-process
echo Begin post-processing script
echo - Document id:    [${1}]
echo - Generated filename: [${2}]
echo - Source path:    [${3}]
echo - Thumbnail path: [${4}]
echo - Download URL:   [${5}]
echo - Thumbnail URL:  [${6}]
echo - Correspondent:  [${7}]
echo - Tags:           [${8}]
echo End post-processing script

Then add these two variables to the paperless-consumer docker:

PAPERLESS_PRE_CONSUME_SCRIPT
/usr/src/paperless/data/pre

PAPERLESS_POST_CONSUME_SCRIPT
/usr/src/paperless/data/post

You can see the output by watching the paperless-consumer docker logs 

Link to comment
On 1/6/2020 at 7:39 PM, ljm42 said:

I haven't fully decided if I will use paperless, but I have some suggestions for the template to help people get started faster. I could submit a PR, but I figured the people who have been using this more might be in a better position to decide if this would be helpful:

  • In the 'Overview', include a link to the documentation:
    https://paperless.readthedocs.io/en/latest/
     
  • For the 'Data' path, set a default of /mnt/user/appdata/paperless/data with the following description:
    Container Path: /usr/src/paperless/data . 
    This contains the paperless database. Should be in appdata.
     
  • For the 'Media' path, set a default of /mnt/user/appdata/paperless/media with the following description:
    Container Path: /usr/src/paperless/media . 
    Once consumed, files will be stored here. You may wish to place this on the array instead of in appdata.
     
  • For the 'Consumption' path, set a default of /mnt/user/appdata/paperless/consume with the following description:
    Container Path: /consume . 
    Files placed here will be consumed by paperless.
     
  • For the 'Export' path, set a default of /mnt/user/appdata/paperless/export with the following description:
    Container Path: /export . 
    Location for files used by the exporter utility.
    See https://paperless.readthedocs.io/en/latest/utilities.html#the-exporter
     
  • For PAPERLESS_OCR_LANGUAGES, set a default value of "eng" and include the following description:
    Container Variable: PAPERLESS_OCR_LANGUAGES.
    Space-separated list of 3-letter language codes used for OCR. List of valid codes available here: https://www.loc.gov/standards/iso639-2/php/code_list.php
     
  • How about adding the PAPERLESS_TIME_ZONE variable, defaulted to "UTC", with the following description: 
    Container Variable: PAPERLESS_TIME_ZONE.
    Override the default UTC time zone. For details see: https://docs.djangoproject.com/en/1.10/ref/settings/#std:setting-TIME_ZONE
     
  • How about adding the PAPERLESS_INLINE_DOC variable, defaulted to "false", with the following description:
    Container Variable: PAPERLESS_INLINE_DOC.
    When true, PDF files will be viewed in the browser. 
    When false, PDF files will be downloaded.

 

 

I would love to see this as a pull request for the docker template. A few minor remarks from my side:

  • I wouldn't add defaults for Media, Consumption and Export paths, since this may mislead people placing it in their appdata. Though, I like the comment!
  • I suggest to add `PAPERLESS_OCR_LANGUAGE`, defaulted to "eng" with the following description or similar:  "Override the language that tesseract will attempt to use when parsing documents. Use a 3-letter language code consistent with ISO 639: https://www.loc.gov/standards/iso639-2/php/code_list.php".
  • Maybe add a short explanation about the unRaid docker template installation i.e. paperless-webserver docker and paperless-consumer docker instances.
  • Maybe add a warning about NFS and inotify issues to the 'Consumption' path such as:

    "If you are using NFS mounts for the consume directory you also need to change the command to turn off inotify as it doesn’t work with NFS command: ["document_consumer", "--no-inotify"]". Maybe shorter?

Edited by T0a
Link to comment
On 1/4/2020 at 6:43 PM, T0a said:

4. Install paperless as consumer service in order to process documents in your /consume folder

4.1 Go to the docker UI, click add container and select the paperless template

4.2 Rename the container to "paperless-consumer"

4.3 Remove the port

4.4 Change the parameter "postargument" to "document_consumer --no-inotify --loop-time 60". This will start the consumer service with a 60 seconds time loop. It also disables the inotify feature, which does not work with consumer/ folders represented as NFS shares.

Thanks mate, this was GOLD :D

Link to comment

Sort of adjacent question: What scanners are people using with Paperless and, importantly, which are the least nonsense to use day to day. I'd like something that I can carelessly throw paper into and press one button to scan and upload if I can. Home budget.

Link to comment
On 1/8/2020 at 1:45 AM, l3gion said:

Thanks mate, this was GOLD :D

 

Glad, I could help!

 

3 hours ago, hpka said:

Sort of adjacent question: What scanners are people using with Paperless and, importantly, which are the least nonsense to use day to day. I'd like something that I can carelessly throw paper into and press one button to scan and upload if I can. Home budget.

 

Do you mind asking this question in the new paperless docker template support thread again? I already added a section for scanner. I also created a pull request for changing the support thread in the docker template.

Edited by T0a
  • Thanks 1
Link to comment
  • trurl locked this topic
9 minutes ago, T0a said:

Do you mind asking this question in the new paperless docker template support thread again? I already added a section for scanner. I also created a pull request for changing the support thread in the docker template.

Here is a link to TOa support thread for this. Please go there for further support.

 

 

  • Thanks 1
Link to comment
Guest
This topic is now closed to further replies.