[Support] Linuxserver.io - Papermerge


Recommended Posts

hell no -- so easy -- 🙈🤣

Seems to work, because now it says "OCR = GERMAN" (see screen) 

I'll verify that with some documents containing special german chars like Ü-Ä-Ö 😜

 

Due to my absolutly knowledge-less please let me please ask another question - if the solution is just easy as you describe it, why there's a need for a "mod" instead of wrapping the needed tesserac ocr files (1540kb of data as said in the shell log below) directly in the docker container providing all users a simple language switch?

 

And ...  I get some (maybe?) errors when running the shell command - as you can see here.

Anything to worry about? 

 

# apt-get install tesseract-ocr-deu
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  tesseract-ocr-deu
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 745 kB of archives.
After this operation, 1540 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal/universe amd64 tesseract-ocr-deu all 1:4.00~git30-7274cfa-1 [745 kB]
Fetched 745 kB in 0s (3742 kB/s)          
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US.UTF-8",
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7, <> line 1.)
debconf: falling back to frontend: Teletype
Selecting previously unselected package tesseract-ocr-deu.
(Reading database ... 10584 files and directories currently installed.)
Preparing to unpack .../tesseract-ocr-deu_1%3a4.00~git30-7274cfa-1_all.deb ...
Unpacking tesseract-ocr-deu (1:4.00~git30-7274cfa-1) ...
Setting up tesseract-ocr-deu (1:4.00~git30-7274cfa-1) ...

 

Bildschirmfoto 2020-10-06 um 20.22.43.png

Edited by Maddeen
Link to comment
10 hours ago, Maddeen said:

Due to my absolutly knowledge-less please let me please ask another question - if the solution is just easy as you describe it, why there's a need for a "mod" instead of wrapping the needed tesserac ocr files (1540kb of data as said in the shell log below) directly in the docker container providing all users a simple language switch?

i guess the mod is the cleanest solution as its provided by the builder of the container ;)

10 hours ago, Maddeen said:

And ...  I get some (maybe?) errors when running the shell command - as you can see here.

Anything to worry about? 

dunno.... it seems to work ;)

  • Like 1
Link to comment
12 hours ago, b0mb said:

i guess the mod is the cleanest solution as its provided by the builder of the container

Mhhh .. isn't the cleanest solution always a full integration instead of providing a mod?

Eugen (the dev of papermerge) works in germany and by default he's providing papermerge with both languages.

So imho this docker is a mod - because it dont provide the default files and must be "change"

 

May be @linuxserver.io can bring some light in here?

Why do you not provide papermerge "as it is" with both languages and make this detour with a docker (only englisch) and an extra mod for the docker, to bring features (german) that are by default integrated? I can't understand that - but I honestly doesnt have any clue about dockers :)

And - additionally - your docker seems to have a bug. As soon as I add a file, the CPU is firing up (100%) to make the ocr analyzis - fine.

But in my case this 100% CPU load will never stop - I need to stop the whole docker - after a restart everythings fine.
I have tested for about 2 hours --- constantly 100%. That can't be correct.
Due to my job, ocr analyzis and extraction is my daily business (Abbyy, Kofax, Tesseract too) and a single page must be (ocr) analyzed and fully extracted within seconds (with my i7 6700 - even if the docker is pinned to two cores only). 

 

Can I provide you anything so that you can fix it?

Thanks you in advance.

Link to comment
1 hour ago, Maddeen said:

Mhhh .. isn't the cleanest solution always a full integration instead of providing a mod?

Eugen (the dev of papermerge) works in germany and by default he's providing papermerge with both languages.

So imho this docker is a mod - because it dont provide the default files and must be "change"

 

May be @linuxserver.io can bring some light in here?

Why do you not provide papermerge "as it is" with both languages and make this detour with a docker (only englisch) and an extra mod for the docker, to bring features (german) that are by default integrated? I can't understand that - but I honestly doesnt have any clue about dockers :)

And - additionally - your docker seems to have a bug. As soon as I add a file, the CPU is firing up (100%) to make the ocr analyzis - fine.

But in my case this 100% CPU load will never stop - I need to stop the whole docker - after a restart everythings fine.
I have tested for about 2 hours --- constantly 100%. That can't be correct.
Due to my job, ocr analyzis and extraction is my daily business (Abbyy, Kofax, Tesseract too) and a single page must be (ocr) analyzed and fully extracted within seconds (with my i7 6700 - even if the docker is pinned to two cores only). 

 

Can I provide you anything so that you can fix it?

Thanks you in advance.

We didn't add all languages as that increases the size of the container with over 600MB, which I already explained in a post in the previous page.

 

I'm not sure why you would expect German to be there as we are mostly English speaking and none of us are German. So it's not so strange we didn't add the dependency for German ocr.

We use a mod for adding your language so it doesn't take up 600 MB of useless space for everyone else. If that is too much trouble for you, you don't have to use it.

 

 

Jump on our discord if you want to try to solve the pegging of the core. We are not hanging out here, but on Discord there are always someone available to help, and not only linuxserver.io guys. Probably it is not related to the container itself, but either papermerge or a dependency.

Edited by saarg
Link to comment
19 minutes ago, saarg said:

We didn't add all languages as that increases the size of the container with over 600MB, which I already explained in a post in the previous page.

 

I'm not sure why you would expect German to be there as we are mostly English speaking and none of us are German. So it's not so strange we didn't add the dependency for German ocr.

We use a mod for adding your language so it doesn't take up 600 MB of useless space for everyone else. If that is too much trouble for you, you don't have to use it.

That wasn't a complaint - I'll wait - but I just want to understand it - thats why I'm asking. For me it's not logic
Because at first - I don't know why you talking about 600MB if - as seen in my log above - the german tesseract comes within 1,5MB of extra storage.
And yes - this is englisch speeking forum - but as far as I found out, papermerge comes by default with all languages and the default language was german (because Eugen is a german developer)
So (for me - a not-developer) it sounds easier to just transist a software to a docker without any changes instead of customizing a product by cutting out default features/settings.

But no need to explain that --- you're the devs and you do whatever you like - fine for me!

I'll join your discord - thanks for this. May be some one will ran into the same issues in the future.

Link to comment
18 minutes ago, Maddeen said:

That wasn't a complaint - I'll wait - but I just want to understand it - thats why I'm asking. For me it's not logic
Because at first - I don't know why you talking about 600MB if - as seen in my log above - the german tesseract comes within 1,5MB of extra storage.
And yes - this is englisch speeking forum - but as far as I found out, papermerge comes by default with all languages and the default language was german (because Eugen is a german developer)
So (for me - a not-developer) it sounds easier to just transist a software to a docker without any changes instead of customizing a product by cutting out default features/settings.

But no need to explain that --- you're the devs and you do whatever you like - fine for me!

I'll join your discord - thanks for this. May be some one will ran into the same issues in the future.

If we add German we have to add other languages also. So the full language pack for tesseract is over 600MB.

With the mod you just have to add an environment variable for the German tesseract package.

 

We build papermerge from source and need to supply a config file as there isn't a default one, just an example one has to change for ones need.

  • Like 1
Link to comment

after todays latest update i get "internal server error" while opening papermerge

 

this is what the container log says

 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./manage.py", line 18, in <module>
raise ImportError(
ImportError: Couldn't import Django. Are you sure it's installed and available on your PYTHONPATH environment variable? Did you forget to activate a virtual environment?
[uwsgi-daemons] respawning "/usr/bin/python3 ./manage.py worker" (uid: 1000 gid: 100)
Traceback (most recent call last):
File "./manage.py", line 10, in <module>
from django.core.management import execute_from_command_line
ModuleNotFoundError: No module named 'django'

edit: meanwhile the container has been fixed ;)

Edited by b0mb
correction
Link to comment
2 hours ago, blaine07 said:

Anyone have any idea if their is a variable for the Clock or setting timezone? Looking at last user login, having just logged in the time is WAY off. 

Have you checked that the TZ env variable is in the run command and that it is correct?

Try to exec the date command inside the container and see if that is correct.

Link to comment
3 minutes ago, saarg said:

Have you checked that the TZ env variable is in the run command and that it is correct?

Try to exec the date command inside the container and see if that is correct.

Appears the TZ in run command is correct; Not sure how to run the date command inside container exactly? 😞

301594056_Screenshot2020-10-12at15_10_37.thumb.png.f83d370ce82b4d656bb496381ac9ef4d.png

 

Link to comment
3 hours ago, saarg said:

Click the container logo and choose console. Then type date and hit enter.

The terminal is reporting the correct time.

"# date
Mon Oct 12 18:36:08 CDT 2020
#"

 

Papermerge however in the GUI shows my login time as "Oct. 12, 2020, 11:36 p.m."

 

Something awry somewhere. DOH. It got the "36" part right. :scratches head:

Link to comment
6 hours ago, blaine07 said:

The terminal is reporting the correct time.

"# date
Mon Oct 12 18:36:08 CDT 2020
#"

 

Papermerge however in the GUI shows my login time as "Oct. 12, 2020, 11:36 p.m."

 

Something awry somewhere. DOH. It got the "36" part right. :scratches head:

Looks like you might have to set the correct timezone in papermerge. Isn't there any setting you can set?

Link to comment

@blaine07 it would appear my times are incorrect as well. When I review the logs via the GUI in Papermerge, the times displaying for me are about 5 hours ahead of what my actual time is. Wouldn't have noticed it without you pointing it out lol

 

Right now, I am having some difficulty being able to get access to drop anything into the Queue folder. I've updated the config file like @blaine07 has, but for whatever reason from my other computers on the network I am unable to do anything with the Queue folder. Seems like it is a permissions issue, but I can't figure out how to update the permissions.

Link to comment

@ShortBusHero - thats why I set up a specific "inbox" folder. I can't copy any files to the queue folder, too. 

I guess that the queue file is NOT proposed to be an inbox folder.
It's - what it sounds like - a queue - not an inbox. Maybe its used for the OCR when uploading a bunch of files and queueing those for the ocr function. 

So just create another folder e.g. "in" - set it up in the appdata config file and I'm pretty sure that you ready to use

 

Here's my papermerge.conf.py

MEDIA_DIR = "/data/media"
STATIC_DIR = "/app/papermerge/static"
MEDIA_URL = "/media/"
STATIC_URL = "/static/"
IMPORTER_DIR = "/data/in"
IMPORTER_URL = "/in/"

LANGUAGE_CODE = "de-DE"

OCR_DEFAULT_LANGUAGE = "deu"

OCR_LANGUAGES = {
    "deu": "Deutsch",
}

 

@blaine07 my timestamps are also not correct. 

current time of my MacMini = 17:54

time in the papermerge log = 15:54

time in console (unraid and docker) after the command "date" = Tue Oct 13 17:54:36 CEST 2020

 

That "exact 2 hours difference" thing seems to be a well known problem when googling that .. some kind of UTC (Linux standard) in addition to another time zone (Berlin +1 hours ) AND the european day light saving zone (+1 hour) ... 

 

 

  • Like 1
Link to comment
  • 2 weeks later...

Just getting this setup, but /data/ contains the papermerge.db file along with all the uploaded files/PDFs, etc.

 

I would assume we all want that papermerge.db file in our appdata folders so it's on our SSDs and part of weekly backups, etc.

Is there a way to get this papermerg.db file into the appdata while leaving the actual data (PDFs) somewhere else, such as on the array?

 

Or is this werhe we go into papermerge.conf.py and setup MEDIA_DIR = "/data/media" to be another path we map, then we can direct /data to the appdata folder as well?

 

Thanks in advance.

Link to comment
  • 2 weeks later...

Hey i use this also use this tool in german and for all the not english users check out the mod.

 

https://github.com/linuxserver/docker-mods/tree/papermerge-multilangocr

 

With this you can add additional languages.

 

The problem with (console in the container->install additional languages) is they are gone with every upade.

 

See the config here for german:

XqPbMTX.png

 

Edited by d3fault
  • Like 3
Link to comment
  • 2 weeks later...

Hey,

 

love the docker so far.

I was really excited about the download feature, because it also downloaded to whole folder structure, which made it easier for me to backup the files.

But for some reason i can only successfully download certain folders.

When i try to download the "other" folders i get a damaged 0 byte file.

Unfortunately there is nothing in the logs.

 

 

Link to comment

@d3fault - thanks, that seems to work :)

 

But can you please verify the timestamps in you log with the real time? 

I set my timezone to Europe/Berlin as well but time in the logs differs one hour to the current time 

726979649_Bildschirmfoto2020-11-14um12_24_15.png.b8067f078a4de5d6c78a9a32b486b984.png

 

@CorneliousJD I found that parameter in the settings.py.

Maybe you can change this - but I'm not a docker and/or papermerge specialist - so try at your own risk. 

But if you achieve it please leave some feedback for others. 

 

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": "/data/papermerge.db",
    }
}

 

Bildschirmfoto 2020-11-14 um 11.24.03.png

Edited by Maddeen
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.