Maddeen Posted October 6, 2020 Share Posted October 6, 2020 @b0mb - mhhh that sounds good, but how I get to docker shell?!? Sorry - I'm relatively new to unraid AND docker as well... 🙈 German interface changed successfully - thanks for that, too!! Quote Link to comment
b0mb Posted October 6, 2020 Share Posted October 6, 2020 4 minutes ago, Maddeen said: mhhh that sounds good, but how I get to docker shell?!? Sorry - I'm relatively new to unraid AND docker as well... 🙈 unRAID Web Interface => Docker = Papermerge Container Icon => left mousebutton => >_Console 1 Quote Link to comment
Maddeen Posted October 6, 2020 Share Posted October 6, 2020 (edited) hell no -- so easy -- 🙈🤣 Seems to work, because now it says "OCR = GERMAN" (see screen) I'll verify that with some documents containing special german chars like Ü-Ä-Ö 😜 Due to my absolutly knowledge-less please let me please ask another question - if the solution is just easy as you describe it, why there's a need for a "mod" instead of wrapping the needed tesserac ocr files (1540kb of data as said in the shell log below) directly in the docker container providing all users a simple language switch? And ... I get some (maybe?) errors when running the shell command - as you can see here. Anything to worry about? # apt-get install tesseract-ocr-deu Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: tesseract-ocr-deu 0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded. Need to get 745 kB of archives. After this operation, 1540 kB of additional disk space will be used. Get:1 http://archive.ubuntu.com/ubuntu focal/universe amd64 tesseract-ocr-deu all 1:4.00~git30-7274cfa-1 [745 kB] Fetched 745 kB in 0s (3742 kB/s) perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US.UTF-8", LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). debconf: unable to initialize frontend: Dialog debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7, <> line 1.) debconf: falling back to frontend: Teletype Selecting previously unselected package tesseract-ocr-deu. (Reading database ... 10584 files and directories currently installed.) Preparing to unpack .../tesseract-ocr-deu_1%3a4.00~git30-7274cfa-1_all.deb ... Unpacking tesseract-ocr-deu (1:4.00~git30-7274cfa-1) ... Setting up tesseract-ocr-deu (1:4.00~git30-7274cfa-1) ... Edited October 6, 2020 by Maddeen Quote Link to comment
b0mb Posted October 7, 2020 Share Posted October 7, 2020 10 hours ago, Maddeen said: Due to my absolutly knowledge-less please let me please ask another question - if the solution is just easy as you describe it, why there's a need for a "mod" instead of wrapping the needed tesserac ocr files (1540kb of data as said in the shell log below) directly in the docker container providing all users a simple language switch? i guess the mod is the cleanest solution as its provided by the builder of the container 10 hours ago, Maddeen said: And ... I get some (maybe?) errors when running the shell command - as you can see here. Anything to worry about? dunno.... it seems to work 1 Quote Link to comment
Maddeen Posted October 7, 2020 Share Posted October 7, 2020 12 hours ago, b0mb said: i guess the mod is the cleanest solution as its provided by the builder of the container Mhhh .. isn't the cleanest solution always a full integration instead of providing a mod? Eugen (the dev of papermerge) works in germany and by default he's providing papermerge with both languages. So imho this docker is a mod - because it dont provide the default files and must be "change" May be @linuxserver.io can bring some light in here? Why do you not provide papermerge "as it is" with both languages and make this detour with a docker (only englisch) and an extra mod for the docker, to bring features (german) that are by default integrated? I can't understand that - but I honestly doesnt have any clue about dockers And - additionally - your docker seems to have a bug. As soon as I add a file, the CPU is firing up (100%) to make the ocr analyzis - fine. But in my case this 100% CPU load will never stop - I need to stop the whole docker - after a restart everythings fine. I have tested for about 2 hours --- constantly 100%. That can't be correct. Due to my job, ocr analyzis and extraction is my daily business (Abbyy, Kofax, Tesseract too) and a single page must be (ocr) analyzed and fully extracted within seconds (with my i7 6700 - even if the docker is pinned to two cores only). Can I provide you anything so that you can fix it? Thanks you in advance. Quote Link to comment
saarg Posted October 7, 2020 Share Posted October 7, 2020 (edited) 1 hour ago, Maddeen said: Mhhh .. isn't the cleanest solution always a full integration instead of providing a mod? Eugen (the dev of papermerge) works in germany and by default he's providing papermerge with both languages. So imho this docker is a mod - because it dont provide the default files and must be "change" May be @linuxserver.io can bring some light in here? Why do you not provide papermerge "as it is" with both languages and make this detour with a docker (only englisch) and an extra mod for the docker, to bring features (german) that are by default integrated? I can't understand that - but I honestly doesnt have any clue about dockers And - additionally - your docker seems to have a bug. As soon as I add a file, the CPU is firing up (100%) to make the ocr analyzis - fine. But in my case this 100% CPU load will never stop - I need to stop the whole docker - after a restart everythings fine. I have tested for about 2 hours --- constantly 100%. That can't be correct. Due to my job, ocr analyzis and extraction is my daily business (Abbyy, Kofax, Tesseract too) and a single page must be (ocr) analyzed and fully extracted within seconds (with my i7 6700 - even if the docker is pinned to two cores only). Can I provide you anything so that you can fix it? Thanks you in advance. We didn't add all languages as that increases the size of the container with over 600MB, which I already explained in a post in the previous page. I'm not sure why you would expect German to be there as we are mostly English speaking and none of us are German. So it's not so strange we didn't add the dependency for German ocr. We use a mod for adding your language so it doesn't take up 600 MB of useless space for everyone else. If that is too much trouble for you, you don't have to use it. Jump on our discord if you want to try to solve the pegging of the core. We are not hanging out here, but on Discord there are always someone available to help, and not only linuxserver.io guys. Probably it is not related to the container itself, but either papermerge or a dependency. Edited October 7, 2020 by saarg Quote Link to comment
Maddeen Posted October 7, 2020 Share Posted October 7, 2020 19 minutes ago, saarg said: We didn't add all languages as that increases the size of the container with over 600MB, which I already explained in a post in the previous page. I'm not sure why you would expect German to be there as we are mostly English speaking and none of us are German. So it's not so strange we didn't add the dependency for German ocr. We use a mod for adding your language so it doesn't take up 600 MB of useless space for everyone else. If that is too much trouble for you, you don't have to use it. That wasn't a complaint - I'll wait - but I just want to understand it - thats why I'm asking. For me it's not logic Because at first - I don't know why you talking about 600MB if - as seen in my log above - the german tesseract comes within 1,5MB of extra storage. And yes - this is englisch speeking forum - but as far as I found out, papermerge comes by default with all languages and the default language was german (because Eugen is a german developer) So (for me - a not-developer) it sounds easier to just transist a software to a docker without any changes instead of customizing a product by cutting out default features/settings. But no need to explain that --- you're the devs and you do whatever you like - fine for me! I'll join your discord - thanks for this. May be some one will ran into the same issues in the future. Quote Link to comment
b0mb Posted October 7, 2020 Share Posted October 7, 2020 3 minutes ago, Maddeen said: Because at first - I don't know why you talking about 600MB if - as seen in my log above - the german tesseract comes within 1,5MB of extra storage. i totally agree with this Quote Link to comment
saarg Posted October 7, 2020 Share Posted October 7, 2020 18 minutes ago, Maddeen said: That wasn't a complaint - I'll wait - but I just want to understand it - thats why I'm asking. For me it's not logic Because at first - I don't know why you talking about 600MB if - as seen in my log above - the german tesseract comes within 1,5MB of extra storage. And yes - this is englisch speeking forum - but as far as I found out, papermerge comes by default with all languages and the default language was german (because Eugen is a german developer) So (for me - a not-developer) it sounds easier to just transist a software to a docker without any changes instead of customizing a product by cutting out default features/settings. But no need to explain that --- you're the devs and you do whatever you like - fine for me! I'll join your discord - thanks for this. May be some one will ran into the same issues in the future. If we add German we have to add other languages also. So the full language pack for tesseract is over 600MB. With the mod you just have to add an environment variable for the German tesseract package. We build papermerge from source and need to supply a config file as there isn't a default one, just an example one has to change for ones need. 1 Quote Link to comment
b0mb Posted October 7, 2020 Share Posted October 7, 2020 2 minutes ago, saarg said: With the mod you just have to add an environment variable for the German tesseract package. thx a lot! the mod is a nice solution to satisfy the mob Quote Link to comment
b0mb Posted October 11, 2020 Share Posted October 11, 2020 (edited) after todays latest update i get "internal server error" while opening papermerge this is what the container log says During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./manage.py", line 18, in <module> raise ImportError( ImportError: Couldn't import Django. Are you sure it's installed and available on your PYTHONPATH environment variable? Did you forget to activate a virtual environment? [uwsgi-daemons] respawning "/usr/bin/python3 ./manage.py worker" (uid: 1000 gid: 100) Traceback (most recent call last): File "./manage.py", line 10, in <module> from django.core.management import execute_from_command_line ModuleNotFoundError: No module named 'django' edit: meanwhile the container has been fixed Edited October 12, 2020 by b0mb correction Quote Link to comment
Maddeen Posted October 12, 2020 Share Posted October 12, 2020 And I can confirm, that my mentioned bug with a constant 100% CPU load is gone, too! Nice work! Quote Link to comment
blaine07 Posted October 12, 2020 Share Posted October 12, 2020 Anyone have any idea if their is a variable for the Clock or setting timezone? Looking at last user login, having just logged in the time is WAY off. Quote Link to comment
saarg Posted October 12, 2020 Share Posted October 12, 2020 2 hours ago, blaine07 said: Anyone have any idea if their is a variable for the Clock or setting timezone? Looking at last user login, having just logged in the time is WAY off. Have you checked that the TZ env variable is in the run command and that it is correct? Try to exec the date command inside the container and see if that is correct. Quote Link to comment
blaine07 Posted October 12, 2020 Share Posted October 12, 2020 3 minutes ago, saarg said: Have you checked that the TZ env variable is in the run command and that it is correct? Try to exec the date command inside the container and see if that is correct. Appears the TZ in run command is correct; Not sure how to run the date command inside container exactly? 😞 Quote Link to comment
saarg Posted October 12, 2020 Share Posted October 12, 2020 9 minutes ago, blaine07 said: Appears the TZ in run command is correct; Not sure how to run the date command inside container exactly? 😞 Click the container logo and choose console. Then type date and hit enter. Quote Link to comment
blaine07 Posted October 12, 2020 Share Posted October 12, 2020 3 hours ago, saarg said: Click the container logo and choose console. Then type date and hit enter. The terminal is reporting the correct time. "# date Mon Oct 12 18:36:08 CDT 2020 #" Papermerge however in the GUI shows my login time as "Oct. 12, 2020, 11:36 p.m." Something awry somewhere. DOH. It got the "36" part right. :scratches head: Quote Link to comment
saarg Posted October 13, 2020 Share Posted October 13, 2020 6 hours ago, blaine07 said: The terminal is reporting the correct time. "# date Mon Oct 12 18:36:08 CDT 2020 #" Papermerge however in the GUI shows my login time as "Oct. 12, 2020, 11:36 p.m." Something awry somewhere. DOH. It got the "36" part right. :scratches head: Looks like you might have to set the correct timezone in papermerge. Isn't there any setting you can set? Quote Link to comment
blaine07 Posted October 13, 2020 Share Posted October 13, 2020 Looks like you might have to set the correct timezone in papermerge. Isn't there any setting you can set?I’m guessing your right but not seeing it anywhere. Settings menu VERY limited. Not seeing a variable to set anywhere. Anyone else’s time grossly incorrect? Quote Link to comment
ShortBusHero Posted October 13, 2020 Share Posted October 13, 2020 @blaine07 it would appear my times are incorrect as well. When I review the logs via the GUI in Papermerge, the times displaying for me are about 5 hours ahead of what my actual time is. Wouldn't have noticed it without you pointing it out lol Right now, I am having some difficulty being able to get access to drop anything into the Queue folder. I've updated the config file like @blaine07 has, but for whatever reason from my other computers on the network I am unable to do anything with the Queue folder. Seems like it is a permissions issue, but I can't figure out how to update the permissions. Quote Link to comment
Maddeen Posted October 13, 2020 Share Posted October 13, 2020 @ShortBusHero - thats why I set up a specific "inbox" folder. I can't copy any files to the queue folder, too. I guess that the queue file is NOT proposed to be an inbox folder. It's - what it sounds like - a queue - not an inbox. Maybe its used for the OCR when uploading a bunch of files and queueing those for the ocr function. So just create another folder e.g. "in" - set it up in the appdata config file and I'm pretty sure that you ready to use Here's my papermerge.conf.py MEDIA_DIR = "/data/media" STATIC_DIR = "/app/papermerge/static" MEDIA_URL = "/media/" STATIC_URL = "/static/" IMPORTER_DIR = "/data/in" IMPORTER_URL = "/in/" LANGUAGE_CODE = "de-DE" OCR_DEFAULT_LANGUAGE = "deu" OCR_LANGUAGES = { "deu": "Deutsch", } @blaine07 my timestamps are also not correct. current time of my MacMini = 17:54 time in the papermerge log = 15:54 time in console (unraid and docker) after the command "date" = Tue Oct 13 17:54:36 CEST 2020 That "exact 2 hours difference" thing seems to be a well known problem when googling that .. some kind of UTC (Linux standard) in addition to another time zone (Berlin +1 hours ) AND the european day light saving zone (+1 hour) ... 1 Quote Link to comment
CorneliousJD Posted October 22, 2020 Share Posted October 22, 2020 Just getting this setup, but /data/ contains the papermerge.db file along with all the uploaded files/PDFs, etc. I would assume we all want that papermerge.db file in our appdata folders so it's on our SSDs and part of weekly backups, etc. Is there a way to get this papermerg.db file into the appdata while leaving the actual data (PDFs) somewhere else, such as on the array? Or is this werhe we go into papermerge.conf.py and setup MEDIA_DIR = "/data/media" to be another path we map, then we can direct /data to the appdata folder as well? Thanks in advance. Quote Link to comment
d3fault Posted November 3, 2020 Share Posted November 3, 2020 (edited) Hey i use this also use this tool in german and for all the not english users check out the mod. https://github.com/linuxserver/docker-mods/tree/papermerge-multilangocr With this you can add additional languages. The problem with (console in the container->install additional languages) is they are gone with every upade. See the config here for german: Edited November 3, 2020 by d3fault 3 Quote Link to comment
Adeon Posted November 12, 2020 Share Posted November 12, 2020 Hey, love the docker so far. I was really excited about the download feature, because it also downloaded to whole folder structure, which made it easier for me to backup the files. But for some reason i can only successfully download certain folders. When i try to download the "other" folders i get a damaged 0 byte file. Unfortunately there is nothing in the logs. Quote Link to comment
Maddeen Posted November 14, 2020 Share Posted November 14, 2020 (edited) @d3fault - thanks, that seems to work But can you please verify the timestamps in you log with the real time? I set my timezone to Europe/Berlin as well but time in the logs differs one hour to the current time @CorneliousJD I found that parameter in the settings.py. Maybe you can change this - but I'm not a docker and/or papermerge specialist - so try at your own risk. But if you achieve it please leave some feedback for others. DATABASES = { "default": { "ENGINE": "django.db.backends.sqlite3", "NAME": "/data/papermerge.db", } } Edited November 14, 2020 by Maddeen Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.