Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[Support] Paperless GPT

Featured Replies

[Template only, I am not the container author/maintainer]

 

Project: https://github.com/icereed/paperless-gpt

Support: https://github.com/icereed/paperless-gpt/issues

Registry: https://hub.docker.com/r/icereed/paperless-gpt

 

paperless-gpt seamlessly pairs with paperless-ngx to generate AI-powered document titles and tags, saving you hours of manual sorting. While other tools may offer AI chat features, paperless-gpt stands out by supercharging OCR with LLMs-ensuring high accuracy, even with tricky scans. If you're craving next-level text extraction and effortless document organization, this is your solution.
Description of container variables: https://github.com/icereed/paperless-gpt?tab=readme-ov-file#configuration

  • 2 months later...

There is an update for paperless-gpt, now adding Mistral API as option. I updated the container, but there are no options to fill the new variables in the template. Is it possible to update this?

Edited by MrNossi

  • 7 months later...

Posting the steps I took to successfully use the "GLM-OCR" model for OCR duties in paperless-gpt with an ollama backend.

1) Make sure you get glm-ocr from ollama.com. I'm using the "bf16" version, not the "q8_0", since VRAM is not an issue for me. I would avoid using any other .gguf, as an example from huggingface.com, with the ollama backend as I think those other versions were assembled to be used on other backends and could be slightly different.

2) I'm using context lengths between 8k and 64k with this model. Anything smaller or larger than that results in poor output or the model crashing for me.

8k = 8192

16k = 16384

32k = 32768

64k = 65536

3) I changed the ocr prompt in paperless-gpt to the following:

-----------------------------------------------------------------------------------------------

Extract text from the provided document/image while preserving all original formatting and layout. Use precision mode on. Output a text file with these requirements:

• Maintain original spacing between paragraphs and lines.

• Preserve special characters and mathematical notation (use ... for equations).

• Do not add any metadata, comments, or modifications beyond what appears in the document.

• For unclear text, leave as-is without correction.

• Ignore non-text elements unless they contain readable text.

Output should be a clean text file containing only the document's content.

-----------------------------------------------------------------------------------------------

The model has 3 modes, text recognition, table recognition, and figure recognition. I believe the output that we would be striving for is text since that is what the "Content" field in paperless-ngx supports. As an example, when you get html output, I believe the model is trying to recreate a table via table recognition mode.

The above ocr prompt, for me, does a good job of steering the model to focus on text extraction mode.

I've successfully used this on images and pdfs.

you are right! Thank you. Was experimenting with long financial tables and went for table-mode, that was the reason for the <td> spam likely. And randomly increasing the context window solved that "Asian signs" problem.

However, since for my use case I need the model to also describe graphical elements from and pictures time to time I went back to qwen3-vl. That one still produces random md garbage on rotated tables with almost no content (zeros) and is stupid slow (like 90min per 30 pages on hard documents), but still the best i could achieve by now. Would be awesome if you could configure paperless-gpt to fire multiple requests against the LLM at once. I am running vllm-rocm as back end.

  • 1 month later...

According to the paperless GPT GitHub Page there should bei Support for docling as the OCR Provider. There are noch Options for the docling Server url in the template and I was Not successful adding it myself. Any suggestions or experience on how getting it to work?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.