Full Text Search


ChrisW1337

Recommended Posts

Hello,

 

I have several scanned PDFs in my unraid. Till lately i used a desktop search (copernic and dtsearch) to find the documents i need.

 

I have tried all related unraid-dockers, like paperless, ...  But I want to keep the PDFs in my directory order and dont want them in database or even forced to put the PDFs via webinerface into the searchable engine.

 

Till now  I didnt found a full-text search engine with a decent web UI where i can search and open/click on the link of my document

 

I tried elasticsearch with kibana but got an operation to complex error on layer 8.

 

Does someone has a tip what i can try or where to look - it would be great if it were free or opensource.

 

 

Thanks

Chris

Link to comment
  • 1 month later...
  • 10 months later...
  • 4 months later...

I have this question too... I've looked at

  • SOLR
  • ElasticSearch

so yes I too would like to hear of an alternate / simpler solution?

 

In the meanwhile - my current approach follows ( I welcome suggestions or others to join me on this journey)

 

I think I got closer with SOLR, and in any case I found the elastic search site unclear about whether I would need to buy a license to run ElasticSearch on a server.

 

So currently I'm pursuing SOLR.

 

The current issue is:

solr 19:28:30.53 INFO  ==> ** Starting solr setup **
solr 19:28:30.55 INFO  ==> Validating settings in SOLR_* env vars...
solr 19:28:30.55 INFO  ==> Initializing Solr ...
realpath: /bitnami/solr/data: No such file or directory
solr 19:28:30.56 INFO  ==> Configuring file permissions for Solr
mkdir: cannot create directory '/bitnami/solr': Permission denied

 

searching through the various links I found

https://hub.docker.com/r/bitnami/solr/

 

TL:DR I think I need to either tinker with the docker image https://github.com/bitnami/containers/blob/main/bitnami/solr/docker-compose.yml or find out how to

"mount a volume in the desired location and setting the environment variable with the customized value (as it is pointed above, the default value is data_driven_schema_configs)"

 

so going to investigate the "data_driven_schema_configs" as I think this would persist even if the container were modified by the maintainer

Link to comment
  • 6 months later...

Is there any solution, meanwhile?

I still have no way to search my files (fulltext,pdf, ...) on the server other then with copernic or totalcommander

 

i dont get elasticsearch running though.

 

and i dont want to import my files into an dms (although i am broken at this point and might do so, just for the sake of searching)

I didnt get yacy to crawl my documents locally.

Link to comment

What I have tried now: 
* Install the container rdestop with the ubuntu label: lscr.io/linuxserver/rdesktop:ubuntu-mate
* gave it a path to my pdf on the server

* then i have installed JAVA and DOCFETCHER into it

And now i can remotedesktop into docfetcher ...

that works, but i dont know what happens after update or reboot.
 

Link to comment
  • 3 weeks later...
On 2/28/2023 at 6:56 AM, JohnGAG said:

I think I got closer with SOLR, and in any case I found the elastic search site unclear about whether I would need to buy a license to run ElasticSearch on a server.

 

So currently I'm pursuing SOLR.

 

The current issue is:

solr 19:28:30.53 INFO  ==> ** Starting solr setup **
solr 19:28:30.55 INFO  ==> Validating settings in SOLR_* env vars...
solr 19:28:30.55 INFO  ==> Initializing Solr ...
realpath: /bitnami/solr/data: No such file or directory
solr 19:28:30.56 INFO  ==> Configuring file permissions for Solr
mkdir: cannot create directory '/bitnami/solr': Permission denied

 

In case someone else comes across this thread looking for help with this error.

 

I was able to get over this error by manually creating that folder and setting permissions.

 

Roughly, from the Unraid console:

cd /mnt/user/appdata/solr

mkdir solr

chown nobody:users solr

 

 

After this, the container starts up and the WebGUI comes up.

Now to work out how to use thing thing...

Link to comment
  • 6 months later...

I am not closer to a solution, since i startedthis thread.

Intense work on Paperless, but since i have tons of existing PDF in a neat directory structure and naming, Paperless ist of no big help.

 

Are there new options? like AI training tools or such. Still wondering why I am the only one (almost) with this problem.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.