rob_robot Posted September 30, 2020 Share Posted September 30, 2020 (edited) This guide is based on the Samba WIKI Spotlight with Elasticsearch backend: https://wiki.samba.org/index.php/Spotlight_with_Elasticsearch_Backend The goal of this project is to use the Mac finder to search SMB shares from Mac clients. The provided solutions gives us an index based full text search, something that I've been waiting for a long time. Recently added extensions in SAMBA finally made this possible. To begin with I want to say that I'm nether an UNRAID nor docker expert, so please forgive me if there are better ways to solve this, but my solution seems to be working for now: To realise this we need firstly the lastest beta of Unraid (6.9.0-beta25 at time of writing) as we need SAMBA 4.12 to use this feature. This revision of SAMBA is only shipped with unraid 4.12, therefore we need to install the beta first. We as well need an Elasticsearch docker container that will work as the search backend. Then an FSCrawler docker that will crawl the data on a regular basis and feed the results to Elasticsearch that will then create the index. Lastly, enable the SAMBA settings for Spotlight search support. The high-level interaction of the tools looks like this: FSCrawler <-------------- DATA directory (SMB share) | | (sends data periodically and tracks changes in data directory) | Elasticsearch --------------> index <---------- SAMBA (4.12) <--------- Finder Spotlight search Steps: 1.) install Elasticsearch I used 7.9.1 from community applications 2.) Install Elasticsearch ingest plugin to search PDF and doc as described here: Download the user scripts plug-in and define the script as follows: #!/bin/bash # execute command inside container docker exec -i elasticsearch /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch ingest-attachment 3.) Install FSCrawler If you go to Settings in the Community Applications and allow additional search results from DockerHub, you can install a version of FSCrawler (I used toto1319/fscrawler, version 2.7): https://hub.docker.com/r/toto1310/fscrawler/ In the template, you need to set the config and data directories. The data directory mount point in FSCrawler needs to match the real mount point in unraid as this path is written into the Elasticsearch index later on, and then needs to be valid for SAMBA to read it. I used /mnt/user/ to be able to search all shares later on. To start the docker, the following post argument needs to be added (turn on advanced mode in the template): Post Arguments: fscrawler job_name --restart The option "--restart" causes a full re-index of the whole share. This option is only needed for the first execution of the crawler, later on this option can be removed so that the crawler only monitors the data directory for changes and feeds these into the Elasticsearch index. After the first run, FSCrawler creates a _settings.yaml file under /mnt/user/appdata/fscrawler/config/job_name/_settings.yaml This file needs to be edited. I have the following content. Please change the IP for your Elasticsearch interface and add the excludes that you do not want to be crawled. The URL needs to match your mount point as this will serve as the "root" directory. --- name: "job_name" fs: url: "/mnt/user" update_rate: "15m" excludes: - "*/~*" - "/appdata/*" - "/domains/*" - "/isos/*" json_support: false filename_as_id: false add_filesize: true remove_deleted: true add_as_inner_object: false store_source: false index_content: true attributes_support: false raw_metadata: false xml_support: false index_folders: true lang_detect: false continue_on_error: false ocr: language: "eng" enabled: false pdf_strategy: "ocr_and_text" follow_symlinks: false elasticsearch: nodes: - url: "http://192.168.xxx.xxx:9200" bulk_size: 100 flush_interval: "5s" byte_size: "10mb" FSCrawler should now start crawling the data and create 2 indices (one for the folders and one for the files) under: /mnt/user/appdata/elasticsearch/data/nodes/0/indices For more information on FSCrawler, have a look at FSCrawler documentation https://hub.docker.com/r/toto1310/fscrawler/ 4.) Configure SAMBA to enable spotlight. I have inserted this in the unraid Settings > SMB > SMB Extras section: We need to add parameters in the global and individual share section. To do this you can add to the Samba extra configuration file the following. Please replace share with your share name: [global] # Settings to enable spotlight search spotlight backend = elasticsearch elasticsearch:address = 192.168.xxx.xxx elasticsearch:port = 9200 elasticsearch:use tls = 0 #enable spotlight search in share [share] path = /mnt/user/share spotlight = yes Restart SAMBA (or the server) . 5.) Enjoy searching in Finder with Spotlight (share needs to be selected in finder). 6.) Background information: Spotlight is accessing the Index with specific search queries. SAMBA has for this purpose a mapping file that translates Elasticsearch attributes to the Spotlight queries. I have not changed this mapping file, but it can be found here for reference: /usr/share/samba/mdssvc/elasticsearch_mappings.json There is also another mapping file that FSCrawler uses when creating the Elasticsearch index. This mapping can be found here if Elasticsearch 7.x is used. Also this mapping file was not modified by me: /mnt/user/appdata/fscrawler/config/_default/7/_settings.json 7.) Testing: List Elasticsearch indices on server (replace localhost with server IP): curl http://localhost:9200/_aliases?pretty=true List all content of index job_name_folder curl -H 'Content-Type: application/json' -X GET http://192.168.xxx.xxx:9200/job_name_folder/_search?pretty List all content of index job_name curl -H 'Content-Type: application/json' -X GET http://192.168.xxx.xxx:9200/job_name/_search?pretty Test if Samba search is working: (replace your user name with username), IP address and select a search string mdfind -d=8 --user=username 192.168.xxx.xxx share 'kMDItemTextContent=="searchstring"' 8.) References: Samba 4.12 release notes: https://www.samba.org/samba/history/samba-4.12.0.html Samba mdfind https://www.samba.org/samba/docs/4.12/man-html/mdfind.1.html fscrawler docker package: https://hub.docker.com/r/toto1310/fscrawler Edited September 30, 2020 by rob_robot Quote Link to comment
rob_robot Posted September 30, 2020 Author Share Posted September 30, 2020 reserved Quote Link to comment
CuFk Posted October 21, 2020 Share Posted October 21, 2020 Does this help speeding up folder browsing as well? We're having serious issues whilst browsning folders containing large amounts of hi-res photos etc. Quote Link to comment
Toskache Posted November 25, 2020 Share Posted November 25, 2020 Nice Tutorial, thank you! Unfortunately I have problems to setup fscrawler: The docker configuration: Starting the docker shows the following output in the docker-log: 16:46:01,584 [32mINFO [m [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [226.5mb/3.4gb=6.38%], RAM [4.7gb/15.6gb=30.16%], Swap [0b/0b=0.0]. 16:46:01,611 [33mWARN [m [f.p.e.c.f.c.FsCrawlerCli] job [job_name] does not exist 16:46:01,611 [32mINFO [m [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)? Exception in thread "main" java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:862) at java.util.Scanner.next(Scanner.java:1371) at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:225) And indeed, there is no directory "job_name": root@nas:/mnt/user/appdata/fscrawler/config# ls -lah total 0 drwxrwxrwx 1 nobody users 16 Nov 25 16:44 ./ drwxrwxrwx 1 root root 12 Nov 25 09:32 ../ drwxr-xr-x 1 root root 4 Nov 25 09:33 _default/ Creating that directory manualy has no effect. Any ideas? Quote Link to comment
HagenS Posted March 7, 2021 Share Posted March 7, 2021 Same error here for me when starting fscrawler. Any tips or is this tutorial orphaned? Quote Link to comment
ungeek67 Posted March 7, 2021 Share Posted March 7, 2021 (edited) 5 hours ago, HagenS said: Same error here for me when starting fscrawler. Any tips or is this tutorial orphaned? I got past this back in 6.9.0 rc2 but then stuck later on, was actually waiting for 6.9 to start this thread back up! Make sure your settings json file is placed correctly, the container is starting with the arguments for "job_name" but no folder and/or config exists and you can't interactively send the "y" I used the job name "unraid_data_spotlight" so my config looks like: fscrawler docker post arguments (--restart commented out after first run) fscrawler unraid_data_spotlight #--restart /mnt/user/appdata/fscaler/unraid_data_spotlight/_settings.json --- name: "unraid_data_spotlight" fs: url: "/mnt/user" update_rate: "15m" ... I think thats all I did to get it moving awhile back, let me know if that doesn't help and I'll blow away my current setup and do it again and actually take notes this time. My issue is that all the tests are passing, including mdfind which is returning the expected results. But when I then try to use spotlight on macOS Big Sur I get either nothing. mdutil -s /Volumes/media returns "Server search enabled" as expected. Adding "elasticsearch:index = unraid_data_spotlight" to the samba extra config under [global] hasn't helped either. Anyone get beyond this? Edited March 7, 2021 by ungeek67 Quote Link to comment
tarzan Posted March 9, 2021 Share Posted March 9, 2021 Im super excited about this.. hope it will work when I update to 6.9 in a few weeks :) Quote Link to comment
rob_robot Posted March 10, 2021 Author Share Posted March 10, 2021 On 3/7/2021 at 10:01 AM, HagenS said: Same error here for me when starting fscrawler. Any tips or is this tutorial orphaned? It is a bit like a chicken and egg problem. The file should get created after the first run, but after this time has passed I don't remember if I manually added the file or if I copied it from inside the docker (so not mapping the config file at all and then copying the file outside of the docker via docker command). One way would be to manually create the file: 1.) Go to /mnt/user/appdata/fscrawler/config/ and create the folder "job_name" (permissions 999, root / root) 2.) Inside the new job_name folder, create a file called _settings.yaml and paste the content from my initial post. Please make sure to change the IP. address at the bottom of the file (- url). Later on there will be as well a 2nd file called _status.json, but I don't think this is needed initially. Quote Link to comment
parazit15 Posted March 10, 2021 Share Posted March 10, 2021 Hi guys, i am trying to get fscrawler to index my shares, but after a couple of hours, i always get this error and the docker stops. # A fatal error has been detected by the Java Runtime Environment: Anybody has an idea, what the error could be? Quote Link to comment
rob_robot Posted March 22, 2021 Author Share Posted March 22, 2021 I didn't encounter this issue as far as I remember. Could it be some memory size issue? Is this the only error or are the additional error messages in the log file? Quote Link to comment
ankx7 Posted May 17, 2021 Share Posted May 17, 2021 Hi! thanks in advance for the article! i've a problem: mdfind don't find anything my samba version is 4.12.3 The indexing is ok. Command: curl -H 'Content-Type: application/json' -X GET http://localhost:9200/myjob/_search?pretty give correct result but mdfind dont find! anyone similar problems? my smb.conf [global] workgroup = TESTSAMBA security = user netbios name = REDHAT8 passdb backend = tdbsam printing = cups printcap name = cups load printers = yes cups options = raw spotlight backend = elasticsearch elasticsearch:address = localhost elasticsearch:port = 9200 [testfolder] comment = folder di test path = /srv/samba/test valid users = testuser browseable = Yes read only = No spotlight = yes thanks Quote Link to comment
ecat Posted June 14, 2021 Share Posted June 14, 2021 On 5/17/2021 at 2:40 PM, ankx7 said: Hi! thanks in advance for the article! i've a problem: mdfind don't find anything my samba version is 4.12.3 The indexing is ok. Command: curl -H 'Content-Type: application/json' -X GET http://localhost:9200/myjob/_search?pretty give correct result but mdfind dont find! anyone similar problems? my smb.conf [global] workgroup = TESTSAMBA security = user netbios name = REDHAT8 passdb backend = tdbsam printing = cups printcap name = cups load printers = yes cups options = raw spotlight backend = elasticsearch elasticsearch:address = localhost elasticsearch:port = 9200 [testfolder] comment = folder di test path = /srv/samba/test valid users = testuser browseable = Yes read only = No spotlight = yes thanks Jobname must be the same to share name, so change your 'myjob' to 'testfolder' Quote Link to comment
Bwx_Flo Posted October 20, 2021 Share Posted October 20, 2021 Hi guys, I did everything according to this tutorial and crawling seems to be working. But: I get an Error in the ES Docker Log, claiming: {"type": "server", "timestamp": "2021-10-20T07:37:59,873+02:00", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "docker-cluster", "node.name": "911179884b65", "message": "master not discovered or elected yet, an election requires a node with id [tJO09zgcQSOaJvZadHyMXQ], have discovered [{911179884b65}{tJO09zgcQSOaJvZadHyMXQ}{FzZZFb76SSu6dgTTsjkWJw}{172.17.0.2}{172.17.0.2:9300}{dilmrt}{ml.machine_memory=67047288832, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] which is a quorum; discovery will continue using [] from hosts providers and [{911179884b65}{tJO09zgcQSOaJvZadHyMXQ}{FzZZFb76SSu6dgTTsjkWJw}{172.17.0.2}{172.17.0.2:9300}{dilmrt}{ml.machine_memory=67047288832, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 3, last-accepted version 48 in term 3", "cluster.uuid": "JqY853ThR_uPDSn3mURqJA", "node.id": "tJO09zgcQSOaJvZadHyMXQ" } I guess what ES is saying is it is looking for a "master", but can't find a node with that ID. Question is: why would it search for that? How do I configure that and where does that specific ID come from? The same time that error occured I also get errors in FSCrawler, saying the directory he just crawled for about half an hour suddenly does not exist anymore. 04:24:27,517 [33mWARN [m [f.p.e.c.f.FsParserAbstract] Error while crawling /mnt/user/public: /mnt/user/public doesn't exists. Can anybody make sense of this and maybe even help me fix it? Thanks a lot for this tutorial and your help in advance! Greetings from Germany, Flo Quote Link to comment
Ralph456 Posted February 28, 2022 Share Posted February 28, 2022 On 10/21/2020 at 8:15 AM, CuFk said: Does this help speeding up folder browsing as well? We're having serious issues whilst browsning folders containing large amounts of hi-res photos etc. GOOD QUESTION, SO will this help browsing large raw files on mac os client? Quote Link to comment
ovcrash Posted August 23 Share Posted August 23 Hi, Did you guys get this working? Elasticsearch is working, indexing and feeding the data into elasticsearch is working. The samba configuration doesn't work. On my mac spotlight doesn't use the index, even if i use mdfind it doesn't find anything. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.