Jump to content

[Support] Collectathon - Hoarder


Recommended Posts

I noticed that CA has 2 of these. One is the actual Hoarder app and the other is a "worker".  Is the Hoarder worker required along side Hoarder container on a single server?  On the app github page - i don't see any mention of the worker. So not sure what it even does. 

Link to comment
13 minutes ago, nirav said:

I noticed that CA has 2 of these. One is the actual Hoarder app and the other is a "worker".  Is the Hoarder worker required along side Hoarder container on a single server?  On the app github page - i don't see any mention of the worker. So not sure what it even does.

The hoarder app will save and store your bookmarks. The worker app will then fetch additional information. It will fetch the web page for archival with browserless and automatically create tags with chat/ollama. 

  • Like 1
Link to comment
1 hour ago, drmetro said:

How does this app work and what are its usecases ?

The main use case for Hoarder is as a "read-it-later" app. You can save interesting articles, tools, or other content they find while browsing on your phone or desktop, and then access and read that content later across devices. The developer built Hoarder as a self-hosted alternative to bookmark managers like Pocket, with inspiration from open-source projects like memos and mymind. They wanted a bookmark manager they could host themselves, with features like link previews and automatic AI-based tagging. You can find more info in the GitHub README.

Link to comment

Hi i installed the "Server" Haorder-App at 192.168.2.207 and the Worker on 192.168.2.211 - the Installation in the same .207 was not possible.

Browserless installed and its running, meilisearch installed and its running.

But:

How does the App know, that the worker is an alternative Adress .211 ?

 

It looks so:

image.thumb.png.408c58b041e677d431ef6bd6ca2f7d66.png

 

 

Link to comment
Posted (edited)
17 hours ago, Trustwbc said:

Hi i installed the "Server" Haorder-App at 192.168.2.207 and the Worker on 192.168.2.211 - the Installation in the same .207 was not possible.

Browserless installed and its running, meilisearch installed and its running.

But:

How does the App know, that the worker is an alternative Adress .211 ?

 

It looks so:

image.thumb.png.408c58b041e677d431ef6bd6ca2f7d66.png

 

 

They access the same database and communicate through redis. The data dir of the worker needs to be the same as the web container. 

Edited by Collectathon
Link to comment

Are there any plans to have only a single image with the worker and the app like an All in One package ?

 

Also does Hoarder meanwhile supports import of booksmarks from Chrome etc ?

 

I really like the Idea of giving tags KI based but im still not convinced why i should move from Linkwarden if the rest is still under heavy development

Link to comment
4 hours ago, schubdog said:

Are there any plans to have only a single image with the worker and the app like an All in One package ?

 

Also does Hoarder meanwhile supports import of booksmarks from Chrome etc ?

 

I really like the Idea of giving tags KI based but im still not convinced why i should move from Linkwarden if the rest is still under heavy development

There will be an AIO template if the developer creates an AIO docker image. From what I can tell, this is on the roadmap, but it's not a priority. 

 

Yes, Hoarder supports importing bookmarks from Chrome.

https://docs.hoarder.app/import/

 

You can run both simultaneously to try it out. I'm not trying to convince you to switch, I just like the software so I made a template. 

Link to comment
Posted (edited)
17 hours ago, luisalrp said:

Has anyone managed to get it working using LocalAI? What models in LocalAI and variables in Hoarder? Thanks!

I haven't used LocalAI before but in theory, you can add/modify the variables below as it should be a drop-in replacement for OpenAI. Other who have actually used it may be able to provide more support.

OPENAI_BASE_URL = http://localhost:8080
OPENAI_API_KEY = sk-XXXXXXXXXXXXXXXXXXXX
INFERENCE_TEXT_MODEL = gpt-4

 

Edited by Collectathon
Link to comment
On 5/24/2024 at 7:15 AM, Collectathon said:

I haven't used LocalAI before but in theory, you can add/modify the variables below as it should be a drop-in replacement for OpenAI. Other who have actually used it may be able to provide more support.

OPENAI_BASE_URL = http://localhost:8080
OPENAI_API_KEY = sk-XXXXXXXXXXXXXXXXXXXX
INFERENCE_TEXT_MODEL = gpt-4

 

 

Hi, Hoarder's developer here.

 

I highly recommend against the `gpt-4` model. `gpt-4` costs $30 / 1M tokens which is extremely expensive. Hoarder defaults to `gpt-3.5-turbo-0125` for text which costs $0.5 / 1M tokens (notice the huge difference!). If you want `gpt-4` level of inference, go for `gpt-4o` which is $5 / 1M tokens still much cheaper than the `gpt-4` model.

Link to comment
  • 2 weeks later...

I am trying to get Hoarder installed on Unraid, but I keep running into issues. I have Hoarder, Hoarder-workers, Redis, and Browserless installed. I figured I can add the search feature after I get it working. When I try to add a bookmark, it seems like nothing happens. If I refresh the page the bookmark shows up, but no description or image, and no tags. I have tried with Ollama and OpenAI, neither seem to work. I am getting an error in the logs, that looks like it is unable to connect to Redis:

Error: connect EHOSTUNREACH 192.168.1.10:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1605:16)
    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
  errno: -113,
  code: 'EHOSTUNREACH',
  syscall: 'connect',
  address: '192.168.1.10',
  port: 6379
}

I am running the Redis docker also on unraid. It is running in Host mode, 192.168.1.10 is the IP of my Unraid box, and it is running on the default port 6379. In Hoarder, I have tried the hostname redis, http://192.168.1.10 and 192.168.1.10. The only thing that changes is when I use "redis" as the hostname I get a different error:

Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
    at GetAddrInfoReqWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
  errno: -3008,
  code: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'redis'
}

I know Redis is working, because I am running Paperless-ngx which uses Redis, and it works just fine. Anyone have any ideas what my problem could be?

Link to comment
3 hours ago, millercb said:

 

Error: connect EHOSTUNREACH 192.168.1.10:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1605:16)
    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
  errno: -113,
  code: 'EHOSTUNREACH',
  syscall: 'connect',
  address: '192.168.1.10',
  port: 6379
}

 

Error: getaddrinfo ENOTFOUND redis
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
    at GetAddrInfoReqWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
  errno: -3008,
  code: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'redis'
}

 

Hoarder can't reach Redis. Can you please send a screenshot of your Hoarder config.

Link to comment
7 hours ago, Collectathon said:

Thanks for that. Are you able to connect to Redis if it is in Bridge mode instead of Host? Also, change the Redis Host address to just the IP address without the 'http://'.

 

Thanks for the help. I switched Redis to bridge mode, and removed the 'http://', but I'm still getting the ehostunreach error. One other thing I just thought of, I am running AdGuard as a DNS server on my network, would that have an effect on it? Other than that, I'm not sure.

Link to comment
8 hours ago, millercb said:

Thanks for the help. I switched Redis to bridge mode, and removed the 'http://', but I'm still getting the ehostunreach error. One other thing I just thought of, I am running AdGuard as a DNS server on my network, would that have an effect on it? Other than that, I'm not sure.

I believe I have found the issue.

 

Normally Docker does not allow Docker containers to directly access the same subnet as the one used by the host. You can allow this under Settings → Docker by changing Host access to custom networks from disabled to enabled.

 

The other option is to also move your Redis container to your br0 network.

Link to comment
On 6/8/2024 at 5:21 PM, Collectathon said:

I believe I have found the issue.

 

Normally Docker does not allow Docker containers to directly access the same subnet as the one used by the host. You can allow this under Settings → Docker by changing Host access to custom networks from disabled to enabled.

 

The other option is to also move your Redis container to your br0 network.

Thank you!! I finally have everything working.

Edited by millercb
Link to comment
  • 4 weeks later...
Posted (edited)

My worker is always failing and i can not really figure out why:

 

2024-07-08T13:04:51.625Z info: Workers version: 0.15.0
2024-07-08T13:04:51.627Z info: [Crawler] Browser connect on demand is enabled, won't proactively start the browser instance
2024-07-08T13:04:51.627Z info: Starting crawler worker ...
2024-07-08T13:04:51.628Z info: Starting inference worker ...
2024-07-08T13:04:51.628Z info: Starting search indexing worker ...
2024-07-08T13:05:16.988Z debug: [search][7] Search is not configured, nothing to do now
2024-07-08T13:05:16.990Z info: [search][7] Completed successfully
2024-07-08T13:05:39.245Z info: [Crawler][15] Will crawl "https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P" for link with id "o81u01cxioxflow8fgim977a"
2024-07-08T13:05:39.245Z info: [Crawler][15] Attempting to determine the content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P
2024-07-08T13:05:39.382Z info: [Crawler][15] Content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P is "text/html"
2024-07-08T13:05:39.383Z info: [Crawler] Connecting to existing browser websocket address: ws://172.19.0.19:3000?token=ak1_0b27720fed5e83f0e765_a073391f3ce890473649
2024-07-08T13:05:39.463Z error: [Crawler][15] Crawling job failed: [object Object]
2024-07-08T13:05:40.501Z info: [Crawler][15] Will crawl "https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P" for link with id "o81u01cxioxflow8fgim977a"
2024-07-08T13:05:40.501Z info: [Crawler][15] Attempting to determine the content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P
2024-07-08T13:05:40.609Z info: [Crawler][15] Content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P is "text/html"
2024-07-08T13:05:40.609Z info: [Crawler] Connecting to existing browser websocket address: ws://172.19.0.19:3000?token=ak1_0b27720fed5e83f0e765_a073391f3ce890473649
2024-07-08T13:05:40.612Z error: [Crawler][15] Crawling job failed: [object Object]
2024-07-08T13:05:42.707Z info: [Crawler][15] Will crawl "https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P" for link with id "o81u01cxioxflow8fgim977a"
2024-07-08T13:05:42.707Z info: [Crawler][15] Attempting to determine the content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P
2024-07-08T13:05:42.791Z info: [Crawler][15] Content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P is "text/html"
2024-07-08T13:05:42.791Z info: [Crawler] Connecting to existing browser websocket address: ws://172.19.0.19:3000?token=ak1_0b27720fed5e83f0e765_a073391f3ce890473649
2024-07-08T13:05:42.798Z error: [Crawler][15] Crawling job failed: [object Object]
2024-07-08T13:05:46.822Z info: [Crawler][15] Will crawl "https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P" for link with id "o81u01cxioxflow8fgim977a"
2024-07-08T13:05:46.823Z info: [Crawler][15] Attempting to determine the content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P
2024-07-08T13:05:46.902Z info: [Crawler][15] Content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P is "text/html"
2024-07-08T13:05:46.902Z info: [Crawler] Connecting to existing browser websocket address: ws://172.19.0.19:3000?token=ak1_0b27720fed5e83f0e765_a073391f3ce890473649
2024-07-08T13:05:46.909Z error: [Crawler][15] Crawling job failed: [object Object]
2024-07-08T13:05:54.956Z info: [Crawler][15] Will crawl "https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P" for link with id "o81u01cxioxflow8fgim977a"
2024-07-08T13:05:54.957Z info: [Crawler][15] Attempting to determine the content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P
2024-07-08T13:05:55.106Z info: [Crawler][15] Content-type for the url https://www.amazon.de/Homo-Sapiens-Nikolaus-Geyrhalter/dp/B0765NRP1P is "text/html"
2024-07-08T13:05:55.106Z info: [Crawler] Connecting to existing browser websocket address: ws://172.19.0.19:3000?token=ak1_0b27720fed5e83f0e765_a073391f3ce890473649
2024-07-08T13:05:55.114Z error: [Crawler][15] Crawling job failed: [object Object]

 

Is the token supposed to be an api token from the hoarder webinterface right? I assumed that one, but otherwise I have no idea what else i could do. I even used the hoarder containers IP instead of the hostname i do usually between container communication.

 

Communication is definetly possible between the two containers. This are pings from the hoarder-workers to the hoarder container:

/app/apps/workers # ping 172.19.0.19
PING 172.19.0.19 (172.19.0.19): 56 data bytes
64 bytes from 172.19.0.19: seq=0 ttl=64 time=0.278 ms
64 bytes from 172.19.0.19: seq=1 ttl=64 time=0.145 ms
64 bytes from 172.19.0.19: seq=2 ttl=64 time=0.147 ms
64 bytes from 172.19.0.19: seq=3 ttl=64 time=0.151 ms
^C
--- 172.19.0.19 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.145/0.180/0.278 ms
/app/apps/workers # ping hoarder
PING hoarder (172.19.0.19): 56 data bytes
64 bytes from 172.19.0.19: seq=0 ttl=64 time=0.166 ms
64 bytes from 172.19.0.19: seq=1 ttl=64 time=0.078 ms
64 bytes from 172.19.0.19: seq=2 ttl=64 time=0.143 ms
64 bytes from 172.19.0.19: seq=3 ttl=64 time=0.162 ms
^C
--- hoarder ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.078/0.137/0.166 ms
/app/apps/workers # 

 

Edited by Greyberry
added pings
Link to comment
Posted (edited)
9 hours ago, Greyberry said:

My worker is always failing and i can not really figure out why

Is the token supposed to be an api token from the hoarder webinterface right? I assumed that one, but otherwise I have no idea what else i could do. I even used the hoarder containers IP instead of the hostname i do usually between container communication.

There are two different versions of browserless. The token you need is passed through as an env variable to the browserless container. If you are using browserless v1, you may not have/need a token. Both versions are available in CA.

Browserless-v1
- Token optional

- Example: ws://browserless:3000

 

Browserless-v2

- Token required

- Example: ws://browserless:3000?token=my-token

Edited by Collectathon
  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...