Jump to content

huquad

Members
  • Posts

    20
  • Joined

  • Last visited

Everything posted by huquad

  1. Stopped passing through the USB controller and instead passed through specific devices. Now my host crashed on me while actively using it (playing a game on VM). I noticed my CPU fan doing some weird stuff according to Netdata. Its possible I'm experiencing a thermal crash. I'm going to reinstall my CPU cooler with fresh paste and see where that gets me. I'm also going to make sure I don't have any wires/junk getting caught in the fans that could be stopping them. @secretstorage did you ever find a solution?
  2. Bad news bears. I just had my first crash since June 1st. I will say my trick seems to have delayed the crash. I'm going to stop passing through one of my USB controllers from my MOBO to my VMs and see where that gets me.
  3. Since I re-implemented my memory limiter in frigate, I haven't seen any more crashes (fingers crossed). It's been two weeks which is much better than I was seeing before. @secretstorage did you ever implement this limiter in your docker container? Or are you still chasing the USB angle?
  4. Has anyone tried this more recently? And specifically with windows 11? I'm getting booted into grub. Admittedly could be how I installed windows a long time ago (dual boot linux + windows). Maybe I need to do a fresh install? Before I do that, can I get confirmation someone is running this currently with 11?
  5. I have this same exact problem, and am also running frigate. Now that you mention it, it could have started when I moved my frigate instance to this machine (from another unraid server which did not show the same crashing to my memory). @secretstorage any updates from your tests? For reference, my setup is a 5800x CPU with 3080 GPU used in VM passthrough. I have 32GB of ram which I pass 16GB to VMs. I also think this is likely a ram issue. I previously had issues with frigate gobbling up ram on my other system with 64GB of ram, but no crashes. I fixed this with a ram limiter, but just noticed the ram limiter didn't make the transfer between my machines. I just added the 5GB limit to frigate ("--memory=5G" in extra parameters for those following along at home). I'll report back with my results. If you don't hear back, assume this fixed my problem. Cheers! Edit: Here are the last lines from my syslog. Make anything of them? I'm in the process of researching them myself. Jun 1 18:58:13 Falcon kernel: xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. Jun 1 18:58:13 Falcon kernel: xhci_hcd 0000:02:00.0: WARN Successful completion on short TX Jun 1 18:58:13 Falcon kernel: xhci_hcd 0000:02:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 Jun 1 18:58:13 Falcon kernel: xhci_hcd 0000:02:00.0: Looking for event-dma 00000001042888d0 trb-start 00000001042888e0 trb-end 00000001042888e0 seg-start 0000000104288000 seg-end 0000000104288ff0
  6. Is there anyway to import recipes from pre-v1 database? I am now unable to access my pre-v1 instance and just discovered my recipes aren't stored in recipes (only the images are).
  7. After some research, I stumbled upon this post: (https://serverfault.com/questions/1015547/what-causes-ssh-error-kex-exchange-identification-connection-closed-by-remote). The poster appears to have the same problem as I do, and one of the responders mentions ntopng network discovery (which I had turned on) caused this for them. Unless anyone else has feedback for me. I'm going to assume this is what the issue was.
  8. I'm seeing SSHD errors in my syslog every ~15 minutes as shown below. I'm especially concerned because these are coming from my pfsense box and I'd hate to think it has malware on it. What can I do to troubleshoot this and/or determine if it's malware or not? For reference, I do run a fair bit of extra network utilities, such as pfblocker, ntopng, darkstat, etc. Thank you for the help. One example from syslog. Exact IPs replaced. Sep 28 06:00:33 MyHostName sshd[11403]: Connection from 192.168.1.1 port 60505 on 192.168.1.2 port 22 rdomain "" Sep 28 06:00:33 MyHostName sshd[11403]: error: kex_exchange_identification: Connection closed by remote host Sep 28 06:00:33 MyHostName sshd[11403]: Connection closed by 192.168.1.1 port 60505 Sep 28 06:00:33 MyHostName sshd[11404]: Connection from 192.168.1.1 port 59414 on 192.168.1.2 port 22 rdomain "" Sep 28 06:00:33 MyHostName sshd[11404]: error: kex_exchange_identification: Connection closed by remote host Sep 28 06:00:33 MyHostName sshd[11404]: Connection closed by 192.168.1.1 port 59414
  9. I just discovered I can remove/delete the port on the docker container when I go to edit the configuration. As far as I can tell, this removes the external access while still allowing my reverse-proxy container (SWAG) access within my proxy-net. Can anyone tell me if this is the right/best/secure way to go about locking down services?
  10. Hello friends! I recently setup an Authelia container to work as a front end for most of my apps (sonarr/radarr/heimdall/etc). Ideally, I'd only have to use Authelia for authentication and then disable the login page for my apps. However, I still have access locally to the dockers through the standard: http://xxx.xxx.xxx.xxx:7878 (radarr for example) and don't want to expose my apps locally (without protection) to minimize horizontal movement should I get pwned. Is there anyway to disable the local connection, or auto-redirect to the reverse proxy link: https://radarr.mydomain.url ? Thanks for the advice!
  11. The second parity was being built when the read errors were found, so I don't think the errors would be corrected (let me know if I'm wrong). Per your advice, I'm now running a non correcting parity check. Thanks for the quick response!
  12. Hello everyone. I recently added a second parity to my setup and Unraid found 572 read errors. I believe these were written to the parity based on the check box on the main screen. Is this is true, or is that just for when you click the check parity button? How can I determine what it did with the read errors and what is best practice? I then ran an extended smart test on the drive which says it completed without errors in the test history. However, I also see some errors when I check the "SMART error log" that appear to have occurred during a smart self scan around the same time. How can I tell if the errors are from the extended self test, and what's going on here? Final question, what is the best practice for when to replace a drive? Currently, I watch the Reallocated Sector and Event Counts, and the Current Pending Sector count. I also use scrutiny to visualize all the stats quickly, but I'm not sure if this is automatically updated or if I need to scan periodically. In the past, I've found scrutiny to show outdated stats compared to Unraid, so I'd be interested to hear y'alls take on this as well. I've attached my diagnostics file below for your reference, but let me know if there's anything else I can add as well. Thanks for reading. Cheers. diagnostics-20230628-0744.zip
  13. Posting for anyone who is also having problems. I haven't had any random crashes since updating to 6.12-rc1. I believe this is due to the updated kernel which now fully supports the new 7000 processors. Fingers crossed. Update: I should have kept my mouth shut. My system just crashed after ~30 days of uptime. Seems much more stable, but still not 100% resolved. I will be upgrading to the most recent rc2 version, turning on eco-mode, and double checking my power settings per this thread:
  14. Did anyone ever find a solution to this besides a complete reinstall? If not, can I get a confirmation the fresh install works, or that you have better up time now? I just swapped from my old xeon rig for a 7950x/x670e and now having random shutdown/reboots and uptime issues. I can attach my syslog and diagnostics as well if that helps. Should I scrub either file before hand, or are they both safe to share?
  15. The Problem Hello everyone! After a recent update, I haven't been able to access binhex docker web servers along with a non-binhex grafana container. The binhex dockers are running, but the Grafana docker is not. All of them are throwing fatal errors related to not being able to write to their config directories (see error messages below). All my other dockers are fine, so I think it has to be a binhex/grafana problem. Please help! Things I've Tried: Turning docker service on/off Rebooting the individual dockers Forcing individual dockers to update Running dockers in privileged mode Errors Binhex-SABNZBD Error: Fatal error: Cannot create INI file /config/sabnzbd.ini Specify a correct file or delete this file. Binhex-Sonarr Error: [v3.0.9.1549] NzbDrone.Common.Exceptions.SonarrStartupException: Sonarr failed to start: AppFolder /config is not writable Binhex-Radarr Error: [v4.2.4.6635] NzbDrone.Common.Exceptions.RadarrStartupException: Radarr failed to start: AppFolder /config is not writable Grafana (not binhex): Failed to start grafana. error: migration failed (id = managed folder permissions alert actions repeated fixed migration): attempt to write a readonly database migration failed (id = managed folder permissions alert actions repeated fixed migration): attempt to write a readonly database
  16. Unfortunately, this did not solve my problem and Nextcloud has yet again become unstoppable. I looked through my php error logs and found this error just before nextcloud crashed: "Server reached pm.max_children setting (5)" I have since added these lines to my www2.conf file: pm = ondemand pm.max_children = 300 pm.process_idle_timeout = 30s pm.max_requests = 500 Update 1: This did resolve my php errors. However, nextcloud still crashed and caused the WebGUI to also later crash after a period of time (30minutes). After some MORE digging, I found this error (with my domain redacted for privacy) happened just before the crash: 2021/07/17 16:36:05 [error] 414#414: *301015 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 172.18.0.2, server: _, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "nextcloud.mydomain.com", referrer: "https://nextcloud.mydomain.com" However, this error seems to happen every 5 minutes and might just be a symptom of the problem. I found it might be possible I have a RAM allowance issue. This would also somewhat explain the webgui crashing. I have reduced the nextcloud logging from debug (0) to info (1) to see if thats the issue as well (tons of logs with debug enabled). At this point, nextcloud seems to randomly crash after a couple weeks at which point a reboot resolves the situation. I will report back here if I find a solution and would advise anyone reading this to do the same. Cheers!
  17. I'm currently having this problem. After 6 or so days my nextcloud docker hangs and is unkillable (short of a dirty shutdown). I was already using redis, but have updated my config file and php-local.ini files in a couple places where you had lines that I was missing. I also made the two shares RW/shared. Here are the lines I didn't have: config: 'filesystem_check_changes' => 1, 'memcache.local' => '\\OC\\Memcache\\APCu', (from redis) 'memcache.distributed' => '\\OC\\Memcache\\Redis', 'password' => 'THE SAME PASSWORD USED IN THE CONTAINER FOR REDIS_HOST_PASSWORD', 'filelocking.enabled' => 'true', 'tempdirectory' => '/tmp/nextcloudtemp', php-local.ini: upload_tmp_dir = /tmp/php/ session.save_handler = redis session.save_path = "tcp://SAME_IP_FOR_REDIS_THAT_I_USED_IN_CONTAINER:REDIS_PORT?auth=REDIS_PASSWORD" However, nextcloud was throwing server errors that prevented me from logging in. I narrowed it down to these three lines of code between the two files: config: 'password' => 'THE SAME PASSWORD USED IN THE CONTAINER FOR REDIS_HOST_PASSWORD', (I put my plaintext password here, probably not ideal) php-local.ini: session.save_handler = redis session.save_path = "tcp://SAME_IP_FOR_REDIS_THAT_I_USED_IN_CONTAINER:REDIS_PORT?auth=REDIS_PASSWORD" (I put my plaintext password here, probably not ideal) I'm not sure why errors were being thrown, but for now I have omitted them. I'm going to try this config and see if its stable, but does anyone have any idea why the server errors would have been thrown from these lines?
×
×
  • Create New...