Migz93 Posted March 4, 2021 Share Posted March 4, 2021 Hi All, I'm hoping you can help me. I've recently upgraded to Unraid 6.9.0 and since then I've been having regular occurrences of Unraid crashing/restarting. I'm not sure if its actually crashing but I do know it's randomly rebooting. As far as I know I've changed nothing between 6.8.3 which was completely stable and 6.9.0 which has had several restarts since. My box runs 1 Windows 10 VM with a GPU passed through for rarely used remote gaming and then the usual stack of media related containers: Plexs Lidar Sonarrs Radarrs Bazarr Jackett NZBGet Qbittorrent AMD (automated music downloader) Telegraf HDDTemp Intel-GPU-Tools Unpackerr Tautulli Tdarr Unraid-API My installed plugins I've seen are listed in the diagnostic info so I'll save posting those. I haven't been doing anything out of the ordinary at the time of crashes, I've been fairly hands off Unraid the last few days. Most of these crashes I'm either in bed or just gaming (on my own PC not the VM), Unraid will just be running the same stack of VM/Containers outlined above that it has been doing for a while. I thought about rolling back to 6.8.3 so checked the "Update OS" page. I notice it shows 6.9.0-beta29 as my previous OS. I think this is because I used the old Nvidia plugin to revert back to 6.8.3 as I was 100% on 6.8.3 a few days ago. Seeing this though reminded me that I did try a beta version (I assume that beta 29) and had the same issue then as well. At the time I tried the following: Memtest for around 12 hours (not long enough I know but see next point) Swapped all RAM between my two Unraid boxes, original box continued to restart even with completely different RAM. Fiddled with XMP (Or the Intel term) I think I found this to cause it to crash less with it off? So it's been left off and from checking the current reported speeds in Unraid I believe it's still of. Bought new PSU, originally had a Corsair 650, changed to Corsair 750. I think eventually I gave up and moved back to 6.8.3 and put it down to beta issues but now it's stable I'm having the same problems. Since the latest crashes the only thing I've tried was to keep "Enable VMs" off. I'd noticed after one of the crashes that it had been forcibly set to no so thought maybe that was the cause but it's still crashing since. I updated to 6.9.0 (Not RC) 02/03/2020 - 15:09:19 Combing through syslog & my healthcheck notifications these are the times Unraid restarted unexpectedly: 04/03/2020 - 00:54 04/03/2020 - 00:26 03/03/2020 - 23:55 03/03/2020 - 21:01 03/03/2020 - 18:41 03/03/2020 - 18:26 02/03/2020 - 23:26 02/03/2020 - 20:31 02/03/2020 - 18:53 I've attached syslog files that are outputted to a 2nd Unraid server. As far as I can see they don't show much/anything at the time of restarts. There are lots of sshd lines littering the syslog, these are from my 2nd Unraid box checking if SSH is still active as part of my healthchecks. My apologies if it takes a while to sift through the syslog because of them. I've also attached the diagnostics.zip file. Hardware is: CPU: Intel i7-9700K (No Overclock) Motherboard: ASUSTeK COMPUTER INC. - TUF Z390M-PRO GAMING RAM: 3x Corsair 16GB DDR4 2133Mhz. CMK16GX4M1D3000C16 GPU: GTX 1650 Super. IOMMU group is separated as i pass through this GPU to a Windows 10 VM. LAN: 1GB Motherboard Lan Port 1GB USB To Ethernet Adapter (This one https://www.amazon.co.uk/gp/product/B003EDY97A/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1) They are meant to be bonded together but I just noticed that the USB NIC isn't part of it so will fix that soon. PSU: Corsair RM750i Storage: LSI SAS 9211-8i 8-port 6Gb/s PCI-E (This one https://www.ebay.co.uk/itm/LSI-SAS-9211-8i-8-port-6Gb-s-PCI-E-Internal-HBA-Both-Brackets-IT-MODE-P20/133048746300) 2x 14TB WesternDigital HDD 2x 12TB WesternDigital HDD 2x 10TB WesternDigital HDD 2x 8TB WesternDigital HDD 1x 8TB Seagate HDD 1TB Sabrent NVME 16GB Sandisk Cruzer Blade USB for Unraid OS No parity configured. I've googled for a few hours, found general threads around crashing on older versions mainly related to RAM or PSU which I hope I've ruled both out. I've been eagerly checking the latest threads to see if anyone else is having the same issue with 6.9.0 but it seems it's just me so thought I'd best just raise a thread myself. Please let me know if you need any more info & let me know if there's anything you want me to try or if the cause is something really obvious that I've missed. Thank you in advance. gdunraid-diagnostics-20210304-1401.zip Syslogs.zip 2 Quote Link to comment
JorgeB Posted March 4, 2021 Share Posted March 4, 2021 Unfortunately nothing in the logs I can see about the crashes, this usually points more to a hardware problem, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Migz93 Posted March 7, 2021 Author Share Posted March 7, 2021 Thanks for having a look Jorge. I will get it into safemode ASAP, just need to setup my 2nd box to serve my plex and see if it continues to crash. Quote Link to comment
doubley Posted March 8, 2021 Share Posted March 8, 2021 (edited) I wanted to add I'm having the exact same experience as you, but haven't had a chance to diagnose hardware issues. Unraid 6.8.3 is stable as a rock for me, Unraid 6.9 the trouble maker. I first had this issue on 6.9 RC2, figured it was a RC-related issue, and rolled back. Now that I'm on 6.9 stable, still the same thing. I've rolled back to 6.8.3 again as it's seeming like this is a software issue of some kind. I wonder if this is somehow related to ICH's Nvidia Driver plugin? That seems to be a commonality between your system and mine - both using GPUs. Ryzen 3950x 64GB 3000Mhz GTX 1650 EVGA 750w Gold Edited March 8, 2021 by doubley Quote Link to comment
Migz93 Posted March 8, 2021 Author Share Posted March 8, 2021 Hmmm, that's interesting. Noticed we both have 1650 variant cards, could be something or could be nothing. Although checking now, i don't have the ICH Nvidia Driver plugin, i did have the original LinuxServer one but as i'm not using the GPU at the unraid level i've just removed it. Will see how my crashes go, if i get another one I'll try removing the GPU completely and if it crashes again i should have Plex & related tools setup on my 2nd box and i can try safe mode. Quote Link to comment
ChadwickTheCrab Posted March 8, 2021 Share Posted March 8, 2021 I am also having a terrible experience switching to 6.9.0. I had 200 something days of uptime and now every morning my server is unresponsive. I have no Nvidia card, no VMs. Just docker containers. Quote Link to comment
a_bomb Posted March 8, 2021 Share Posted March 8, 2021 I am having the same issue. I have a 1050 in my server. Though it seems to be fine in 6.8.3. Quote Link to comment
doubley Posted March 9, 2021 Share Posted March 9, 2021 22 hours ago, Migz93 said: Hmmm, that's interesting. Noticed we both have 1650 variant cards, could be something or could be nothing. Although checking now, i don't have the ICH Nvidia Driver plugin, i did have the original LinuxServer one but as i'm not using the GPU at the unraid level i've just removed it. Will see how my crashes go, if i get another one I'll try removing the GPU completely and if it crashes again i should have Plex & related tools setup on my 2nd box and i can try safe mode. Cool - let me know how it goes. I have my server at a buddy's house ~6 hours away, so it's not as easy for me to swap hardware in and out. Very interested to see the results. Quote Link to comment
Migz93 Posted March 9, 2021 Author Share Posted March 9, 2021 I did indeed get another crash since, I saw there was a new unraid release today so I tried that but still another crash since. I went to try safe mode but then found most of my disks don't show because I assume it's not loading the drivers for my LSI card so for now I've reverted to normal boot but with Docker engine turned off. Will see how that goes. Will also see if theres a way for me to just load the drivers for my LSI card so i can use safe mode but still access plex on a 2nd box using the files on the crashing one. 1 Quote Link to comment
eqjunkie829 Posted March 10, 2021 Share Posted March 10, 2021 (edited) I have started getting crashes/unresponsive since upgrading from 6.9 RC2 to 6.9.0. I have Quadro P2000 card doing transcoding for Plex but it was working fine before the upgrade. I checked the IPMI display prior to restarting the server and the only thing its showing that appears relevant is kernel panic. Ive rolled back to RC2 and will see if im still getting daily crashes. **additional info- I have a custom IP address set for plex container so I can utilize my 2 1gig bonded nics (balance-alb). Some searching has indicated custom address on docker network may be causing problems. Anyone else have it setup this way and still having issues? Edited March 10, 2021 by eqjunkie829 Quote Link to comment
Tristankin Posted March 11, 2021 Share Posted March 11, 2021 I just want to add to the chorus. I have an intel based system that was rock solid on 6.8.3 with a -30 voltage offset. Since upgrading to both 6.9.0 and 6.9.1 I the system seems to hang every 24 hours or so. I notice the web interface stops responding and also no response to keyboard inputs. I have attached the diagnostic report but the syslog seems to be replaced each boot making it quite difficult to see what the issue is. firefly-diagnostics-20210312-0036.zip Quote Link to comment
b0rgi85 Posted March 11, 2021 Share Posted March 11, 2021 I got a Intel based setup, too. No NVIDIA Card and just running Docker-Containers. The system is doing randomly reboots and wants to do a parity check after rebooting. When I cancel the parity check after a few minutes, the system reboots. Hope there will be help ASAP. Here are my logs: b0rgis-unraid-diagnostics-20210311-1458.zip Quote Link to comment
Migz93 Posted March 11, 2021 Author Share Posted March 11, 2021 Just a quick update, with docker engine stopped on my main box I so far haven't had a reboot since, 1 day 22 hours uptime which is the longest it's gone. Although will wait till at least a week uptime before re-enabling docker engine, seeing if it starts restarting again and work out what container is doing it. I should also mention there's a mix of people having their server hang and become completely unresponsive until rebooted and people who have their server just restart randomly by itself but excluding the reboot part the server is acting fine. My issue is the latter one, my server is completely "fine" beforehand, randomly reboots and then comes back up "fine" by itself and continues working. It doesn't hang and I don't have to powercycle it for it to come back. Quote Link to comment
a_bomb Posted March 12, 2021 Share Posted March 12, 2021 I am in the random restart by itself group with 6.9 and 6.9.1 Quote Link to comment
b0rgi85 Posted March 13, 2021 Share Posted March 13, 2021 On 3/11/2021 at 3:09 PM, b0rgi85 said: I got a Intel based setup, too. No NVIDIA Card and just running Docker-Containers. The system is doing randomly reboots and wants to do a parity check after rebooting. When I cancel the parity check after a few minutes, the system reboots. Hope there will be help ASAP. Here are my logs: b0rgis-unraid-diagnostics-20210311-1458.zip 90.01 kB · 1 download I downgraded to 6.9.0 and the system is running longer then one day. But I get the restarts when I start to stream something through Plex. Is the problem related to the transcoding? Quote Link to comment
Qubix1 Posted March 14, 2021 Share Posted March 14, 2021 I am also having this random crash / hang issue. Was rock solid stable for months with my i7 9700k and Asus Z390-P Prime. Since the 6.9 and then 6.9.1 update, been having crashes out of the blue for no apparent reason. Quote Link to comment
doubley Posted March 14, 2021 Share Posted March 14, 2021 I hope this gets some attention from @limetech. Seems to be a common issue. Anything we can do to help diagnose? 2 Quote Link to comment
Qubix1 Posted March 14, 2021 Share Posted March 14, 2021 Server restarted 45 minutes after a restart, reverted back to 6.9.0 for now. Quote Link to comment
Tristankin Posted March 14, 2021 Share Posted March 14, 2021 (edited) OK, update, I have shifted some ports around and moved pihole from a secondary IP on my ethernet interface to everything sharing a single ip (pihole was 192.168.1.9, everything else 192.168.1.10). So far the system has been up 1 day and 3 hours. Networking could be the issue. Not sure hangs/restart states are actually that different, might be just how individual systems deal with the freeze. Does look like it might be tied to networking though? Was doing it on 6.9.0 and 6.9.1, still on 6.9.1 EDIT: Scratch that, just went down again, 28hr uptime. This is getting boring..... I have turned on USB save of syslog so hopefully something appears in there but from previous reports I don't have a lot of hope. Edited March 14, 2021 by Tristankin Quote Link to comment
JorgeB Posted March 14, 2021 Share Posted March 14, 2021 Anyone having issues using custom a IP address for docker(s)? That's a known issue and it can crash Unraid, more info below: Quote Link to comment
Tristankin Posted March 14, 2021 Share Posted March 14, 2021 (edited) 1 hour ago, JorgeB said: Anyone having issues using custom a IP address for docker(s)? That's a known issue and it can crash Unraid, more info below: I removed the custom IP from pihole which was perfectly fine in 6.8.3 and tried removing in 6.9.1 to potentially fix the issue but the server still ended up hung. There is some potentially weird stuff happening with some of the bridges in the syslog but I really am not sure what I am meant to be looking for.. Mar 14 18:33:13 Firefly root: Starting NTP daemon: /usr/sbin/ntpd -g -u ntp:ntp Mar 14 18:38:52 Firefly ntpd[23095]: kernel reports TIME_ERROR: 0x2041: Clock Unsynchronized Mar 14 19:53:32 Firefly ool www[27271]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config Mar 14 19:53:34 Firefly rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="28603" x-info="https://www.rsyslog.com"] start Mar 14 19:54:29 Firefly kernel: veth2d9cb7a: renamed from eth0 Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state Mar 14 19:54:29 Firefly kernel: device veth4b15139 left promiscuous mode Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered blocking state Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered disabled state Mar 14 19:54:29 Firefly kernel: device vethe4c338c entered promiscuous mode Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered blocking state Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered forwarding state Mar 14 19:54:29 Firefly kernel: eth0: renamed from veth5b308b3 Mar 14 19:54:29 Firefly kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe4c338c: link becomes ready Mar 14 19:55:44 Firefly ool www[28995]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config Mar 14 19:55:46 Firefly rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="30204" x-info="https://www.rsyslog.com"] start Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered disabled state Mar 14 19:57:57 Firefly kernel: device veth33796e7 entered promiscuous mode Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered forwarding state Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered disabled state Mar 14 19:57:57 Firefly kernel: eth0: renamed from veth7755757 Mar 14 19:57:57 Firefly kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth33796e7: link becomes ready Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered forwarding state my current docker loadout and config Edited March 14, 2021 by Tristankin Quote Link to comment
JorgeB Posted March 14, 2021 Share Posted March 14, 2021 47 minutes ago, Tristankin said: There is some potentially weird stuff happening with some of the bridges in the syslog Those are normal, syslog server might help if it catches anything. Quote Link to comment
Tristankin Posted March 14, 2021 Share Posted March 14, 2021 Yeah, I will be checking in tomorrow I guess when it goes down again. Quote Link to comment
Tristankin Posted March 14, 2021 Share Posted March 14, 2021 Well, we got (un) lucky, went down again, I rebooted at 11:02 log is pretty bare? syslog Quote Link to comment
JorgeB Posted March 14, 2021 Share Posted March 14, 2021 11 minutes ago, Tristankin said: log is pretty bare? Yep, unfortunately there's nothing about the crash. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.