6.9.0 Random Crashes/Restarts Since Upgrading


Recommended Posts

Hi All,

 

I'm hoping you can help me. I've recently upgraded to Unraid 6.9.0 and since then I've been having regular occurrences of Unraid crashing/restarting. I'm not sure if its actually crashing but I do know it's randomly rebooting.
As far as I know I've changed nothing between 6.8.3 which was completely stable and 6.9.0 which has had several restarts since.

 

My box runs 1 Windows 10 VM with a GPU passed through for rarely used remote gaming and then the usual stack of media related containers:

  • Plexs
  • Lidar
  • Sonarrs
  • Radarrs
  • Bazarr
  • Jackett
  • NZBGet
  • Qbittorrent
  • AMD (automated music downloader)
  • Telegraf
  • HDDTemp
  • Intel-GPU-Tools
  • Unpackerr
  • Tautulli
  • Tdarr
  • Unraid-API

 

My installed plugins I've seen are listed in the diagnostic info so I'll save posting those.

 

I haven't been doing anything out of the ordinary at the time of crashes, I've been fairly hands off Unraid the last few days. Most of these crashes I'm either in bed or just gaming (on my own PC not the VM), Unraid will just be running the same stack of VM/Containers outlined above that it has been doing for a while.

 

I thought about rolling back to 6.8.3 so checked the "Update OS" page. I notice it shows 6.9.0-beta29 as my previous OS. I think this is because I used the old Nvidia plugin to revert back to 6.8.3 as I was 100% on 6.8.3 a few days ago. Seeing this though reminded me that I did try a beta version (I assume that beta 29) and had the same issue then as well.


At the time I tried the following:

  • Memtest for around 12 hours (not long enough I know but see next point)
  • Swapped all RAM between my two Unraid boxes, original box continued to restart even with completely different RAM.
  • Fiddled with XMP (Or the Intel term) I think I found this to cause it to crash less with it off? So it's been left off and from checking the current reported speeds in Unraid I believe it's still of.
  • Bought new PSU, originally had a Corsair 650, changed to Corsair 750.

I think eventually I gave up and moved back to 6.8.3 and put it down to beta issues but now it's stable I'm having the same problems.

 

Since the latest crashes the only thing I've tried was to keep "Enable VMs" off. I'd noticed after one of the crashes that it had been forcibly set to no so thought maybe that was the cause but it's still crashing since.

 

I updated to 6.9.0 (Not RC) 02/03/2020 - 15:09:19
Combing through syslog & my healthcheck notifications these are the times Unraid restarted unexpectedly:
04/03/2020 - 00:54
04/03/2020 - 00:26
03/03/2020 - 23:55
03/03/2020 - 21:01
03/03/2020 - 18:41
03/03/2020 - 18:26
02/03/2020 - 23:26
02/03/2020 - 20:31
02/03/2020 - 18:53

 

I've attached syslog files that are outputted to a 2nd Unraid server. As far as I can see they don't show much/anything at the time of restarts.
There are lots of sshd lines littering the syslog, these are from my 2nd Unraid box checking if SSH is still active as part of my healthchecks. My apologies if it takes a while to sift through the syslog because of them.
I've also attached the diagnostics.zip file.

 

Hardware is:

  • CPU: Intel i7-9700K (No Overclock)
  • Motherboard: ASUSTeK COMPUTER INC. - TUF Z390M-PRO GAMING
  • RAM: 3x Corsair 16GB DDR4 2133Mhz. CMK16GX4M1D3000C16
  • GPU: GTX 1650 Super. IOMMU group is separated as i pass through this GPU to a Windows 10 VM.
  • LAN:
        1GB Motherboard Lan Port
        1GB USB To Ethernet Adapter (This one https://www.amazon.co.uk/gp/product/B003EDY97A/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1)
        They are meant to be bonded together but I just noticed that the USB NIC isn't part of it so will fix that soon.
  • PSU: Corsair RM750i
  • Storage:
        LSI SAS 9211-8i 8-port 6Gb/s PCI-E (This one https://www.ebay.co.uk/itm/LSI-SAS-9211-8i-8-port-6Gb-s-PCI-E-Internal-HBA-Both-Brackets-IT-MODE-P20/133048746300)
            2x 14TB WesternDigital HDD
            2x 12TB WesternDigital HDD
            2x 10TB WesternDigital HDD
            2x 8TB WesternDigital HDD
            1x 8TB Seagate HDD
            1TB Sabrent NVME
            16GB Sandisk Cruzer Blade USB for Unraid OS
            No parity configured.

        
        
I've googled for a few hours, found general threads around crashing on older versions mainly related to RAM or PSU which I hope I've ruled both out. I've been eagerly checking the latest threads to see if anyone else is having the same issue with 6.9.0 but it seems it's just me so thought I'd best just raise a thread myself.

Please let me know if you need any more info & let me know if there's anything you want me to try or if the cause is something really obvious that I've missed. Thank you in advance.

gdunraid-diagnostics-20210304-1401.zip Syslogs.zip

  • Like 2
Link to comment

Unfortunately nothing in the logs I can see about the crashes, this usually points more to a hardware problem, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment

I wanted to add I'm having the exact same experience as you, but haven't had a chance to diagnose hardware issues.

 

Unraid 6.8.3 is stable as a rock for me, Unraid 6.9 the trouble maker. I first had this issue on 6.9 RC2, figured it was a RC-related issue, and rolled back. Now that I'm on 6.9 stable, still the same thing. I've rolled back to 6.8.3 again as it's seeming like this is a software issue of some kind.

 

I wonder if this is somehow related to ICH's Nvidia Driver plugin? That seems to be a commonality between your system and mine - both using GPUs. 

 

Ryzen 3950x

64GB 3000Mhz

GTX 1650

EVGA 750w Gold

Edited by doubley
Link to comment

Hmmm, that's interesting. Noticed we both have 1650 variant cards, could be something or could be nothing.

 

Although checking now, i don't have the ICH Nvidia Driver plugin, i did have the original LinuxServer one but as i'm not using the GPU at the unraid level i've just removed it.

Will see how my crashes go, if i get another one I'll try removing the GPU completely and if it crashes again i should have Plex & related tools setup on my 2nd box and i can try safe mode.

Link to comment
22 hours ago, Migz93 said:

Hmmm, that's interesting. Noticed we both have 1650 variant cards, could be something or could be nothing.

 

Although checking now, i don't have the ICH Nvidia Driver plugin, i did have the original LinuxServer one but as i'm not using the GPU at the unraid level i've just removed it.

Will see how my crashes go, if i get another one I'll try removing the GPU completely and if it crashes again i should have Plex & related tools setup on my 2nd box and i can try safe mode.

Cool - let me know how it goes. I have my server at a buddy's house ~6 hours away, so it's not as easy for me to swap hardware in and out.

 

Very interested to see the results. 

Link to comment

I did indeed get another crash since, I saw there was a new unraid release today so I tried that but still another crash since.

I went to try safe mode but then found most of my disks don't show because I assume it's not loading the drivers for my LSI card so for now I've reverted to normal boot but with Docker engine turned off.

 

Will see how that goes. Will also see if theres a way for me to just load the drivers for my LSI card so i can use safe mode but still access plex on a 2nd box using the files on the crashing one.

  • Like 1
Link to comment

I have started getting crashes/unresponsive since upgrading from 6.9 RC2 to 6.9.0. I have Quadro P2000 card doing transcoding for Plex but it was working fine before the upgrade. I checked the IPMI display prior to restarting the server and the only thing its showing that appears relevant is kernel panic. Ive rolled back to RC2 and will see if im still getting daily crashes.

 

**additional info- I have a custom IP address set for plex container so I can utilize my 2 1gig bonded nics (balance-alb). Some searching has indicated custom address on docker network may be causing problems. Anyone else have it setup this way and still having issues?

Edited by eqjunkie829
Link to comment

I just want to add to the chorus. I have an intel based system that was rock solid on 6.8.3 with a -30 voltage offset. Since upgrading to both 6.9.0 and 6.9.1 I the system seems to hang every 24 hours or so. I notice the web interface stops responding and also no response to keyboard inputs.

I have attached the diagnostic report but the syslog seems to be replaced each boot making it quite difficult to see what the issue is.

firefly-diagnostics-20210312-0036.zip

Link to comment

Just a quick update, with docker engine stopped on my main box I so far haven't had a reboot since, 1 day 22 hours uptime which is the longest it's gone.

Although will wait till at least a week uptime before re-enabling docker engine, seeing if it starts restarting again and work out what container is doing it. 

 

I should also mention there's a mix of people having their server hang and become completely unresponsive until rebooted and people who have their server just restart randomly by itself but excluding the reboot part the server is acting fine. 

 

My issue is the latter one, my server is completely "fine" beforehand, randomly reboots and then comes back up "fine" by itself and continues working. It doesn't hang and I don't have to powercycle it for it to come back.

Link to comment
On 3/11/2021 at 3:09 PM, b0rgi85 said:

I got a Intel based setup, too.
No NVIDIA Card and just running Docker-Containers. The system is doing randomly reboots and wants to do a parity check after rebooting. When I cancel the parity check after a few minutes, the system reboots.

Hope there will be help ASAP.

 

Here are my logs:
 

b0rgis-unraid-diagnostics-20210311-1458.zip 90.01 kB · 1 download

I downgraded to 6.9.0 and the system is running longer then one day.

But I get the restarts when I start to stream something through Plex.

Is the problem related to the transcoding?

Link to comment

OK, update, I have shifted some ports around and moved pihole from a secondary IP on my ethernet interface to everything sharing a single ip (pihole was

192.168.1.9, everything else 192.168.1.10). So far the system has been up 1 day and 3 hours. Networking could be the issue. Not sure hangs/restart states are actually that different, might be just how individual systems deal with the freeze. Does look like it might be tied to networking though?

 

Was doing it on 6.9.0 and 6.9.1, still on 6.9.1

 

 

EDIT: Scratch that, just went down again, 28hr uptime. This is getting boring.....

 

I have turned on USB save of syslog so hopefully something appears in there but from previous reports I don't have a lot of hope.

Edited by Tristankin
Link to comment
1 hour ago, JorgeB said:

Anyone having issues using custom a IP address for docker(s)? That's a known issue and it can crash Unraid, more info below:

 

 

 

 

I removed the custom IP from pihole which was perfectly fine in 6.8.3 and tried removing in 6.9.1 to potentially fix the issue but the server still ended up hung.

 

There is some potentially weird stuff happening with some of the bridges in the syslog but I really am not sure what I am meant to be looking for..

 

Mar 14 18:33:13 Firefly root: Starting NTP daemon:  /usr/sbin/ntpd -g -u ntp:ntp
Mar 14 18:38:52 Firefly ntpd[23095]: kernel reports TIME_ERROR: 0x2041: Clock Unsynchronized
Mar 14 19:53:32 Firefly ool www[27271]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config
Mar 14 19:53:34 Firefly rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="28603" x-info="https://www.rsyslog.com"] start
Mar 14 19:54:29 Firefly kernel: veth2d9cb7a: renamed from eth0
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state
Mar 14 19:54:29 Firefly kernel: device veth4b15139 left promiscuous mode
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered blocking state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered disabled state
Mar 14 19:54:29 Firefly kernel: device vethe4c338c entered promiscuous mode
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered blocking state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered forwarding state
Mar 14 19:54:29 Firefly kernel: eth0: renamed from veth5b308b3
Mar 14 19:54:29 Firefly kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe4c338c: link becomes ready
Mar 14 19:55:44 Firefly ool www[28995]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config
Mar 14 19:55:46 Firefly rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="30204" x-info="https://www.rsyslog.com"] start
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered disabled state
Mar 14 19:57:57 Firefly kernel: device veth33796e7 entered promiscuous mode
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered forwarding state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered disabled state
Mar 14 19:57:57 Firefly kernel: eth0: renamed from veth7755757
Mar 14 19:57:57 Firefly kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth33796e7: link becomes ready
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered forwarding state

 

my current docker loadout and config

image.thumb.png.845abc46050aed113c4659762d14385c.png

Edited by Tristankin
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.