6.9.0 Random Crashes/Restarts Since Upgrading

Migz93 · March 4, 2021

Hi All,

I'm hoping you can help me. I've recently upgraded to Unraid 6.9.0 and since then I've been having regular occurrences of Unraid crashing/restarting. I'm not sure if its actually crashing but I do know it's randomly rebooting.
As far as I know I've changed nothing between 6.8.3 which was completely stable and 6.9.0 which has had several restarts since.

My box runs 1 Windows 10 VM with a GPU passed through for rarely used remote gaming and then the usual stack of media related containers:

Plexs
Lidar
Sonarrs
Radarrs
Bazarr
Jackett
NZBGet
Qbittorrent
AMD (automated music downloader)
Telegraf
HDDTemp
Intel-GPU-Tools
Unpackerr
Tautulli
Tdarr
Unraid-API

My installed plugins I've seen are listed in the diagnostic info so I'll save posting those.

I haven't been doing anything out of the ordinary at the time of crashes, I've been fairly hands off Unraid the last few days. Most of these crashes I'm either in bed or just gaming (on my own PC not the VM), Unraid will just be running the same stack of VM/Containers outlined above that it has been doing for a while.

I thought about rolling back to 6.8.3 so checked the "Update OS" page. I notice it shows 6.9.0-beta29 as my previous OS. I think this is because I used the old Nvidia plugin to revert back to 6.8.3 as I was 100% on 6.8.3 a few days ago. Seeing this though reminded me that I did try a beta version (I assume that beta 29) and had the same issue then as well.

At the time I tried the following:

Memtest for around 12 hours (not long enough I know but see next point)
Swapped all RAM between my two Unraid boxes, original box continued to restart even with completely different RAM.
Fiddled with XMP (Or the Intel term) I think I found this to cause it to crash less with it off? So it's been left off and from checking the current reported speeds in Unraid I believe it's still of.
Bought new PSU, originally had a Corsair 650, changed to Corsair 750.

I think eventually I gave up and moved back to 6.8.3 and put it down to beta issues but now it's stable I'm having the same problems.

Since the latest crashes the only thing I've tried was to keep "Enable VMs" off. I'd noticed after one of the crashes that it had been forcibly set to no so thought maybe that was the cause but it's still crashing since.

I updated to 6.9.0 (Not RC) 02/03/2020 - 15:09:19
Combing through syslog & my healthcheck notifications these are the times Unraid restarted unexpectedly:
04/03/2020 - 00:54
04/03/2020 - 00:26
03/03/2020 - 23:55
03/03/2020 - 21:01
03/03/2020 - 18:41
03/03/2020 - 18:26
02/03/2020 - 23:26
02/03/2020 - 20:31
02/03/2020 - 18:53

I've attached syslog files that are outputted to a 2nd Unraid server. As far as I can see they don't show much/anything at the time of restarts.
There are lots of sshd lines littering the syslog, these are from my 2nd Unraid box checking if SSH is still active as part of my healthchecks. My apologies if it takes a while to sift through the syslog because of them.
I've also attached the diagnostics.zip file.

Hardware is:

CPU: Intel i7-9700K (No Overclock)
Motherboard: ASUSTeK COMPUTER INC. - TUF Z390M-PRO GAMING
RAM: 3x Corsair 16GB DDR4 2133Mhz. CMK16GX4M1D3000C16
GPU: GTX 1650 Super. IOMMU group is separated as i pass through this GPU to a Windows 10 VM.
LAN:
   1GB Motherboard Lan Port
   1GB USB To Ethernet Adapter (This one https://www.amazon.co.uk/gp/product/B003EDY97A/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1)
   They are meant to be bonded together but I just noticed that the USB NIC isn't part of it so will fix that soon.
PSU: Corsair RM750i
Storage:
   LSI SAS 9211-8i 8-port 6Gb/s PCI-E (This one https://www.ebay.co.uk/itm/LSI-SAS-9211-8i-8-port-6Gb-s-PCI-E-Internal-HBA-Both-Brackets-IT-MODE-P20/133048746300)
       2x 14TB WesternDigital HDD
       2x 12TB WesternDigital HDD
       2x 10TB WesternDigital HDD
       2x 8TB WesternDigital HDD
       1x 8TB Seagate HDD
       1TB Sabrent NVME
16GB Sandisk Cruzer Blade USB for Unraid OS
       No parity configured.

I've googled for a few hours, found general threads around crashing on older versions mainly related to RAM or PSU which I hope I've ruled both out. I've been eagerly checking the latest threads to see if anyone else is having the same issue with 6.9.0 but it seems it's just me so thought I'd best just raise a thread myself.

Please let me know if you need any more info & let me know if there's anything you want me to try or if the cause is something really obvious that I've missed. Thank you in advance.

gdunraid-diagnostics-20210304-1401.zip Syslogs.zip

JorgeB · March 4, 2021

Unfortunately nothing in the logs I can see about the crashes, this usually points more to a hardware problem, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Migz93 · March 7, 2021

Thanks for having a look Jorge.

I will get it into safemode ASAP, just need to setup my 2nd box to serve my plex and see if it continues to crash.

doubley · March 8, 2021

I wanted to add I'm having the exact same experience as you, but haven't had a chance to diagnose hardware issues.

Unraid 6.8.3 is stable as a rock for me, Unraid 6.9 the trouble maker. I first had this issue on 6.9 RC2, figured it was a RC-related issue, and rolled back. Now that I'm on 6.9 stable, still the same thing. I've rolled back to 6.8.3 again as it's seeming like this is a software issue of some kind.

I wonder if this is somehow related to ICH's Nvidia Driver plugin? That seems to be a commonality between your system and mine - both using GPUs.

Ryzen 3950x

64GB 3000Mhz

GTX 1650

EVGA 750w Gold

Edited March 8, 2021 by doubley

Migz93 · March 8, 2021

Hmmm, that's interesting. Noticed we both have 1650 variant cards, could be something or could be nothing.

Although checking now, i don't have the ICH Nvidia Driver plugin, i did have the original LinuxServer one but as i'm not using the GPU at the unraid level i've just removed it.

Will see how my crashes go, if i get another one I'll try removing the GPU completely and if it crashes again i should have Plex & related tools setup on my 2nd box and i can try safe mode.

ChadwickTheCrab · March 8, 2021

I am also having a terrible experience switching to 6.9.0. I had 200 something days of uptime and now every morning my server is unresponsive. I have no Nvidia card, no VMs. Just docker containers.

a_bomb · March 8, 2021

I am having the same issue. I have a 1050 in my server. Though it seems to be fine in 6.8.3.

doubley · March 9, 2021

22 hours ago, Migz93 said:

Hmmm, that's interesting. Noticed we both have 1650 variant cards, could be something or could be nothing.

Although checking now, i don't have the ICH Nvidia Driver plugin, i did have the original LinuxServer one but as i'm not using the GPU at the unraid level i've just removed it.

Will see how my crashes go, if i get another one I'll try removing the GPU completely and if it crashes again i should have Plex & related tools setup on my 2nd box and i can try safe mode.

Cool - let me know how it goes. I have my server at a buddy's house ~6 hours away, so it's not as easy for me to swap hardware in and out.

Very interested to see the results.

Migz93 · March 9, 2021

I did indeed get another crash since, I saw there was a new unraid release today so I tried that but still another crash since.

I went to try safe mode but then found most of my disks don't show because I assume it's not loading the drivers for my LSI card so for now I've reverted to normal boot but with Docker engine turned off.

Will see how that goes. Will also see if theres a way for me to just load the drivers for my LSI card so i can use safe mode but still access plex on a 2nd box using the files on the crashing one.

eqjunkie829 · March 10, 2021

I have started getting crashes/unresponsive since upgrading from 6.9 RC2 to 6.9.0. I have Quadro P2000 card doing transcoding for Plex but it was working fine before the upgrade. I checked the IPMI display prior to restarting the server and the only thing its showing that appears relevant is kernel panic. Ive rolled back to RC2 and will see if im still getting daily crashes.

**additional info- I have a custom IP address set for plex container so I can utilize my 2 1gig bonded nics (balance-alb). Some searching has indicated custom address on docker network may be causing problems. Anyone else have it setup this way and still having issues?

Edited March 10, 2021 by eqjunkie829

Tristankin · March 11, 2021

I just want to add to the chorus. I have an intel based system that was rock solid on 6.8.3 with a -30 voltage offset. Since upgrading to both 6.9.0 and 6.9.1 I the system seems to hang every 24 hours or so. I notice the web interface stops responding and also no response to keyboard inputs.

I have attached the diagnostic report but the syslog seems to be replaced each boot making it quite difficult to see what the issue is.

firefly-diagnostics-20210312-0036.zip

b0rgi85 · March 11, 2021

I got a Intel based setup, too.
No NVIDIA Card and just running Docker-Containers. The system is doing randomly reboots and wants to do a parity check after rebooting. When I cancel the parity check after a few minutes, the system reboots.

Hope there will be help ASAP.

Here are my logs:

b0rgis-unraid-diagnostics-20210311-1458.zip

Migz93 · March 11, 2021

Just a quick update, with docker engine stopped on my main box I so far haven't had a reboot since, 1 day 22 hours uptime which is the longest it's gone.

Although will wait till at least a week uptime before re-enabling docker engine, seeing if it starts restarting again and work out what container is doing it.

I should also mention there's a mix of people having their server hang and become completely unresponsive until rebooted and people who have their server just restart randomly by itself but excluding the reboot part the server is acting fine.

My issue is the latter one, my server is completely "fine" beforehand, randomly reboots and then comes back up "fine" by itself and continues working. It doesn't hang and I don't have to powercycle it for it to come back.

a_bomb · March 12, 2021

I am in the random restart by itself group with 6.9 and 6.9.1

b0rgi85 · March 13, 2021

On 3/11/2021 at 3:09 PM, b0rgi85 said:

I got a Intel based setup, too.
No NVIDIA Card and just running Docker-Containers. The system is doing randomly reboots and wants to do a parity check after rebooting. When I cancel the parity check after a few minutes, the system reboots.

Hope there will be help ASAP.

Here are my logs:

b0rgis-unraid-diagnostics-20210311-1458.zip 90.01 kB · 1 download

I downgraded to 6.9.0 and the system is running longer then one day.

But I get the restarts when I start to stream something through Plex.

Is the problem related to the transcoding?

Qubix1 · March 14, 2021

I am also having this random crash / hang issue. Was rock solid stable for months with my i7 9700k and Asus Z390-P Prime. Since the 6.9 and then 6.9.1 update, been having crashes out of the blue for no apparent reason.

doubley · March 14, 2021

I hope this gets some attention from @limetech. Seems to be a common issue. Anything we can do to help diagnose?

Qubix1 · March 14, 2021

Server restarted 45 minutes after a restart, reverted back to 6.9.0 for now.

Tristankin · March 14, 2021

OK, update, I have shifted some ports around and moved pihole from a secondary IP on my ethernet interface to everything sharing a single ip (pihole was

192.168.1.9, everything else 192.168.1.10). So far the system has been up 1 day and 3 hours. Networking could be the issue. Not sure hangs/restart states are actually that different, might be just how individual systems deal with the freeze. Does look like it might be tied to networking though?

Was doing it on 6.9.0 and 6.9.1, still on 6.9.1

EDIT: Scratch that, just went down again, 28hr uptime. This is getting boring.....

I have turned on USB save of syslog so hopefully something appears in there but from previous reports I don't have a lot of hope.

Edited March 14, 2021 by Tristankin

JorgeB · March 14, 2021

Anyone having issues using custom a IP address for docker(s)? That's a known issue and it can crash Unraid, more info below:

Tristankin · March 14, 2021

1 hour ago, JorgeB said:

Anyone having issues using custom a IP address for docker(s)? That's a known issue and it can crash Unraid, more info below:

I removed the custom IP from pihole which was perfectly fine in 6.8.3 and tried removing in 6.9.1 to potentially fix the issue but the server still ended up hung.

There is some potentially weird stuff happening with some of the bridges in the syslog but I really am not sure what I am meant to be looking for..

Mar 14 18:33:13 Firefly root: Starting NTP daemon:  /usr/sbin/ntpd -g -u ntp:ntp
Mar 14 18:38:52 Firefly ntpd[23095]: kernel reports TIME_ERROR: 0x2041: Clock Unsynchronized
Mar 14 19:53:32 Firefly ool www[27271]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config
Mar 14 19:53:34 Firefly rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="28603" x-info="https://www.rsyslog.com"] start
Mar 14 19:54:29 Firefly kernel: veth2d9cb7a: renamed from eth0
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state
Mar 14 19:54:29 Firefly kernel: device veth4b15139 left promiscuous mode
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(veth4b15139) entered disabled state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered blocking state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered disabled state
Mar 14 19:54:29 Firefly kernel: device vethe4c338c entered promiscuous mode
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered blocking state
Mar 14 19:54:29 Firefly kernel: br-b33c13ba4d4e: port 4(vethe4c338c) entered forwarding state
Mar 14 19:54:29 Firefly kernel: eth0: renamed from veth5b308b3
Mar 14 19:54:29 Firefly kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe4c338c: link becomes ready
Mar 14 19:55:44 Firefly ool www[28995]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config
Mar 14 19:55:46 Firefly rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="30204" x-info="https://www.rsyslog.com"] start
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered disabled state
Mar 14 19:57:57 Firefly kernel: device veth33796e7 entered promiscuous mode
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered forwarding state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered disabled state
Mar 14 19:57:57 Firefly kernel: eth0: renamed from veth7755757
Mar 14 19:57:57 Firefly kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth33796e7: link becomes ready
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered blocking state
Mar 14 19:57:57 Firefly kernel: docker0: port 4(veth33796e7) entered forwarding state

my current docker loadout and config

Edited March 14, 2021 by Tristankin

JorgeB · March 14, 2021

47 minutes ago, Tristankin said:

There is some potentially weird stuff happening with some of the bridges in the syslog

Those are normal, syslog server might help if it catches anything.

Tristankin · March 14, 2021

Yeah, I will be checking in tomorrow I guess when it goes down again.

Tristankin · March 14, 2021

Well, we got (un) lucky, went down again, I rebooted at 11:02 log is pretty bare?

syslog

JorgeB · March 14, 2021

11 minutes ago, Tristankin said:

log is pretty bare?

Yep, unfortunately there's nothing about the crash.

6.9.0 Random Crashes/Restarts Since Upgrading

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Migz93

doubley

dlandon

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation