Unraid crashing and bringing down entire network


Recommended Posts

I am having issues with my unraid server crashing, and upon doing so, my network quits working. Alexa doesnt work, streaming stops, cant web surf. anything. I can unplug my network cable to the server and its all restored within a couple minutes. The only fix for my server however is a hard reboot. My issue is very similar to the issue reported on this older forum post: https://forums.unraid.net/topic/59142-unraid-crashing-and-taking-down-network/?_fromLogin=1

I followed the advise given in this thread, and tailed the syslog on my monitor. That didnt work for me because my screen was filled with crazy crash messages  rather than output from the syslog when it froze. So then i decided to pipe the tail to tee and have it on the screen and to a syslogcopy file. 

There are two syslogcopy files. The first (not '2') is the one that was running from the last reboot to this crash. The second file ('2') is from this recent reboot up until taking a copy to paste in here. I have 3 photos of my screen taken at unraid crash on 3 separate instances. I have also attached a diagnostic zip taken just after this most recent reboot. I am averaging a crash every two days now. I can barely complete one parity check before the next crash. I had one happen within a few hours of another. I cannot think of anything that would cause this. I have been stable for 2 months prior to this.

Please advise.

IMG_0981.jpg

IMG_0982.jpg

IMG_0984.jpg

syslogcopy.log syslogcopy2.log unraid-diagnostics-20190927-0109.zip

Link to comment

Hi there,

 

Can I start by asking what recently changed?  If things were working solid for 2+ months and all of the sudden this, one of three things come to mind:

 

1)  An update to the OS / reboot and since then this has been happening.

2)  Something was changed in the hardware/settings/OS configuration and since then this has been happening.

3)  Nothing has changed but these issues just started out of nowhere.

 

If #3 is the answer, then it is likely a hardware-specific issue (something faulty in the electronics) but if its 1 or 2, that will greatly help us in narrowing down the root cause.

Link to comment

I've seen this happening, and it happened at least to my two motherboard. I don't know if it's HW specific, but it often happens when I have VM crashing, and when I try to reboot it literally takes everyone thing down with it. 

 

I literally had to get something like https://store.resetplug.com/ (but Aliexpress version) to powercycle modem and router when unraid crashes. 

Extremely frustrating...

Link to comment

To the best of my knowledge, nothing major has changed. There have been VM updates, and Docker updates, but Ive been on 6.7.2 for a while now. I had added a new docker, but in my troubleshooting, I turned it off and disabled autostart. This still did not fix my crashing. To the best of my knowledge, I am number 3, but nothing hardware related has changed in quite a while.

Link to comment
3 hours ago, rottenpotatoes said:

To the best of my knowledge, nothing major has changed. There have been VM updates, and Docker updates, but Ive been on 6.7.2 for a while now. I had added a new docker, but in my troubleshooting, I turned it off and disabled autostart. This still did not fix my crashing. To the best of my knowledge, I am number 3, but nothing hardware related has changed in quite a while.

Interesting.  The best thing I would suggest is turning off both Docker and VMs first and see if you are able to remain stable.  Then slowly turn on things one by one until you reproduce the issue.  Then we at least will have narrowed down the specific thing that is causing you to enter this state.

 

This issue here is that we've never been able to reproduce this issue internally, but if we can get more insights on how systems end up in this state to begin with, it would be very helpful.

Link to comment

Weird, that span port is still there.

Try the following:

1. Stop docker service and VM service (see settings)

2. Change your network setting for eth0 to "enable bonding = NO"

3. It seems you don't use IPv6, select "Network protocol = ipv4 only"

 

Restart your system in safe mode and post your diagnostics again.

Link to comment

That was interesting. So I stopped my docker service and after shutting down my VMs that auto-boot, the VM service. I then went to network settings to check 2 and 3 above. It was already set to IPV4 only, so I tried to change the bonding to no. When I tried to hit save, my machine crashed again. I took a photo of my monitor and Ill attach it. I had to do a hard reset. After boot, I went back again to disable bonding. This time, when i tried to hit apply, I got an error stating that my flash drive was not mounted in R/W mode. This time I did a reboot from the Main page. Upon this boot I was able to apply the bonding setting. Then I rebooted and entered safe mode. I downloaded my diagnostics and I have attached the new file.

IMG_0994.jpg

unraid-diagnostics-20191002-1536.zip

Link to comment

You have a lot of interfaces which are not supposed to be there: erspan0, gre0, gretap0, ip_vti0, sit0 & tun10.

These have all to do with tunneling and monitoring, but nothing in stock Unraid sets up tunnel interfaces or monitoring.

I can't explain where these are coming from.

 

Check in your BIOS the setting for "Network Stack Configuration" and it is set to "disabled".

 

Link to comment

Another thing to try for you :)

Disable the Docker service and disable the VM service (see settings), so they are not running and won't start upon a reboot.

Now reboot your system and post again the diagnostics afterwards.

 

Ps. I will be offline for a couple hours due to other commitments

Edited by bonienl
Link to comment

Yeah, you need to tell your BIOS to boot in legacy mode. I don't know if your MB this stills support (you have very recent hardware).

 

I believe there is another version of the memtest available which runs under UEFI, but I don't know/have the details.

Perhaps somebody else reading this, may know this and have some pointers.

 

Link to comment
2 hours ago, bonienl said:

I believe there is another version of the memtest available which runs under UEFI, but I don't know/have the details.

Perhaps somebody else reading this, may know this and have some pointers.

Might need to make a bootable usb drive for the latest version of memtest86 https://www.memtest86.com/download.htm . I've had issues trying to boot to the built-in version that unraid has even on a board with legacy bios

  • Upvote 1
Link to comment
9 minutes ago, Mytherium said:

Might need to make a bootable usb drive for the latest version of memtest86 https://www.memtest86.com/download.htm . I've had issues trying to boot to the built-in version that unraid has even on a board with legacy bios

Just to make this clear, unraid has the latest version of memtest that is licensed for free redistribution. The new version on that website isn't available for unraid to package on the boot drive.

Link to comment

I tried to go ahead and boot into UEFI mode to run that memtest, but it fails to even try to boot. Where the Legacy mode comes up with an error message that says non-bootable, when in the boot menu I choose the UEFI mode, it immediately returns like it doesn’t even try and the screen just flickers like it refreshed. So now after doing that check box, I can’t boot into my flash drive at all.

 

And to answer squids question, no I have tried no alternative cables, router ports, or ethernet jacks on the motherboard

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.