rottenpotatoes Posted September 27, 2019 Share Posted September 27, 2019 I am having issues with my unraid server crashing, and upon doing so, my network quits working. Alexa doesnt work, streaming stops, cant web surf. anything. I can unplug my network cable to the server and its all restored within a couple minutes. The only fix for my server however is a hard reboot. My issue is very similar to the issue reported on this older forum post: https://forums.unraid.net/topic/59142-unraid-crashing-and-taking-down-network/?_fromLogin=1 I followed the advise given in this thread, and tailed the syslog on my monitor. That didnt work for me because my screen was filled with crazy crash messages rather than output from the syslog when it froze. So then i decided to pipe the tail to tee and have it on the screen and to a syslogcopy file. There are two syslogcopy files. The first (not '2') is the one that was running from the last reboot to this crash. The second file ('2') is from this recent reboot up until taking a copy to paste in here. I have 3 photos of my screen taken at unraid crash on 3 separate instances. I have also attached a diagnostic zip taken just after this most recent reboot. I am averaging a crash every two days now. I can barely complete one parity check before the next crash. I had one happen within a few hours of another. I cannot think of anything that would cause this. I have been stable for 2 months prior to this. Please advise. syslogcopy.log syslogcopy2.log unraid-diagnostics-20190927-0109.zip Quote Link to comment
jonp Posted October 1, 2019 Share Posted October 1, 2019 Hi there, Can I start by asking what recently changed? If things were working solid for 2+ months and all of the sudden this, one of three things come to mind: 1) An update to the OS / reboot and since then this has been happening. 2) Something was changed in the hardware/settings/OS configuration and since then this has been happening. 3) Nothing has changed but these issues just started out of nowhere. If #3 is the answer, then it is likely a hardware-specific issue (something faulty in the electronics) but if its 1 or 2, that will greatly help us in narrowing down the root cause. Quote Link to comment
takkkkkkk Posted October 1, 2019 Share Posted October 1, 2019 I've seen this happening, and it happened at least to my two motherboard. I don't know if it's HW specific, but it often happens when I have VM crashing, and when I try to reboot it literally takes everyone thing down with it. I literally had to get something like https://store.resetplug.com/ (but Aliexpress version) to powercycle modem and router when unraid crashes. Extremely frustrating... Quote Link to comment
rottenpotatoes Posted October 1, 2019 Author Share Posted October 1, 2019 To the best of my knowledge, nothing major has changed. There have been VM updates, and Docker updates, but Ive been on 6.7.2 for a while now. I had added a new docker, but in my troubleshooting, I turned it off and disabled autostart. This still did not fix my crashing. To the best of my knowledge, I am number 3, but nothing hardware related has changed in quite a while. Quote Link to comment
jonp Posted October 1, 2019 Share Posted October 1, 2019 3 hours ago, rottenpotatoes said: To the best of my knowledge, nothing major has changed. There have been VM updates, and Docker updates, but Ive been on 6.7.2 for a while now. I had added a new docker, but in my troubleshooting, I turned it off and disabled autostart. This still did not fix my crashing. To the best of my knowledge, I am number 3, but nothing hardware related has changed in quite a while. Interesting. The best thing I would suggest is turning off both Docker and VMs first and see if you are able to remain stable. Then slowly turn on things one by one until you reproduce the issue. Then we at least will have narrowed down the specific thing that is causing you to enter this state. This issue here is that we've never been able to reproduce this issue internally, but if we can get more insights on how systems end up in this state to begin with, it would be very helpful. Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 Something has created a SPAN (monitoring) port on your system, which may lead to strange network behaviour. Most likely one of the packages in the NerdPack plugin. Uninstall this NerdPack plugin and reboot your system. Please provide diagnostics afterwards. Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 I removed the nerdpack plugin, and then grabed a fresh diagnostic zip. I have attached it. unraid-diagnostics-20191002-1205.zip Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 Please reboot your system and take diagnostics afterwards. Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 I bounced my server, and then grabed a fresh diagnostic zip. I have attached it. unraid-diagnostics-20191002-1442.zip Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 Weird, that span port is still there. Try the following: 1. Stop docker service and VM service (see settings) 2. Change your network setting for eth0 to "enable bonding = NO" 3. It seems you don't use IPv6, select "Network protocol = ipv4 only" Restart your system in safe mode and post your diagnostics again. Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 That was interesting. So I stopped my docker service and after shutting down my VMs that auto-boot, the VM service. I then went to network settings to check 2 and 3 above. It was already set to IPV4 only, so I tried to change the bonding to no. When I tried to hit save, my machine crashed again. I took a photo of my monitor and Ill attach it. I had to do a hard reset. After boot, I went back again to disable bonding. This time, when i tried to hit apply, I got an error stating that my flash drive was not mounted in R/W mode. This time I did a reboot from the Main page. Upon this boot I was able to apply the bonding setting. Then I rebooted and entered safe mode. I downloaded my diagnostics and I have attached the new file. unraid-diagnostics-20191002-1536.zip Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 Are you using a custom kernel or the standard Unraid image? Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 I am using the standard unraid image on the stable channel. Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 You have a lot of interfaces which are not supposed to be there: erspan0, gre0, gretap0, ip_vti0, sit0 & tun10. These have all to do with tunneling and monitoring, but nothing in stock Unraid sets up tunnel interfaces or monitoring. I can't explain where these are coming from. Check in your BIOS the setting for "Network Stack Configuration" and it is set to "disabled". Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 Although I was almost certain that I already had this set properly, I rebooted and went in to verify that the network stack was disabled in the BIOS. It is, and I attached a screenshot just to be thorough. I am now booted back into safe mode awaiting further instructions. Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 (edited) Another thing to try for you Disable the Docker service and disable the VM service (see settings), so they are not running and won't start upon a reboot. Now reboot your system and post again the diagnostics afterwards. Ps. I will be offline for a couple hours due to other commitments Edited October 2, 2019 by bonienl Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 I rebooted into regular mode and I have attached this diagnostic file. unraid-diagnostics-20191002-1648.zip Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 In several of your screenshots I see "page fault" errors, which may indicate something flaky with the memory. Did you run memory tests? Hint: you need to disable UEFI boot mode, see Main -> Boot Device ->Syslinux Configuration -> Permit UEFI boot mode, to run these memory tests Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 No I have not ran any previously. I just disabled that check box on my drive and rebooted. Is there a specific test I need to run? Quote Link to comment
rottenpotatoes Posted October 2, 2019 Author Share Posted October 2, 2019 Actually, when I try to boot in non UEFI mode it says the device is not bootable. I checked the bios and made sure that the uefi is set to 'other os', not 'windows uefi', and changed the compatibility support module to 'uefi and legacy oprom ' Quote Link to comment
bonienl Posted October 2, 2019 Share Posted October 2, 2019 Yeah, you need to tell your BIOS to boot in legacy mode. I don't know if your MB this stills support (you have very recent hardware). I believe there is another version of the memtest available which runs under UEFI, but I don't know/have the details. Perhaps somebody else reading this, may know this and have some pointers. Quote Link to comment
Mytherium Posted October 2, 2019 Share Posted October 2, 2019 2 hours ago, bonienl said: I believe there is another version of the memtest available which runs under UEFI, but I don't know/have the details. Perhaps somebody else reading this, may know this and have some pointers. Might need to make a bootable usb drive for the latest version of memtest86 https://www.memtest86.com/download.htm . I've had issues trying to boot to the built-in version that unraid has even on a board with legacy bios 1 Quote Link to comment
JonathanM Posted October 2, 2019 Share Posted October 2, 2019 9 minutes ago, Mytherium said: Might need to make a bootable usb drive for the latest version of memtest86 https://www.memtest86.com/download.htm . I've had issues trying to boot to the built-in version that unraid has even on a board with legacy bios Just to make this clear, unraid has the latest version of memtest that is licensed for free redistribution. The new version on that website isn't available for unraid to package on the boot drive. Quote Link to comment
Squid Posted October 3, 2019 Share Posted October 3, 2019 And just to rule it out, have you tried a different cable / port on the switch or mobo. Bad ethernet ports / cables *can* under certain circumstances take down an entire network Quote Link to comment
rottenpotatoes Posted October 3, 2019 Author Share Posted October 3, 2019 I tried to go ahead and boot into UEFI mode to run that memtest, but it fails to even try to boot. Where the Legacy mode comes up with an error message that says non-bootable, when in the boot menu I choose the UEFI mode, it immediately returns like it doesn’t even try and the screen just flickers like it refreshed. So now after doing that check box, I can’t boot into my flash drive at all. And to answer squids question, no I have tried no alternative cables, router ports, or ethernet jacks on the motherboard Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.