Riddle me this - unRAID kills my home network


Recommended Posts

Background:

I have been using unRAID for few months now, I´m on Basic with couple HDDs and couple SSDs. SSDs are BTRFS with RAID0 for files and RAID1 for metadata. I run my main OS (Windows10) as VM on it as a daily driver. I also have Plex running as docker image, USB controller passthrough and coretemp plugin, that is about the extent of my "customization" from vanilla unRAID 6.1.9

 

My home network consists of LTE modem and ADSL modem connected to load balancing router to which my unRAID machine connects directly. unRAID is mapped with static IP.

 

What happens:

This only seems to happen when I am actively using the Win10 VM ie. I am doing something on the computer, I havent noticed this behavior ever when its sitting idle during nights etc.

 

Windows OS will freeze, straight up lock up reacting to nothing. When I try switching my monitor input to show unRAID terminal it wont accept keyboard input nor is the "tail -f /var/log/syslog" moving, the cursor in the bottom left is still blinking tho. Since network is completely down and it wont accept keyboard input I am not actually sure if the whole unRAID has locked up, or just the VM. As I said syslog is not reacting, but the "cursor" is still blinking.

EDIT: Considering pushing power button doesnt close the system like it should it seems whole computer has locked up, unRAID and all. Normally if I press power button unRAID powers off nicely.

 

Now what is really peculiar is that when this happens, it also takes my home network down with it. I have no frigging clue how this is even possible, but indeed when my VM locks up all other devices in the network lose their internet connection.

 

At first I was thinking that the cause must be the internet/network outage which is somehow crashing the VM, makes lot more sense, right? But no, if I reboot all my network related gear and try to get connection on other devices working, it will stay down until I disconnect unRAID computers network cable. Weird huh? I have no clue how could locked up unRAID cause the network to die, and keep it dead until I either remove the network cable or hard reset unRAID computer.

 

You might say this has been just a coincidence, but this has happened at least 3-4 times already. I think there are two different things here, which are related but need to be solved.

 

1) Why is the Windows10 VM locking up?

2) Why does the VM or the unRAID manage to kill my network?

 

For the 1) I have a slight hunch that it might be related to memory consumption. I have total of 16gb of RAM on the machine, out which 12GB is allocated to Windows10 VM and rest are free for unRAID itself. Last time this lockup happened, I was playing the new Deus Ex game, it eats fair bit of memory and when I opened my Chrome browser the lockup happened. My Chrome has like 30 open tabs, it will consume something like 6gb of RAM when fully loaded. So I have a feeling I might have run over that 12gb, but I assume Windows10 in default settings should be able to use pagefile or something so honestly I dont know why it would lockup.

 

 

For the 2) I dont have any ideas, also I cant beging to understand how the unRAID box could kill my network. It seems to nuke all active devices in way that their connection gets interrupted and they wont be able to get working connection no matter if I reboot them or the network gear, only thing that seems to help is either pulling the network cable from unRAID or hard resetting it, then rebooting network gear gives me working network on all devices.

 

 

If someone smarter than me could float ideas on whats causing this and how is the 2) even possible I would REALLY appreciate it.

 

EDIT: Attached the diagnostics zip, let me know if something else would be helpful.

megathron-diagnostics-20160824-2330.zip

Link to comment

I disabled the C-states in the bios and my server has run without locking up for 9 days so far. Previous best was about 3-4 days before I would find it locked up.

 

I'm not sure what impact this has on energy use but I expect little for me since I have some dockers that are always working. I'm going to try and get the power meter on it soon to see if it does make a difference.

 

Link to comment

what your networking scheme, also sounds like a broadcast storm.

 

I second this. Look at your network switch. Is it going ape-s**t?

 

A couple possible causes are two DHCP servers issuing IPs on the same subnet or conflicting IPs. If your VMs are using dynamically assigned IPs, and you suspend them, it's possible that the IP that was on that VM is getting reassigned and when you wake the VM there is a conflict because it doesn't ask for another new IP.

 

The next time it happens, do a 'ipconfig /release' and 'ipconfig /renew' on the VM, or just reboot it. See if that resolves the issue. If it does, stop suspending the VM and/or assign it a static IP.

Link to comment
  • 3 weeks later...

I have had something similar happen a good number of times.  I don't run any vms though.  Also at one point it was happening because of a duplicated IP address.  At this point, maybe once every 3 to 6 months i have to unplug the server to get any computers on the network to have internet access.  Usually a reboot solved the problem or several reboots and sometimes the server itself is having an unrelated crash (but not all of the time).

Link to comment
  • 2 weeks later...

Just saw this and can only respond quickly. I've had the same problem without running any VM's so the VM isn't the issue. Is your system Skylake based?

 

Sorry for the late reply, I indeed have a Skylake, 6700k.

 

Just saw this and can only respond quickly. I've had the same problem without running any VM's so the VM isn't the issue. Is your system Skylake based?

 

Sorry for the late reply, I indeed have a Skylake, 6700k.

 

You also happen to have Skylake CPU?

 

 

what your networking scheme, also sounds like a broadcast storm.

 

I second this. Look at your network switch. Is it going ape-s**t?

 

A couple possible causes are two DHCP servers issuing IPs on the same subnet or conflicting IPs. If your VMs are using dynamically assigned IPs, and you suspend them, it's possible that the IP that was on that VM is getting reassigned and when you wake the VM there is a conflict because it doesn't ask for another new IP.

 

The next time it happens, do a 'ipconfig /release' and 'ipconfig /renew' on the VM, or just reboot it. See if that resolves the issue. If it does, stop suspending the VM and/or assign it a static IP.

 

The VM freezes, ie locks up, totally.

 

Also like I said earlier, unfortunately I cannot do anything on the Unraid itself. Maybe I need to have second keyboard(ps2?) on the machine and see if I can get anything done on it. Remember network dies I can not SSH in it and Windows VM has claimed the keyboard.

But then again when this lockup happens, pushing power button doesnt close the system like it should, in normal situation unRAID closes nicely and system powers off, in this situtation nothing happens and I need to hold the powerbutton down for it to poweroff.

 

Link to comment

I disabled the C-states in the bios and my server has run without locking up for 9 days so far. Previous best was about 3-4 days before I would find it locked up.

 

I'm not sure what impact this has on energy use but I expect little for me since I have some dockers that are always working. I'm going to try and get the power meter on it soon to see if it does make a difference.

 

Are you still running without lockups?

Link to comment
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.