Unraid crashing inconsitently


Recommended Posts

I haven't made many posts because unraid has been near flawless for many years, but lately I've been having issues that I can't pinpoint and unfortunately I have not done a great job at tracking my changes. Most of the changes include docker updates and downloads.

 

Unraid version: 6.6.7

Plugins: CA Auto updates, CA Backup, Community Applications, FCP, Preclear Disks, Server Layout, Statistics, Unassigned Devices

Dockers: binhex deluge, jacket, unificontroller, duckdns, letsencrypt, radar, openvpnas, plex, sabnzbd, sonar, ombi, lazylibrarian

VMs: ubuntu, WinServer 2016 essentials

*letsencrypt, lazylibrarian, ubuntu, ws2016 are stopped most of the time and not running

Hardware: Supermicro - X10SDV-TLN4F

Intel Xeon CPU D-1541 

NVM/IOMMU Enabled

32GB ECC Corsair Memory

 

Issue: Somewhat randomly (might take a day, might take several), I lose all connectivity to the WebGUI, Dockers, SSH, seemingly everything.  What continues to work is the local terminal (via mouse/key). I can't pinpoint what I'm doing at the exact time but generally just watching plex or checking out the unifi controller, so I'm guessing it must be some process happening behind the scenes or potentially something with an automatic downloader.  I still can login via root on the local term but get further sometimes and other times not far at all (e.g. sometimes ifconfig will run, othertimes it won't - one of the few Linux commands I know :) 
Diagnostics never completes - just stays at collecting... for hours. 

I managed to collect a syslog a cp /var/log/syslog /boot/syslog command 


Troubleshooting:  MemTest cleared 2 passes, no recent hardware changes, no smart errors that I can tell, parity always passes - and I've ran dozens over the past couple weeks due to the hard shutdowns. Last night I thought I'd boot to SafeMode - It was going smooth until I happened to notice today while I was vpn'd to my server via unraid supervisor app on ios that the memory usage seemed to be at 76% before it crashed this time - I thought that was a bit unusual because I've never seen it reach that high, but could have been a coincidence idk.

 

I appreciate any help/advice. 

syslog.zip

Link to comment

Check that all of the Dockers and Plugins that collect data to be stored on the array have any storage area that they use assigned to one of the mount points in   /mnt  

 

IF one on them is pointing to a place other than  /mnt , then data is probably being stored on Unraid's RAM disk.  The available RAM for this type of use is quite small and bad things will usually happen when unRAID runs out of RAM.   🙄

Link to comment

Thanks for the feedback Frank.  I haven't noticed anything you mentioned out of place but I'll give it another lookover.

I did actually find FCP pointing out my two dockers didn't have the Slave option, but adjusted that to the recommended RW/Slave.  I had the parity running from yesterday and its at 81% but the CPU usage I just discovered is at 100% and isn't moving.  I ran top and the process using my entire cpu is  kworker/u32:1+events_power_efficient

Link to comment
  • 2 weeks later...
Quote

 

I thought I had this fixed but issue reoccurred last night with the same kworker process utilizing 100% of CPU.  Only thing I could do was a hard reset.

I did as suggested and everything is pointed to a /mnt directory in docker & plugin configs.

 

Here is something I've been able to catch

IMG_2124.JPG

Edited by jbeazies
Link to comment
  • 5 months later...

Hey guys,
I'm still having issues with my server becoming unresponsive to anything and stuck with a screen similar to this..

Unable to access the system at all. This is the only notable interaction, a photo of the screen.

The only commonality I've found is the Unifi-controller docker.  If I enable it then it'll crash my unraid server with the following symptoms intermittently.  When I originally observed this behavior I thought I had it narrowed down to this specific docker after many hours of troubleshooting.  At the time I thought it was because I attempted to downgrade controller version from v5.9 to the LTS branch (~v5.6).  Since then I completely removed the Unifi-controller docker with a clean install and a clean config.  

Here is my current config - same except for newer version of unraid & lts of unifi.

 

Unraid version: 6.7.2

Plugins: CA Auto updates, CA Backup, Community Applications, FCP, Preclear Disks, Server Layout, Statistics, Unassigned Devices

Dockers: binhex deluge, jacket, unificontroller, duckdns, letsencrypt, radar, openvpnas, plex, sabnzbd, sonar, ombi, lazylibrarian

Hardware: Supermicro - X10SDV-TLN4F

Intel Xeon CPU D-1541 

NVM/IOMMU Enabled

32GB ECC Corsair Memory

 

Symptoms :

-WebUI not responsive (took to long to respond, site can't be reached)

-Ping reply:  Destination host unreachable

-SSH not responsive (Connection timed out)

-SMB Share / Network name via file Explorer unable to access (Network path was not found)

 

Any other suggestions? I can't grab any logs because the system is completely hung

IMG_2623.JPG

Link to comment
5 minutes ago, jbeazies said:

Unraid version: 6.7.2

 

<<<<   SNIP >>>>

Any other suggestions? I can't grab any logs because the system is completely hung

I am not enough of a Guru to be able to figure out from that screen shot what is going on but hopefully one of the true Gurus will be able to.

 

However, you can now grab the syslog since you upgraded to 6.7.2!   Go to   Settings   >>>>  Syslog Server    and set it up to 'Mirror syslog to flash:'.  You can find out how to do this by using the built-in Unraid "Help" feature--- it is the question-mark-in-a-circle icon on the Toolbar.  

Link to comment
2 hours ago, jbeazies said:

@kingfetty - What did you do as a workaround?  I still want to use static IP for my unifi-controller docker. 

If he did what I ended up doing as documented in the thread of mine you linked, the solution was to create a VLAN for docker containers.  I would get call traces when assigning an IP address on br0, but once I created a VLAN (br0.3) and assigned IP addresses on that VLAN to the docker containers the call traces went away.  It has been well over a year since I had a macvlan-related call trace or server lockup

 

Your screenshots don't show any macvlan broadcast-related call traces, but, after a few of them occurred, my server would eventually lock up and had to be manually rebooted via the power button.

Link to comment
24 minutes ago, Hoopster said:

I would get call traces when assigning an IP address on br0

Do you use VMs which connect over br0?

 

The observation I have is that VM communication and Docker communication over the same interface (or VLAN) can have a collision once in a while, resulting in a broadcast related call trace.

 

I have separated VM and Docker communications over different VLANs.

Link to comment

VLANs are created under network settings.

For example adding VLAN 10 to eth0 becomes interface eth0.10 (or br0.10 when bridging is enabled)

In the container configuration you need to assign custom network br0.10 to make use of the VLAN network (keep in mind this is a completely separated network from eth0/br0 and your router needs to support VLANs too).

 

  • Thanks 1
Link to comment
15 minutes ago, jbeazies said:

Are there any instructions on creating and assigning VLAN to a docker container? Is it just setting VLAN on and putting the 169.x address into the custom br0 docker config?

Here's my VLAN 3 for Docker containers as defined in unRAID Network Settings (since bridging is enabled it becomes br0.3):

image.thumb.png.7a1009af7400727f53af51e750d635a3.png

 

On the router side (UniFi USG), I also created the VLAN:

image.thumb.png.6e9a9bb7be6101485ef482367eb46f24.png

 

And in the config for a Dcoker container I want to assign an address on the VLAN, I specify br0.3 as the Network Type and an IP address in the range (192.168.3.100...192.168.3.150) I assigned to that VLAN in the router:

image.thumb.png.296578540420e30be5a0a4257c4214db.png

  • Thanks 1
Link to comment

Thanks all! 
This was primarily for my Unifi controller docker. I suppose I could slowly start migrating my other dockers over, but for now those remain on a combination of bridge/host network types.

I'll monitor the status of the unifi docker and fingers cross, hope this resolves the issue. 
 

Now that I think about it, I'm wondering if this had anything to do with my original issue.  I set a static IP to my primary network via USG for the controller docker.  I also set the custom br0 within unraid docker config to that same static IP.  I wonder if this somehow caused the conflict, resulting in unraid sys locking up?
 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.