Unraid OS stuck


3dee

Recommended Posts

Hello everyone,

 

today and last week I checked my array in the morning and Unraid was kind of stuck, today for the second time. In the Webinterface Docker and Dashboard didn't load at all, the Main tab showed all my devices but nothing under "Array Operation" (like reboot or shutdown). None of my docker apps were responding. I had to cold start the system to get it back running. I don't know where I should start searching for issues.

 

Before the shutdown I was able to pull diagnostics via SSH / WinSCP. They are untouched except for the mover logs I removed.

 

 

My CPU is an Intel Xeon E5-2630L V4 ES (Engineering Sample). All the other hardware information should be in the diagnostics.

 

 

I hope you can help me find the reason for my Unraid OS "crashes".

 

Thanks!

serverpc-diagnostics-20220113-0939.zip serverpc-diagnostics-20220120-0912.zip

Edited by 3dee
Link to comment

This appears to be the start of it

Jan 20 04:29:00 ServerPC kernel: BTRFS critical (device sdj1): corrupt leaf: root=2 block=3314483200 slot=95, unexpected item end, have 428459034 expect 13148

 

Are you running ECC memory?  I would start investigating by downloading memtest (and setting up a separate boot stick) from https://www.memtest86.com/ (the up to date versions will catch ECC errors, but due to licencing restrictions can't be included with the base OS)

 

This can also have been caused by filling up the cache pool to 100% (btrfs does not respond well in that circumstance)

  • Thanks 1
Link to comment

Yes, the RAM is REG ECC.

 

I will do memtest.

 

How can I prevent the cache going full? It actually was 99% some times. I'm using rTorrent with pre allocation of space but it seems to ignore the 30GB limit of minimum free space I set up for my media share. Files usually aren't larger than 5GB but never larger than 30GB.

Link to comment

One thing is to set up the shares you're using for downloads to be cache-prefer and not cache-only, and to also make sure the cache floor limit is set appropriately so the system knows when to overflow to the array.  (And also don't directly reference /mnt/cache in any path mappings but always use /mnt/user/... so that the rules are obeyed

 

You might also want to have a look at the system event log (bios) to see if anything noteworthy is in there as to the lockups

Link to comment
Jan 13 06:47:17 ServerPC kernel: macvlan_broadcast+0x10e/0x13c [macvlan]
Jan 13 06:47:17 ServerPC kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan]

 

Macvlan call traces are usually the result of having dockers with a custom IP address, upgrading to v6.10 and switching to ipvlan might fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info.

https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/

See also here:

https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/

  • Thanks 1
Link to comment

Yes, all my dockers are runnning on br0 with a fixed IP address. Are those macvlan call traces "bad"? Should I change all my containers to ipvlan?

 

 

Edit: I should read the links before asking :) I will check that. Thanks!

Edited by 3dee
Link to comment
  • 2 weeks later...

Hey Guys,

 

so I upgraded to version 6.10 and changed the network type to ipvlan. None of my docker containers was able to connect to the internet, only local network was working.

 

So I changed back to macvlan and disabled vlan tagging in the network settings. Macvlan traces still showed up every then and now and last night my server crashed completely (kernel panic I guess).

 

Is there a way I can get ipvlan working with my br0 docker containers? Or can I use macvlan with br0 containers without getting trace errors? I really don't need vlan tagging that much on my server.

Link to comment

So I just changed it again from everthing working macvlan setting to ipvlan and re-enabled docker. Nothing else changed.

 

The issues are back - to name some of them:

 

binhex-teamspeak - Server not reachable through WAN IP, only LAN

binhex-rtorrentvpn - No torrent conntects to its Tracker (Tracker: [Couldn't resolve host name])

swag - None of the proxy sites is reachable via its subdomain (but the containers are local reachable)

pi-hole - "Maximum number of concurrent DNS queries reached (max: 150)"

 

the webinterfaces of containers take like 10-20 seconds to connect for the first time (like owncloud or pi-hole)

 

 

It seems like none of the containers can connect to the internet, but pinging a server from the container shell still works:

 

Spoiler

root@67a0fc2da785:/# ping google.com
PING google.com (172.217.168.206): 56 data bytes
64 bytes from 172.217.168.206: seq=0 ttl=113 time=10.286 ms
64 bytes from 172.217.168.206: seq=1 ttl=113 time=9.862 ms
64 bytes from 172.217.168.206: seq=2 ttl=113 time=9.996 ms
64 bytes from 172.217.168.206: seq=3 ttl=113 time=9.887 ms
64 bytes from 172.217.168.206: seq=4 ttl=113 time=9.919 ms
64 bytes from 172.217.168.206: seq=5 ttl=113 time=9.940 ms
64 bytes from 172.217.168.206: seq=6 ttl=113 time=9.972 ms
64 bytes from 172.217.168.206: seq=7 ttl=113 time=9.854 ms
64 bytes from 172.217.168.206: seq=8 ttl=113 time=9.853 ms
64 bytes from 172.217.168.206: seq=9 ttl=113 time=9.938 ms
^C
--- google.com ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 9.853/9.950/10.286 ms
root@67a0fc2da785:/#

 

 

I will switch back to macvlan for now, but I would love to get ipvlan running since it may not crash my server. Diagnostics are attached.

 

serverpc-diagnostics-20220202-1914.zip

Link to comment
  • 3 months later...
18 hours ago, hqueiroga said:

@3dee have you found the solution for your problem? I have same problem as you and IPVLAN doesn’t work to me whereas MACVLAN works but eventually crashes my server :(

This is so frustrating…

 

Nope :( I don't use VLANs anymore with my server. Switch port is now mode access. It's stable since then.

Link to comment
6 hours ago, 3dee said:

 

Nope :( I don't use VLANs anymore with my server. Switch port is now mode access. It's stable since then.

This is new to me… so if the mode access do you get to run different t IP addresses? Do you have multiple network cards on your server? Sorry the questions… trying to find an alternative solution here… otherwise have no way to use multiple IP addresses here… if I choose macvlan works perfectly but eventually crashes my server… if I choose ipvlan doesn’t work :(

 

Link to comment

Switchport Mode access means that I'm not using VLAN tagging. I'm only using one ethernet port with a single IP address for the server. Trunk makes my server crash like every week, ipvlan does not work for me neither and I'm not willing to change every docker container from br0 to something else.

Link to comment
3 hours ago, hqueiroga said:

Thanks for your reply @3dee. Think I need same solution as you... which switch do you use for that?

Not sure what solution you mean. It's the "easiest" way of connecting the server to the network. The switch model is not relevant for this, you could also just use an unmanaged switch or your standard home router.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.