Server locking after a few days forcing unclean shutdowns

nkissick · January 28

Hi all - I've been having persistent issues with my server (Unraid version 6.12.6).

Basically, the server becomes unresponsive after three to five days and requires a hard reset.

Issues/things I know

The issue has been going on for a while no - I have no idea whether it was brought on by something or just randomly developed.

Oddly, the boot order in my BIOS also gets out of whack - which means that after the reset, it sticks the USB drive with unraid as lowest priority and then the server hangs. This means I have to go into the BIOS to change it back. Note: Only the boot order appears to change - time/date etc remains consistent.

Fix common problems reports the maclan thing is an issue and a an issue with the NIC - but these have never caused me problems before.

Steps I have already taken

Replaced the Cache nmve drive / formatted everything to XFS (I thought this might the issue - but no luck)
Replace the CMOS battery (also didn't work)
Enabled system logging to try and catch the problem - but i have no idea what im looking for (not attached - will do so on request)

Request

I'm hoping that someone here can interpret my attached diagnostics and see if it is a software issue / easy fix. The hardware is all old - so it may be time for new gear. But the problem is weird and doesn't seem to be related to failing hardware as far as I can tell.

Thanks very much in advance

tower-diagnostics-20240128-1244.zip

rickholman · January 28

I was having random lockup/hard resets as well. Ended up switching Settings-Docker-Docker custom network type to ipvlan to fix a bug or something. Found the solution in google search. Might not be releated to your issue, but its a solution i tried and all my lockups went away.

nkissick · January 28

Thanks Rickholman - will try that out.

itimpi · January 28

20 minutes ago, nkissick said:

Thanks Rickholman - will try that out.

The syslog in the diagnostics is the RAM copy and only shows what happened since the reboot. It could be worth enabling the syslog server to get a log that survives a reboot so we can see what happened prior to the reboot. The mirror to flash option is the easiest to set up, but if you are worried about excessive wear on the flash drive you can put your server’s address into the Remote Server field.

nkissick · January 28

Hi Itimpi - thanks for that. I've been running the syslog for a while now while I try and diagnose. If you'd be able to take a quick look at the attached I'd appreciate it.

In the meantime, I ran CA Advisor which reported that something on the file system /dev/mapper is very full. I tried expanding the docker image which didn't seem to do anything - so not sure if this is related or just a random issue.

syslog-192.168.1.101(2).log

JorgeB · January 29

Jan 25 17:09:34 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Jan 25 17:09:34 Tower kernel: ? _raw_spin_unlock+0x14/0x29
Jan 25 17:09:34 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot.

nkissick · February 9

Hi All - just wanted to say thanks to you all for replying. It was indeed the macvlan issue - shouldn't have ignored CA Advisor....

Server locking after a few days forcing unclean shutdowns

Recommended Posts

nkissick

Link to comment

rickholman

Link to comment

nkissick

Link to comment

itimpi

Link to comment

nkissick

Link to comment

JorgeB

Link to comment

nkissick

Link to comment

Join the conversation