Server locking after a few days forcing unclean shutdowns


Go to solution Solved by JorgeB,

Recommended Posts

Hi all - I've been having persistent issues with my server (Unraid version 6.12.6).

 

Basically, the server becomes unresponsive after three to five days and requires a hard reset.

 

Issues/things I know

 

The issue has been going on for a while no - I have no idea whether it was brought on by something or just randomly developed.

 

Oddly, the boot order in my BIOS also gets out of whack - which means that after the reset, it sticks the USB drive with unraid as lowest priority and then the server hangs. This means I have to go into the BIOS to change it back. Note: Only the boot order appears to change - time/date etc remains consistent.

 

Fix common problems reports the maclan thing is an issue and a an issue with the NIC - but these have never caused me problems before.

 

Steps I have already taken

 

  • Replaced the Cache nmve drive / formatted everything to XFS (I thought this might the issue - but no luck)
  • Replace the CMOS battery (also didn't work)
  • Enabled system logging to try and catch the problem - but i have no idea what im looking for (not attached - will do so on request)

 

Request

 

I'm hoping that someone here can interpret my attached diagnostics and see if it is a software issue / easy fix. The hardware is all old - so it may be time for new gear. But the problem is weird and doesn't seem to be related to failing hardware as far as I can tell.

 

Thanks very much in advance

tower-diagnostics-20240128-1244.zip

Link to comment

I was having random lockup/hard resets as well. Ended up switching Settings-Docker-Docker custom network type to ipvlan to fix a bug or something. Found the solution in  google search. Might not be releated to your issue, but its a solution i tried and all my lockups went away.

Link to comment
20 minutes ago, nkissick said:

Thanks Rickholman - will try that out.

The syslog in the diagnostics is the RAM copy and only shows what happened since the reboot.   It could be worth enabling the syslog server to get a log that survives a reboot so we can see what happened prior to the reboot. The mirror to flash option is the easiest to set up, but if you are worried about excessive wear on the flash drive you can put your server’s address into the Remote Server field.

Link to comment

Hi Itimpi - thanks for that. I've been running the syslog for a while now while I try and diagnose. If you'd be able to take a quick look at the attached I'd appreciate it.

 

In the meantime, I ran CA Advisor which reported that something on the file system /dev/mapper is very full. I tried expanding the docker image which didn't seem to do anything - so not sure if this is related or just a random issue.

syslog-192.168.1.101(2).log

Link to comment
  • Solution
Jan 25 17:09:34 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Jan 25 17:09:34 Tower kernel: ? _raw_spin_unlock+0x14/0x29
Jan 25 17:09:34 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot.

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.