Intermittent hard crashes that also take my LAN down; much worse since 6.10


Recommended Posts

Several times since the 6.10 upgrade, my unraid server has crashed, and in such a way as to take my router down with it.  I'm attaching both the server's diagnostics .zip and my router's syslog (note: you can see the crashes based on when the dates in syslog.txt revert to "May 4" until the NTP update.  Server on eth3 on the router.).  This DID happen once before the 6.10 release (though I believe it was on an early 6.10 RC build), but it's been much more frequent since. 

 

I need to restart my router (which returns my LAN to normal) and then the server itself.  Of course, this constitutes an unclean shutdown and has caused some issues.  

 

It's a hard crash for the server, though; even with a locally attached keyboard and monitor, not even keyboard lights(numlock etc) work.  

 

I'm kind of at a loss as to what's happening, but it's a new Alder Lake system - Gigabyte Z690 UD DDR4 motherboard, and Intel 12400 CPU.  I am NOT hardware transcoding in plex - that causes hard crashes too, but doesn't take the LAN down with it.  I'm wondering if it's something to do with the 2.5gbe motherboard LAN?  I vaguely remember reading about some issues there?

 

Any ideas of what's happening, or how to troubleshoot it?

tuna-diagnostics-20220521-0931.zip syslog.txt

Edited by Wintersdark
Link to comment
On 5/22/2022 at 2:16 AM, JorgeB said:

Enable the syslog server and post that after a crash.

Sorry, I assumed that was in the diagnostics zip.  

 

The most recent crash happened sometime may 23 morning:

 

May 23 03:33:07 Tuna CA Backup/Restore: #######################
May 23 03:33:07 Tuna CA Backup/Restore: Deleting /mnt/user/backup/appdata_backups/[email protected]
May 23 03:33:07 Tuna CA Backup/Restore: Deleting /mnt/user/backup/appdata_backups/[email protected]
May 23 03:33:07 Tuna CA Backup/Restore: Backup / Restore Completed
May 23 03:40:01 Tuna kernel: mdcmd (36): set md_write_method 1
May 23 03:40:01 Tuna kernel: 
May 23 18:31:49 Tuna kernel: microcode: microcode updated early to revision 0x1f, date = 2022-03-03
May 23 18:31:49 Tuna kernel: Linux version 5.15.40-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Mon May 16 10:05:44 PDT 2022
May 23 18:31:49 Tuna kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
May 23 18:31:49 Tuna kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks

 

A look into docker containers logs (no VM's running) show an abrupt stop of logging with the last entries ~ 07:30 - 07:40, though none have errors there an end then followed by the restart in the evening. 

syslog

Link to comment

Unfortunately there's nothing logged, one thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment

So, for reference for anyone else who happens by: 

 

Disabled my 2.5gbps onboard LAN, replaced with an older Intel 4-port lan card.  Unraid still freezes intermittently, however it doesn't take my router down too.  Small wins, I suppose. 

 

Next steps: 

Realized I was still running Intel GPU Top (though *not* passing through anything to Plex; Plex has no /dev/dri folder).  Because the 12400's GPU is a known source of crashes, I removed that as well.  This wasn't a problem pre-6.10, but daily crashes ever since the upgrade have been frustrating. 

 

If it still crashes, I'll shut down docker as well and run it as a pure NAS for a while and see.

  • Like 1
Link to comment

Well, still crashing daily; often in less than a day.  Nothing in logs illustrating why.  Shutting down docker entirely and running as a pure NAS for a while to see how that goes.  Gotta say, though, this is not a good experience for someone new to Unraid.  I'd love more tools to see what's going on and why it's crashing - I mean, if it is something docker, isn't that largely the point of docker?  That an individual container crashing doesn't take the whole system down?

Link to comment

This also happened to me with the latest update, my server (Dell Poweredge T420) stalled on boot up with the message etho 0 not connected, and something about IPv4 address: not set, IPv6 address: not set. leading to no ip address could be found.

 

i tried everything from this forum, www, to sort this and was pulling my hair out to sort this to no avail, then i remembered i had a recent - ish flash back up, and reverted back to that, it was not that simple as i had changed a few things in the last week tidying up files etc, and what a palaver that was getting things back to a working system.

 

Anyway i backed up the previous version that works for me until this is rectified, so remember always back up before upgrade is a lesson to me.

 

PS:- i usually do back up before update, but the updates have been going so well without issues i thought just do it 🙂 how wrong was i .....

Link to comment

And continuing to document, just in case anyone else has a similar problem in the future. 

 

Still up and stable after 36 hours, that's better than I've managed since 6.10 dropped where I never managed 24 hours.  This, with docker just turned off.  I'm going to leave it tonight, and if it's still running tomorrow after work I'm going to re-enable Docker, and everything except Plex (where I have a temporary instance up and running on another machine currently).  I feel Plex is the most likely culprit, even though it's currently not set up to hardware transcode. 

Link to comment
  • 4 months later...

Hey, did you have any more crashes since then? Have you made updates also? 

 

I started having the same symptoms for the past month, after having it running from a first install for 2 months... Problem is my oldest version of unraid is 6.10 as I first installed it on August. Running the same kind of hardware (alder lake) 

Link to comment

Yes, narrowed it down to my 2.5g lan port.  Tossed in an old PCIe Intel network card and the problem stopped, removed it a week ago and the system crashed in about 12 hours (again taking my whole lan with it).

 

Back in went the network adapter, no more problems.  Frustrating as I'd rather just use the 2.5g port, but it's not the end of the world.  

Link to comment

I'm not in a huge hurry.  It's hard to get downtime (my server is used by a LOT of people) and I'm really not fond of hard crashes. Not to mention what an utter PITA it is to have my whole lan go down and require power cycling my router.  The PCIe adapter I'm using now is a very good quad port job, and while bonds aren't as good as a fatter single pipe, it does do the job. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.