Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Server Repeatedly Hard Crashes

Featured Replies

Hello

 

I'm experiencing an issue where my server will randomly crash completely.  No web UI, no SAMBA, everything hard crashes and the box requires a hard restart.  It sounds like a kernel panic but I can't confirm this.

 

At first the server would last between 12-24 hours before crashing, but recently this window has been cut to around 2-6 hours.

 

I was originally under the impression that I had a failing flash drive.  Sometimes before it would crash, I would see the "License file not found" error and that the boot USB device had been moved to Unassigned Devices.  However, I've already swapped to a brand new drive and have been moving it around to different USB ports and controllers, and this hasn't fixed anything.

ca-server-diagnostics-20220915-0935.zip

  • Author

UPDATE:  4 days 1 hour up and counting.  I ended up needing to change both "Power Supply Idle Control" and global C-states, but it appears to be stable now!  Thanks for pointing that out!

 

Will do.  I was a little thrown since I've run unRAID without issue on this board before when I was running a 1900X, but it makes sense that something may have changed when I upgraded.

Edited by SpyisSandvich
Status update

  • 2 months later...
  • Author

Reopening this again, as the hard crashes have started back up with an Unraid update (not sure exactly which but it was good until at the very earliest the beginning of October).  I double-checked the UEFI settings and confirmed that the C-States options and RAM speed configurations have not changed.

17 hours ago, SpyisSandvich said:

I double-checked the UEFI settings and confirmed that the C-States options and RAM speed configurations have not changed.

Also worth completely disabling C-states if just using the power supply idle control setting since it has been reported that it can change with a kernel change.

  • Author
On 12/12/2022 at 2:13 AM, JorgeB said:

Also worth completely disabling C-states if just using the power supply idle control setting since it has been reported that it can change with a kernel change.

I checked again this morning.  Global C-State Control is "Disabled", and Power Supply Idle Control is "Typical Current Idle".

 

On 12/11/2022 at 9:09 AM, ChatNoir said:

Then set up a syslog server and post the file after the next crash.

Tried with a local syslog server and it didn't capture anything around the time of the last crash  (It crashed around 22:30 and the last log message was two hours prior).  I also briefly tried remote syslog with a virtual Debian machine, but switched away from it because for most of the 11th, I only ever saw the line that indicated syslogging was started.  Should I be mirroring to the flash drive, or should I try the remote solution again?

syslog-192.168.2.251.log

18 minutes ago, SpyisSandvich said:

Should I be mirroring to the flash drive

Worth a try but if it's a hardware issue usually there's nothing relevant logged.

  • Author

There really isn't much here to go off of, it frequently crashes several hours ahead of the last logs.

 

The part that gets me here is that this was stable until I updated unRAID a month or so ago.  I realize it's possible that some of the BIOS settings got messed with, but I've been able to confirm that this hasn't happened.  I would rather not downgrade my unRAID version, and it would be difficult to switch to comparable hardware I'm currently running.

 

How does one troubleshoot this without any information?

I would start by downgrading to last known good release to confirm if it's update related or not.

  • Author

Is there a better way to do this than the built-in Update OS screen?  That screen only allows me to go back one version.

 

EDIT:  I suppose I can follow this nugget.  I'll take a backup of my flash drive first though.

 

EDIT 2:  Currently testing unRAID 6.10.3, seemed to take the downgrade just fine.

 

EDIT 3:  6.10.3 crashed last night.  Testing 6.9.2 now.  If this fails, then that should remove the software as the possible culprit because I should have easily been past this point when it was working well.

Edited by SpyisSandvich

  • Author

Okay, quickly breaking at my 6.9.2 test as I'm immediately seeing a flood of this message in my syslog:

Quote

Dec 21 08:23:42 CA-Server kernel: vfio-pci 0000:42:00.0: BAR 1: can't reserve [mem 0x80000000-0x87ffffff 64bit pref]

 

I'm going to halt the software regression test because it seems like my system can't handle going back this far.

 

EDIT 2:  This is still happening even now that I've gone back to 6.11.5, should I restore from my flash backup, or is this something deep in the system configuration that the flash wouldn't touch?

 

I want to revisit the RAM speed, since I was told previously that might be an issue with servers.  This system runs DDR4-2400 with unbuffered ECC.  UEFI was already set to use that speed, and from what I gathered for this configuration, this should be okay.  I could try bringing the speed down and seeing if that improves stability?

 

EDIT:  Downclocking my RAM only seems to have made things worse.  My computer gets into this weird loop where it'll spin the fans for a few seconds, then power back down.  It'll do this 3 times then stabilize, where I'm assuming it falls back to last good configuration.  Even does this when setting the speed back to normal.  It will eventually boot though.

 

Attached logs from the 6.9.2 test in case they're helpful.

ca-server-diagnostics-20221221-0832.zip

Edited by SpyisSandvich

2400MT/s should be fine, you only have 3 sticks detected, is that expected?

  • Author

Yeah, I've been running like this since I got it set up.  One of the slots is bad, but I've never run into memory capacity issues, so I kinda just left it.  I could do some troubleshooting and potentially reconfigure it to see all 4, just haven't seen the need.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.