September 16, 20223 yr Hello I'm experiencing an issue where my server will randomly crash completely. No web UI, no SAMBA, everything hard crashes and the box requires a hard restart. It sounds like a kernel panic but I can't confirm this. At first the server would last between 12-24 hours before crashing, but recently this window has been cut to around 2-6 hours. I was originally under the impression that I had a failing flash drive. Sometimes before it would crash, I would see the "License file not found" error and that the boot USB device had been moved to Unassigned Devices. However, I've already swapped to a brand new drive and have been moving it around to different USB ports and controllers, and this hasn't fixed anything. ca-server-diagnostics-20220915-0935.zip
September 16, 20223 yr Start here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173
September 16, 20223 yr Author UPDATE: 4 days 1 hour up and counting. I ended up needing to change both "Power Supply Idle Control" and global C-states, but it appears to be stable now! Thanks for pointing that out! Will do. I was a little thrown since I've run unRAID without issue on this board before when I was running a 1900X, but it makes sense that something may have changed when I upgraded. Edited September 21, 20223 yr by SpyisSandvich Status update
December 11, 20223 yr Author Reopening this again, as the hard crashes have started back up with an Unraid update (not sure exactly which but it was good until at the very earliest the beginning of October). I double-checked the UEFI settings and confirmed that the C-States options and RAM speed configurations have not changed.
December 12, 20223 yr 17 hours ago, SpyisSandvich said: I double-checked the UEFI settings and confirmed that the C-States options and RAM speed configurations have not changed. Also worth completely disabling C-states if just using the power supply idle control setting since it has been reported that it can change with a kernel change.
December 13, 20223 yr Author On 12/12/2022 at 2:13 AM, JorgeB said: Also worth completely disabling C-states if just using the power supply idle control setting since it has been reported that it can change with a kernel change. I checked again this morning. Global C-State Control is "Disabled", and Power Supply Idle Control is "Typical Current Idle". On 12/11/2022 at 9:09 AM, ChatNoir said: Then set up a syslog server and post the file after the next crash. Tried with a local syslog server and it didn't capture anything around the time of the last crash (It crashed around 22:30 and the last log message was two hours prior). I also briefly tried remote syslog with a virtual Debian machine, but switched away from it because for most of the 11th, I only ever saw the line that indicated syslogging was started. Should I be mirroring to the flash drive, or should I try the remote solution again? syslog-192.168.2.251.log
December 13, 20223 yr 18 minutes ago, SpyisSandvich said: Should I be mirroring to the flash drive Worth a try but if it's a hardware issue usually there's nothing relevant logged.
December 20, 20223 yr Author There really isn't much here to go off of, it frequently crashes several hours ahead of the last logs. The part that gets me here is that this was stable until I updated unRAID a month or so ago. I realize it's possible that some of the BIOS settings got messed with, but I've been able to confirm that this hasn't happened. I would rather not downgrade my unRAID version, and it would be difficult to switch to comparable hardware I'm currently running. How does one troubleshoot this without any information?
December 20, 20223 yr I would start by downgrading to last known good release to confirm if it's update related or not.
December 20, 20223 yr Author Is there a better way to do this than the built-in Update OS screen? That screen only allows me to go back one version. EDIT: I suppose I can follow this nugget. I'll take a backup of my flash drive first though. EDIT 2: Currently testing unRAID 6.10.3, seemed to take the downgrade just fine. EDIT 3: 6.10.3 crashed last night. Testing 6.9.2 now. If this fails, then that should remove the software as the possible culprit because I should have easily been past this point when it was working well. Edited December 21, 20223 yr by SpyisSandvich
December 21, 20223 yr Author Okay, quickly breaking at my 6.9.2 test as I'm immediately seeing a flood of this message in my syslog: Quote Dec 21 08:23:42 CA-Server kernel: vfio-pci 0000:42:00.0: BAR 1: can't reserve [mem 0x80000000-0x87ffffff 64bit pref] I'm going to halt the software regression test because it seems like my system can't handle going back this far. EDIT 2: This is still happening even now that I've gone back to 6.11.5, should I restore from my flash backup, or is this something deep in the system configuration that the flash wouldn't touch? I want to revisit the RAM speed, since I was told previously that might be an issue with servers. This system runs DDR4-2400 with unbuffered ECC. UEFI was already set to use that speed, and from what I gathered for this configuration, this should be okay. I could try bringing the speed down and seeing if that improves stability? EDIT: Downclocking my RAM only seems to have made things worse. My computer gets into this weird loop where it'll spin the fans for a few seconds, then power back down. It'll do this 3 times then stabilize, where I'm assuming it falls back to last good configuration. Even does this when setting the speed back to normal. It will eventually boot though. Attached logs from the 6.9.2 test in case they're helpful. ca-server-diagnostics-20221221-0832.zip Edited December 21, 20223 yr by SpyisSandvich
December 22, 20223 yr Author Yeah, I've been running like this since I got it set up. One of the slots is bad, but I've never run into memory capacity issues, so I kinda just left it. I could do some troubleshooting and potentially reconfigure it to see all 4, just haven't seen the need.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.