February 11, 20251 yr Server rebooted around 1:25am this morning and I am trying to find out the cause. Currently undergoing a parity check due to the unclean reboot and so far at 41.1% and 0 errors so far. All drives listed as healthy. The server is on a UPS (APC UPS 950VA). Running an Intel Core i5-14500 on a ASRock Z790 Pro RS (on the latest BIOS) with 128GB (4x32GB) Corsair Vengeance DDR5 RAM. Below are 3 points of info that I can provide to people smarter than me in order to hopefully narrow down the cause: 1. I have syslog server enabled already and this was the log entries before and after 1:25am: Feb 11 01:17:01 Server emhttpd: spinning down /dev/sdf Feb 11 01:17:33 Server emhttpd: spinning down /dev/sde Feb 11 01:17:58 Server emhttpd: spinning down /dev/sdc Feb 11 01:25:46 Server rc.rsyslogd: Syslog server daemon... Started. Feb 11 01:25:46 Server cache_dirs: Arguments=-p 2 -i archives -i media -l off -d 6 Feb 11 01:25:46 Server cache_dirs: Max Scan Secs=10, Min Scan Secs=1 Feb 11 01:25:46 Server cache_dirs: Scan Type=adaptive Feb 11 01:25:46 Server cache_dirs: Min Scan Depth=4 Feb 11 01:25:46 Server cache_dirs: Max Scan Depth=6 Feb 11 01:25:46 Server cache_dirs: Use Command='find -noleaf' 2. Going into the system logs via the GUI, I see the following after the reboot: Feb 11 01:24:54 Server kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks Feb 11 01:24:54 Server kernel: ACPI: Early table checksum verification disabled Feb 11 01:24:54 Server rsyslogd: omfwd/udp: socket 8: sendto() error: Network is unreachable [v8.2102.0 try https://www.rsyslog.com/e/2354 ] Feb 11 01:24:54 Server rsyslogd: omfwd: socket 8: error 101 sending via udp: Network is unreachable [v8.2102.0 try https://www.rsyslog.com/e/2354 ] Feb 11 01:24:54 Server kernel: floppy0: no floppy controllers found Feb 11 01:24:54 Server kernel: i915 0000:00:02.0: [drm] [ENCODER:235:DDI A/PHY A] failed to retrieve link info, disabling eDP Feb 11 01:25:05 Server sshd[2002]: Server listening on 10.14.1.2 port 22. Feb 11 01:25:08 Server mcelog: failed to prefill DIMM database from DMI data Feb 11 01:25:08 Server sshd[2002]: Received signal 15; terminating. Feb 11 01:25:08 Server sshd[2506]: Server listening on 10.253.0.1 port 22. Feb 11 01:25:08 Server sshd[2506]: Server listening on 10.14.1.2 port 22. Feb 11 01:25:24 Server upsmon[5158]: Warning: running as one big root process by request (upsmon -p) Feb 11 01:25:30 Server sshd[2506]: Received signal 15; terminating. I am guessing the rsyslogd errors are there just because the server is starting up and the syslog server didn't start yet. The kernal warning about split_locks, is that something I should be worried about? 3. I have also attached the diagnostic zip file below. What would be the first thing I should look into? Appreciate any guidance in advance! server-diagnostics-20250211-1139.zip Edited February 11, 20251 yr by unraid_user11
February 11, 20251 yr Community Expert Server rebooting by itself is almost always a hardware problem, and most often there won't be anything logged in the syslog, there may be something in the SEL, if the board has one, just had one of my servers reboot yesterday, checking the SEL shows the reason:
February 11, 20251 yr Community Expert Could be the UPS batteries, too if they haven't been replaced in a while
February 26, 20251 yr Author On 2/11/2025 at 12:42 PM, JorgeB said: Server rebooting by itself is almost always a hardware problem, and most often there won't be anything logged in the syslog, there may be something in the SEL, if the board has one Appreciate you JorgeB for always being so quick to respond and offering sound advice. Sadly running an ASRock RS Pro Z790 that doesn't have SEL. On 2/11/2025 at 12:47 PM, Michael_P said: Could be the UPS batteries, too if they haven't been replaced in a while Hmm that's interesting recommendation I would have never thought of. Currently running NUT server on Unraid, wouldn't I see an alert there?
February 26, 20251 yr Author Downloaded and ran the newest Memtest v11.2 for over 7 hours with 0 errors. After almost finishing the 3rd pass I cancelled the test as I am sure I would have seen at least some errors by now if it was the RAM.
February 26, 20251 yr Author Little over 12 days and no random reboots. One thing I did do was disable Hybrid mode on the PSU and switched it on to constantly on. It is a Seasonic FOCUS PX-750. Maybe some PSU components were getting too hot? If the server reboots again, I am thinking of swapping the PSU to a Corsair RM750x. Curious, what would be the next thing you guys would check or replace?
February 26, 20251 yr Community Expert It's worth a try, also remember that memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.
February 26, 20251 yr Community Expert 5 hours ago, unraid_user11 said: Appreciate you JorgeB for always being so quick to respond and offering sound advice. Sadly running an ASRock RS Pro Z790 that doesn't have SEL. Hmm that's interesting recommendation I would have never thought of. Currently running NUT server on Unraid, wouldn't I see an alert there? Suggested it because If the batteries are on their way out and the power blinks it wouldn't alert it'd just power off/reset
March 8, 20251 yr Author On 2/26/2025 at 5:16 AM, Michael_P said: Suggested it because If the batteries are on their way out and the power blinks it wouldn't alert it'd just power off/reset Any suggestion on how I could find out if the battery is the cause?
March 8, 20251 yr 19 minutes ago, unraid_user11 said: battery is the cause? Almost a given that if the batteries are over a year or 2 old they need to be replaced. But not necessarily your issue.
March 23, 20251 yr Author On 3/7/2025 at 7:37 PM, Squid said: Almost a given that if the batteries are over a year or 2 old they need to be replaced. But not necessarily your issue. Batteries have to be replaced every year?!?!
March 23, 20251 yr 1 hour ago, unraid_user11 said: Batteries have to be replaced every year?!?! Batteries in my APC UPS units have usually lasted 4-5 years. YMMV.
March 24, 20251 yr Community Expert 14 hours ago, unraid_user11 said: Batteries have to be replaced every year?!?! I got two years to the day on the last set
March 24, 20251 yr Community Expert 3 hours ago, unraid_user11 said: How did you know when it was time to replace the battery? When my firewall and server in the rack both reset after a power blink (and I mean blink, my TV didn't even flicker). UPS still showed 100% battery and 50+ minutes of estimated time on power, but as soon as the plug was pulled it wailed like a baby and shut off after a few seconds. 1 of the 4 batteries in the UPS ended up reading 10v, so one of the cells wasn't celling anymore. In my experience having 7 of these around the house from different brands, sometimes the monthly self-test will let you know, but most of the time I find out when it just doesn't work when I need it, so I replace them every couple of years anyway on the stuff I need to keep online. My kid's powerwheel jeep uses them too, so any 'good' ones end up in there, the rest get recycled. I'll edit to add that I also have the rack UPS plugged into an Ecoflow Delta 2, which isn't fast enough to handle switchover, but is enough to keep the UPS and rack running long enough to fire up the generator. So the batteries don't get beat on a lot, they just wear out faster than you'd think even if you baby 'em. Edited March 24, 20251 yr by Michael_P
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.