nfriedly Posted February 26 Share Posted February 26 (edited) I put my server together about 6 years ago and it's been really solid until recently. However, 4 times in the last ~month I've found it completely frozen, not responding to network requests (file shares, docker images, the admin web UI), and not responding to keyboard or mouse inputs at the physical machine. When this happens, the numlock light on the keyboard turns off and the caps lock and scroll lock lights start blinking. It seems to be getting more frequent, with the most recent two freezes happening after only a day or two of uptime. The screen shows whatever was happening at the moment it froze, so I've left `htop` open once, and `dmesg --follow` the next time, but neither have anything too obvious. htop shows shfs using about 10% CPU and transmission using another 15% across two processes; dmesg shows only two recent messages: md: sync done. time=60938sec md: recovery thread: exit status 0 I'm not sure what those mean, but "exit status 0" sounds like "not a crash". I'm also attaching an anonymized diagnostics bundle. The CPU, MB, and RAM (i7-2600K, Asus P8P67 Pro, & 2x8GB Kingston HyperX Fury DDR3-1866) are all recycled from the desktop PC I built about 15 years ago, so my first thought is that maybe one of them is going out. But I'd still like to understand what's happening better. Also, all of 4 freezes have happened in the middle of the night, which makes me think it might be some scheduled thing that's triggering it. [Edit] One other thing that comes to mind is that I switched the cache drive from a SATA SSD to an NVMe SSD a couple of months ago. I initially messed up the file owners when copying everything over to the new SSD, which broke some of my docker images, but I think I have it straightened out now. Does anyone here have any ideas what the root cause might be? unraid-diagnostics-20240226-1418.zip Edited February 29 by nfriedly Quote Link to comment
JorgeB Posted February 27 Share Posted February 27 13 hours ago, nfriedly said: the numlock light on the keyboard turns off and the caps lock and scroll lock lights start blinking. It seems to be getting more frequent, with the most recent two freezes happening after only a day or two of uptime. Looks more like a hardware issue, but enable the syslog server and post that after a crash just in case there's something there. 1 Quote Link to comment
nfriedly Posted February 27 Author Share Posted February 27 11 hours ago, JorgeB said: Looks more like a hardware issue, but enable the syslog server and post that after a crash just in case there's something there. Ok, I turned on Mirror syslog to flash, I'll post another update after the next freeze. I see it says a copy of the syslog is stored in the logs folder on the flash drive - what's the right way to retrieve that file? Is it exposed in the web UI or over the network, or should I just plug the flash drive into another computer? Quote Link to comment
itimpi Posted February 27 Share Posted February 27 2 hours ago, nfriedly said: Is it exposed in the web UI or over the network, or should I just plug the flash drive into another computer? You can expose the flash drive over the network if you want as the 'flash' share by going into its properties and setting the SMB part appropriately. Plugging it into another computer is another easy way. 1 Quote Link to comment
nfriedly Posted March 6 Author Share Posted March 6 (edited) Ok, I had two more freezes. Yesterday's was a bit different - it happened in the afternoon, they keyboard lights didn't blink, and when I rebooted it, it told me that the CPU was over temperature. This made me realize that the fan in the AOI cooler had died. (The pump was still working though.) That might be the root cause of the whole mess. I swapped it with a case fan and rebooted. It kicked off a parity check and seemed to be working when I went to bed. I thought maybe I had fixed the issue. This morning it was back to the same type of crash that had happened before yesterday with the keyboard lights blinking. I'm attaching the syslog and syslog-previous, but my suspicion now is that something is got permanently damaged by the CPU overheating. --------- Update: (April 22) maybe it was just overheating. It's been a month and a half, and aside from the one crash that happened the night after replacing the fan, it's been rock solid. I ordered an HBA card, and was waiting for it to arrive so that I could use a different CPU & motherboard with fewer SATA ports. But now that it's arrived, I'm not sure I actually need it. Oh well. I'll probably swap it out anyways. syslog syslog-previous Edited April 22 by nfriedly Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.