Jump to content

Server keeps freezing overnight, with caps lock and scroll lock lights blinking on the keyboard.


Recommended Posts

I put my server together about 6 years ago and it's been really solid until recently. However, 4 times in the last ~month I've found it completely frozen, not responding to network requests (file shares, docker images, the admin web UI), and not responding to keyboard or mouse inputs at the physical machine. When this happens, the numlock light on the keyboard turns off and the caps lock and scroll lock lights start blinking. It seems to be getting more frequent, with the most recent two freezes happening after only a day or two of uptime.

 

The screen shows whatever was happening at the moment it froze, so I've left `htop` open once, and `dmesg --follow` the next time, but neither have anything too obvious. htop shows shfs using about 10% CPU and transmission using another 15% across two processes; dmesg shows only two recent messages:

 

md: sync done. time=60938sec
md: recovery thread: exit status 0

 

I'm not sure what those mean, but "exit status 0" sounds like "not a crash".

I'm also attaching an anonymized diagnostics bundle.

 

The CPU, MB, and RAM (i7-2600K, Asus P8P67 Pro, & 2x8GB Kingston HyperX Fury DDR3-1866) are all recycled from the desktop PC I built about 15 years ago, so my first thought is that maybe one of them is going out.  But I'd still like to understand what's happening better.

Also, all of 4 freezes have happened in the middle of the night, which makes me think it might be some scheduled thing that's triggering it.

 

[Edit] One other thing that comes to mind is that I switched the cache drive from a SATA SSD to an NVMe SSD a couple of months ago. I initially messed up the file owners when copying everything over to the new SSD, which broke some of my docker images, but I think I have it straightened out now.


Does anyone here have any ideas what the root cause might be?

IMG_20240224_223657671.jpg

IMG_20240224_223713141.jpg

IMG_20240226_094457521_BURST000_COVER.jpg

unraid-diagnostics-20240226-1418.zip

Edited by nfriedly
Link to comment
13 hours ago, nfriedly said:

the numlock light on the keyboard turns off and the caps lock and scroll lock lights start blinking. It seems to be getting more frequent, with the most recent two freezes happening after only a day or two of uptime.

Looks more like a hardware issue, but enable the syslog server and post that after a crash just in case there's something there.

  • Thanks 1
Link to comment
11 hours ago, JorgeB said:

Looks more like a hardware issue, but enable the syslog server and post that after a crash just in case there's something there.

 

Ok, I turned on Mirror syslog to flash, I'll post another update after the next freeze.

 

I see it says a copy of the syslog is stored in the logs folder on the flash drive - what's the right way to retrieve that file? Is it exposed in the web UI or over the network, or should I just plug the flash drive into another computer?

Link to comment
2 hours ago, nfriedly said:

Is it exposed in the web UI or over the network, or should I just plug the flash drive into another computer?

You can expose the flash drive over the network if you want as the 'flash' share by going into its properties and setting the SMB part appropriately.  Plugging it into another computer is another easy way.

  • Thanks 1
Link to comment
Posted (edited)

Ok, I had two more freezes. Yesterday's was a bit different - it happened in the afternoon, they keyboard lights didn't blink, and when I rebooted it, it told me that the CPU was over temperature. This made me realize that the fan in the AOI cooler had died. (The pump was still working though.) That might be the root cause of the whole mess. I swapped it with a case fan and rebooted. It kicked off a parity check and seemed to be working when I went to bed. I thought maybe I had fixed the issue.

 

This morning it was back to the same type of crash that had happened before yesterday with the keyboard lights blinking.

 

I'm attaching the syslog and syslog-previous, but my suspicion now is that something is got permanently damaged by the CPU overheating.

 

---------

 

Update: (April 22) maybe it was just overheating. It's been a month and a half, and aside from the one crash that happened the night after replacing the fan, it's been rock solid.

 

I ordered an HBA card, and was waiting for it to arrive so that I could use a different CPU & motherboard with fewer SATA ports. But now that it's arrived, I'm not sure I actually need it. Oh well. I'll probably swap it out anyways.

 

syslog syslog-previous

Edited by nfriedly
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...