[SOLVED] Server hang (segfaults) - not possible to do a clean shutdown / reboot


Recommended Posts

Hi everyone,


Today I noticed that my server had become unresponsive. I was able to login and see that the disk activity was 0, 1% CPU usage and 11% ram usage - temps were all in check (disks under 40, CPU under 45).

 

The syslog (which I could still open) showed a lot of red error logs. But I was not able to download the diagnostics anymore - the system had become too unresponsive. I did managed to connect via SSH. The top command showed nothing to be concerned about, but the top command crashed after 5 seconds with a "Segmentation fault" error. (see screenshot attached)

 

That's when I decided to reboot the server entirely with the Unraid "powerdown -r" command, which again resulted in a Segmentation fault error. Trying again did show the "going down" message, but after waiting another 10 minutes it still didn't power down. Even the terminal when connecting to the server directly with keyboard and display, was unresponsive. I could still type but the commands didn't actually do anything.

 

I eventually restarted the server by holding down the power button and then starting the server again. The boot up process proceeded as it normally would, except that it started a parity check immediately, but I think that is expected when the server experienced an unclean shutdown.

 

==

 

Does anyone have any suggestions as to what could have caused this? Or any recommendations in terms of next steps? I was thinking of maybe doing a memory test - but this memory kit has been running fine 24/7 since I bought it (4 months ago) and is running stock (non-xmp).

 

Screenshot 2021-06-08 160843.png

syslog-manual.rtf

Edited by ssh
Spelling, grammar and wording
Link to comment

Today I had this problem again. Attached the syslog again via manual copy, as it was not possible to download diagnostics when the server was in this state. I have recorded a video of what I saw on screen, not sure if its helpful: 

 

syslog.txt

Edited by ssh
Link to comment

Can't see the reason based on the syslog, one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment

Not sure if this error could be related to RAM, but I decided to do a RAM test anyways.

 

At around 36% of the first pass, errors started appearing.

 

IMG_3194.thumb.jpeg.5026c3a0b436c782c179b4ea8b284003.jpeg

 

So I am starting to suspect my memory modules / motherboard / cpu now (or can it only be the memory itself with these kind of errors?).

 

I am using 2 sticks of 16 GB DDR4-2400 memory (CMK32GX4M2A2400C16) in dual-channel mode with XMP disabled, using the "Auto" frequency setting in the BIOS. So what I am doing now is testing each stick individually to see if the errors remain (will post results here).

Edited by ssh
Link to comment

Turns out one of my modules is bad. Tested them separately, module A seems to be fine: no errors after 5 hours (3 passes), module B started spitting out errors during the 2nd pass (or after 1 hour). This was repeatable on another system with another motherboard and cpu, so it's definitely the module itself.


I've replaced both sticks with 1x 8GB stick that I had laying around and completed 8 passes on that without any errors. I've started up the server again and did another parity check (luckily still 0 sync errors, so it seems that the memory was faulty, but not faulty enough to cause any data loss on the array yet). Hopefully replacing the bad RAM fixed the issue :)

 

I've requested an RMA on the faulty memory kit.

Edited by ssh
  • Like 1
Link to comment
  • JorgeB changed the title to [SOLVED] Server hang (segfaults) - not possible to do a clean shutdown / reboot

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.