Server running very slow and becomes unresponsive soon after pre-clearing start


Recommended Posts

Hi,

 

I posted this Saturday and thought all was good, but I've been experiencing additional issues that are odd: 

I am in the process of preclearing a new replacement disk for the one that is failing. I've turned VMs and Docker off so that there are no additional resources being used elsewhere and to not stress out the parity drive that would e emulating the disabled disk. The problem is, after a period of time, the server becomes unresponsive with no gui loading nor am I ale to SSH in. Even sending an orderly shutdown via IPMI fails. So I have to power cycle via IPMI. It also runs extremely slowly now even after a reboot. I was on an older version (6.12.6) and updated to 6.12.8 to see if that was the issue. I ran a smart test on the new drive to see if it was the issue but it came back no errors. Can anyone tell me idf they see something awry from the diagnostics file I pulled just now after a restart and pre-clear start? It also took the diagnostics almost 20 minutes to generate.

Thanks!

tower-diagnostics-20240311-1035.zip

Link to comment
Mar 11 10:00:09 Tower mcelog: Fallback Socket memory error count 27350 exceeded threshold: 219209113 in 24h
Mar 11 10:00:09 Tower mcelog: Location SOCKET:1 CHANNEL:? DIMM:? []
Mar 11 10:00:09 Tower mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Mar 11 10:00:09 Tower mcelog: Fallback Socket memory error count 30236 exceeded threshold: 219239350 in 24h
Mar 11 10:00:09 Tower mcelog: Location SOCKET:1 CHANNEL:? DIMM:? []
Mar 11 10:00:09 Tower mcelog: Running trigger `socket-memory-error-trigger' (reporter: sockdb_fallback)
Mar 11 10:00:09 Tower mcelog: Fallback Socket memory error count 30236 exceeded threshold: 219269587 in 24h
Mar 11 10:00:09 Tower mcelog: Location SOCKET:1 CHANNEL:? DIMM:? []

 

Looks like the server may be having RAM issues

Link to comment

Thanks for the quick reply!

Ah, that's unfortunate. I'm going to try to reseat them all to see if that helps. I also have more RAM than I need for this machine so I could remove any that have failed.

Also, the CPU utilization is screaming at 100% even though VMs and Docker is disabled for now.

 

2024-03-11_11-11-45.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.