bally12345 Posted February 1, 2021 Share Posted February 1, 2021 (edited) Been having a few issues with my memory filling up with logs, disabled dockers and it didnt seem to go away so I thought I would change the memory modules. So got all 8gb sticks and filled up all 12 slots giving 96gb ECC but everyy few hours system just dies and lose all access server-diagnostics-20210131-1612.zip Been able to reboot system with IPMI, posted the last diagnostic file from yesterday from flash drive but not sure if its due to the memory or something else. Also am I right in thinking memtest wont help as I am using ECC? Any help is appreciated as always Edited February 8, 2021 by bally12345 Quote Link to comment
JonathanM Posted February 1, 2021 Share Posted February 1, 2021 29 minutes ago, bally12345 said: Also am I right in thinking memtest wont help as I am using ECC? It probably won't show errors, but any errors should be logged in the motherboard BIOS. Have you checked the motherboard diagnostic logs? Quote Link to comment
bally12345 Posted February 1, 2021 Author Share Posted February 1, 2021 Getting this 7408 02/01/2021 14:16:42 Fan3 Fan Lower Non-Critical - Going Low - Deasserted 7407 02/01/2021 14:16:42 Fan3 Fan Lower Critical - Going Low - Deasserted 7406 02/01/2021 14:16:42 Fan3 Fan Lower Non-Recoverable - Going Low - Deasserted 7405 02/01/2021 14:16:39 Fan3 Fan Lower Non-Recoverable - Going Low - Asserted 7404 02/01/2021 14:16:39 Fan3 Fan Lower Critical - Going Low - Asserted 7403 02/01/2021 14:16:39 Fan3 Fan Lower Non-Critical - Going Low - Asserted 7402 02/01/2021 14:15:02 Fan3 Fan Lower Non-Critical - Going Low - Deasserted 7401 02/01/2021 14:15:02 Fan3 Fan Lower Critical - Going Low - Deasserted 7400 02/01/2021 14:15:02 Fan3 Fan Lower Non-Recoverable - Going Low - Deasserted 7399 02/01/2021 14:14:59 Fan3 Fan Lower Non-Recoverable - Going Low - Asserted 7398 02/01/2021 14:14:59 Fan3 Fan Lower Critical - Going Low - Asserted 7397 02/01/2021 14:14:59 Fan3 Fan Lower Non-Critical - Going Low - Asserted 7396 02/01/2021 07:25:25 OEM Physical Security (Chassis Intrusion) General Chassis Intrusion - Asserted 7395 02/01/2021 07:23:31 VBAT Voltage Upper Non-Recoverable - Going High - Asserted 7394 02/01/2021 07:23:31 VBAT Voltage Upper Critical - Going High - Asserted 7393 02/01/2021 07:23:30 VBAT Voltage Upper Non-Critical - Going High - Asserted 7392 02/01/2021 07:23:26 +5V Voltage Upper Non-Recoverable - Going High - Asserted 7391 02/01/2021 07:23:26 +5V Voltage Upper Critical - Going High - Asserted 7390 02/01/2021 07:23:25 +5V Voltage Upper Non-Critical - Going High - Asserted 7389 02/01/2021 07:23:23 +3.3VSB Voltage Upper Non-Recoverable - Going High - Asserted 7388 02/01/2021 07:23:23 +3.3VSB Voltage Upper Critical - Going High - Asserted 7387 02/01/2021 07:23:22 +3.3VSB Voltage Upper Non-Critical - Going High - Asserted 7386 02/01/2021 07:23:20 +3.3V Voltage Upper Non-Recoverable - Going High - Asserted 7385 02/01/2021 07:23:20 +3.3V Voltage Upper Critical - Going High - Asserted 7384 02/01/2021 07:23:19 +3.3V Voltage Upper Non-Critical - Going High - Asserted 7383 02/01/2021 07:23:15 CPU2 DIMM Voltage Upper Non-Recoverable - Going High - Asserted 7382 02/01/2021 07:23:15 CPU2 DIMM Voltage Upper Critical - Going High - Asserted 7381 02/01/2021 07:23:15 CPU2 DIMM Voltage Upper Non-Critical - Going High - Asserted 7380 02/01/2021 07:23:12 CPU1 DIMM Voltage Upper Non-Recoverable - Going High - Asserted 7379 02/01/2021 07:23:12 CPU1 DIMM Voltage Upper Critical - Going High - Asserted 7378 02/01/2021 07:23:12 CPU1 DIMM Voltage Upper Non-Critical - Going High - Asserted All the sensors are showing as normal right now. Quote Link to comment
JonathanM Posted February 1, 2021 Share Posted February 1, 2021 The voltage going high sounds like the motherboard isn't happy with the PSU, but that could be a wild goose chase. I'd run a few hours of memtest and then look at the motherboard logs. 1 Quote Link to comment
bally12345 Posted February 1, 2021 Author Share Posted February 1, 2021 (edited) I have a couple of spare PSUs, I am going to swap them over I think and run memtest, the only other thing is to perhaps drop the number of dimms in use and see if that makes any difference. Edited February 1, 2021 by bally12345 Quote Link to comment
bally12345 Posted February 3, 2021 Author Share Posted February 3, 2021 (edited) Changed the PSUs and server has almost been up 24hours, I will give it to the weekend. This seems to have resolved the issue, nothing to do with the RAM. Faulty PSU Edited February 8, 2021 by bally12345 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.