bullmoose20 Posted May 9, 2022 Share Posted May 9, 2022 During monthly parity check, one of my older drives (1TB) was taken offline due to excessive errors. I replaced the drive with a new 4TB drive. Then upon restarting, the progress was at about 100Mb/s. Now it’s between 5-10Mb/s. diags attached nzwhs01-diagnostics-20220509-0719.zip Quote Link to comment
ChatNoir Posted May 9, 2022 Share Posted May 9, 2022 You have bigger issues than speed. May 9 06:54:25 NZWHS01 kernel: mce: [Hardware Error]: Machine check events logged May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: CPU 10: Machine Check Event: 0 Bank 11: cc00008f000800c3 May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: TSC 0 May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: ADDR 288aa0b000 May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: MISC 138116f5b3f9e8c May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: PROCESSOR 0:206d7 TIME 1652093665 SOCKET 1 APIC 24 May 9 06:54:25 NZWHS01 kernel: EDAC MC1: 2 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 or CPU_SrcID#1_Ha#0_Chan#3_DIMM#1 or CPU_SrcID#1_Ha#0_Chan#3_DIMM#2 (channel:3 page:0x288aa0b offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c3 socket:1 ha:0 channel_mask:8 rank:255) May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 11: cc0000cf000800c3 May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: TSC 0 May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: ADDR 288aa0b000 May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: MISC c900199356db5e8c May 9 06:54:25 NZWHS01 kernel: EDAC sbridge MC1: PROCESSOR 0:206d7 TIME 1652093665 SOCKET 1 APIC 20 May 9 06:54:25 NZWHS01 kernel: EDAC MC1: 3 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 or CPU_SrcID#1_Ha#0_Chan#3_DIMM#1 or CPU_SrcID#1_Ha#0_Chan#3_DIMM#2 (channel:3 page:0x288aa0b offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c3 socket:1 ha:0 channel_mask:8 rank:255) May 9 06:54:38 NZWHS01 root: Total Spundown: 0 ### [PREVIOUS LINE REPEATED 1 TIMES] ### May 9 06:59:53 NZWHS01 kernel: mce_notify_irq: 11 callbacks suppressed May 9 06:59:53 NZWHS01 kernel: mce: [Hardware Error]: Machine check events logged ### [PREVIOUS LINE REPEATED 1 TIMES] ### May 9 07:04:40 NZWHS01 root: Total Spundown: 0 May 9 07:05:20 NZWHS01 kernel: mce_notify_irq: 12 callbacks suppressed May 9 07:05:20 NZWHS01 kernel: mce: [Hardware Error]: Machine check events logged You should fix your RAM issues before doing any rebuild. Your motherboard might pinpoint what DIMM(s ?) are causing problems, you could also run a memtest (from https://www.memtest86.com/ since you have ECC memory). Quote Link to comment
ChatNoir Posted May 9, 2022 Share Posted May 9, 2022 Is it RAM or the CPU ? May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: Machine check events logged May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 8: Machine Check: 0 Bank 11: c8009acf00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d12040438340de00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 20 microcode 71a May 8 17:32:13 NZWHS01 kernel: #9 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: Machine check events logged May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 9: Machine Check: 0 Bank 11: c8002a4f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d120404240019e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 22 microcode 71a May 8 17:32:13 NZWHS01 kernel: #10 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 10: Machine Check: 0 Bank 11: c8001b0f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d120018101419e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 24 microcode 71a May 8 17:32:13 NZWHS01 kernel: #11 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 11: Machine Check: 0 Bank 11: c80018cf00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC c908414303039e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 26 microcode 71a May 8 17:32:13 NZWHS01 kernel: #12 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 12: Machine Check: 0 Bank 11: c8001c8f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC ca09414059f51e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 28 microcode 71a May 8 17:32:13 NZWHS01 kernel: #13 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 13: Machine Check: 0 Bank 11: c800224f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d2295e6997215e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 2a microcode 71a May 8 17:32:13 NZWHS01 kernel: #14 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 11: c8001d8f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d1286ec40002de00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 2c microcode 71a May 8 17:32:13 NZWHS01 kernel: #15 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 15: Machine Check: 0 Bank 11: c8001ecf00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC ca0100c141421e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 2e microcode 71a May 8 17:32:13 NZWHS01 kernel: May 8 17:32:13 NZWHS01 kernel: .... node #0, CPUs: #16 May 8 17:32:13 NZWHS01 kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. May 8 17:32:13 NZWHS01 kernel: #17 #18 #19 #20 #21 #22 #23 May 8 17:32:13 NZWHS01 kernel: .... node #1, CPUs: #24 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 24: Machine Check: 0 Bank 11: c800454f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC ca01038343c1de00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 21 microcode 71a May 8 17:32:13 NZWHS01 kernel: #25 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 25: Machine Check: 0 Bank 11: c8001a4f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d12040c081801e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 23 microcode 71a May 8 17:32:13 NZWHS01 kernel: #26 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 26: Machine Check: 0 Bank 11: c8001a0f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d120400181801e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 25 microcode 71a May 8 17:32:13 NZWHS01 kernel: #27 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 27: Machine Check: 0 Bank 11: c80022cf00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC ca094242c08b9e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 27 microcode 71a May 8 17:32:13 NZWHS01 kernel: #28 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 28: Machine Check: 0 Bank 11: c8001d0f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC ca090283c0819e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 29 microcode 71a May 8 17:32:13 NZWHS01 kernel: #29 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 29: Machine Check: 0 Bank 11: c8001d0f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d128060580829e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 2b microcode 71a May 8 17:32:13 NZWHS01 kernel: #30 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 30: Machine Check: 0 Bank 11: c800178f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC d1285b10ce5e5e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 2d microcode 71a May 8 17:32:13 NZWHS01 kernel: #31 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: CPU 31: Machine Check: 0 Bank 11: c800298f00800093 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: TSC 0 MISC ca014242a2ff5e00 May 8 17:32:13 NZWHS01 kernel: mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1652045491 SOCKET 1 APIC 2f microcode 71a Quote Link to comment
bullmoose20 Posted May 9, 2022 Author Share Posted May 9, 2022 "ID","Severity","Class","Last Update","Initial Update","Count","Description", "447","Critical","CPU","05/08/2022 21:29","05/08/2022 21:29","1","Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000004, Status 0xB2000000'72000402, Address 0x00000000'00000000, Misc 0x00000000'00000000)", "446","Critical","POST Message","05/08/2022 18:13","05/08/2022 18:13","2","POST Error: 207-Memory initialization error on Processor 2 Socket 6. The operating system may not have access to all of the memory installed in the system.", "445","Critical","POST Message","05/08/2022 18:13","05/08/2022 18:13","2","POST Error: 207-Memory initialization error on Processor 2 Socket 5. The operating system may not have access to all of the memory installed in the system.", "444","Critical","POST Message","05/08/2022 14:37","05/08/2022 14:37","2","POST Error: 207-Memory initialization error on Processor 2 Socket 5. The operating system may not have access to all of the memory installed in the system.", "443","Critical","POST Message","05/08/2022 14:37","05/08/2022 14:37","2","POST Error: 207-Memory initialization error on Processor 2 Socket 4. The operating system may not have access to all of the memory installed in the system.", "442","Caution","Main Memory","04/30/2022 14:57","04/30/2022 14:57","1","Corrected Memory Error threshold exceeded ((Processor 2, Memory Module 5))", "441","Caution","Main Memory","04/30/2022 14:57","04/30/2022 14:57","1","Corrected Memory Error threshold exceeded ((Processor 2, Memory Module 4))", "440","Critical","POST Message","04/29/2022 16:09","04/29/2022 16:09","2","POST Error: 207-Memory initialization error on Processor 2 Socket 4. The operating system may not have access to all of the memory installed in the system.", "439","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 12 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "438","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 11 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "437","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 10 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "436","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 9 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "435","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 8 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "434","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 7 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "433","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 6 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "432","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 5 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "431","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 4 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "430","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 3 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "429","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 2 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "428","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 2, DIMM 1 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "427","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 12 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "426","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 11 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "425","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 10 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "424","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 9 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "423","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 8 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "422","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 7 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "421","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 6 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "420","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 5 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "419","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 4 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "418","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 3 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "417","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 2 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "416","Informational","POST Message","05/08/2022 21:29","04/29/2022 12:53","6","POST Information: Processor 1, DIMM 1 could not be authenticated as genuine HP SmartMemory. Enhanced and extended HP SmartMemory features will not be active.", "415","Critical","POST Message","04/29/2022 12:53","04/29/2022 12:53","4","POST Error: 207-Memory initialization error on Processor 2 Socket 4. The operating system may not have access to all of the memory installed in the system.", "414","Caution","Power","04/29/2022 08:35","04/29/2022 08:35","1","System Power Supplies Not Redundant", "413","Caution","Power","04/29/2022 08:35","04/29/2022 08:35","1","System Power Supply: Input Power Loss or Unplugged Power Cord, Verify Power Supply Input (Power Supply 2)", "412","Informational","Maintenance","04/25/2022 21:52","04/25/2022 21:52","1","IML Cleared (iLO 4 user:Administrator)", This is what my iLo4 board is telling me... seems like a possible CPU failure... but man... this does not sound right... Processor 2 also seems to be the one that is always having DIMM problems... So maybe DIMMs are fine, but CPU is causing the issue..... And the trouble is that you need to generally using these as a pair.. so I would need to find another paid of CPUs if one of these is faulty. Quote Link to comment
bullmoose20 Posted May 9, 2022 Author Share Posted May 9, 2022 So I gather, my best course of action is to??? pause the Parity-Sync/Data-Rebuild shutdown server and reseat CPU and RAM run memtest See what happens Quote Link to comment
ChatNoir Posted May 9, 2022 Share Posted May 9, 2022 1 hour ago, bullmoose20 said: So I gather, my best course of action is to??? pause the Parity-Sync/Data-Rebuild shutdown server and reseat CPU and RAM run memtest See what happens Sounds good. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.