February 28, 20179 yr Looking through my logs and nerd tools it seems I have a failing SSD cache disk and a bad stick of Ram at slot 11. I am not super familiar with the hardware side of linux so was hoping anyone with some more experience can take a look. I plan on doing a memtest on the next downtime, is there anything else to look at? This is a Dell R320 with 48GB of ECC ddr3. Quote root@Storage:~# mcelog mcelog: Warning: MCE buffer is overflowed. Hardware event. This is not a software error. MCE 0 CPU 0 BANK 11 MISC 90000010000568c ADDR a0b0d4000 TIME 1487315830 Thu Feb 16 23:17:10 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00008b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 1 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d1000 TIME 1487315866 Thu Feb 16 23:17:46 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 2 CPU 0 BANK 11 MISC 90000008000968c ADDR b8b0d5000 TIME 1487315866 Thu Feb 16 23:17:46 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 3 CPU 0 BANK 11 MISC 90000004000568c ADDR a0b0d3000 TIME 1487439109 Sat Feb 18 09:31:49 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 4 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d1000 TIME 1487439137 Sat Feb 18 09:32:17 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 5 CPU 0 BANK 11 MISC 90000010000568c ADDR a0b0d4000 TIME 1487441733 Sat Feb 18 10:15:33 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc0000cb000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 6 CPU 0 BANK 11 MISC 90000010000568c ADDR b8b0d3000 TIME 1487441762 Sat Feb 18 10:16:02 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00008b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 7 CPU 0 BANK 11 MISC 90000008000968c ADDR b8b0d5000 TIME 1487441762 Sat Feb 18 10:16:02 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00008b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 8 CPU 0 BANK 11 MISC 90000004000568c ADDR a0b0d3000 TIME 1487593857 Mon Feb 20 04:30:57 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 9 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d1000 TIME 1487593885 Mon Feb 20 04:31:25 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 10 CPU 0 BANK 11 MISC 90000004000968c ADDR a0b0d3000 TIME 1487622441 Mon Feb 20 12:27:21 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00008b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 11 CPU 0 BANK 11 MISC 90000020000968c ADDR a0b0d2000 TIME 1487625500 Mon Feb 20 13:18:20 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00024b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 12 CPU 0 BANK 11 MISC 90000020002168c ADDR a0b0d5000 TIME 1487625500 Mon Feb 20 13:18:20 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00014b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 13 CPU 0 BANK 11 MISC 90000008002168c ADDR b8b0d5000 TIME 1487625522 Mon Feb 20 13:18:42 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc0003cb000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 14 CPU 0 BANK 11 MISC 90000004000568c ADDR a0b0d3000 TIME 1487628627 Mon Feb 20 14:10:27 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 15 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d1000 TIME 1487638437 Mon Feb 20 16:53:57 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 16 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d4000 TIME 1487638437 Mon Feb 20 16:53:57 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 17 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d1000 TIME 1487641376 Mon Feb 20 17:42:56 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 18 CPU 0 BANK 11 MISC 90000004000568c ADDR a0b0d0000 TIME 1487644716 Mon Feb 20 18:38:36 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00008b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 19 CPU 0 BANK 11 MISC 90000010001168c ADDR a0b0d4000 TIME 1487644716 Mon Feb 20 18:38:36 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 20 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d1000 TIME 1487644756 Mon Feb 20 18:39:16 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 21 CPU 0 BANK 11 MISC 90000004000568c ADDR b8b0d4000 TIME 1487644756 Mon Feb 20 18:39:16 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 22 CPU 0 BANK 7 MISC 2140242400 ADDR a0b0d3a40 TIME 1487678903 Tue Feb 21 04:08:23 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR Transaction: Memory read error STATUS 8c00004000010092 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 23 CPU 0 BANK 11 MISC 490000004000568c TIME 1487678903 Tue Feb 21 04:08:23 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCA: MEMORY CONTROLLER RD_CHANNEL2_ERR Transaction: Memory read error MemCtrl: Corrected memory read error STATUS 8800004b00800092 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 24 CPU 0 BANK 11 MISC 90000004000568c ADDR a0b0d3000 TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc000b0b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 25 CPU 0 BANK 11 MISC 90000028002168c TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS c800010b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 26 CPU 0 BANK 11 MISC 90000020002168c ADDR a0b0d4000 TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 27 CPU 0 BANK 11 MISC 90000008002168c ADDR a0b0d4000 TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00008b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 28 CPU 0 BANK 11 MISC 90000008000568c ADDR a0b0d4000 TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc0000cb000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 29 CPU 0 BANK 11 MISC 90000004000568c TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS c80000cb000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 30 CPU 0 BANK 11 MISC 90000010000968c ADDR a0b0d5000 TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Error overflow Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS cc00008b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 Hardware event. This is not a software error. MCE 31 CPU 0 BANK 11 MISC 90000020002168c ADDR a0b0d5000 TIME 1487876971 Thu Feb 23 11:09:31 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR Transaction: Memory scrubbing error MemCtrl: Corrected patrol scrub error STATUS 8c00004b000800c2 MCGSTATUS 0 MCGCAP 1000c15 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 62 mcelog: warning: 16 bytes ignored in each record mcelog: consider an update storage-diagnostics-20170228-0657.zip Edited February 28, 20179 yr by arghhh40k Dell R320 not R420
February 28, 20179 yr Community Expert 23 minutes ago, arghhh40k said: I plan on doing a memtest on the next downtime If you suspect memory why would you wait to do the test? Bad RAM can corrupt your data.
February 28, 20179 yr Author It's currently in production, I will have a chance in a few hours. The ECC memory errors as far as I can tell are coming from the patrol scrub and being corrected.
March 1, 20179 yr 19 hours ago, arghhh40k said: Memtest came back ok... For ECC memory testing, I believe you need the PassMark Memtest86. I don't believe the built in Memtest has good ECC RAM testing. Also, how long did you test for? A marginal issue does not necessarily appear on every pass. I'd run it for about 4 hours.
Archived
This topic is now archived and is closed to further replies.