May 4, 201115 yr Hi All, Just wondering if someone can help me. I'm runnning unRAID 5.0beta6a on an AMD MicroServer, and I've just hit a problem when running a parity check. The server has been installed and running for over a month now, and I've seen no errors in the syslog, and the initial parity build went smoothly. I figured it was time to run another parity check, and the GUI tells me that it corrected 944 parity errors. Looking in the syslog, I can see: May 3 20:48:36 MicroServer kernel: mdcmd (22): check NOCORRECT (unRAID engine) May 3 20:48:36 MicroServer kernel: md: recovery thread woken up ... (unRAID engine) May 3 20:48:36 MicroServer kernel: md: recovery thread checking parity... (unRAID engine) May 3 20:48:36 MicroServer kernel: md: using 1152k window, over a total of 1953514552 blocks. (unRAID engine) This would seem to indicate that the check should not have corrected the parity, but it definitely looks like it has. I also see numerous errors like this: May 3 21:04:07 MicroServer kernel: md: parity incorrect: 130170024 (Errors) May 3 21:04:07 MicroServer kernel: md: parity incorrect: 130170032 (Errors) May 3 21:04:07 MicroServer kernel: md: parity incorrect: 130170040 (Errors) May 3 21:04:07 MicroServer kernel: md: parity incorrect: 130170048 (Errors) May 3 21:04:07 MicroServer kernel: md: parity incorrect: 130170056 (Errors) May 3 21:04:07 MicroServer kernel: md: parity incorrect: 130170064 (Errors) Followed by several instances of: May 3 21:04:11 MicroServer kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) May 3 21:04:11 MicroServer kernel: ata6.00: irq_stat 0x40000001 (Drive related) May 3 21:04:11 MicroServer kernel: ata6.00: failed command: READ DMA EXT (Minor Issues) May 3 21:04:11 MicroServer kernel: ata6.00: cmd 25/00:78:90:57:c2/00:02:07:00:00/e0 tag 0 dma 323584 in (Drive related) May 3 21:04:11 MicroServer kernel: res 51/40:00:37:58:c2/00:00:07:00:00/00 Emask 0x9 (media error) (Errors) May 3 21:04:11 MicroServer kernel: ata6.00: status: { DRDY ERR } (Drive related) May 3 21:04:11 MicroServer kernel: ata6.00: error: { UNC } (Errors) May 3 21:04:11 MicroServer kernel: ata6.00: configured for UDMA/133 (Drive related) May 3 21:04:11 MicroServer kernel: ata6: EH complete (Drive related) The smartctl output for the drives shows multiple instances of: Error 6 occurred at disk power-on lifetime: 3097 hours (129 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 37 58 c2 07 Error: UNC at LBA = 0x07c25837 = 130177079 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 78 90 57 c2 e7 00 8d+08:56:23.486 READ DMA EXT 27 00 00 00 00 00 e0 00 8d+08:56:23.485 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 8d+08:56:23.484 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 8d+08:56:23.484 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 8d+08:56:23.460 READ NATIVE MAX ADDRESS EXT I've not seen any other errors, so it would be good to get some input on possible causes. I'm beginning to suspect either the cable or the StarTech drive bay. Any help would be much appreciated. I've attached the syslog and smartctl output to the thread. Andy. syslog-parity-problems.txt smartreport-disk0.txt
May 4, 201115 yr Author I've replaced the cable now, just in case, but would still be interested in hearing any thoughts on possible causes. Thanks, Andy.
May 4, 201115 yr Author I've just started another parity check, and the syslog shows these errors now: May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170024 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170032 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170040 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170048 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170056 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170064 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170072 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170080 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170088 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170096 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170104 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170112 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170120 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170128 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170136 (Errors) They are exactly the same as previously, which seems to indicate that the errors weren't corrected in the previous run, despite the GUI saying that they were. No low level disk errors this time, so I think I might be safe to run the parity check with correct option now. What do folks think?
May 4, 201115 yr I've just started another parity check, and the syslog shows these errors now: May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170024 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170032 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170040 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170048 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170056 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170064 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170072 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170080 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170088 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170096 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170104 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170112 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170120 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170128 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170136 (Errors) They are exactly the same as previously, which seems to indicate that the errors weren't corrected in the previous run, despite the GUI saying that they were. No low level disk errors this time, so I think I might be safe to run the parity check with correct option now. What do folks think? 5.X runs a no correct parity check. You will have to run a correcting parity check manually from the command line.
May 4, 201115 yr I've just started another parity check, and the syslog shows these errors now: May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170024 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170032 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170040 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170048 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170056 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170064 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170072 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170080 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170088 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170096 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170104 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170112 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170120 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170128 (Errors) May 4 16:30:28 MicroServer kernel: md: parity incorrect: 130170136 (Errors) They are exactly the same as previously, which seems to indicate that the errors weren't corrected in the previous run, despite the GUI saying that they were. No low level disk errors this time, so I think I might be safe to run the parity check with correct option now. What do folks think? 5.X runs a no correct parity check. You will have to run a correcting parity check manually from the command line. or check the box on the web-interface asking they be corrected.
May 7, 201115 yr Author This turned out to be a bad molex -> SATA power Y-splitter. Neither the parity drive or the cache drive were getting decent power. Replaced it, and have now had several parity checks run without error.
Archived
This topic is now archived and is closed to further replies.