Morveus Posted October 27, 2014 Share Posted October 27, 2014 Hi ! Yesterday one of my drive has been disabled without me having the time to notice problems (it had been returning errors for about a week, according to the log ; unRAID disabled the disk avec 10k errors). The first thing I did was to stop the machine, check the cables and restart the server. unRAID then tried to check the parity, found parity sync errors, and after a few hours the disk was disabled again (after exactly 1733 errors). Here are the errors : First, parity sync errors Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 144 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 152 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 160 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 168 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 176 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 184 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 192 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 200 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 208 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 216 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 224 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 232 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 240 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 248 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 256 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 264 Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 272 Then "ata" errors : Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: BMDMA2 stat 0x80d0209 Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: failed command: READ DMA EXT Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: cmd 25/00:08:30:03:bc/00:00:23:00:00/e0 tag 0 dma 4096 in Oct 26 17:02:08 Morveus-NAS kernel: res 51/40:00:30:03:bc/00:00:23:00:00/10 Emask 0x9 (media error) Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: status: { DRDY ERR } Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: error: { UNC } Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: configured for UDMA/100 Oct 26 17:02:08 Morveus-NAS kernel: ata5: EH complete Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: BMDMA2 stat 0x80d0209 Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: failed command: READ DMA EXT Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: cmd 25/00:08:30:03:bc/00:00:23:00:00/e0 tag 0 dma 4096 in Oct 26 17:02:11 Morveus-NAS kernel: res 51/40:00:30:03:bc/00:00:23:00:00/10 Emask 0x9 (media error) Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: status: { DRDY ERR } Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: error: { UNC } Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: configured for UDMA/100 Oct 26 17:02:11 Morveus-NAS kernel: ata5: EH complete And finally : Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65608/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65616/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65624/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65632/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65640/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65648/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65656/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65664/4, count: 1 Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65672/4, count: 1 (hundreds of occurrences) Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Unhandled error code Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] CDB: cdb[0]=0x2a: 2a 00 00 00 f7 a8 00 04 00 00 Oct 27 06:00:52 Morveus-NAS kernel: end_request: I/O error, dev sde, sector 63400 Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Unhandled error code Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] CDB: cdb[0]=0x2a: 2a 00 00 00 fb a8 00 04 00 00 Oct 27 06:00:52 Morveus-NAS kernel: end_request: I/O error, dev sde, sector 64424 Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Unhandled error code Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] CDB: cdb[0]=0x2a: 2a 00 00 00 ff a8 00 01 28 00 (a few occurrences) I'm getting a new disk tomorrow, but I'm worried about those parity sync errors. From what I could understand, there are two possible scenarios: 1) It's the disk that has those errors, in that case when rebuilding on a new disk I won't loose data 2) It's the parity disk that has errors, in that case if I rebuild on a new disk, data corruption will ensue. Is that correct ? I'm pretty sure that the faulty disk isn't completely dead, and could be plugged into a computer to recover some files. In the event of scenario #2 happening in a few days (after rebuilding), how can I compare data from the "old" disk with data from the "new, rebuilt" disk and replace corrupt files if needed? (I'm thinking "recursive MD5 check" for instance, but don't know where to start). Thank you! Edit : adding attachment "syslog" syslog.txt Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.