Jump to content

One dead drive and parity sync errors


Recommended Posts

Hi !

 

Yesterday one of my drive has been disabled without me having the time to notice problems (it had been returning errors for about a week, according to the log ; unRAID disabled the disk avec 10k errors).

The first thing I did was to stop the machine, check the cables and restart the server. unRAID then tried to check the parity, found parity sync errors, and after a few hours the disk was disabled again (after exactly 1733 errors).

 

Here are the errors :

First, parity sync errors

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 144

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 152

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 160

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 168

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 176

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 184

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 192

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 200

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 208

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 216

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 224

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 232

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 240

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 248

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 256

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 264

Oct 26 13:00:01 Morveus-NAS kernel: md: parity incorrect: 272

 

Then "ata" errors :

Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: BMDMA2 stat 0x80d0209

Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: failed command: READ DMA EXT

Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: cmd 25/00:08:30:03:bc/00:00:23:00:00/e0 tag 0 dma 4096 in

Oct 26 17:02:08 Morveus-NAS kernel:          res 51/40:00:30:03:bc/00:00:23:00:00/10 Emask 0x9 (media error)

Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: status: { DRDY ERR }

Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: error: { UNC }

Oct 26 17:02:08 Morveus-NAS kernel: ata5.00: configured for UDMA/100

Oct 26 17:02:08 Morveus-NAS kernel: ata5: EH complete

Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: BMDMA2 stat 0x80d0209

Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: failed command: READ DMA EXT

Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: cmd 25/00:08:30:03:bc/00:00:23:00:00/e0 tag 0 dma 4096 in

Oct 26 17:02:11 Morveus-NAS kernel:          res 51/40:00:30:03:bc/00:00:23:00:00/10 Emask 0x9 (media error)

Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: status: { DRDY ERR }

Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: error: { UNC }

Oct 26 17:02:11 Morveus-NAS kernel: ata5.00: configured for UDMA/100

Oct 26 17:02:11 Morveus-NAS kernel: ata5: EH complete

 

And finally :

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65608/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65616/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65624/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65632/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65640/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65648/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65656/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65664/4, count: 1

Oct 27 06:00:52 Morveus-NAS kernel: md: disk4 write error

Oct 27 06:00:52 Morveus-NAS kernel: handle_stripe write error: 65672/4, count: 1

(hundreds of occurrences)

 

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Unhandled error code

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde]  Result: hostbyte=0x04 driverbyte=0x00

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] CDB: cdb[0]=0x2a: 2a 00 00 00 f7 a8 00 04 00 00

Oct 27 06:00:52 Morveus-NAS kernel: end_request: I/O error, dev sde, sector 63400

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Unhandled error code

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde]  Result: hostbyte=0x04 driverbyte=0x00

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] CDB: cdb[0]=0x2a: 2a 00 00 00 fb a8 00 04 00 00

Oct 27 06:00:52 Morveus-NAS kernel: end_request: I/O error, dev sde, sector 64424

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] Unhandled error code

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde]  Result: hostbyte=0x04 driverbyte=0x00

Oct 27 06:00:52 Morveus-NAS kernel: sd 4:0:0:0: [sde] CDB: cdb[0]=0x2a: 2a 00 00 00 ff a8 00 01 28 00

(a few occurrences)

 

I'm getting a new disk tomorrow, but I'm worried about those parity sync errors. From what I could understand, there are two possible scenarios:

 

1) It's the disk that has those errors, in that case when rebuilding on a new disk I won't loose data

2) It's the parity disk that has errors, in that case if I rebuild on a new disk, data corruption will ensue.

 

Is that correct ?

 

I'm pretty sure that the faulty disk isn't completely dead, and could be plugged into a computer to recover some files. In the event of scenario #2 happening in a few days (after rebuilding), how can I compare data from the "old" disk with data from the "new, rebuilt" disk and replace corrupt files if needed?

(I'm thinking "recursive MD5 check" for instance, but don't know where to start).

 

Thank you!

 

Edit : adding attachment "syslog"

syslog.txt

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...