Millions of errors on data disk during parity rebuild

December 9, 201114 yr

I'm running version 4.7.

I recently had to take 3 disks out of my array because they failed at the same time. I figured I could live with losing 1/7 (5TB) of my data, and could eventually replace most of it via recovery tools, reripping etc.

So I was rebuilding parity with my remaining 17 disks, the largest of are 2 TB. Just at 73.5%, one of the 1.5TB drives began to accumulate errors and writes to parity have stopped it seems. It's currently at 74.5% and the 'main' display of unraid seems to think that parity build is continuing as normal. Here's an excerpt from the syslog, which is mostly repeats of what I've included:

Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546072/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546080/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546088/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546096/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546104/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546112/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546120/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546128/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546136/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546144/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546152/3, count: 1
Dec 8 20:58:22 MediaServer kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Dec 8 20:58:22 MediaServer kernel: ata1.00: device reported invalid CHS sector 0
Dec 8 20:58:22 MediaServer kernel: ata1: status=0x41 { DriveReady Error }
Dec 8 20:58:22 MediaServer kernel: ata1: error=0x04 { DriveStatusError }

The affected disk, md5, is also showing 0C as its temperature. I've never had a problem with disk 5.

OT: What I have had lots of problems with is WD Green drives. All the bad drives I've had in the past two years, except for firmware issues with Samsung F4 and a Seagate, have been WD drives. I must have had about 8 bad ones.

Quote

December 9, 201114 yr

Author

Now that the 75% mark has been passed (over the 1.5TB point), the errors have stopped and parity is being written to again as expected. Any solutions?

Should I replace the drive after parity is built, and recover most of the data that way. Then see what I can get out of the drive, from the 18-20 GB that were not able to be read during parity build, by other means?

Quote

December 9, 201114 yr

Author

Any recommendations on how to proceed are welcomed.

Quote

December 9, 201114 yr

provide the info requested here - http://lime-technology.com/forum/index.php?topic=9880.0

Quote

December 9, 201114 yr

Author

provide the info requested here - http://lime-technology.com/forum/index.php?topic=9880.0

I'm really sorry - somehow some information got lost from my original post, above the code excerpt from the syslog. I must have accidentally deleted it during an edit. I'll fix it.

Quote

December 9, 201114 yr

Author

Well it seems that the drive is now redballed. No SMART data. No idea what went wrong with it, but that's the fourth WD drive to redball in one week. I guess I'll just remove it, see what data I can recover off it and add back to the other array disks. For sure I cannot rebuild this disk properly because at least 3.3% of it wasn't able to be read when building parity, and if I'm right that could mean any number of files were affected if they're written in a fragmented way, right?

Quote

December 9, 201114 yr

You have (if I remember right) a server with port multipliers and external boxes. You know that the power supply in these enclosures can become bad, or a cable loose...

And I suspect something is wrong with your configuration if you have seen so many bad WD disks.

Quote

December 9, 201114 yr

Author

You have (if I remember right) a server with port multipliers and external boxes. You know that the power supply in these enclosures can become bad, or a cable loose...

And I suspect something is wrong with your configuration if you have seen so many bad WD disks.

The external boxes have been great. It's the PITA internal drives that have been a bugger and dying frequently. I'm looking forward to my 4224 becoming operational soon!

Quote

December 9, 201114 yr

I would agree that something seems off to have that many internal drive failures. I wouldn't trust that setup with more drives and I would test those drives in another setup.

Peter

Quote

Millions of errors on data disk during parity rebuild

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)