Jump to content

Millions of errors on data disk during parity rebuild


Recommended Posts

Posted

I'm running version 4.7.

I recently had to take 3 disks out of my array because they failed at the same time. I figured I could live with losing 1/7 (5TB) of my data, and could eventually replace most of it via recovery tools, reripping etc.

So I was rebuilding parity with my remaining 17 disks, the largest of are 2 TB. Just at 73.5%, one of the 1.5TB drives began to accumulate errors and writes to parity have stopped it seems. It's currently at 74.5% and the 'main' display of unraid seems to think that parity build is continuing as normal. Here's an excerpt from the syslog, which is mostly repeats of what I've included:

 

Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546072/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546080/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546088/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546096/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546104/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546112/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546120/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546128/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546136/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546144/3, count: 1
Dec 8 20:58:22 MediaServer kernel: md: disk5 read error
Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546152/3, count: 1
Dec 8 20:58:22 MediaServer kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Dec 8 20:58:22 MediaServer kernel: ata1.00: device reported invalid CHS sector 0
Dec 8 20:58:22 MediaServer kernel: ata1: status=0x41 { DriveReady Error }
Dec 8 20:58:22 MediaServer kernel: ata1: error=0x04 { DriveStatusError }

 

The affected disk, md5, is also showing 0C as its temperature. I've never had a problem with disk 5.

 

OT: What I have had lots of problems with is WD Green drives. All the bad drives I've had in the past two years, except for firmware issues with Samsung F4 and a Seagate, have been WD drives. I must have had about 8 bad ones.

Posted

Now that the 75% mark has been passed (over the 1.5TB point), the errors have stopped and parity is being written to again as expected. Any solutions?

Should I replace the drive after parity is built, and recover most of the data that way. Then see what I can get out of the drive, from the 18-20 GB that were not able to be read during parity build, by other means?

Posted

Well it seems that the drive is now redballed. No SMART data. No idea what went wrong with it, but that's the fourth WD drive to redball in one week. I guess I'll just remove it, see what data I can recover off it and add back to the other array disks. For sure I cannot rebuild this disk properly because at least 3.3% of it wasn't able to be read when building parity, and if I'm right that could mean any number of files were affected if they're written in a fragmented way, right?

Posted

You have (if I remember right) a server with port multipliers and external boxes. You know that the power supply in these enclosures can become bad, or a cable loose...

And I suspect something is wrong with your configuration if you have seen so many bad WD disks.

Posted

You have (if I remember right) a server with port multipliers and external boxes. You know that the power supply in these enclosures can become bad, or a cable loose...

And I suspect something is wrong with your configuration if you have seen so many bad WD disks.

 

The external boxes have been great. It's the PITA internal drives that have been a bugger and dying frequently. I'm looking forward to my 4224 becoming operational soon!

Posted

I would agree that something seems off to have that many internal drive failures. I wouldn't trust that setup with more drives and I would test those drives in another setup.

 

Peter

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...