fitbrit Posted December 9, 2011 Posted December 9, 2011 I'm running version 4.7. I recently had to take 3 disks out of my array because they failed at the same time. I figured I could live with losing 1/7 (5TB) of my data, and could eventually replace most of it via recovery tools, reripping etc. So I was rebuilding parity with my remaining 17 disks, the largest of are 2 TB. Just at 73.5%, one of the 1.5TB drives began to accumulate errors and writes to parity have stopped it seems. It's currently at 74.5% and the 'main' display of unraid seems to think that parity build is continuing as normal. Here's an excerpt from the syslog, which is mostly repeats of what I've included: Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546072/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546080/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546088/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546096/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546104/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546112/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546120/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546128/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546136/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546144/3, count: 1 Dec 8 20:58:22 MediaServer kernel: md: disk5 read error Dec 8 20:58:22 MediaServer kernel: handle_stripe read error: 2909546152/3, count: 1 Dec 8 20:58:22 MediaServer kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 Dec 8 20:58:22 MediaServer kernel: ata1.00: device reported invalid CHS sector 0 Dec 8 20:58:22 MediaServer kernel: ata1: status=0x41 { DriveReady Error } Dec 8 20:58:22 MediaServer kernel: ata1: error=0x04 { DriveStatusError } The affected disk, md5, is also showing 0C as its temperature. I've never had a problem with disk 5. OT: What I have had lots of problems with is WD Green drives. All the bad drives I've had in the past two years, except for firmware issues with Samsung F4 and a Seagate, have been WD drives. I must have had about 8 bad ones.
fitbrit Posted December 9, 2011 Author Posted December 9, 2011 Now that the 75% mark has been passed (over the 1.5TB point), the errors have stopped and parity is being written to again as expected. Any solutions? Should I replace the drive after parity is built, and recover most of the data that way. Then see what I can get out of the drive, from the 18-20 GB that were not able to be read during parity build, by other means?
fitbrit Posted December 9, 2011 Author Posted December 9, 2011 Any recommendations on how to proceed are welcomed.
lionelhutz Posted December 9, 2011 Posted December 9, 2011 provide the info requested here - http://lime-technology.com/forum/index.php?topic=9880.0
fitbrit Posted December 9, 2011 Author Posted December 9, 2011 provide the info requested here - http://lime-technology.com/forum/index.php?topic=9880.0 I'm really sorry - somehow some information got lost from my original post, above the code excerpt from the syslog. I must have accidentally deleted it during an edit. I'll fix it.
fitbrit Posted December 9, 2011 Author Posted December 9, 2011 Well it seems that the drive is now redballed. No SMART data. No idea what went wrong with it, but that's the fourth WD drive to redball in one week. I guess I'll just remove it, see what data I can recover off it and add back to the other array disks. For sure I cannot rebuild this disk properly because at least 3.3% of it wasn't able to be read when building parity, and if I'm right that could mean any number of files were affected if they're written in a fragmented way, right?
bcbgboy13 Posted December 9, 2011 Posted December 9, 2011 You have (if I remember right) a server with port multipliers and external boxes. You know that the power supply in these enclosures can become bad, or a cable loose... And I suspect something is wrong with your configuration if you have seen so many bad WD disks.
fitbrit Posted December 9, 2011 Author Posted December 9, 2011 You have (if I remember right) a server with port multipliers and external boxes. You know that the power supply in these enclosures can become bad, or a cable loose... And I suspect something is wrong with your configuration if you have seen so many bad WD disks. The external boxes have been great. It's the PITA internal drives that have been a bugger and dying frequently. I'm looking forward to my 4224 becoming operational soon!
lionelhutz Posted December 9, 2011 Posted December 9, 2011 I would agree that something seems off to have that many internal drive failures. I wouldn't trust that setup with more drives and I would test those drives in another setup. Peter
Recommended Posts
Archived
This topic is now archived and is closed to further replies.