December 1, 20169 yr Just had another drive fail a week ago - wondering what's going on here?! Well, let's look at it this one for the moment. I will double check sata, power, and controller card seating soon. Drive is disabled, red X through it. the first SMART report, nothing highlighted there. I followed these steps from JoeL. Stopped array, un-assigned drive, started array, stopped array, reassigned drive. It's currently being rebuilt from parity. Seems like a lot to do for a little red X - hope I didn't hose something like I did last time. Wow, this 3tb drive is going to take more than a day... ----------------------------------------------updates below--------------------------------- a more recent note about how to re-enable a drive
December 1, 20169 yr Author Yes, the parity drive has been up the whole time. Parity is in tact, but the drive being rebuilt is emulated of course. The re-build estimated finish is a day and a few hours, like tomorrow night midnight. 12/1/16 2359.
December 1, 20169 yr Hmmm, if you have a lot of important files not backed up, you may want to copy them to your local drive just to be safe. Let the drive rebuild and see what happens...sorry, flaky things are very annoying...
December 1, 20169 yr Community Expert It shouldn't take that long to rebuild a 3TB drive unless you have some sort of bottleneck with drive controllers or something. Post diagnostics.
December 1, 20169 yr Author Diagnostics attached. Reading is about 30MB/sec. dumbo-diagnostics-20161130-1845.zip
December 1, 20169 yr Community Expert Nothing unusual in syslog that I noticed. How long has the rebuild been going?
December 1, 20169 yr Author only a half hour or so. So it's estimating 33 hours. Not a problem. I'm shopping for hard drives... Side note, http://forre.st/storage was awesome, but recently stopped working. Any other ideas for shopping best values/GB besides hitting up amazon and newegg?
December 1, 20169 yr If you're shopping for drives, I HIGHLY recommend you review the stats from BackBlaze. Very significant research conducted in a review of their 10's of thousands of drives in operations for years. The guy is a data wonk and has done a quality job. Here's the link: https://www.backblaze.com/blog/hard-drive-failure-rates-q3-2016/ Their blog is chock full of really interesting topics. But the drive failure rates are gold...
December 1, 20169 yr Author the Parity re-write inexplicably sped up later on, and finished the 3TB drive in 13 hours. We are all good for the moment - and just in case, I've ordered a couple more hard drives, and am copying contents off /mnt/disk11 to an external drive. Until I can get this sorted out! Did "short SMART test" here are results http://hastebin.com/isuwakadit.sql. Does it look good enough to leave in the array for now? Should I do a Long SMART and post results?
December 2, 20169 yr The short SMART self-test only checks basic functions. The extended self-test takes much longer and is much more thorough.
December 2, 20169 yr Author Long SMART test http://hastebin.com/iwijaxobuq.sql But it looks really similar to the short SMART test from above. Hmm. ? Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 2342 - # 2 Short offline Completed without error 00% 2324 - # 3 Short offline Completed without error 00% 2306 -
December 2, 20169 yr ... and am copying contents off /mnt/disk11 to an external drive. Until I can get this sorted out! I interpret this to mean you do NOT have backups -- right? Without re-hashing why that's a bad idea, I will note that if you're not going to keep your data backed up, you should, as a minimum, upgrade the system to dual parity.
December 2, 20169 yr The short self-test just confirms that the electronics are essentially working. The extended self-test checks that every sector can be read. The way the results are presented are similar but the tests are very different. The former takes a few minutes; the latter several hours.
December 3, 20169 yr Author @gary I did a little poking around, but wasn't able to find a succinct explanation of dual parity. I guess it is have two mirrored parity drives so if the parity drive fails? Or does a dual party drive do even more? My array is mostly 3tb's, and I have another two 3tb HGST drives coming. I was planning on preclearing both to xfs, deploying one right away to replace possible drive failing (reports above) and having the second for a standby. Any thoughts there?
December 3, 20169 yr Community Expert @gary I did a little poking around, but wasn't able to find a succinct explanation of dual parity. I guess it is have two mirrored parity drives so if the parity drive fails? Or does a dual party drive do even more? My array is mostly 3tb's, and I have another two 3tb HGST drives coming. I was planning on preceeding both to xfs, deploying one right away to replace possible drive failing (reports above) and having the second for a standby. Any thoughts there? with dual parity the second parity uses a different algorithm to the first parity drive (I.e. It is not simply a mirror). With that in place unRAID can recover from having any two drives failing at the same time regardless of whether they are data or parity drives.
December 3, 20169 yr The mathematics of the 2nd parity calculation are more complex than the simple longitudinal parity used for the first parity calculation (which can be - and is - done via simple XOR instructions). But the math is really irrelevant -- what matters is that the 2nd parity drive provides a 2nd layer of fault tolerance => i.e. ANY two drives can fail and you won't lose any data. That does NOT, of course, mean you should ignore a drive failure and wait for a 2nd failure before replacing your failed drives !! What it means is that if a drive fails, and you get a replacement drive and do a rebuild, that a 2nd drive failure during that rebuild won't cause a problem with the rebuild -- it will still complete successfully. The simple fact is that with the very large sizes of today's drives, the bit error rates are such that it's moderately likely that you could indeed have a bit failure during a rebuild. Another factor is that many folks buy their drives in "bunches" -- so it's likely you have several drives of the same age ... and when one starts failing, another may not be far behind. Being able to sustain two failures without data loss is a MAJOR improvement in the reliability of the system, as long as you don't "push it" and wait for the 2nd failure before doing anything about a failed drive. This is the reason major data centers no longer use RAID-5 => they use RAID-6 instead, which is dual fault tolerant. As I noted earlier, regardless of the degree of fault tolerance your RAID array has, RAID is NOT a substitute for having your data backed up. But that's another topic ...
December 3, 20169 yr ... I was planning on preceeding both to xfs, deploying one right away to replace possible drive failing (reports above) and having the second for a standby. Any thoughts there? There's no real reason to change previous disks to XFS ... especially if they're essentially static (i.e. full of media files and rarely written to or modified) As for a standby drive -- not a bad idea; BUT if you don't have dual parity, I'd upgrade to that first. You can always overnight a new drive when one fails; and with dual parity your array will still be protected while you're waiting for it. [Of course a spare "on the shelf" is even better, since you can start the rebuild process immediately]
Archived
This topic is now archived and is closed to further replies.