Drives failing on every parity check

July 1, 20188 yr

Community Expert

Hello,

I've been toying with unRAID since April as a means to reduce some of the anxiety that came with storing all my data on my old WHS setup (all independent drives), every time a drive failed it took FOREVER (exaggerating only slightly) to restore from backups. However, it seems I can't run a parity check without one of my drives dropping offline or one of my unassigned devices throwing SMART errors. It runs the check, and so far, every time there's another drive with read errors.

So far since April, I've RMA'd 7 Toshiba drives - all in the 20k hour range, except 1 with less than 1000 - which is kind of a pain, since they issue a Visa "gift" card in exchange - for the full original value of the drive (which is nice) but it takes 2 months to receive.. Each of them were part of the array, and during a parity check began to have read errors and were dropped out of the array. At this point, not showing any SMART errors, I just assigned them backup duty to my WHS VM, until the next parity check... Then during the parity check (which they are not a part of, as they're unassigned devices) they begin showing pending and reallocated sectors. So, I pop them out and RMA them. Every time I've run the parity check, this happens.

This time, the parity check started as scheduled, and after a while one of my 4TB Seagates dropped out of the array with 770 read errors, it's a couple years old so I didn't think it odd, but my other 4 Seagate drives of the same vintage are all dead too so I didn't think it was too odd. So I popped over to Best Buy, bought an 8TB MyBook, shucked it (WD Red score!) and installed in the Seagate's spot. In my haste, I did not wait for the drive to initialize and mistakenly assigned one of the unassigned 8TB drives to the array and blew away a couple TBs of backups. Whoops. Annoying, but they're only local machine backups so no real loss. During the re-build, another Toshiba drive dropped out of the array with 768 errors.. This one is a few years old, but only around ~~10k~~ hours (*edit- 25k hours, I was looking at the wrong drive).

I have dual parity, and the re-build is continuing - my question is should I let it continue? Will it fully rebuild the first drive allowing me to then replace the second? Why is this such a pain - do I have unusually bad luck? Are Toshiba drives really THIS bad (unfortunately I still have 5 in the array)? I will say that the drives that have "failed" all allowed me to recover most all files from the drives, save for 1 here and there.

Here's my setup:

Gigabyte Z370XP mobo

Intel i7-8700

G.Skill Ripjaws 32GB DDR4 (8GB*4)

Thermaltake Toughpower Grand RGB 850W 80+ Gold

2x LSI 9207-8i (previously used 2x M115 in IT mode, same problems)

1x Intel RES2SV240 (dual linked to one 9207 above)

Norco 4224 (latest revision)

I'll note that after each of these mishaps, I've checked and re-checked the cables, replaced them, and replaced the controllers. I don't think it's a back plane as the drives causing issues have been in different slots on different back planes and it's a new slot each time.

I've attached the diagnostics from the first parity fail and the second, I hope someone can help me shed some light on this. I'm spending more time watching parity re-builds than watching the movies stored on it...

Thanks!

urserver-diagnostics-20180630-1934.zip

urserver-diagnostics-20180701-0808.zip

Edited July 1, 20188 yr by Michael_P

Quote

July 1, 20188 yr

Author
Community Expert

And now disk 9 is showing 8 pending sectors.... WD Red less than a month old.

Quote

July 1, 20188 yr

Author
Community Expert

Nobody knows if it's OK to let it continue to rebuild?

Bueller?

Quote

July 1, 20188 yr

7 hours ago, Michael_P said:

770 read errors

7 hours ago, Michael_P said:

768 errors

7 hours ago, Michael_P said:

Each of them were part of the array, and during a parity check began to have read errors and were dropped out of the array

7 hours ago, Michael_P said:

So far since April, I've RMA'd 7 Toshiba drives

What is your PS? What is your case? Hot swap bays? My first thought is that a ton of people have a million drives installed (as you do), and do not have any hot swap bays.

Sata connectors suck. Beyond suck... Locking connectors (if the drive actually supports them, which many do not anymore) doesn't matter. Truth be told, the slightest corrosion and or breathing on the connector affects its connection. Since it pretty much appears that every time you replace one drive, you have to shortly replace another, this *implies* that in the absence of hotswap bays you're affecting the the cabling and/or power to the other drives (btw, power splitters as a general rule are also all terrible quality).

7 hours ago, Michael_P said:

unassigned devices throwing SMART errors

What errors?

7 hours ago, Michael_P said:

dropped out of the array with 770 read errors

Read errors actually never cause a drive to drop out. What happens in the case of a read error, is that unRaid (rightfully) will attempt to re-write the offending sector with the calculated value from parity and the other drives. If that write fails, then the drive gets dropped.

Actual drive failures are exceedingly rare (although you do have a ton of WD's which I consider garbage compared to Seagate ) by and large, the majority of "failures" are cabling or power issues, and unless you're living on top of an old Indian gravesite, I'd say most if not all of your returns have been of good drives.

5 hours ago, Michael_P said:

And now disk 9 is showing 8 pending sectors

This is indeed a sign of a problem (or potential problem) drive. What were all the other drives showing on the SMART reports for them?

Quote

July 1, 20188 yr

Author
Community Expert

Hi!

The case and power supply I noted above, I started in a regular PC case with external enclosures, but moved to the norco after having problems completing the initial parity.

All cabling now is mini-SAS, and I've replaced them all too. And the controllers.

The SMART errors have all been pending or uncorrectable sectors.

I have so many WDs because the Toshibas have all gone to sh@t.

Quote

July 2, 20188 yr

Community Expert

13 hours ago, Michael_P said:

Nobody knows if it's OK to let it continue to rebuild?

Since you have dual parity it's OK, parity2 will be used to provide the correct data.

Quote

July 2, 20188 yr

Author
Community Expert

6 hours ago, johnnie.black said:

Since you have dual parity it's OK, parity2 will be used to provide the correct data.

Yes, thanks - it completed the rebuild of the Seagate disk, I purchased yet another WD 8TB drive, I'm getting pretty adept at shucking these things, and began a rebuild of both suspect drives - the red balled 5TB and the SMART error WD 8TB drive. Hopefully it will complete in the next few hours.

I am beginning to suspect that these Toshiba drives may be more susceptible to vibration than the others, being in a storage shelf probably isn't within their capability. The failure pattern has been the same for them:

Initial parity build and 1 drive was showing millions of read errors
- Manually move data off and remove drive from array
Attempt to build parity again another drive from the same serial # series (I bought 4 at once) showing millions of read errors
- Purchased 2 more WD 8TB drives
- Manually move data off and remove drive from the array
- run pre-clear on both drives showing millions of read errors, both passed, no SMART errors
- move 1 drive to my WHS VM, the other removed as a cold spare
Update firmware for the controllers, probably what was causing the read errors.
Parity build completes, parity checks scheduled for first of every month
2 weeks later, scheduled parity check starts - parity drive shows read errors (1 month old Toshiba)
- Purchase ANOTHER 8TB WD to take its place as parity
- Preclear checks the 8TB Toshiba, passes
Parity rebuild begins, Toshiba drive assigned to VM starts to show pending sectors. First 8, then 16 then rapidly up to 72. Drive is removed, cold spare takes its place
- RMA process started on this drive
Parity build begins again, second Toshiba drive begins to show pending sectors, 16 and then 32. Drive is removed and the 8TB toshiba takes its place.
- Drive is added to RMA
Parity build completes
Days, maybe a week later, another Toshiba drive from a different batch shows pending sectors
- Purchase another 8TB WD
- Drive is removed and added to the other RMA - all 3 are returned
Rebuild of parity completes
Purchased another 8TB WD to add as second parity
8TB Toshiba in VM begins to report pending sectors (just outside of return window, rats.)
- Begin RMA process

At some point, another Toshiba 5TB drive fell out of the array and was added to the WHS VM, I don't remember exactly when.

Scheduled parity begins, I just happen to be next to the server as it begins to run and here an intermittent drive noise - if you've heard a Seagate begin to fail then you know what I mean, almost like a scratching record. So I KNOW that a drive is going to fail.
- Seagate drive falls out of the array
- A short time later, the Toshiba drive attached to the VM begins reporting pending sectors
- Purchase another 8TB WD, screw up adding it to the array and wipe one attached to my VM for local machine backups.
- Add the correct one to the array, rebuild of the Seagate drive begins.
- At some point during the rebuild, another Toshiba drive falls out of the array - rebuild continues
- During the re-build, one of the 8TB WDs begins to show pending sectors, 16, but the count does not increase and the re-build completes successfully (?)
- Purchase ANOTHER 8TB WD, pull both the dropped Toshiba and the 8TB WD with the pending sectors and add in the 8TB WD I wiped by mistake and the newly purchased drive and begin re-build

Also, in the middle there somewhere I purchased 2 9207s to increase bandwidth.

The Seagate was more or less expected, 3 of its brothers met the same fate; with data loss each time. That's a big part of my decision to move to unRAID, re-ripping movies is a PITA. All of my major data (pictures and documents and such) are backed up, offsite so this is more frustrating than anything.

All of the Toshibas followed the same pattern, read errors followed shortly after with SMART failures. They all experienced very light duty over their lifetime, mostly sitting idle - that is until getting stressed by the parity checks. It's certainly possible that the parity checks revealed latent defects with the marginal drives, but to have so many go down in a short period of time certainly raises my eyebrow.

So, my theory is vibration, which when exposed to during a parity check when all drives are being accessed in close proximity, is simply too much for these Toshibas to bear.

Other pertinent, or not, information:

These were almost all their "High Reliability" NAS drives with the exception of 2 desktop performance drives I bought by mistake.

I do have one question: Why in the world would they begin show pending sectors while running a parity check if they were attached to the VM and not part of the pool? My WHS VM is very light duty, only daily PC backups for my windows machines. Strange.

Thoughts?

Edited July 2, 20188 yr by Michael_P

Quote

July 2, 20188 yr

Community Expert

That's a lot of failures...

I've been having good luck with Toshiba N300 drives (and X300 desktop drives), I've used them to replace WD Green/Blue drives that were failing a lot, I suspect from vibrations, so bough the NAS drives that are supposed to better handle these, though they are specified for 8 drive NAS, still no issues so far, but they are all relatively young.

Quote

July 2, 20188 yr

Author
Community Expert

9 minutes ago, johnnie.black said:

That's a lot of failures...

I've been having good luck with Toshiba N300 drives (and X300 desktop drives), I've used them to replace WD Green/Blue drives that were failing a lot, I suspect from vibrations, so bough the NAS drives that are supposed to better handle these, though they are specified for 8 drive NAS, still no issues so far, but they are all relatively young.

Yes, I replaced all my Greens with the Toshiba N300s and X300s, and one random MD04ACA500 (which was the latest casualty). 12 in all, over a period of 3 years. Only 4 remain in service.

The good thing, so far, is that they fail gracefully. I think I've only had 1 or 2 files unable to move from them due to errors.

WD is usually good for me too, in that regard. But I've NEVER had a Seagate fail gracefully, it just stops working or is so un-readable that the files take too long to move off the drive, which was normally how I discovered a problem - files no longer accessible .And, it only triggers SMART warnings when it happens so no advanced notice (I am partially to blame too, by not running enough scans proactively).

The biggest issue with the Toshiba drives, besides the time wasted, is their RMA process is too slow. I have to ship the drives back to them, at my expense, and then they issue a gift card in compensation two months later. So in the meantime, I have to buy another drive to replace it. 7 so far... Gets expensive.

Quote

Drives failing on every parity check

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)