On Failed Drives, Failing Drives, and Parity.

October 10, 20214 yr

I've been in a long process of moving my data storage from Drobo to unRAID (you can see most of the saga in my other topics; there aren't many). Until recently, I had the following:

8Tb WDC_WD80EFAX-68KNBN0 (parity)
6Tb WDC_WD60EFAX-68JH4N0 (data; I planned to also have this be parity when all was finished)
2Tb WD2003FZEX (data)

The 6Tb and the 2Tb had all my data, and parity was valid. At that point, I also had 3x 3Tb WD drives and 1x 3Tb Seagate drive still operational in the Drobo, with all the data on them as well (that's where it all came from).

Things were working well, but I wanted dual parity drives, and I wanted to make use of all my 3Tb drives from the Drobo. However, I only had 6 SATA ports on my motherboard, so I bought what I thought was a good SATA expansion card, with 8 ports. I plugged all my Drobo drives into that card, and began the preclear process on them.

When preclear was done, I added them to the array, but one drive consistently seemed to get kicked out of the array... I would add it, but it would appear in Unassigned Devices when I started the array. Figuring this issue was due to drive failure rather than some sort of issue with the SATA expansion card, I didn't think much of it.

All seemed well with the remining 3, however, so I used unBalance to scatter (in move mode, argh!) my data from the 6Tb drive to all the other ones. That seemed to work well, but then more problems began to appear.

One drive (disk 5 in the array) seemed to intermittently error out. The data that had been moved to it was still available due to the parity drive most of the time, but at one point I noticed that a lot of files and folders were missing from the array, and they were all items that had been on that drive. Not sure why parity wasn't "picking up the slack" in that case.

At this point was when I posted the earlier diagnostics (in this thread), and was told that the "drives attached to [my SATA card] are continually resetting, probably severely impacting performance."

So that's when I shut everything down, and connected as many drives as I could directly to the motherboard. I had also added a 1Tb SSD as a cache drive at an earlier point, so I then had what you see in the attached diagnostics:

8Tb WDC_WD80EFAX-68KNBN0 (parity)
6Tb WDC_WD60EFAX-68JH4N0 (data, but largely empty)
2Tb WD2003FZEX (data)
3Tb WDC_WD30EZRX-00D8PB0 (data)
3Tb ST3000VN007 (data, but see below)
1Tb Samsung SSD 860 EVO (cache)

One thing to note is that the drive that was intermittently erroring out (Disk 5) is physically disconnected for now as I ran out of ports.

Now, another drive, the Seagate, has begun to report hundreds of thousands of read errors (currently 261,818), but without getting kicked from the array. The data on there is family photos going back about 2 decades, but it's also all backed up on Google Photos, so it's (ironically for family photos) not unrecoverable.

I'm currently waiting for another SATA card to arrive, hopefully this week. Then I'll be able to plug back in Disk 5, and hopefully it will be intact and working properly with a proper SATA card.

My questions for all you wizards out there are:

Am I doing things "right", or is there anything else I should be doing now?
Exactly what should I do once the replacement SATA card arrives? Is there a way to check if it's well-behaved before plugging drives into it?
Is there any way to "rebuild" the data that is on the missing Disk 5 from parity, so that it's "actually" stored on one of the other disks that isn't missing?

Thanks for any help!

cube-diagnostics-20211010-0843.zip

Edited October 10, 20214 yr by Sandwich
Added 3rd question at end

Quote

October 10, 20214 yr

Community Expert

Replace cables on disk5, start array and post new diags.

Quote

October 10, 20214 yr

Author

1 hour ago, JorgeB said:

Replace cables on disk5, start array and post new diags.

I don't have any free SATA ports at the moment. Should I wait until the SATA card arrives, or is it okay to, say, plug in Disk 5 instead of the cache drive temporarily?

Quote

October 10, 20214 yr

Community Expert

Sorry, meant to say disk4.

Quote

October 11, 20214 yr

Author

Attached.

cube-diagnostics-20211011-1144.zip

Quote

October 11, 20214 yr

Community Expert

There are still constant ATA errors on disk4, if you replaced the cables it's likely a disk problem, disabled disk5 can't be correctly emulated because of those errors.

Quote

October 11, 20214 yr

Author

Ok, thanks for taking the time to analyze things. What would happen if I replaced disk 4 with disk 5? The parity would then be emulating disk 4, and 5 would hopefully be available regularly, right?

Quote

October 11, 20214 yr

Community Expert

41 minutes ago, Sandwich said:

What would happen if I replaced disk 4 with disk 5? The parity would then be emulating disk 4, and 5 would hopefully be available regularly, right?

No, because disk5 is already disable, just connecting it won't enable it, it would need to be rebuilt, but since disk4 has issue it's not possible.

Quote

October 11, 20214 yr

Author

Ok, so when I get the SATA card and can connect everything back up, what should I do and not do? How do I recover as much data as possible?

Quote

October 11, 20214 yr

Community Expert

You can do a new config with disk5 to recover that, assuming the disk is OK, if disk4 is really failing no easy way to recover that, you can try ddrescue.

Quote

October 11, 20214 yr

Author

3 hours ago, JorgeB said:

You can do a new config with disk5 to recover that, assuming the disk is OK

When you say "new config", do you mean to effectively recreate the current array config, with the addition of Disk 5? Or create a new, separate, second array (is that possible?) with Disk 5 as the only assigned device?

Quote

October 11, 20214 yr

Community Expert

Current array with disk5 and without disk4.

Quote

October 18, 20214 yr

Author

Ok, so I have the SATA card, and it seems to at the very least allow unRAID to recognize drives plugged into it. The other card did that as well, so I'm not sure what that's worth.

In any case, I have a crucial question: I have TWO disks that were excluded from the array when I had to stop using the previous SATA card, and I'm not sure which one was "disk5". The only way I have to reliably identify them is by their manufacturer ID (eg. WD30EZRX-00DC0B0). Can you tell me which was disk5? If it helps, I'm attaching an earlier diagnostic from Sep 26th; I think I might have had all the disks attached at that point.

Alternately, if there's no way to tell what the manufacturer ID was for disk5, is there a way to browse the files on the disk without having to add it to the array? I'm pretty sure disk5 had lots of data, and disk6(?) was empty.

cube-diagnostics-20210926-2115.zip

Edited October 18, 20214 yr by Sandwich
Typo

Quote

October 18, 20214 yr

Community Expert

This was disk5 in the diags posted:

Sep 25 22:56:48 Cube kernel: md: import disk5: (sdh) WDC_WD30EZRX-00AZ6B0_WD-WMC070040204 size: 2930266532

Quote

October 18, 20214 yr

Author

Ahh, great, thanks. So I clicked to add that disk to the array as disk5, but the page seemed to refresh and now the disk has disappeared (see attachment—I swear it was there a moment ago!).

Fresh diagnostics posted.

cube-diagnostics-20211018-2029.zip

Quote

October 18, 20214 yr

Community Expert

Oct 18 19:19:02 Cube kernel: ata10: SATA link down (SStatus 0 SControl 300)
Oct 18 19:19:02 Cube kernel: ata10.00: disabled

Disk dropped offline, try replacing cables.

Quote

October 19, 20214 yr

Author

I've replaced cables, and now all drives seem to be getting recognized consistently. So I started the array and it began a parity rebuild, which finished "successfully". During that process, two drives reported nearly-identical numbers of read errors: WDC_WD30EZRX-00D8PB0_WD-WMC4N1244814 - 3 TB (sdf) with 732,487,239 errors (disk 3), and ST3000VN007-2E4166_Z730JMM1 - 3 TB (sdg) with 732,562,482 errors (disk 4).

Additionally, I do appear to have lost all the data on disks 3, 4, and 5, despite them still showing as partly/mostly full.

Finally, when I try to browse the filesystems of 3, 4, or 5, it just says "No listing: Too many files".

What's going on?

Fresh diags attached.

cube-diagnostics-20211019-1139.zip

Edited October 19, 20214 yr by Sandwich
disk clarification

Quote

October 19, 20214 yr

Community Expert

You're still having multiple ATA errors on multiple disks:

Oct 18 22:09:43 Cube kernel: ata10.00: status: { DRDY }
Oct 18 22:09:43 Cube kernel: ata10.00: failed command: WRITE FPDMA QUEUED
Oct 18 22:09:43 Cube kernel: ata10.00: cmd 61/40:88:08:58:00/05:00:00:00:00/40 tag 17 ncq dma 688128 out
Oct 18 22:09:43 Cube kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 18 22:09:43 Cube kernel: ata10.00: status: { DRDY }
Oct 18 22:09:43 Cube kernel: ata10.00: failed command: WRITE FPDMA QUEUED
Oct 18 22:09:43 Cube kernel: ata10.00: cmd 61/f8:90:48:5d:00/04:00:00:00:00/40 tag 18 ncq dma 651264 out
Oct 18 22:09:43 Cube kernel:         res 40/00:01:01:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 18 22:09:43 Cube kernel: ata10.00: status: { DRDY }
Oct 18 22:09:43 Cube kernel: ata10: hard resetting link
Oct 18 22:09:43 Cube ntpd[1794]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Oct 18 22:09:48 Cube kernel: ata10: link is slow to respond, please be patient (ready=0)
Oct 18 22:09:53 Cube kernel: ata10: COMRESET failed (errno=-16)
Oct 18 22:09:53 Cube kernel: ata10: hard resetting link
Oct 18 22:09:53 Cube kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 310)
Oct 18 22:09:53 Cube kernel: ata10.00: configured for UDMA/133
Oct 18 22:09:53 Cube kernel: ata10: EH complete
Oct 18 22:10:11 Cube kernel: ata8.00: READ LOG DMA EXT failed, trying PIO
Oct 18 22:10:11 Cube kernel: ata8: failed to read log page 10h (errno=-5)
Oct 18 22:10:11 Cube kernel: ata8.00: exception Emask 0x1 SAct 0xc02000 SErr 0x0 action 0x0
Oct 18 22:10:11 Cube kernel: ata8.00: irq_stat 0x40000001
Oct 18 22:10:11 Cube kernel: ata8.00: failed command: READ FPDMA QUEUED
Oct 18 22:10:11 Cube kernel: ata8.00: cmd 60/20:68:00:ee:4a/00:00:0a:00:00/40 tag 13 ncq dma 16384 in
Oct 18 22:10:11 Cube kernel:         res 51/04:b8:80:87:00/00:00:00:00:00/40 Emask

These are usually a power/connection problem, and the data should come back after a reboot.

Quote

October 19, 20214 yr

Author

Is there any indication if this is possibly a problem with the SATA card (again)?

Quote

October 19, 20214 yr

Community Expert

Could be, you can try swapping drives and cables with the onboard SATA.

Quote

October 19, 20214 yr

Author

I had rebooted the system to see if that brought the data back, but now none of those three drives were detected. -.-

Gonna see if they come back after plugging into the onboad SATA.

Quote

October 19, 20214 yr

Author

Well, I suspect disk 5 has fully bit the bullet. I hear those telltale electronic clicks coming from its drive bay, and now unRAID reports it as unmountable, despite it being attached directly to the MB.

cube-diagnostics-20211019-2140.zip

Quote

October 20, 20214 yr

Community Expert

If it's doing weird noises it's likely the disk, assuming power cable/slot was already replaced/swapped.

Quote

October 24, 20214 yr

Author

Ok, so if I have a drive that's no longer being recognized and its contents are being emulated, what's the best way to copy or move those contents to some other drive, so that they no longer need to be emulated?

Quote

October 25, 20214 yr

Community Expert

They will be emulated until you replace that drive, if you want to move the data then you can manually copy the data to other disk(s) (or use the unbalance plugin) then do a new config and re-sync parity without that disk.

Quote

On Failed Drives, Failing Drives, and Parity.

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)