Sandwich Posted October 10, 2021 Share Posted October 10, 2021 (edited) I've been in a long process of moving my data storage from Drobo to unRAID (you can see most of the saga in my other topics; there aren't many). Until recently, I had the following: 8Tb WDC_WD80EFAX-68KNBN0 (parity) 6Tb WDC_WD60EFAX-68JH4N0 (data; I planned to also have this be parity when all was finished) 2Tb WD2003FZEX (data) The 6Tb and the 2Tb had all my data, and parity was valid. At that point, I also had 3x 3Tb WD drives and 1x 3Tb Seagate drive still operational in the Drobo, with all the data on them as well (that's where it all came from). Things were working well, but I wanted dual parity drives, and I wanted to make use of all my 3Tb drives from the Drobo. However, I only had 6 SATA ports on my motherboard, so I bought what I thought was a good SATA expansion card, with 8 ports. I plugged all my Drobo drives into that card, and began the preclear process on them. When preclear was done, I added them to the array, but one drive consistently seemed to get kicked out of the array... I would add it, but it would appear in Unassigned Devices when I started the array. Figuring this issue was due to drive failure rather than some sort of issue with the SATA expansion card, I didn't think much of it. All seemed well with the remining 3, however, so I used unBalance to scatter (in move mode, argh!) my data from the 6Tb drive to all the other ones. That seemed to work well, but then more problems began to appear. One drive (disk 5 in the array) seemed to intermittently error out. The data that had been moved to it was still available due to the parity drive most of the time, but at one point I noticed that a lot of files and folders were missing from the array, and they were all items that had been on that drive. Not sure why parity wasn't "picking up the slack" in that case. At this point was when I posted the earlier diagnostics (in this thread), and was told that the "drives attached to [my SATA card] are continually resetting, probably severely impacting performance." So that's when I shut everything down, and connected as many drives as I could directly to the motherboard. I had also added a 1Tb SSD as a cache drive at an earlier point, so I then had what you see in the attached diagnostics: 8Tb WDC_WD80EFAX-68KNBN0 (parity) 6Tb WDC_WD60EFAX-68JH4N0 (data, but largely empty) 2Tb WD2003FZEX (data) 3Tb WDC_WD30EZRX-00D8PB0 (data) 3Tb ST3000VN007 (data, but see below) 1Tb Samsung SSD 860 EVO (cache) One thing to note is that the drive that was intermittently erroring out (Disk 5) is physically disconnected for now as I ran out of ports. Now, another drive, the Seagate, has begun to report hundreds of thousands of read errors (currently 261,818), but without getting kicked from the array. The data on there is family photos going back about 2 decades, but it's also all backed up on Google Photos, so it's (ironically for family photos) not unrecoverable. I'm currently waiting for another SATA card to arrive, hopefully this week. Then I'll be able to plug back in Disk 5, and hopefully it will be intact and working properly with a proper SATA card. My questions for all you wizards out there are: Am I doing things "right", or is there anything else I should be doing now? Exactly what should I do once the replacement SATA card arrives? Is there a way to check if it's well-behaved before plugging drives into it? Is there any way to "rebuild" the data that is on the missing Disk 5 from parity, so that it's "actually" stored on one of the other disks that isn't missing? Thanks for any help! cube-diagnostics-20211010-0843.zip Edited October 10, 2021 by Sandwich Added 3rd question at end Quote Link to comment
JorgeB Posted October 10, 2021 Share Posted October 10, 2021 Replace cables on disk5, start array and post new diags. Quote Link to comment
Sandwich Posted October 10, 2021 Author Share Posted October 10, 2021 1 hour ago, JorgeB said: Replace cables on disk5, start array and post new diags. I don't have any free SATA ports at the moment. Should I wait until the SATA card arrives, or is it okay to, say, plug in Disk 5 instead of the cache drive temporarily? Quote Link to comment
JorgeB Posted October 10, 2021 Share Posted October 10, 2021 Sorry, meant to say disk4. Quote Link to comment
Sandwich Posted October 11, 2021 Author Share Posted October 11, 2021 Attached. cube-diagnostics-20211011-1144.zip Quote Link to comment
JorgeB Posted October 11, 2021 Share Posted October 11, 2021 There are still constant ATA errors on disk4, if you replaced the cables it's likely a disk problem, disabled disk5 can't be correctly emulated because of those errors. Quote Link to comment
Sandwich Posted October 11, 2021 Author Share Posted October 11, 2021 Ok, thanks for taking the time to analyze things. What would happen if I replaced disk 4 with disk 5? The parity would then be emulating disk 4, and 5 would hopefully be available regularly, right? Quote Link to comment
JorgeB Posted October 11, 2021 Share Posted October 11, 2021 41 minutes ago, Sandwich said: What would happen if I replaced disk 4 with disk 5? The parity would then be emulating disk 4, and 5 would hopefully be available regularly, right? No, because disk5 is already disable, just connecting it won't enable it, it would need to be rebuilt, but since disk4 has issue it's not possible. Quote Link to comment
Sandwich Posted October 11, 2021 Author Share Posted October 11, 2021 Ok, so when I get the SATA card and can connect everything back up, what should I do and not do? How do I recover as much data as possible? Quote Link to comment
JorgeB Posted October 11, 2021 Share Posted October 11, 2021 You can do a new config with disk5 to recover that, assuming the disk is OK, if disk4 is really failing no easy way to recover that, you can try ddrescue. Quote Link to comment
Sandwich Posted October 11, 2021 Author Share Posted October 11, 2021 3 hours ago, JorgeB said: You can do a new config with disk5 to recover that, assuming the disk is OK When you say "new config", do you mean to effectively recreate the current array config, with the addition of Disk 5? Or create a new, separate, second array (is that possible?) with Disk 5 as the only assigned device? Quote Link to comment
JorgeB Posted October 11, 2021 Share Posted October 11, 2021 Current array with disk5 and without disk4. Quote Link to comment
Sandwich Posted October 18, 2021 Author Share Posted October 18, 2021 (edited) Ok, so I have the SATA card, and it seems to at the very least allow unRAID to recognize drives plugged into it. The other card did that as well, so I'm not sure what that's worth. In any case, I have a crucial question: I have TWO disks that were excluded from the array when I had to stop using the previous SATA card, and I'm not sure which one was "disk5". The only way I have to reliably identify them is by their manufacturer ID (eg. WD30EZRX-00DC0B0). Can you tell me which was disk5? If it helps, I'm attaching an earlier diagnostic from Sep 26th; I think I might have had all the disks attached at that point. Alternately, if there's no way to tell what the manufacturer ID was for disk5, is there a way to browse the files on the disk without having to add it to the array? I'm pretty sure disk5 had lots of data, and disk6(?) was empty. cube-diagnostics-20210926-2115.zip Edited October 18, 2021 by Sandwich Typo Quote Link to comment
JorgeB Posted October 18, 2021 Share Posted October 18, 2021 This was disk5 in the diags posted: Sep 25 22:56:48 Cube kernel: md: import disk5: (sdh) WDC_WD30EZRX-00AZ6B0_WD-WMC070040204 size: 2930266532 Quote Link to comment
Sandwich Posted October 18, 2021 Author Share Posted October 18, 2021 Ahh, great, thanks. So I clicked to add that disk to the array as disk5, but the page seemed to refresh and now the disk has disappeared (see attachment—I swear it was there a moment ago!). Fresh diagnostics posted. cube-diagnostics-20211018-2029.zip Quote Link to comment
JorgeB Posted October 18, 2021 Share Posted October 18, 2021 Oct 18 19:19:02 Cube kernel: ata10: SATA link down (SStatus 0 SControl 300) Oct 18 19:19:02 Cube kernel: ata10.00: disabled Disk dropped offline, try replacing cables. Quote Link to comment
Sandwich Posted October 19, 2021 Author Share Posted October 19, 2021 (edited) I've replaced cables, and now all drives seem to be getting recognized consistently. So I started the array and it began a parity rebuild, which finished "successfully". During that process, two drives reported nearly-identical numbers of read errors: WDC_WD30EZRX-00D8PB0_WD-WMC4N1244814 - 3 TB (sdf) with 732,487,239 errors (disk 3), and ST3000VN007-2E4166_Z730JMM1 - 3 TB (sdg) with 732,562,482 errors (disk 4). Additionally, I do appear to have lost all the data on disks 3, 4, and 5, despite them still showing as partly/mostly full. Finally, when I try to browse the filesystems of 3, 4, or 5, it just says "No listing: Too many files". What's going on? Fresh diags attached. cube-diagnostics-20211019-1139.zip Edited October 19, 2021 by Sandwich disk clarification Quote Link to comment
JorgeB Posted October 19, 2021 Share Posted October 19, 2021 You're still having multiple ATA errors on multiple disks: Oct 18 22:09:43 Cube kernel: ata10.00: status: { DRDY } Oct 18 22:09:43 Cube kernel: ata10.00: failed command: WRITE FPDMA QUEUED Oct 18 22:09:43 Cube kernel: ata10.00: cmd 61/40:88:08:58:00/05:00:00:00:00/40 tag 17 ncq dma 688128 out Oct 18 22:09:43 Cube kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 18 22:09:43 Cube kernel: ata10.00: status: { DRDY } Oct 18 22:09:43 Cube kernel: ata10.00: failed command: WRITE FPDMA QUEUED Oct 18 22:09:43 Cube kernel: ata10.00: cmd 61/f8:90:48:5d:00/04:00:00:00:00/40 tag 18 ncq dma 651264 out Oct 18 22:09:43 Cube kernel: res 40/00:01:01:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 18 22:09:43 Cube kernel: ata10.00: status: { DRDY } Oct 18 22:09:43 Cube kernel: ata10: hard resetting link Oct 18 22:09:43 Cube ntpd[1794]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Oct 18 22:09:48 Cube kernel: ata10: link is slow to respond, please be patient (ready=0) Oct 18 22:09:53 Cube kernel: ata10: COMRESET failed (errno=-16) Oct 18 22:09:53 Cube kernel: ata10: hard resetting link Oct 18 22:09:53 Cube kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 310) Oct 18 22:09:53 Cube kernel: ata10.00: configured for UDMA/133 Oct 18 22:09:53 Cube kernel: ata10: EH complete Oct 18 22:10:11 Cube kernel: ata8.00: READ LOG DMA EXT failed, trying PIO Oct 18 22:10:11 Cube kernel: ata8: failed to read log page 10h (errno=-5) Oct 18 22:10:11 Cube kernel: ata8.00: exception Emask 0x1 SAct 0xc02000 SErr 0x0 action 0x0 Oct 18 22:10:11 Cube kernel: ata8.00: irq_stat 0x40000001 Oct 18 22:10:11 Cube kernel: ata8.00: failed command: READ FPDMA QUEUED Oct 18 22:10:11 Cube kernel: ata8.00: cmd 60/20:68:00:ee:4a/00:00:0a:00:00/40 tag 13 ncq dma 16384 in Oct 18 22:10:11 Cube kernel: res 51/04:b8:80:87:00/00:00:00:00:00/40 Emask These are usually a power/connection problem, and the data should come back after a reboot. Quote Link to comment
Sandwich Posted October 19, 2021 Author Share Posted October 19, 2021 Is there any indication if this is possibly a problem with the SATA card (again)? Quote Link to comment
JorgeB Posted October 19, 2021 Share Posted October 19, 2021 Could be, you can try swapping drives and cables with the onboard SATA. Quote Link to comment
Sandwich Posted October 19, 2021 Author Share Posted October 19, 2021 I had rebooted the system to see if that brought the data back, but now none of those three drives were detected. -.- Gonna see if they come back after plugging into the onboad SATA. Quote Link to comment
Sandwich Posted October 19, 2021 Author Share Posted October 19, 2021 Well, I suspect disk 5 has fully bit the bullet. I hear those telltale electronic clicks coming from its drive bay, and now unRAID reports it as unmountable, despite it being attached directly to the MB. cube-diagnostics-20211019-2140.zip Quote Link to comment
JorgeB Posted October 20, 2021 Share Posted October 20, 2021 If it's doing weird noises it's likely the disk, assuming power cable/slot was already replaced/swapped. Quote Link to comment
Sandwich Posted October 24, 2021 Author Share Posted October 24, 2021 Ok, so if I have a drive that's no longer being recognized and its contents are being emulated, what's the best way to copy or move those contents to some other drive, so that they no longer need to be emulated? Quote Link to comment
JorgeB Posted October 25, 2021 Share Posted October 25, 2021 They will be emulated until you replace that drive, if you want to move the data then you can manually copy the data to other disk(s) (or use the unbalance plugin) then do a new config and re-sync parity without that disk. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.