bblicke1 Posted September 18, 2024 Posted September 18, 2024 Greetings! I know there are a handful of threads on this topic, but after reading through many of them I haven't been able to find a solution to my problem. I had a 3TB drive go down the other day and it seemed as if perhaps there were some loose cables or something because I rebooted and it came right back. Well, it went down again and when I got it to reappear it was being treated as if it was a new drive requiring a rebuild. I attempted to rebuild it and the speeds were like 2-3 mb/sec so I started looking into it. I found some tuning settings and attempted to increase those, but to no avail. Shortly after more looking around and testing, the very next drive in the array went down. Luckily I have 2 parity drives so my data is still getting emulated. I went to Micro Center today and grabbed 2 brand new 8TB drives. Replaced both of the disabled disks in the array and started in maintenance mode. I went to sync and the rebuild is now running between 400-700 kb/sec. Then the second disk got disabled again. I cancelled the rebuild and stopped the array, removed the 2nd disabled disk and tried to spin everything up in maintenance mode and just do the single disk. Same result... slow as can be. I already replaced the cable last night, not sure what else to do now. Diags attached. Thanks in advance for any/all help! tower-diagnostics-20240918-1737.zip Quote
bblicke1 Posted September 18, 2024 Author Posted September 18, 2024 I'd also note that while I was getting such slow speeds the other night (2-3mb/s) unraid only seemed to be using one CPU core at a time almost as if on rotation...not sure what common behavior is as this is my first time having to rebuild. Quote
Gragorg Posted September 19, 2024 Posted September 19, 2024 Looks like you have a cabling issue on ATA6 Quote
bblicke1 Posted September 19, 2024 Author Posted September 19, 2024 (edited) Hm, I swapped the cable out and tested again, perhaps it's an issue with the drive bay... I'll have to take a look. Did you see anything pertaining to disk 7 and why that failed/disabled? I looked through and searched ATA6 and saw all the resetting of the interface, but didn't seem to find anything I could easily understand regarding disk 7. I see a bunch of write errors to different sectors, but we're talking about a brand new HDD? One other question, I assume that ATA 6 = Disk 6, then what are ATA 43 or ATA 44? Thanks! -Edit - Nevermind, looks like ATA43 = Disk 7, so they don't actually correlate I guess. Edited September 19, 2024 by bblicke1 Quote
bblicke1 Posted September 19, 2024 Author Posted September 19, 2024 I just tried to run a rebuild of disk 7 (new 8TB replacing 3TB) after removing disk 6 from the array. I've attached the diag for that as well, it failed and disables which is slightly different behavior from disk 6. It also throws errors and "bad sector" errors which disk 6 doesn't seem to do. It can't be a coincidence that 2 neighboring disks randomly begin having issues at the same time after months and months of issue free operation, then their replacements have issues too. tower-diagnostics-20240919-0950.zip Quote
JorgeB Posted September 19, 2024 Posted September 19, 2024 Sep 18 17:26:47 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300) Sep 18 17:26:52 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) ### [PREVIOUS LINE REPEATED 1 TIMES] ### Sep 18 17:27:09 Tower kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 18 17:27:10 Tower kernel: ata6.00: configured for UDMA/133 Sep 18 17:27:10 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300) Sep 18 17:27:11 Tower kernel: ata43: SATA link down (SStatus 0 SControl 300) Sep 18 17:27:16 Tower kernel: ata6: link is slow to respond, please be patient (ready=0) Sep 18 17:27:17 Tower kernel: ata43: SATA link down (SStatus 0 SControl 300) Sep 18 17:27:17 Tower kernel: ata43: limiting SATA link speed to <unknown> Sep 18 17:27:22 Tower kernel: ata43: SATA link down (SStatus 0 SControl 3F0) Sep 18 17:27:22 Tower kernel: ata43.00: disable device Sep 18 17:27:22 Tower kernel: ata43.00: detaching (SCSI 43:0:0:0) Issues with multiple disks, and disk7 ended up dropping offline, this is usually a power/connection issue. Quote
bblicke1 Posted September 20, 2024 Author Posted September 20, 2024 I've ordered new drive trays, since apparently it wasn't the cable I replaced yesterday. I'll report back tomorrow and hopefully the issue will be resolved. Quote
bblicke1 Posted September 20, 2024 Author Posted September 20, 2024 Alright, so I've replaced all of the trayless hdd adapters that I'd been using to make swapping drives easier. That means new cables and new adapters. ATA 6 still showing the same behavior. The only thing I can think of now is that the motherboard interface is bad? The other disabled drive is properly rebuilding now at ~120mb/s. Question about this particular drive rebuild.. I've unassigned disk 6 (which is now being emulated) and am only rebuilding disk 7. Disk 6 was a 3TB as was disk 7 originally. Since I've now replaced disk 7 with an 8TB drive, will it rebuild just the 3TB that used to be on disk 7 and continue to emulate the 3TB from disk 6 or will it attempt to write the 3TB from disk 7 as well as the 3TB from disk 6 to the new 8TB disk 7? Quote
JorgeB Posted September 20, 2024 Posted September 20, 2024 57 minutes ago, bblicke1 said: ATA 6 still showing the same behavior. Swap that disk with a different one and see where the issue follows. Quote
JorgeB Posted September 20, 2024 Posted September 20, 2024 58 minutes ago, bblicke1 said: will it rebuild just the 3TB that used to be on disk 7 This, rest of the capacity will be available but empty. Quote
bblicke1 Posted September 20, 2024 Author Posted September 20, 2024 (edited) 5 hours ago, JorgeB said: Swap that disk with a different one and see where the issue follows. So I'm all out of empty drive bays. After this rebuild completes can I swap around 2 disks without affecting the array/parity? Note for clarity - the disk on ATA6 that is having problems is a brand new disk that is replacing the previous disk that began having problems. I've swapped cables and now trays neither with success. Edited September 20, 2024 by bblicke1 Clarity Quote
JonathanM Posted September 20, 2024 Posted September 20, 2024 10 minutes ago, bblicke1 said: So I'm all out of drive bays. After this rebuild completes can I swap around 2 disks without affecting the array/parity? As long as all the drives are still connected, parity or unraid doesn't care which physical bay / connector is used. It tracks the drives by serial number. The logical slots in the Unraid GUI won't be effected. Quote
bblicke1 Posted September 21, 2024 Author Posted September 21, 2024 Thats extremely helpful to know. I was so worried about mixing cables and ports, etc that everything has taken me so much longer to do lol. I'll report back. Quote
bblicke1 Posted September 21, 2024 Author Posted September 21, 2024 So it would seem as though the issue lies with the port on the board... Which stinks. I'm all out of ports but would like to get that disk 6 drive rebuilt, can I pull disk 7 that's now on that ATA6 so it can rebuild disk 6 from emulated disk 7 data and then pop disk 7 back in without a need to rebuild it again once I have another functioning port to plug into? Quote
JorgeB Posted September 21, 2024 Posted September 21, 2024 Since you have dual parity you can have two disable disks, but that disk would need to be rebuilt again. Quote
bblicke1 Posted September 23, 2024 Author Posted September 23, 2024 (edited) Ok, so... update... I ordered and installed an LSI HBA 9300-16i and am now exclusively using the HBA (with forward breakouts) to connect all 14 of my HDDs. At first boot I noticed it seems to take a bit longer to POST, but consistently does so. Once up and running everything seemed great, I added disk 6 to the array and proceeded to start the array in maintenance mode. Shortly thereafter, disk 7 disabled with like 130 or so errors. I found that odd as it is a brand new disk that was successfully rebuilt just a couple days ago and had been running successfully through this evening. I decided to stop the array and reboot. After rebooting disk 7 continued to be disabled, but now disk 8 is missing... I checked to make sure I was running correct firmware, etc for the HBA and the following seemed to confirm that everything was right: Avago Technologies SAS3 Flash Utility Version 17.00.00.00 (2018.04.02) Copyright 2008-2018 Avago Technologies. All rights reserved. Adapter Selected is a Avago SAS: SAS3008(C0) Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr ---------------------------------------------------------------------------- 0 SAS3008(C0) 16.00.10.00 0e.01.00.03 08.37.00.00 00:03:00:00 1 SAS3008(C0) 16.00.10.00 0e.01.00.03 08.37.00.00 00:05:00:00 Finished Processing Commands Successfully. Exiting SAS3Flash. I also pulled specifics on each controller: Avago Technologies SAS3 Flash Utility Version 17.00.00.00 (2018.04.02) Copyright 2008-2018 Avago Technologies. All rights reserved. Adapter Selected is a Avago SAS: SAS3008(C0) Controller Number : 0 Controller : SAS3008(C0) PCI Address : 00:03:00:00 SAS Address : 500062b-2-03bb-3380 NVDATA Version (Default) : 0e.01.00.03 NVDATA Version (Persistent) : 0e.01.00.07 Firmware Product ID : 0x2221 (IT) Firmware Version : 16.00.10.00 NVDATA Vendor : LSI NVDATA Product ID : SAS9300-16i BIOS Version : 08.37.00.00 UEFI BSD Version : 18.00.00.00 FCODE Version : N/A Board Name : SAS9300-16i Board Assembly : 03-25600-01B Board Tracer Number : SP81712990 Finished Processing Commands Successfully. Exiting SAS3Flash. Avago Technologies SAS3 Flash Utility Version 17.00.00.00 (2018.04.02) Copyright 2008-2018 Avago Technologies. All rights reserved. Adapter Selected is a Avago SAS: SAS3008(C0) Controller Number : 1 Controller : SAS3008(C0) PCI Address : 00:05:00:00 SAS Address : 500062b-2-03bb-3b00 NVDATA Version (Default) : 0e.01.00.03 NVDATA Version (Persistent) : 0e.01.00.07 Firmware Product ID : 0x2221 (IT) Firmware Version : 16.00.10.00 NVDATA Vendor : LSI NVDATA Product ID : SAS9300-16i BIOS Version : 08.37.00.00 UEFI BSD Version : 18.00.00.00 FCODE Version : N/A Board Name : SAS9300-16i Board Assembly : 03-25600-01B Board Tracer Number : SP81712990 Finished Processing Commands Successfully. Exiting SAS3Flash. I'm not entirely sure what could be causing these issues. At first ATA6 seemed to be having problems resulting in disk 6 getting disabled, then disk 7 went down. I rebuilt disk 7, installed HBA, spun up, assigned a new disk to disk 6 to rebuild and then disk 7 disabled again and disk 8 went down.... I'm really worried that I've now exceeded my 2 down drive limit and am going to have significant data loss. I've attached the most recent diagnostics. tower-diagnostics-20240922-2325.zip Edited September 23, 2024 by bblicke1 Quote
bblicke1 Posted September 23, 2024 Author Posted September 23, 2024 Another update... I shut down and pulled all the drives from the enclosure and pushed them all back in. Disk 8 has magically reappeared... I again assigned disk 6 and disk 7 is now appearing to need another rebuild? This seems odd considering everything was fine with disk 7 before I installed the HBA. Considering disk 8 is back, I suppose I ought to just rebuild 6 and 7 now while I have the chance? I just worry that if disk 8 somehow goes "missing" again I'll be in trouble if the rebuild hasn't yet finished. Quote
bblicke1 Posted September 23, 2024 Author Posted September 23, 2024 (edited) I tried to rebuild 6 and 7, it stopped immediately, it said disk 6 threw write errors... disk 7 was showing up in unassigned (though still seems to be assigned) and once I stopped the array, disk 8 now shows missing again. What's strange about disk 8 is that it now shows in historical unassigned devices as "in standby." One other thing I noticed after stopping the array, disk 7 is no longer in unassigned and is simply disabled. tower-diagnostics-20240922-2358.zip Edited September 23, 2024 by bblicke1 Quote
Solution JorgeB Posted September 23, 2024 Solution Posted September 23, 2024 Looks like a power/connection issue. Quote
bblicke1 Posted September 23, 2024 Author Posted September 23, 2024 This morning I installed a spare 1000W PSU that I had and took the opportunity to reorganize all my cables, etc. Looks like Disk 8 is again present and now we're sort of where we were to begin with disabled disks 6 and 7 (despite having already rebuilt disk 7 a few days ago?) I spun up the array in maintenance mode, but there is no "Sync" button now? Also - for future reference, would you share what in the logs suggested to you that the issue was a power or connection issue? I skimmed through the logs, but clearly missed that as I was looking for disk specific errors/issues. Thanks again for all the help! Quote
JorgeB Posted September 23, 2024 Posted September 23, 2024 Post current diags after array start to see the status. Quote
bblicke1 Posted September 23, 2024 Author Posted September 23, 2024 Latest diagnostics attached tower-diagnostics-20240923-1112.zip Quote
JorgeB Posted September 23, 2024 Posted September 23, 2024 No errors so far, but the array was started in maintenance mode, so we can't see if the disabled disks are mounting, start in normal mode and check that, or post new diags, and if they are mounting and contents look correct try to rebuild again. Quote
bblicke1 Posted September 23, 2024 Author Posted September 23, 2024 The drives certainly appear to mount. I've also attached more diagnostics. I was under the impression that when you started up the array with 2 disks needing rebuilds, that they would automatically rebuild with btrfs? I had been starting it in maintenance mode to be able to click the sync button manually as opposed to having it automatically proceed, but now I'm not seeing the sync button when I start in either regular or maintenance mode. tower-diagnostics-20240923-1155.zip Quote
JorgeB Posted September 23, 2024 Posted September 23, 2024 1 hour ago, bblicke1 said: I was under the impression that when you started up the array with 2 disks needing rebuilds, that they would automatically rebuild with btrfs? Since the disks are disable, you need to initiate a rebuild: https://docs.unraid.net/unraid-os/manual/storage-management#rebuilding-a-drive-onto-itself Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.