Jump to content

Extremely Slow Drive Rebuild


bblicke1
Go to solution Solved by JorgeB,

Recommended Posts

Posted

Greetings! 

 

I know there are a handful of threads on this topic, but after reading through many of them I haven't been able to find a solution to my problem.

 

I had a 3TB drive go down the other day and it seemed as if perhaps there were some loose cables or something because I rebooted and it came right back. Well, it went down again and when I got it to reappear it was being treated as if it was a new drive requiring a rebuild. I attempted to rebuild it and the speeds were like 2-3 mb/sec so I started looking into it. I found some tuning settings and attempted to increase those, but to no avail. Shortly after more looking around and testing, the very next drive in the array went down. Luckily I have 2 parity drives so my data is still getting emulated.

 

I went to Micro Center today and grabbed 2 brand new 8TB drives. Replaced both of the disabled disks in the array and started in maintenance mode. I went to sync and the rebuild is now running between 400-700 kb/sec. Then the second disk got disabled again. I cancelled the rebuild and stopped the array, removed the 2nd disabled disk and tried to spin everything up in maintenance mode and just do the single disk. Same result... slow as can be.

 

I already replaced the cable last night, not sure what else to do now.

 

Diags attached. Thanks in advance for any/all help!

tower-diagnostics-20240918-1737.zip

Posted

I'd also note that while I was getting such slow speeds the other night (2-3mb/s) unraid only seemed to be using one CPU core at a time almost as if on rotation...not sure what common behavior is as this is my first time having to rebuild.

Posted (edited)

Hm, I swapped the cable out and tested again, perhaps it's an issue with the drive bay... I'll have to take a look.

 

Did you see anything pertaining to disk 7 and why that failed/disabled? I looked through and searched ATA6 and saw all the resetting of the interface, but didn't seem to find anything I could easily understand regarding disk 7. I see a bunch of write errors to different sectors, but we're talking about a brand new HDD?

 

One other question, I assume that ATA 6 = Disk 6, then what are ATA 43 or ATA 44?

 

Thanks!

 

-Edit - Nevermind, looks like ATA43 = Disk 7, so they don't actually correlate I guess.

Edited by bblicke1
Posted

I just tried to run a rebuild of disk 7 (new 8TB replacing 3TB) after removing disk 6 from the array.

 

I've attached the diag for that as well, it failed and disables which is slightly different behavior from disk 6. It also throws errors and "bad sector" errors which disk 6 doesn't seem to do.

 

It can't be a coincidence that 2 neighboring disks randomly begin having issues at the same time after months and months of issue free operation, then their replacements have issues too.

tower-diagnostics-20240919-0950.zip

Posted
Sep 18 17:26:47 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Sep 18 17:26:52 Tower kernel: ata6: link is slow to respond, please be patient (ready=0)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Sep 18 17:27:09 Tower kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 18 17:27:10 Tower kernel: ata6.00: configured for UDMA/133
Sep 18 17:27:10 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Sep 18 17:27:11 Tower kernel: ata43: SATA link down (SStatus 0 SControl 300)
Sep 18 17:27:16 Tower kernel: ata6: link is slow to respond, please be patient (ready=0)
Sep 18 17:27:17 Tower kernel: ata43: SATA link down (SStatus 0 SControl 300)
Sep 18 17:27:17 Tower kernel: ata43: limiting SATA link speed to <unknown>
Sep 18 17:27:22 Tower kernel: ata43: SATA link down (SStatus 0 SControl 3F0)
Sep 18 17:27:22 Tower kernel: ata43.00: disable device
Sep 18 17:27:22 Tower kernel: ata43.00: detaching (SCSI 43:0:0:0)

 

Issues with multiple disks, and disk7 ended up dropping offline, this is usually a power/connection issue.

Posted

Alright, so I've replaced all of the trayless hdd adapters that I'd been using to make swapping drives easier. That means new cables and new adapters. ATA 6 still showing the same behavior. The only thing I can think of now is that the motherboard interface is bad?

 

The other disabled drive is properly rebuilding now at ~120mb/s. Question about this particular drive rebuild..

 

I've unassigned disk 6 (which is now being emulated) and am only rebuilding disk 7. Disk 6 was a 3TB as was disk 7 originally. Since I've now replaced disk 7 with an 8TB drive, will it rebuild just the 3TB that used to be on disk 7 and continue to emulate the 3TB from disk 6 or will it attempt to write the 3TB from disk 7 as well as the 3TB from disk 6 to the new 8TB disk 7?

Posted (edited)
5 hours ago, JorgeB said:

Swap that disk with a different one and see where the issue follows.

So I'm all out of empty drive bays. After this rebuild completes can I swap around 2 disks without affecting the array/parity?

 

Note for clarity - the disk on ATA6 that is having problems is a brand new disk that is replacing the previous disk that began having problems. I've swapped cables and now trays neither with success.

Edited by bblicke1
Clarity
Posted
10 minutes ago, bblicke1 said:

So I'm all out of drive bays. After this rebuild completes can I swap around 2 disks without affecting the array/parity?

As long as all the drives are still connected, parity or unraid doesn't care which physical bay / connector is used. It tracks the drives by serial number.

 

The logical slots in the Unraid GUI won't be effected.

Posted

So it would seem as though the issue lies with the port on the board... Which stinks.

 

I'm all out of ports but would like to get that disk 6 drive rebuilt, can I pull disk 7 that's now on that ATA6 so it can rebuild disk 6 from emulated disk 7 data and then pop disk 7 back in without a need to rebuild it again once I have another functioning port to plug into?

Posted (edited)

Ok, so... update...

 

I ordered and installed an LSI HBA 9300-16i and am now exclusively using the HBA (with forward breakouts) to connect all 14 of my HDDs.

 

At first boot I noticed it seems to take a bit longer to POST, but consistently does so. Once up and running everything seemed great, I added disk 6 to the array and proceeded to start the array in maintenance mode. Shortly thereafter, disk 7 disabled with like 130 or so errors. I found that odd as it is a brand new disk that was successfully rebuilt just a couple days ago and had been running successfully through this evening. I decided to stop the array and reboot. 

 

After rebooting disk 7 continued to be disabled, but now disk 8 is missing...

 

I checked to make sure I was running correct firmware, etc for the HBA and the following seemed to confirm that everything was right:

Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02) 
Copyright 2008-2018 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS3008(C0)  16.00.10.00    0e.01.00.03    08.37.00.00     00:03:00:00
1  SAS3008(C0)  16.00.10.00    0e.01.00.03    08.37.00.00     00:05:00:00

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.

 

I also pulled specifics on each controller:

Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02) 
Copyright 2008-2018 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

        Controller Number              : 0
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:03:00:00
        SAS Address                    : 500062b-2-03bb-3380
        NVDATA Version (Default)       : 0e.01.00.03
        NVDATA Version (Persistent)    : 0e.01.00.07
        Firmware Product ID            : 0x2221 (IT)
        Firmware Version               : 16.00.10.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9300-16i
        BIOS Version                   : 08.37.00.00
        UEFI BSD Version               : 18.00.00.00
        FCODE Version                  : N/A
        Board Name                     : SAS9300-16i
        Board Assembly                 : 03-25600-01B
        Board Tracer Number            : SP81712990

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.
Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02) 
Copyright 2008-2018 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

        Controller Number              : 1
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:05:00:00
        SAS Address                    : 500062b-2-03bb-3b00
        NVDATA Version (Default)       : 0e.01.00.03
        NVDATA Version (Persistent)    : 0e.01.00.07
        Firmware Product ID            : 0x2221 (IT)
        Firmware Version               : 16.00.10.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9300-16i
        BIOS Version                   : 08.37.00.00
        UEFI BSD Version               : 18.00.00.00
        FCODE Version                  : N/A
        Board Name                     : SAS9300-16i
        Board Assembly                 : 03-25600-01B
        Board Tracer Number            : SP81712990

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.

 

I'm not entirely sure what could be causing these issues. At first ATA6 seemed to be having problems resulting in disk 6 getting disabled, then disk 7 went down. I rebuilt disk 7, installed HBA, spun up, assigned a new disk to disk 6 to rebuild and then disk 7 disabled again and disk 8 went down.... I'm really worried that I've now exceeded my 2 down drive limit and am going to have significant data loss.

 

I've attached the most recent diagnostics.

tower-diagnostics-20240922-2325.zip

Edited by bblicke1
Posted

Another update... I shut down and pulled all the drives from the enclosure and pushed them all back in. Disk 8 has magically reappeared... 

 

I again assigned disk 6 and disk 7 is now appearing to need another rebuild? This seems odd considering everything was fine with disk 7 before I installed the HBA.

 

Considering disk 8 is back, I suppose I ought to just rebuild 6 and 7 now while I have the chance? I just worry that if disk 8 somehow goes "missing" again I'll be in trouble if the rebuild hasn't yet finished.

Posted (edited)

I tried to rebuild 6 and 7, it stopped immediately, it said disk 6 threw write errors... disk 7 was showing up in unassigned (though still seems to be assigned) and once I stopped the array, disk 8 now shows missing again. What's strange about disk 8 is that it now shows in historical unassigned devices as "in standby." 

 

One other thing I noticed after stopping the array, disk 7 is no longer in unassigned and is simply disabled.

tower-diagnostics-20240922-2358.zip

Edited by bblicke1
Posted

This morning I installed a spare 1000W PSU that I had and took the opportunity to reorganize all my cables, etc.

 

Looks like Disk 8 is again present and now we're sort of where we were to begin with disabled disks 6 and 7 (despite having already rebuilt disk 7 a few days ago?)

 

I spun up the array in maintenance mode, but there is no "Sync" button now?

 

Also - for future reference, would you share what in the logs suggested to you that the issue was a power or connection issue? I skimmed through the logs, but clearly missed that as I was looking for disk specific errors/issues.

 

Thanks again for all the help!

Posted

No errors so far, but the array was started in maintenance mode, so we can't see if the disabled disks are mounting, start in normal mode and check that, or post new diags, and if they are mounting and contents look correct try to rebuild again.

Posted

The drives certainly appear to mount.

 

image.thumb.png.0b3fdab5e57e8e02fe8e51b36e0f2952.png

 

I've also attached more diagnostics.

 

I was under the impression that when you started up the array with 2 disks needing rebuilds, that they would automatically rebuild with btrfs? I had been starting it in maintenance mode to be able to click the sync button manually as opposed to having it automatically proceed, but now I'm not seeing the sync button when I start in either regular or maintenance mode.

tower-diagnostics-20240923-1155.zip

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...