Slow rebuild speeds

evakq8r · February 18, 2023

I've recently rebuilt my server to new architecture (custom server vs an old Dell Poweredge) and had a disk go into a disabled state during a reboot. I've started the rebuild process and the speeds fluctuate wildly from 1MB/s - 125MB/s. Server specs are in my sig.

The link speed of my LSI HBA is showing 8x, which as far as I know should be fine for a decent rebuild speed:

lspci -vv -s 31:00.0
31:00.0 RAID bus controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
        Subsystem: Fujitsu Technology Solutions HBA Ctrl SAS 6G 0/1 [D2607]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 72
        IOMMU group: 35
        Region 0: I/O ports at e000 [size=256]
        Region 1: Memory at c04c0000 (64-bit, non-prefetchable) [size=16K]
        Region 3: Memory at c0080000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at c0000000 [disabled] [size=512K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: No such device
                Not readable
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
                Vector table: BAR=1 offset=00002000
                PBA: BAR=1 offset=00003800
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [138 v1] Power Budgeting <?>
        Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 16, Total VFs: 16, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 0072
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000000c04c4000 (64-bit, non-prefetchable)
                Region 2: Memory at 00000000c00c0000 (64-bit, non-prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

Attached are diagnostics. I'm not exactly sure why the speeds fluctuate so heavily, so any help would be appreciated.

notyourunraid-diagnostics-20230218-1311.zip

Squid · February 18, 2023

You have what appears to be a docker container that is continually restarting.

The docker image is in the system share which is currently on the cache drive and disk 3. Impossible to tell from diagnostics where the docker.img file is on which of those two drives, but if its on disk 3 then what you're seeing would be expected.

Go to Shares, next to system hit calculate. If you see 100G (which may be overkill) on disk 3 then that's your issue. Stop the entire docker service in settings - docker and run mover from the main tab

evakq8r · February 18, 2023

1 hour ago, Squid said:

You have what appears to be a docker container that is continually restarting.

The docker image is in the system share which is currently on the cache drive and disk 3. Impossible to tell from diagnostics where the docker.img file is on which of those two drives, but if its on disk 3 then what you're seeing would be expected.

Go to Shares, next to system hit calculate. If you see 100G (which may be overkill) on disk 3 then that's your issue. Stop the entire docker service in settings - docker and run mover from the main tab

Looks like there was a docker container restarting (which I thought I resolved a month ago). Removed the entire docker container and image so that should no longer be a problem.

As for system, you're correct:

It seems when I built this system last week, I neglected to check how much space the cache files I already had used up, so can only assume system can't move to cache because there isn't enough space.

Mover is disabled in a data rebuild, so I guess I'll need to wait until it's finished? Or do you suggest cancelling the rebuild anyway and moving system to cache, then do the rebuild again? Rebuild is currently at 77.8% going at a reasonable speed:

image.png.da81a1fc2f6d306ede10c065f5000551.png

itimpi · February 18, 2023

3 hours ago, evakq8r said:

It seems when I built this system last week, I neglected to check how much space the cache files I already had used up, so can only assume system can't move to cache because there isn't enough space.

It would be normal for mover not to move files in the 'system' share if you have either the VM or Docker services enabled as they keep files open and mover cannot move open files. Those services should be disabled while trying to move files in the system share.

As was mentioned you also seem to have the docker image file configured to be 100GB which is far more than you should need. The default of 20GB is fine for most people. If you find that you are filling it then that means you almost certainly have a docker container misconfigured so that it is writing files internal to the image when they should be mapped to an external location on the host.

evakq8r · February 18, 2023

4 hours ago, itimpi said:

It would be normal for mover not to move files in the 'system' share if you have either the VM or Docker services enabled as they keep files open and mover cannot move open files.

This makes sense, however it was actually disabled because a parity operation was in place (as advised on the GUI).

4 hours ago, itimpi said:

As was mentioned you also seem to have the docker image file configured to be 100GB which is far more than you should need. The default of 20GB is fine for most people. If you find that you are filling it then that means you almost certainly have a docker container misconfigured so that it is writing files internal to the image when they should be mapped to an external location on the host.

I've stopped docker and VM, resized the image to 30GB and deleted the existing image, and now taking the (very painful) process of setting everything up again. Less than 20 docker containers in, and the 30GB image is already 70% full. Containers are all starting/not in a restart loop, so not sure why the image is already blown out. I did have about 75 containers running at once though, so I may fall into the 'not most people' category for docker image sizes.

evakq8r · February 18, 2023

Everything setup again, system is much more responsive. Docker image is now 75GB, 68% used. Will leave as is for now, but all containers are working as expected.

Thanks for your help @Squid.

Squid · February 18, 2023

1 hour ago, evakq8r said:

This makes sense, however it was actually disabled because a parity operation was in place (as advised on the GUI).

Actually a bug in the system and fixed next rev

evakq8r · February 18, 2023

7 minutes ago, Squid said:

Actually a bug in the system and fixed next rev

Oh! Good to know. Thanks!

Slow rebuild speeds

Recommended Posts

evakq8r

Link to comment

Squid

Link to comment

evakq8r

Link to comment

itimpi

Link to comment

evakq8r

Link to comment

evakq8r

Link to comment

Squid

Link to comment

evakq8r

Link to comment

Join the conversation