Jump to content
We're Hiring! Full Stack Developer ×

How can I improve my parity check/data rebuild speed?


Recommended Posts

Hi guys,

 

I was wondering if there is anything I can do to improve my speed for the parity check/data rebuild.

My parity disk is a shucked WD Elements disk and all the other disks are 18tb HGST and 2 12tb WD Reds.

As you can see, with just the HGST disks, the data rebuild is currently at ~80 MB/s, and it won't go any higher. The Parity check is even slower at around 70 MB/s max (same speed when the mover (Samsung 990pro) is running).

 

They are all connected via an LSI 9300-16i Card, a few weeks back, 6 of them were connected to the mobo via SATA, but the speed overall didn't change.

 

Would it help, if I upgrade/change the parity disk to an 18tb HGST too? Is there a significant difference in speed, if all the disks are the same?

 

Is there even room for improvement in terms of speed, or do you guys have similar speeds for parity check/data rebuild?

 

Cheers

 

grafik.png.976b2f6ba9c7abee8f78cf001ca43e46.png

Link to comment

Parity disk do not only be the largest drives in the array, they should als be the fastest. Because ALL write (and parity check) operations depend on them.

 

A shucked WD Elements disk is nothing I would allow in my array and of course absolutely no choice for a parity drive.

 

My checks run with 270Mb/s (down to 140 Mb/s. Average 193Mb/s).

 

Link to comment
8 minutes ago, MAM59 said:

Parity disk do not only be the largest drives in the array, they should als be the fastest. Because ALL write (and parity check) operations depend on them.

 

A shucked WD Elements disk is nothing I would allow in my array and of course absolutely no choice for a parity drive.

 

My checks run with 270Mb/s (down to 140 Mb/s. Average 193Mb/s).

 

The WD Element was just lying around, and I spontaneously decided to use it as a parity drive for more safety besides all my offline backup drives. 270Mb/s is sick compared to my speed

Link to comment
1 hour ago, wulperdinger said:

The WD Element was just lying around

let her lie and rest in peace. For "Element" WD uses the slowest drives. They are meant for backup purposes. They even do not dare to sell them without the elements cover.

Do not like to be written very often and do not like to step around. The controller inside the elements box compensates a bit for this by doing intensive caching.

Without the box the drives are slow as a dog.

Therefore, to be able to "shuck" them is a questionable success.

 

A parity drive needs the absolute opposite optimizations.

You MAY use it within the array (but then, it will pull down parity check speed quite a bit too),

 

The other drives are slow too, but the parity disk marks the upper possible speed limit for all drives. If you have a faster one, it will be cut off by the parity.

 

Link to comment
1 hour ago, JorgeB said:

Just paste that in the terminal window, and post there the output.

 

 lspci -d 1000: -vv
08:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
        Subsystem: Broadcom / LSI SAS 9300-16i
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at 4000
        Region 1: Memory at 42140000 (64-bit, non-prefetchable)
        Region 3: Memory at 42100000 (64-bit, non-prefetchable)
        Expansion ROM at 42000000 [disabled]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend+
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [c0] MSI-X: Enable+ Count=96 Masked-
                Vector table: BAR=1 offset=0000e000
                PBA: BAR=1 offset=0000f000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [1e0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [1c0 v1] Power Budgeting <?>
        Capabilities: [190 v1] Dynamic Power Allocation <?>
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

0a:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
        Subsystem: Broadcom / LSI SAS 9300-16i
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 19
        Region 0: I/O ports at 3000
        Region 1: Memory at 42340000 (64-bit, non-prefetchable)
        Region 3: Memory at 42300000 (64-bit, non-prefetchable)
        Expansion ROM at 42200000 [disabled]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend+
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [c0] MSI-X: Enable+ Count=96 Masked-
                Vector table: BAR=1 offset=0000e000
                PBA: BAR=1 offset=0000f000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [1e0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [1c0 v1] Power Budgeting <?>
        Capabilities: [190 v1] Dynamic Power Allocation <?>
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

 

 

What information can you gather from that?

Link to comment
2 hours ago, wulperdinger said:

Do you have any recommendations for faster drives with a capacity of 18tb+?

"recommendations" is said too much. Before I used WD-Red-Pro drives (not the slow ones without Pro), but then I found Toshiba ones. Those are a bit faster and also a lot cheaper.

So far no Problems (run for 2 yrs now)

grafik.png.110c8228880d95d04ec16a3823fd9a25.png

But then, others may have other recommendations. As long as a drive runs with 7200rpm, has CMR (and not SMR!) technics and is "NAS-approved" it should work an be quite fast.

CMR ensures fast writing speed, "NAS-approved" means that it recognises vibrations from other disks in the same enclosure and compensates them.

Link to comment

Okay does anyone actually have a remedy with the WD drives? I have a similar issue with 52TB across 13 drives in mixed environment of 7200 and 5400 speeds. Is it possible to go around the parity/data rebuild from a failed drive? Can these checks be done separately and what method of order if so.

Link to comment
On 5/22/2024 at 4:36 PM, JorgeB said:

HBA is not downgraded, you can run the diskspeed container test to see if all disks are performing normally.

Should I downgrade the HBA software?

 

Here is a ss of my disk speeds

benchmark-speeds.thumb.png.c7fa098033262299c462e79c117fa659.png

 

I switched out the parity disk, but the parity check speed is still very slow

1.PNG.99b158af84df912edbd80ca6d24f5933.PNG

 

2.PNG.bc84937a19d87ad604100f69012a138c.PNG

Link to comment
On 5/23/2024 at 5:30 AM, roberth58 said:

You are running a SAS3 controller, use SAS3 drives for your parity. Right now you have a ferrari controller hooked up to moped parity drives.

I switched the parity drive but the speed is still very slow (see pictures above)

Link to comment

Did you check that you have no others access to disks?

It may seem a stupid question but a few weeks ago the same kind of thing append to me and the culprit was my time machine backup which running at the same time.

Parity rebuild was stuck at low speed (12MB/S). Speed went up few minutes after i cancelled tm backup.

Link to comment
11 hours ago, caplam said:

Did you check that you have no others access to disks?

It may seem a stupid question but a few weeks ago the same kind of thing append to me and the culprit was my time machine backup which running at the same time.

Parity rebuild was stuck at low speed (12MB/S). Speed went up few minutes after i cancelled tm backup.

I only have binhex crusader, diskspeed and plex, all turned off when parity check is running

Link to comment
Posted (edited)
13 hours ago, JorgeB said:

Post the test results, looks like this:

took me a few minutes to find how to do that

controller-benchmark2.thumb.PNG.0ed891713266bea97d4b0e0adc77930a.PNGcontroller-benchmark1.thumb.PNG.a13224265eba2edbe162e3e9323d5faf.PNG

 

it kinda makes sense now, somehow. Is there a reason for the low avg. speed with all drives active?

 

edit: preclear is running atm on sdc , while the benchmark test was done

Edited by wulperdinger
Link to comment
Posted (edited)
8 minutes ago, JorgeB said:

It shows the problem, but doesn't really make sense, the disks are connected directly to the HBA right? No enclosure/expander?

 

preclear is running atm on sdc (started it long after I started the parity check), but even if it isn't running, the speed never exceeds 60-80mbit

 

all connected via the HBA only

 

Edited by wulperdinger
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...