Jump to content

Slow Parity Sync - 2-8mb/s


Recommended Posts

Hey, trying to throw a parity disk into array but just cant get over how slow it is, estimates going from 40-600days, tried a couple of things, tried some more, stopped docker and vms, not doing any smb to it. It's a fairly big array mind you but surely it can do better than that. one thing of note is I've got ssd and nvmes in main array but if anything I'd expect them to be last ones to cause issues, left it to run for almost 2 days, got 1-2% done, stopped it and tried again same results, did a disk check on all drives, no visible issues, all disks have at least 150mb/s at start of disks and slowly fall off to around 100, the new parity performing best of the lot as well and ssds nvmes doing over 1gb within cache.

 

cache moved just in case, no other reads/writes on array, I've some old plugins but they shouldnt interfere i dont think, tried playing with tunables to marginal or no improvements (left it at default in the end).

 

Attached diags if anyone could help me out please.

 

image.png

 

Edited by Mizerka
Link to comment

that could've been unbalance, they're crap spare wd blue ssds 25/26 so was moving data out of them and take them out of array but then unbalance would happily do 60-100mb/s on reads on them so wasnt the disk fault dont think.

 

But thats gone now as far as I can tell, main tab view was showing 0 activity before I started this latest one. left it to run overnight, and got 1.7% done. started up all my normal docker stuff, 0 impact if anything its gone better overnight with docker enabled avg now 5mb/s.

 

it feels like something is holding it back, I know its meant to be slowest disk reads affecting but they're all good, i mean at least 100mb/s across platter good, not 3-5.

 

Maybe worth noting I recently upgraded hardware and went from x9 dual xeons to x11 rome 7551p, maybe its some legacy shit that came from older platform to current install? would it be worth trying to downgrade os and see if it fixes some/any components?

image.png

Link to comment
Posted (edited)

Okay, back to vm and docker off, removed unbalance and some other plugins I've not used in ages and removed my smb mount. still seeing same speeds, can you confirm where you're seeing the disk traffic in diags. cant think of anything else that might be touching the disk, any cmd I can run to spit out use logs maybe? could unraid connect be doing it? I doubt it but running out of things I guess.

 

attached post change. 6mb/s still.

 

looking at whats actually on 26, there's only appdata share data on there.

 

Edited by Mizerka
Link to comment

I was seeing it on loads.txt, but there are no extra reads on the latest diags, so the limit may be caused by another thing, like a slow disk, doesn't look like a controller bottleneck, it's too low for that, but post the output of:

 

lspci -d 1000: -vv

 

Link to comment
~# lspci -d 1000: -vv
25:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)
        Subsystem: Broadcom / LSI 9201-16e 6Gb/s SAS/SATA PCIe x8 External HBA
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 428
        NUMA node: 1
        IOMMU group: 52
        Region 0: I/O ports at 4000 [size=256]
        Region 1: Memory at eb79c000 (64-bit, non-prefetchable) [size=16K]
        Region 3: Memory at eb740000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at eb500000 [disabled] [size=512K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: No such device
                Not readable
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
                Vector table: BAR=1 offset=00002000
                PBA: BAR=1 offset=00003800
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 04000001 20002103 25010000 dd1aa418
        Capabilities: [138 v1] Power Budgeting <?>
        Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 7, Total VFs: 7, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 0064
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000000eb780000 (64-bit, non-prefetchable)
                Region 2: Memory at 00000000eb580000 (64-bit, non-prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

61:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
        Subsystem: Broadcom / LSI 9211-8i
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 56
        NUMA node: 3
        IOMMU group: 80
        Region 0: I/O ports at 6000 [size=256]
        Region 1: Memory at cf300000 (64-bit, non-prefetchable) [size=16K]
        Region 3: Memory at cf280000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at cee00000 [disabled] [size=512K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: No such device
                Not readable
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
                Vector table: BAR=1 offset=00002000
                PBA: BAR=1 offset=00003800
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 04000001 60002103 61010000 06a3f758
        Capabilities: [138 v1] Power Budgeting <?>
        Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 16, Total VFs: 16, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 0072
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000000cf2c0000 (64-bit, non-prefetchable)
                Region 2: Memory at 00000000cee80000 (64-bit, non-prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas


I doubt its the controllers, they wont be able to saturate sata3 on all disks but its not even close atm, when I ran speed tests they were doing over1g over some period across various disks, that was probably with docker running as well;

 

one thing im wondering, I've got appdata share inside array (on ssd/nvmes), the plex appdata is particularly terrible, in that is has several million folders and files, cant remember how much exactly but its a big plex install atm, perhaps its slowing down due to just io being terrible reading all these tiny files? then again I'd imagine parity does just bit reads and doesnt care much about data.
 

image.png

Link to comment
Posted (edited)

also i ran all disks through disk speed docker, they came back as expected, it complained about not seeing docker volume mount or smth but thats probably dev issue and/or 6.12.9. didnt grab screenshots but all were 200-100MB/s over span of platter with ssds doing better and nvmes happily doing 1.5gb/s

 

unraid cli can do quick check with hdparm right? from unraid docs
for ((i=0;i<12;i++)) do hdparm -tT /dev/hda; done
worth pausing parity sync and runnign across all disks?

Edited by Mizerka
Link to comment

Controllers look normal, but if the disks are performing normally with speed disk not sure what it could be, you could try syncing parity with just half the array, then try the other half and see if there a difference, if there's not, try with a different disk as parity.

Link to comment

did that anyways, this might be the reason, all disks coming back with 200mb ish (not accurate but close enough)

 

root@NekoUnRaid:/dev# for ((i=0;i<1;i++)) do hdparm -tT /dev/sds; done

/dev/sds:
 Timing cached reads:   10762 MB in  1.99 seconds = 5396.88 MB/sec
 Timing buffered disk reads:  16 MB in  3.20 seconds =   5.01 MB/sec

 

sds being disk 26, sob, getting unbalance back on and moving data off of it and gonna see how it behaves afterwards.image.png.09c0aebf8d3778281c7626a33254405c.png

 

Link to comment

I want to replace 25 and 26 either way, 25 was already cleared prior sitting empty, I've had no parity for over 3 years it can wait a day for me to remove them for now. Thanks for helping btw JorgeB, I'll drop an update probably tomorrow with how it turns out, its about half way done.

Link to comment

Yeah that was it, on the shelf they go to never be used i guess. 100mb/s ish parity write with 2.3gb/s array read going atm, feel like it could do better but this time it might be capping controller, mere 2days to complete.

 

probably not the best forum but would be nice to have a "quick read test" button in gui given how simple of a command it is, for things like this.

 

again, thanks @JorgeBfor taking a look.

image.png

Link to comment
Posted (edited)

got 8 devices on lsi 9211 8i and 16 on 9201 16e. uhhh both are are in pcie3 x8 from memory. then got 4nvmes in x4/4/4/4 and nvme in pcie m.2 (eats 4 lanes also electrically i believe). on h11ssl-i platform, 128gigs ram.

 

adjusted tunables a bit, got a bit more juice out of it, changing mb stripes from 1280 default to  8192 got me another 10mb/s or so, changing io to 90% didnt seem to do much(not brave enough for 100%), but maybe its too small too notice. so the auto tuning or whatever doesnt seem that great or atleast not opmising very much.

nekounraid-diagnostics-20240430-1053.zip

Edited by Mizerka
Link to comment
Posted (edited)

array reads improved but only temporarily? interesting, still worse than what it started as, got a bit wobbly for a bit but now back to stable 1.8gb/s, i assume unraid is doing some stability magic in background, not certain, or controller isnt happy, at this point, its past furthest its been on new parity disk. still 20x better than before but about half of expected read capacity of disks.

 

on graph, started at 2.3 ish, went down to 1.8, after tunable changes back to 2gig, wobbled a bit and now at 1.9

 

 

rr.png

Edited by Mizerka
Link to comment
48 minutes ago, Mizerka said:

and 16 on 9201 16e

This will be the main bottleneck, max usable bandwidth is around 1600MB/s, with 16 devices 100MB/s per device.

 

Tunables should not make much of a difference with current releases, they did before 6.8, or maybe 6.7.

 

 

 

 

Link to comment
Posted (edited)

yeah figured as much, hmmm I might have 2x spare 8i's somewhere, now that I have a spare pcie lane, also could move disks around hot bays more strategically so as it stops reading lower capacity disks it can balace the 2 controllers better without capping out. I'll live with 80-100 its doing atm, will take forever either way. and ye I read your posts actually on the tunable stuff, weird that it made a difference albeit brief just after changing poll and num stripe.

 

for now continues at 1.8-1.9gb/s with 80mb/s write to parity, 2% cpu (crypto) and 2.3gigs ram use. happy enough for now, thanks.image.thumb.png.fe44863afd2dcecafe1ecd3af8645bd9.png

Edited by Mizerka
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...