Mizerka Posted April 28 Share Posted April 28 (edited) Hey, trying to throw a parity disk into array but just cant get over how slow it is, estimates going from 40-600days, tried a couple of things, tried some more, stopped docker and vms, not doing any smb to it. It's a fairly big array mind you but surely it can do better than that. one thing of note is I've got ssd and nvmes in main array but if anything I'd expect them to be last ones to cause issues, left it to run for almost 2 days, got 1-2% done, stopped it and tried again same results, did a disk check on all drives, no visible issues, all disks have at least 150mb/s at start of disks and slowly fall off to around 100, the new parity performing best of the lot as well and ssds nvmes doing over 1gb within cache. cache moved just in case, no other reads/writes on array, I've some old plugins but they shouldnt interfere i dont think, tried playing with tunables to marginal or no improvements (left it at default in the end). Attached diags if anyone could help me out please. Edited April 29 by Mizerka Quote Link to comment
JorgeB Posted April 29 Share Posted April 29 There appears to be something else reading from disk26, try stopping all other activity. 1 Quote Link to comment
Mizerka Posted April 29 Author Share Posted April 29 that could've been unbalance, they're crap spare wd blue ssds 25/26 so was moving data out of them and take them out of array but then unbalance would happily do 60-100mb/s on reads on them so wasnt the disk fault dont think. But thats gone now as far as I can tell, main tab view was showing 0 activity before I started this latest one. left it to run overnight, and got 1.7% done. started up all my normal docker stuff, 0 impact if anything its gone better overnight with docker enabled avg now 5mb/s. it feels like something is holding it back, I know its meant to be slowest disk reads affecting but they're all good, i mean at least 100mb/s across platter good, not 3-5. Maybe worth noting I recently upgraded hardware and went from x9 dual xeons to x11 rome 7551p, maybe its some legacy shit that came from older platform to current install? would it be worth trying to downgrade os and see if it fixes some/any components? Quote Link to comment
Mizerka Posted April 29 Author Share Posted April 29 (edited) attached, can kill docker and uninstall any plugins if you believe they have an impact on this Edited April 29 by Mizerka Quote Link to comment
JorgeB Posted April 29 Share Posted April 29 There's still something reading from the array, if you don't what it is, try stopping the docker service. Quote Link to comment
Mizerka Posted April 29 Author Share Posted April 29 (edited) Okay, back to vm and docker off, removed unbalance and some other plugins I've not used in ages and removed my smb mount. still seeing same speeds, can you confirm where you're seeing the disk traffic in diags. cant think of anything else that might be touching the disk, any cmd I can run to spit out use logs maybe? could unraid connect be doing it? I doubt it but running out of things I guess. attached post change. 6mb/s still. looking at whats actually on 26, there's only appdata share data on there. Edited April 29 by Mizerka Quote Link to comment
JorgeB Posted April 29 Share Posted April 29 I was seeing it on loads.txt, but there are no extra reads on the latest diags, so the limit may be caused by another thing, like a slow disk, doesn't look like a controller bottleneck, it's too low for that, but post the output of: lspci -d 1000: -vv Quote Link to comment
Mizerka Posted April 29 Author Share Posted April 29 ~# lspci -d 1000: -vv 25:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02) Subsystem: Broadcom / LSI 9201-16e 6Gb/s SAS/SATA PCIe x8 External HBA Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 428 NUMA node: 1 IOMMU group: 52 Region 0: I/O ports at 4000 [size=256] Region 1: Memory at eb79c000 (64-bit, non-prefetchable) [size=16K] Region 3: Memory at eb740000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at eb500000 [disabled] [size=512K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [d0] Vital Product Data pcilib: sysfs_read_vpd: read failed: No such device Not readable Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [c0] MSI-X: Enable+ Count=15 Masked- Vector table: BAR=1 offset=00002000 PBA: BAR=1 offset=00003800 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 04000001 20002103 25010000 dd1aa418 Capabilities: [138 v1] Power Budgeting <?> Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq- IOVSta: Migration- Initial VFs: 7, Total VFs: 7, Number of VFs: 0, Function Dependency Link: 00 VF offset: 1, stride: 1, Device ID: 0064 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000eb780000 (64-bit, non-prefetchable) Region 2: Memory at 00000000eb580000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mpt3sas Kernel modules: mpt3sas 61:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) Subsystem: Broadcom / LSI 9211-8i Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 56 NUMA node: 3 IOMMU group: 80 Region 0: I/O ports at 6000 [size=256] Region 1: Memory at cf300000 (64-bit, non-prefetchable) [size=16K] Region 3: Memory at cf280000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at cee00000 [disabled] [size=512K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 512 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [d0] Vital Product Data pcilib: sysfs_read_vpd: read failed: No such device Not readable Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [c0] MSI-X: Enable+ Count=15 Masked- Vector table: BAR=1 offset=00002000 PBA: BAR=1 offset=00003800 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 04000001 60002103 61010000 06a3f758 Capabilities: [138 v1] Power Budgeting <?> Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq- IOVSta: Migration- Initial VFs: 16, Total VFs: 16, Number of VFs: 0, Function Dependency Link: 00 VF offset: 1, stride: 1, Device ID: 0072 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000cf2c0000 (64-bit, non-prefetchable) Region 2: Memory at 00000000cee80000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mpt3sas Kernel modules: mpt3sas I doubt its the controllers, they wont be able to saturate sata3 on all disks but its not even close atm, when I ran speed tests they were doing over1g over some period across various disks, that was probably with docker running as well; one thing im wondering, I've got appdata share inside array (on ssd/nvmes), the plex appdata is particularly terrible, in that is has several million folders and files, cant remember how much exactly but its a big plex install atm, perhaps its slowing down due to just io being terrible reading all these tiny files? then again I'd imagine parity does just bit reads and doesnt care much about data. Quote Link to comment
Mizerka Posted April 29 Author Share Posted April 29 (edited) also i ran all disks through disk speed docker, they came back as expected, it complained about not seeing docker volume mount or smth but thats probably dev issue and/or 6.12.9. didnt grab screenshots but all were 200-100MB/s over span of platter with ssds doing better and nvmes happily doing 1.5gb/s unraid cli can do quick check with hdparm right? from unraid docs for ((i=0;i<12;i++)) do hdparm -tT /dev/hda; done worth pausing parity sync and runnign across all disks? Edited April 29 by Mizerka Quote Link to comment
JorgeB Posted April 29 Share Posted April 29 Controllers look normal, but if the disks are performing normally with speed disk not sure what it could be, you could try syncing parity with just half the array, then try the other half and see if there a difference, if there's not, try with a different disk as parity. Quote Link to comment
Mizerka Posted April 29 Author Share Posted April 29 did that anyways, this might be the reason, all disks coming back with 200mb ish (not accurate but close enough) root@NekoUnRaid:/dev# for ((i=0;i<1;i++)) do hdparm -tT /dev/sds; done /dev/sds: Timing cached reads: 10762 MB in 1.99 seconds = 5396.88 MB/sec Timing buffered disk reads: 16 MB in 3.20 seconds = 5.01 MB/sec sds being disk 26, sob, getting unbalance back on and moving data off of it and gonna see how it behaves afterwards. Quote Link to comment
JorgeB Posted April 29 Share Posted April 29 2 hours ago, Mizerka said: sds being disk 26 Yeah, that one could be the problem Quote Link to comment
ChatNoir Posted April 29 Share Posted April 29 You say that you want to remove it from the Array. Why sync the Array with disk26 IN the Array ? Keep it outside (unassigned if need be) and sync with just the rest. Quote Link to comment
Mizerka Posted April 29 Author Share Posted April 29 I want to replace 25 and 26 either way, 25 was already cleared prior sitting empty, I've had no parity for over 3 years it can wait a day for me to remove them for now. Thanks for helping btw JorgeB, I'll drop an update probably tomorrow with how it turns out, its about half way done. Quote Link to comment
Mizerka Posted April 30 Author Share Posted April 30 Yeah that was it, on the shelf they go to never be used i guess. 100mb/s ish parity write with 2.3gb/s array read going atm, feel like it could do better but this time it might be capping controller, mere 2days to complete. probably not the best forum but would be nice to have a "quick read test" button in gui given how simple of a command it is, for things like this. again, thanks @JorgeBfor taking a look. Quote Link to comment
JorgeB Posted April 30 Share Posted April 30 Don't remember how many devices there were connected to the HBA, and the diags were removed, do you have an expander or are they connected directly? Quote Link to comment
Mizerka Posted April 30 Author Share Posted April 30 (edited) got 8 devices on lsi 9211 8i and 16 on 9201 16e. uhhh both are are in pcie3 x8 from memory. then got 4nvmes in x4/4/4/4 and nvme in pcie m.2 (eats 4 lanes also electrically i believe). on h11ssl-i platform, 128gigs ram. adjusted tunables a bit, got a bit more juice out of it, changing mb stripes from 1280 default to 8192 got me another 10mb/s or so, changing io to 90% didnt seem to do much(not brave enough for 100%), but maybe its too small too notice. so the auto tuning or whatever doesnt seem that great or atleast not opmising very much. nekounraid-diagnostics-20240430-1053.zip Edited April 30 by Mizerka Quote Link to comment
Mizerka Posted April 30 Author Share Posted April 30 (edited) array reads improved but only temporarily? interesting, still worse than what it started as, got a bit wobbly for a bit but now back to stable 1.8gb/s, i assume unraid is doing some stability magic in background, not certain, or controller isnt happy, at this point, its past furthest its been on new parity disk. still 20x better than before but about half of expected read capacity of disks. on graph, started at 2.3 ish, went down to 1.8, after tunable changes back to 2gig, wobbled a bit and now at 1.9 Edited April 30 by Mizerka Quote Link to comment
JorgeB Posted April 30 Share Posted April 30 48 minutes ago, Mizerka said: and 16 on 9201 16e This will be the main bottleneck, max usable bandwidth is around 1600MB/s, with 16 devices 100MB/s per device. Tunables should not make much of a difference with current releases, they did before 6.8, or maybe 6.7. Quote Link to comment
Mizerka Posted April 30 Author Share Posted April 30 (edited) yeah figured as much, hmmm I might have 2x spare 8i's somewhere, now that I have a spare pcie lane, also could move disks around hot bays more strategically so as it stops reading lower capacity disks it can balace the 2 controllers better without capping out. I'll live with 80-100 its doing atm, will take forever either way. and ye I read your posts actually on the tunable stuff, weird that it made a difference albeit brief just after changing poll and num stripe. for now continues at 1.8-1.9gb/s with 80mb/s write to parity, 2% cpu (crypto) and 2.3gigs ram use. happy enough for now, thanks. Edited April 30 by Mizerka 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.