Slow write perfromances on big data transfer

Jetro · January 24

Hello everyone,

I'm moving a lot of files for a friend, from 6 Truenas devices to 2 Unraid. Now only the first is "alive", with following hardware:

- Supermicro X10DRi motherboard

- Supermicro controller in HBA mode (don't remember exact model)

- 416 GB RAM ECC (has to be 512GB but we've got some defective modules. Will go down to 256GB with the second unraid)

- 2 x CPU E5-2690 v3

- 36 Disk of various capacity, that will be moved during transfer. (actually only 28 + 2 parity in use)

- 10 Gbit SFP+ dual port network card

- 1 Gbit dual port integrated network card

I've set up the 1 Gbit network for OOB management, while 1 x 10 Gbit goes to LAN and 1 x 10 Gbit goes to the Truenas moving files.

First Truenas has been moved at approx. 145 MB/s, which is the limit of destination hard drive on unraid. Parity was disabled. Write mode was "reconstruct write".

Now, I've formatted first truenas Segate Exos HDDs (18TB) and installed them on unraid, and I'm transferring second NAS. Speed start at 145 MB/s then drop after some times to 50 MB/s, which means more than a month of transfer instead of 2 weeks.

As the speed was slow, I integrated parity, let the parity set and then restarted transfer with the exactly same speed.

Tried moving from reconstruct write to "auto", and speed increased from 50 to 60 MB/s.

Attached the diagnostics

Many thanks

mmpl-2-diagnostics-20240124-2003.zip

JorgeB · January 24

10 minutes ago, Jetro said:

Write mode was "reconstruct write".

Write mode only matters if parity is enabled.

12 minutes ago, Jetro said:

Tried moving from reconstruct write to "auto", and speed increased from 50 to 60 MB/s.

That suggests you have a controller bottleneck problem, or a slow disk, turbo writes needs to read all the disks simultaneously while it writes to the destination (and parity).

Suggest running the disk and controller benchmarks with the diskspeed docker.

Jetro · January 24

Thank you for your reply,

It looks like I have some really slow disk, which are the already filled ones. As I'm now working on the fastest drives, maybe it's better to keep turbo write off.

For example I was copying to disk 2, which is rated > 250 MB/s like also the parity drives, is it normal I was writing at < 50 MB/s?

Controller Benchmark:

Disk benchmark:

Disk 20 detail:

Speed after resuming transfer:

1580738873_ScreenShot2024-01-25at02_07_48.png.bf28014bbaecfd9bab53ac348acb1dd8.png

Tomorrow I'll update with transfer speed

Edit: here's what hapens:

1893262969_ScreenShot2024-01-25at09_56_57.png.03cf5e7f427328bba55a5d6a1e2936d8.png

FIle are only videos.

Edited January 25 by Jetro
+ All Disk test

JorgeB · January 25

Some slow disks, or at least slow zones, and there's also a controller bottleneck.

Jetro · January 25

How can I understand where's the controller bottleneck? Basing on 18 TB drives (which I'm using now) speeds seem to be much higher.

Also, I'm reading/writing to 3 disks only, CPU usage is < 7 %, is it normal I'm getting these speeds?

If it is, is there any suggestion for an HBA controller which wouldn't cause bottleneck?

He's okay if there are some slow disks, but I prefer getting full speeds when working on new disks

JorgeB · January 25

Post the output of:

lspci -d 1000: -vv

Also, what is the backplane model and how is it connected to the HBA, one or two cables?

Jetro · January 25

There are two backplanes on that Supermicro server, one to the back (12 Disks) and one to the front (24 disks). I have to take them off to read the P/N, I think I could do that in the afternoon.

01:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
        Subsystem: Broadcom / LSI MegaRAID SAS-3 3108 [Invader]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 24
        NUMA node: 0
        IOMMU group: 64
        Region 0: I/O ports at 6000 [size=256]
        Region 1: Memory at c7300000 (64-bit, non-prefetchable) [size=64K]
        Region 3: Memory at c7200000 (64-bit, non-prefetchable) [size=1M]
        Expansion ROM at c7100000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <2us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
                Not readable
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [c0] MSI-X: Enable+ Count=97 Masked-
                Vector table: BAR=1 offset=0000e000
                PBA: BAR=1 offset=0000f000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [1e0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [1c0 v1] Power Budgeting <?>
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: megaraid_sas
        Kernel modules: megaraid_sas

JorgeB · January 25

24 minutes ago, Jetro said:

There are two backplanes on that Supermicro server, one to the back (12 Disks) and one to the front (24 disks).

I assume one cable from the controller is going to each backplane? If you have to take them off to see the model don't do it for now, it may not be needed.

Jetro · January 25

No really, there is only a single cable going under the mainbord, I assume to something like what HP calls an expander board.

To the front backplane I can see two cables linked (24 disks).

To access rear backplane or the expander I have to remove the MoBo.

JorgeB · January 25

There can be two cables to the front expander, and then one from the front expander to the back, it would be good to confirm how they are connected, that can have an impact on the available bandwidth.

Jetro · January 27

It makes sense, as there is a big heatsink on front backplate, like there's some chipset behind it. I'll make a check next time I'll be there.

However, I can't understand how I moved something around 180 TB at full speed (> 150 MB/S) to slower disks (14 TB and lower) then now it seem I'm stuck at half that speed

JorgeB · January 27

10 hours ago, Jetro said:

However, I can't understand how I moved something around 180 TB at full speed (> 150 MB/S) to slower disks (14 TB and lower) then now it seem I'm stuck at half that speed

And there were already all the same disks installed?

Jetro · January 28

Nope, initially there was the slow ones with other 5-6-8 TB slower disks which were never be used.

Finished moving from first NAS I removed from the old one the 18TB disks, formatted and installed on Unraid machine

Edited January 28 by Jetro

JorgeB · January 29

The controller bottleneck will only be noticeable with more disks.

Jetro · January 29

I'm not getting this:

- Before: 28 disks (actual ones + smaller and slower disks which have never been used)

- Now: 28 disks (removed some small and slow disks and replaced with 18TB/280MB/s drives)

- Having turbo write off, I'm only using 3 disk at a time. Parity disks have always been 2 x 18 TB Exos drives. May the overall array size be the culprit?

I'm gonna build a twin machine, and I'd like to avoid bottlenecks...

JorgeB · January 30

13 hours ago, Jetro said:

Having turbo write off, I'm only using 3 disk at a time.

With turbo write off the limit is the parity write speed, read, modify, write.

Jetro · January 30

That's why I'm not understanding low speed: parity drives are the faster one (min. 150 MB/s) and I'm moving big video files, so why I'm not going at full speed?

And, above all, why first hundred of GB go at full speed?

Maybe there's some thermal trottling involved (all temperatures seem low)? Or some caching issues?

JorgeB · January 30

You cannot get full speed with read/modify/write write mode, it's always around a 1/3 of the disk max speed, you can get full speed with reconstruct (turbo) write if there are no bottlenecks.

Jetro · January 30

Got that. So it makes sense, now I'm writing as low as 60 MB/s when I'm near the end of the disk (so 180 MB/s) and 90 MB/s when in faster zone.

Also there was some kind of problem with the other machine, and it died today. I replaced mobo/cpu/ram at once (no time to investigate, data transfer is long enough) and it seem more stable now, acting as expected.

Building the new one I'll try to put only new and equal disks, so it will make sense keeping turbo write on and transfer @ full speed!

Another quick question 1 : does having 256GB of RAM makes sense? I've never seen more than 4GB used. (Note it will never run containers nor VMs as there's a separate cluster for that).

Question 2: How many free space is it best to leave in disks?

Edited January 30 by Jetro
Quick questions

JorgeB · January 31

9 hours ago, Jetro said:

Another quick question 1 : does having 256GB of RAM makes sense?

It can, but without VMs mostly if you want to dedicate a big part to the zfs ARC.

9 hours ago, Jetro said:

Question 2: How many free space is it best to leave in disks?

Around 1% for xfs, a little more is best for btrfs or zfs.

Jetro · January 31

Does the 1% apply also for bigger array? I have 382 TB now, an plan to go to 500TB with final disk... That mean I'd need 5 TB RAM...

Also, i rolled back from ZFS to XFS because of slowliness

itimpi · January 31

3 minutes ago, Jetro said:

Does the 1% apply also for bigger array? I have 382 TB now, an plan to go to 500TB with final disk... That mean I'd need 5 TB RAM...

Also, i rolled back from ZFS to XFS because of slowliness

This is not relevant to the main array as in that each drive is an independent file system. I also suspect that it is a legacy requirement and nothing like that is needed on modern systems.

ZFS in the main array is know to currently have performance issues. This does not apply to ZFS zpools which are currently the highest performance option with Unraid.

Jetro · January 31

ZFS Zpools may have redundancy? (ex. RAID Z1)

If not, once finished transfer i can setup a zfs pool with a replica task for most used files

JorgeB · January 31

1 hour ago, Jetro said:

ZFS Zpools may have redundancy? (ex. RAID Z1)

Pools yes, array devices formatted with zfs can have parity based redundancy.

trurl · January 31

On 1/30/2024 at 4:46 AM, Jetro said:

why first hundred of GB go at full speed?

RAM buffering

Slow write perfromances on big data transfer

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation