talmania Posted March 18 Share Posted March 18 I've ran 10gb on my unraid for a really long time now and have finally spent some time trying to optimize my transfer speeds. Some history first: Initial Config & Revision: Workstation: AMD Ryzen 3500 Intel x540-T1 with CAT5e to Brocade 10gb switch Transfer speeds: wildly fluctuating from 450MB/s to 70MB/s Workstation Revision 3/2024: ConnectX-4 SMF Fiber Consistent ~400-480MB/s transfer speeds UNRAID: Supermicro X10DRH-IT Dual E5-2670v3 128GB RAM Intel x540 Dual Port 10Gbase T Cache: Dual Crucial P3 Plus 4TB PCIe 4.0 x4 in a PCI3.0 x4 slot (Mirror--theoretically 4100MB/s write) After seeing lots of variances in speed since the initial 6.x upgrade when writing to cache and never being super satisfied with transfer speed (for reference I would get ~400MB/s peaks pre 6.x) Well I decided I wanted to do something about it and I ended up running SMF fiber under my house, replaced my NIC with a Mellanox ConnectX-4 and now I'm seeing consistent~400 to 480MB/s transfer speeds to the cache drive. When I run iperf3 to the server itself from my workstation I'm seeing on average about 7.5Gbps (~935MB/s). Network is flat, there's a single nic enabled on Unraid with no teaming, jumbo frames or any other tweaks done. I'm curious what step I should take next to troubleshoot and see if I can't get a better transfer rate? Thanks for any and all advice. Quote Link to comment
Solution MAM59 Posted March 18 Solution Share Posted March 18 You should better have bought a Connect X-3 instead (the latest edition). This is because the X-4 needs 8 Lanes for the PCIe slot, else it will slow down by half. The 450Mb you see tells me, that you have put the card either into the wrong slot (4x only), or you bios settings somehow prevent the usage of the full lanes. With 10G you should expect to see ~1,1Gb/s not much less. Look at the manuals of your Motherboards to see if there is an apropriate free slot (watch out for (*) notes many many have restriction with several combinations.) Mostly there is only one real 16x slot which is used for the Grafics card. What you need is a Connect X-3 with PCIe3.0x4. (Warning! Many models of this card are PCIe2.0x8, these wont work well the same...) Quote Link to comment
JorgeB Posted March 18 Share Posted March 18 8 hours ago, talmania said: now I'm seeing consistent~400 to 480MB/s transfer speeds to the cache drive. Is this with a user share or disk/exclusive share? Quote Link to comment
talmania Posted March 18 Author Share Posted March 18 6 hours ago, MAM59 said: You should better have bought a Connect X-3 instead (the latest edition). This is because the X-4 needs 8 Lanes for the PCIe slot, else it will slow down by half. The 450Mb you see tells me, that you have put the card either into the wrong slot (4x only), or you bios settings somehow prevent the usage of the full lanes. With 10G you should expect to see ~1,1Gb/s not much less. Look at the manuals of your Motherboards to see if there is an appropriate free slot (watch out for (*) notes many many have restriction with several combinations.) Mostly there is only one real 16x slot which is used for the Grafics card. What you need is a Connect X-3 with PCIe3.0x4. (Warning! Many models of this card are PCIe2.0x8, these wont work well the same...) Thank you for responding!! I think you may be onto something here...the ConnectX4 I purchased is actually a 25Gbe card (MCX4121A-ACAT) as I'm awaiting the arrival of a 100gbe switch here shortly. The motherboard is a ASUS TUF gaming x570-Plus WiFi and the manual does show 2x PCIe 4.0 x16 slots but when dual vga/pcie cards are used the 2nd slot is PCIe 4.0 at x4 instead of x8. OK bear with me here and please correct me if I'm totally wrong (probably am!). I could not locate a board diagram for the card but assuming it's 50gb (2x 25gb ports) across the 8 lanes. Each PCIe 4.0 lane is 2GB/s of bandwidth. PCIe 3.0 is 1GB/s. So assuming the card is split 4 lanes for each port that's 4GB/s for each port or 4000 MB/s with 1000MB/s for each lane for each port. If that's running at half speed with the fewer x4 lanes (2 per port??) that's 500MB/s per lane or 1000MB/s for each port but I'm guessing it's split between send and receive?? And holy hell there's the problem! Do I have that right? The 7.5Gbps result from iPerf would be both lanes at 500MB/s to get to 1000MB/s and the 937.5MB/s result from iPerf with overhead. Quote Link to comment
talmania Posted March 18 Author Share Posted March 18 4 hours ago, JorgeB said: Is this with a user share or disk/exclusive share? It's a user share with cache enabled. Just attempted to write a file directly to the cache folder and found it would sustain 625-700MB/s write Quote Link to comment
JorgeB Posted March 18 Share Posted March 18 Disk/exclusive shares will generally perform noticeably better, so if possible for your use case, use them. Quote Link to comment
MAM59 Posted March 18 Share Posted March 18 2 hours ago, talmania said: ASUS TUF gaming x570-Plus WiFi Yeah, I have the same over here. Just retired it in favour of a ProArt x670 Wifi with a 7750x and DDR 5. But the same flaw: if you use slot 2, slot 1 drops to 8x and the other 8x go over to slot 2. I think, it was the same split for the x570 but with 4x instead of 8x. as you said. Sadly its not a simple "port 1 uses lanes 1-4 and port 2 uses lanes 5-8" thing. The existing lanes all go to the chipset of the card which handles the 2 ports. Reserving bandwith for both. So I am afraid, it will effently lower port 1's throughput even though port 2 is not used at all. But I might also be wrong. Anyway these are "server only" cards, they are not designed to be run on cheap motherboards with limited lane counts. (Server boards usually come with real 8x slots) Quote Link to comment
talmania Posted March 18 Author Share Posted March 18 1 hour ago, MAM59 said: Yeah, I have the same over here. Just retired it in favour of a ProArt x670 Wifi with a 7750x and DDR 5. But the same flaw: if you use slot 2, slot 1 drops to 8x and the other 8x go over to slot 2. I think, it was the same split for the x570 but with 4x instead of 8x. as you said. Sadly its not a simple "port 1 uses lanes 1-4 and port 2 uses lanes 5-8" thing. The existing lanes all go to the chipset of the card which handles the 2 ports. Reserving bandwith for both. So I am afraid, it will effently lower port 1's throughput even though port 2 is not used at all. But I might also be wrong. Anyway these are "server only" cards, they are not designed to be run on cheap motherboards with limited lane counts. (Server boards usually come with real 8x slots) Well I guess this just gives me extra incentive to finish up my Z690 build I’ve been lazy about. Thankfully it does have an 4.0 x16 slot that will operates at x16. Time to test… Quote Link to comment
talmania Posted March 18 Author Share Posted March 18 4 hours ago, MAM59 said: Yeah, I have the same over here. Just retired it in favour of a ProArt x670 Wifi with a 7750x and DDR 5. But the same flaw: if you use slot 2, slot 1 drops to 8x and the other 8x go over to slot 2. I think, it was the same split for the x570 but with 4x instead of 8x. as you said. Sadly its not a simple "port 1 uses lanes 1-4 and port 2 uses lanes 5-8" thing. The existing lanes all go to the chipset of the card which handles the 2 ports. Reserving bandwith for both. So I am afraid, it will effently lower port 1's throughput even though port 2 is not used at all. But I might also be wrong. Anyway these are "server only" cards, they are not designed to be run on cheap motherboards with limited lane counts. (Server boards usually come with real 8x slots) Well that was definitely it. Got the new build in a state where I could do some preliminary testing and found that writing to cache backed user shares with another same model ConnectX-4 was a sustained 450-525MBps and when I wrote directly to the cache drive I was getting sustained 935MBps to 1.05GBps. Quote Link to comment
talmania Posted March 19 Author Share Posted March 19 Thanks all for the advice. I’m closing this as solved but for anyone coming by this thread the solution is a combination of both pcie lanes and writing to cache vs writing to a user share that is cache backed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.