Bandwidth Limit


Recommended Posts

I seem to have hit a hard limit with my 80TB UnRaid server. Under normal operation the system runs very smoothly, but when I go to do a parity check (System crashed) I hit a limit of 400MB/s on my array. This is very slow for 20 drives. Attached is a screenshot of what I am talking about. I know its not the drives because once the 2TB drives are done the remaining speed up again to about 400MB/s total. My Cache array (4x 240GB SSD's) doesn't seem to have any impact on this limit. This limit causes my parity check to take on average 4 - 6 days.

 

I have an LSI 9207-8I capable of 8x 6Gb/s hooked up to an HP SAS expander capable of sustaining an 8x connection (Datasheet Here). I do have a slight memory limitation at the moment however I don't believe any memory is used when doing parity checks.

 

If anyone has any insight, please let me know. Also if I missed anything, please ask. Diags attached

 

Thanks

Lonnie

 

Parity Check.PNG

andromeda-diagnostics-20170430-0005.zip

Link to comment

The HP expander will be a bottleneck with so many disks, still it should be faster than that, are you using single or dual link to the LSI?

 

Max speed with 20 disks is:

single link - 55MB/s

dual link - 110MB/s

 

Also, you're using the default tunables, they are not optimal especially with so many disks, try these (Settings -> Disk settings):

 

Tunable (md_num_stripes): 4096

Tunable (md_sync_window): 2048

Tunable (md_sync_thresh): 2000

  • Upvote 2
Link to comment

I like to test during a read check with unRAID, that's with all disks assigned as data disks and no parity, let it run a minute or so and check the speed reported on main or total bandwidth with the stats plugin, this way there's no parity calculation overhead.

 

Obviously this is not practical for a working server and in those cases I use a simple script, problem with it is that with some controllers not all disks will run at the same speed so the way to check total bandwidth is using the stats plugin and look at the storage graph during the test.

 

 

Edited by johnnie.black
  • Upvote 1
Link to comment
1 hour ago, johnnie.black said:

I like to test during a read check with unRAID, that's with all disks assigned as data disks and no parity, let it run a minute or so and check the speed reported on main or total bandwidth with the stats plugin, this way there's no parity calculation overhead.

 

Obviously this is not practical for a working server and in those cases I use a simple script, problem with it is that with some controllers not all disks will run at the same speed so the way to check total bandwidth is using the stats plugin and look at the storage graph during the test.

 

 

 

I wonder if LT could implement a read check for all disks even if you use parity.

Link to comment
7 hours ago, johnnie.black said:

The HP expander will be a bottleneck with so many disks, still it should be faster than that, are you using single or dual link to the LSI?

 

Max speed with 20 disks is:

single link - 55MB/s

dual link - 110MB/s

 

Also, you're using the default tunables, they are not optimal especially with so many disks, try these (Settings -> Disk settings):

 

Tunable (md_num_stripes): 4096

Tunable (md_sync_window): 2048

Tunable (md_sync_thresh): 2000

Johnnie, I am using dual link, so the 110MB/s per drive should be more than enough to saturate my 10Gb ethernet on reads (Provided the NIC's are setup properly, which it is not). I did change those tunables but I have some write operations going on so we will see if that makes any difference.

 

1 hour ago, ashman70 said:

Just wanted to say your drive temps are kind of high, mind you if that is during a parity check I guess its not too bad. What kind of case are you using?

Temps are high but once the parity check is done they settle out to around 30. I am using the NORCO RPC-4224 so the drives are packed in there.

Link to comment
4 minutes ago, lonnie776 said:

Johnnie, I am using dual link, so the 110MB/s per drive should be more than enough to saturate my 10Gb ethernet on reads (Provided the NIC's are setup properly, which it is not). I did change those tunables but I have some write operations going on so we will see if that makes any difference.

 

110MB's limit will be on parity check/sync and disk rebuild, unRAID only reads one disk at a time, so normal reads won't be a problem.

 

There's shouldn't be any other array activity during the check or performance will be much worse.

 

You can check if dual link is working by typing the following in the console:

 

cat /sys/class/sas_host/host1/device/port-1\:0/sas_port/port-1\:0/num_phys

 

If result is 8 then it's using dual link, 4 means single link.

 

  • Upvote 2
Link to comment
3 minutes ago, johnnie.black said:

 

110MB's limit will be on parity check/sync and disk rebuild, unRAID only reads one disk at a time, so normal reads won't be a problem.

 

There's shouldn't be any other array activity during the check or performance will be much worse.

 

You can check if dual link is working by typing the following in the console:

 


cat /sys/class/sas_host/host1/device/port-1\:0/sas_port/port-1\:0/num_phys

 

If result is 8 then it's using dual link, 4 means single link.

 

Very interesting, apparently UnRaid is only using a single link. I'm 100% positive I have the cables hooked up properly so is this an UnRaid setting or on the cards themselves?

 

Also this is a production machine so removing all possible writes to the array is a bit out of the question. When I grabbed that information yesterday there was nothing else going on though. Also this issue came up when I first started making the machine where there were no writes at all.

 

P.S. I was on Freenas before this and their community was... less than helpful with a lot of condescension. I have to say my experience here has been a very pleasant one.

Thanks for all the help so far on this, I really appreciate a "Specialists" hand with tough problems like this.

Link to comment

Yes, One SAS 8087 cable = 4 physical sata/sas connections. Each connection has a theoretical throughput of 6Gb/s if the drive supports those speeds. You will most likely have a sas expander somewhere which will allocate those 4 channels to which ever drive needs them so 4 channels (in my case) can run 24 drives.

Link to comment
Just now, johnnie.black said:

 

Yes, each backplane has an integrated expander, if there's just one cable connected it can only be single link (also the expander must support dual link).

 

I don't actually have a backplane, I  have 1 cable to 4 drives. So i have 6 "Dumb" backplanes connected to my old HP SAS expander which does what the really expensive backplanes do.

Link to comment

Also johnnie, while you are here and you seem to know what you are talking about. Do you know why my smaller 2TB drives keep splitting at around 500GB instead of 1TB? This behavior seems strange to me as I have read the intended process for high water splitting and it doesn't fit.

 

Edit: Also the drives were not filled to 1TB then things were deleted, they just filled that way. UnRaid split to disk 11 yesterday.

Edited by lonnie776
Link to comment

Hopefully this won't confuse the situation, but I'll add some info for comparison because I have an HP sas expander in 2 of my machines, both using an HP H220 HBA (which means SATA speeds are capped at 3gbps.) Both expanders are attached to an MD1000 in split mode (1/2 of the disks to 1 server, the rest to the other.)

 

When I run the command 

 

cat /sys/class/sas_host/host1/device/port-1\:0/sas_port/port-1\:0/num_phys

 

on both servers, they both show 8.

 

The original cable I purchased to go between the sas expanders and the md1000 was http://www.ebay.com/itm/NEW-IBM-External-5-5m-SAS-to-Mini-SAS-Cable-SFF-8470-to-SFF-8088-95P4588-/381835840239?hash=item58e7308aef:g:bTMAAOSwcUBYGiF-. But it turned out to be like 2x or something... don't remember exactly, anyways speed was horrible to the disk enclosure. Then purchased this one: http://www.ebay.com/itm/5x-Foxconn-SAS-cable-2m-SFF-8470-CX4-screw-type-Mini-SAS-SFF-8088-2GFPGBX-63G-/142167368793?hash=item2119d5e459 and now this is what I show:

 

 

On my main server, a parity check (with a couple slow disks which will be phased out in a few months) shows the following:

 

5906156689513_ScreenShot2017-04-30at12_38_16PM.png.1f69b0bc4364cae8e411ccfe3baee3d7.png

5906154216841_ScreenShot2017-04-30at12_38_07PM.png.51cf13c28d2b6ab68b768ad5305624fa.png

 

 

 

on the secondary server, with only 3 disks and no parity, it shows the following:

 

 

59061572a4b2b_ScreenShot2017-04-30at12_38_44PM.png.57a8a683e04b73cd8ac676c223821ba5.png

5906157bd9211_ScreenShot2017-04-30at12_34_42PM.png.7135eee31ddd864cb989990b87278cee.png

 

 

 

Link to comment
11 minutes ago, 1812 said:

The original cable I purchased to go between the sas expanders and the md1000 was http://www.ebay.com/itm/NEW-IBM-External-5-5m-SAS-to-Mini-SAS-Cable-SFF-8470-to-SFF-8088-95P4588-/381835840239?hash=item58e7308aef:g:bTMAAOSwcUBYGiF-. But it turned out to be like 2x or something... don't remember exactly, anyways speed was horrible to the disk enclosure. Then purchased this one: http://www.ebay.com/itm/5x-Foxconn-SAS-cable-2m-SFF-8470-CX4-screw-type-Mini-SAS-SFF-8088-2GFPGBX-63G-/142167368793?hash=item2119d5e459 and now this is what I show:

 

Oh, so the Chinese cable only had half the physical pins?

 

Edit: That shouldn't be my problem. I got 8 of these

Edited by lonnie776
Link to comment
19 minutes ago, 1812 said:

Hopefully this won't confuse the situation, but I'll add some info for comparison because I have an HP sas expander in 2 of my machines, both using an HP H220 HBA (which means SATA speeds are capped at 3gbps.) Both expanders are attached to an MD1000 in split mode (1/2 of the disks to 1 server, the rest to the other.)

 

If I understand correctly the expander is using 2 wide links, one to each server, so both wide ports are in use so the 8 is correct, but obviously in that configuration it doesn't represent dual link.

 

Also, your speeds are more consistent with a SATA 1.5gbps single link, are you sure the disks are linking at sata 3gbps? a single link @ sata2 is good for 1100MB/s

Link to comment
35 minutes ago, lonnie776 said:

Also johnnie, while you are here and you seem to know what you are talking about. Do you know why my smaller 2TB drives keep splitting at around 500GB instead of 1TB? This behavior seems strange to me as I have read the intended process for high water splitting and it doesn't fit.

 

Edit: Also the drives were not filled to 1TB then things were deleted, they just filled that way. UnRaid split to disk 11 yesterday.

 

This is normal for the default high water split setting:

 

https://lime-technology.com/wiki/index.php/Un-Official_UnRAID_Manual#High_Water

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.