[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)


TODDLT

Recommended Posts

  • Replies 453
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

I'm just wondering if its doing an extra I/O operation (perhaps in error) because of that setting?

 

I usually perform non-correcting checks, because I want to be able to investigate the error before it's corrected incase its the data disk instead of the parity. That might be rare, but it's what I chose to do.

Link to comment

I have, on my test system, run both correcting and non-correcting for timing purposes -- they've always been identical to the second.

 

... there is, by the one, ONE instance where I run non-correcting checks:  If I've just done a drive rebuild, I immediately run a non-correcting check to confirm the rebuild is okay.    Doing a non-correcting check in this instance allows re-doing the rebuild if anything went awry.    Other than that, I never do non-correcting checks ... if there's an error, I want parity corrected.  Period.    Even if there's a data error on one of the data drives, I still want parity to be correct, so if anything else goes wrong I can do a successful rebuild of the failed disk.

 

Link to comment

You might want to read this post and follow the links in the post. 

 

      http://lime-technology.com/forum/index.php?topic=43023.msg413195#msg413195

 

It provides a means for optimizing the 'Tunables' on the Disk Settings.  This might be of some help with this problem!  I did not find that changing the tunables made much difference in my setup BUT I seem to recall that it has help some folks in the past. 

 

 

Link to comment

CLEARLY there's something in the parity checks that has changed since v5 and is causing these issues.  It'd certainly be nice if Limetech would have an epiphany and realize what it is !!  One of those forehead-slapping moments when you suddenly realize what's going on  :)

 

I believe the change happened between 5beta12 and 5.0.0, but the slowdown is more subtle on v5 and mitigated by changing the tunable settings, I don’t know what’s different about it but I believe that if the parity check code was more like the parity sync/disk rebuild code not only would the SAS2LP issue be solved but probably also the single core Celeron problem, which has similar symptoms.

 

On the other hand I know nothing about writing code so maybe there’s a good reason why it was changed.

 

Link to comment

I've tried a variety of tunables settings, and was indeed able to get some improvement; but the bottom line is that v6 definitely has slower parity checks than v5 ... and johnnie's results clearly show that this is NOT a "necessary" outcome, since parity syncs still work at full speed.

 

It's VERY nice that he has an all-SSD array to do this testing with, and has been kind enough to run a variety of tests to help isolate what works and what doesn't.    The results with the SAS2LP clearly show that the parity check issue is NOT with the card or its drives ... but with the parity check code; since both parity syncs and drive rebuilds work just fine.

 

By the way, my parity check finished, and it wasn't nearly as bad as I had anticipated ... it took about 6% longer than the sync [13:21 sync;  14:08 check].    My original results with this system were 31% longer; but after changing the tunables; disabling display updates; and disabling folder caching I had it down to 6% ... and this test shows exactly the same result.    Not bad ... certainly "good enough" to stay with v6 ... but nevertheless 6% longer than it needs to be !!

 

I tend to agree with johnnie that if the root cause of this could be isolated and fixed, I suspect that not only would parity checks run at the same speed as syncs; but it would likely eliminate the issue with older single-core Celerons and would probably resolve the SAS2LP issue as well.

 

 

 

Link to comment

In all testing, did anyone play with the disk setting Force NCQ disabled?

 

Changing it from default Yes to No, does that make a difference for those experiencing performance issues?

 

Force NCQ disabled=NO does not appear to have much effect. Looks like unraid-tunables "Best Bang" settings didn't help much either.

 

167uryt.jpg

Link to comment

Force NCQ disabled=NO does not appear to have much effect. Looks like unraid-tunables "Best Bang" settings didn't help much either.

 

NCQ setting makes negligible difference for me also.

 

You have one of the worst cases I’ve seen, make sure the SAS2LP is on one of the top two slots, in my testing the problem is much worse on any slot that goes through the DMI, in your case the bottom slot.

 

 

Link to comment

Force NCQ disabled=NO does not appear to have much effect. Looks like unraid-tunables "Best Bang" settings didn't help much either.

 

NCQ setting makes negligible difference for me also.

 

You have one of the worst cases I’ve seen, make sure the SAS2LP is on one of the top two slots, in my testing the problem is much worse on any slot that goes through the DMI, in your case the bottom slot.

 

 

It's in the first slot (x8). Actually, opentoe is flashing an H310 to replace my SAS2LP.

Link to comment

...  opentoe is flashing an H310 to replace my SAS2LP.

 

Hopefully that will be a good work-a-round.    But it's a shame that the perfectly good SAS2LP's aren't working with v6 for parity checks => when they work perfectly for syncs or rebuilds.    Hopefully this WILL be resolved ... although Limetech has been silent on this for quite a while => the last Limetech post was on 8 Sept when JonP noted that they HAVE been able to duplicate the issue.    Hopefully they're following this and realize that syncs and rebuilds don't have the issue ... this should exonerate the driver and the card itself, and hopefully they can focus on what's done differently with the parity checks.    Johnnie's testing on this will hopefully provide the clue they need to resolve it !!

 

 

Link to comment

I don't have any SATA controller cards in my system but just as a reference point I just did a Parity Sync  and Parity Check on a new config for my #2 Server, see below.

The parity sync took 5 hrs, 7 min, 10 sec, avg speed 108.5 MB/s.

The parity check took 5 hrs, 5 min, 22 sec, avg speed 109.2 MB/s.

So my check, correcting, was slightly faster than the sync.

My older, 6 yrs old, 1tb drive must be a lot slower than the newer 2TB drives because once it gets past it the speed goes from 63 MB/s or so to more than 115 MB/s.

Link to comment

... My older, 6 yrs old, 1tb drive must be a lot slower than the newer 2TB drives because once it gets past it the speed goes from 63 MB/s or so to more than 115 MB/s.

 

Clearly the older drive has much lower areal density ... it's likely a 500GB/platter unit (or perhaps even 333GB/platter).    Your newer 2TB drive is almost certainly a 1TB/platter drive.

 

Link to comment
  • 2 weeks later...

UPDATE ON THIS ISSUE!  Today we had an interesting discovery on a test system using this controller.  Please review notes below and report back with your own configs/findings from testing.

 

  • The controller originally had 7 x 3TB HDDs and 1 x 2TB HDD attached to it.
  • This resulted in a parity check speed of between 30-40 MB/s, as others have reported (bad performance)
  • Moved the 2TB HDD from the SAS2LP controller to the controller on the motherboard
  • Parity check speed back to normal (over 100 MB/s)

 

We have a few theories on what might be causing this, but it would definitely explain why some people are experiencing this and others aren't.  It's too premature to discuss our theories, but here is my ask:

 

For anyone who has this controller, please report the devices you have attached, their sizes, model numbers, and your average parity check speed.

 

 

Link to comment

For anyone who has this controller, please report the devices you have attached, their sizes, model numbers, and your average parity check speed.

7 x Seagate 3TB ST3000DM001, 1 x Seagate 500GB ST3500630AS (Cache Drive)

 

Cache drive is obviously not involved in parity checks / rebuilds.  Parity Check Speed is ~120MB/s  ie: No slowdowns with this system

 

Link to comment

For anyone who has this controller, please report the devices you have attached, their sizes, model numbers, and your average parity check speed.

I'm not still using the SAS2LP. Replaced it with an H310 (thank you OpenToe). But the attached devices stayed the same for both.

2- 1TB Hitachi HDS721010KLA330

2- 2TB Hitachi HDS721010KLA330

1- 3TB Seagate ST3000DM001-9YN166

1- 4TB Hitachi HDS724040ALE640

2- 4TB WDReds WDC WD40EFRX-68WT0N0

 

Was getting parity check speeds <35 MB/s average with the SAS2LP.

Link to comment

UPDATE ON THIS ISSUE!  Today we had an interesting discovery on a test system using this controller.  Please review notes below and report back with your own configs/findings from testing.

 

For anyone who has this controller, please report the devices you have attached, their sizes, model numbers, and your average parity check speed.

 

Thanks for the update!!

 

I think this is everything you're looking for. Additional system details in signature.

 

 

Disks: (multiple 2TB and 3TB on 2xSAS2LP, parity on mb)

 

Oct 24 18:30:09 Tower emhttp: WDC_WD5000BPKX-00HPJT0_WD-WX61AC4L4H8D (sdb) 488386584 [Cache - on MB SATA]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WMC4N2022598 (sdc) 2930266584 [Parity - on MB SATA]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WMC4N0H2AL9C (sdd) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WMC4N0F81WWL (sde) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N4TRHA67 (sdf) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4NPRDDFLF (sdg) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N1VJKTUV (sdh) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS722020ALA330_JK1101B9GME4EF (sdi) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS5C3020ALA632_ML0220F30EAZYD (sdj) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: ST2000DL004_HD204UI_S2H7J90C301317 (sdk) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: ST2000DL003-9VT166_6YD1WXKR (sdl) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N4EZ7Z5Y (sdm) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS722020ALA330_JK11H1B9GM9YKR (sdn) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS722020ALA330_JK1101B9GKEL4F (sdo) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N3YFCR2A (sdp) 2930266584 [sAS2LP]

 

* I am not prepared to move all my 2TB disks to the motherboard right now (need cables and free time). If others report success, I'll crack open my case when I have time and plug all my 2TB & cache on the MB and leave the 3TB drives on the SAS2LP.

 

 

Last Parity on 5.0.5:

37311 seconds (avg between 10-11 hrs, same drives plugged into same ports)

 

Parity After upgrade to 6.1.3:

Last checked on Sun 25 Oct 2015 12:11:19 PM MST (four days ago), finding 0 errors.

Duration: 17 hours, 38 minutes, 42 seconds. Average speed: 47.2 MB/sec

Link to comment

UPDATE ON THIS ISSUE!  Today we had an interesting discovery on a test system using this controller.  Please review notes below and report back with your own configs/findings from testing.

 

  • The controller originally had 7 x 3TB HDDs and 1 x 2TB HDD attached to it.
  • This resulted in a parity check speed of between 30-40 MB/s, as others have reported (bad performance)
  • Moved the 2TB HDD from the SAS2LP controller to the controller on the motherboard
  • Parity check speed back to normal (over 100 MB/s)

 

We have a few theories on what might be causing this, but it would definitely explain why some people are experiencing this and others aren't.  It's too premature to discuss our theories, but here is my ask:

 

For anyone who has this controller, please report the devices you have attached, their sizes, model numbers, and your average parity check speed.

 

This is definitely interesting, but remember that the card works perfectly with v5 and only has the issue with v6.    And it works perfectly for parity SYNCs (not checks).    But anything that moves you closer to a solution is certainly a worthwhile discovery  :)

Link to comment

UPDATE ON THIS ISSUE!  Today we had an interesting discovery on a test system using this controller.  Please review notes below and report back with your own configs/findings from testing.

 

For anyone who has this controller, please report the devices you have attached, their sizes, model numbers, and your average parity check speed.

 

Thanks for the update!!

 

I think this is everything you're looking for. Additional system details in signature.

 

 

Disks: (multiple 2TB and 3TB on 2xSAS2LP, parity on mb)

 

Oct 24 18:30:09 Tower emhttp: WDC_WD5000BPKX-00HPJT0_WD-WX61AC4L4H8D (sdb) 488386584 [Cache - on MB SATA]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WMC4N2022598 (sdc) 2930266584 [Parity - on MB SATA]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WMC4N0H2AL9C (sdd) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WMC4N0F81WWL (sde) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N4TRHA67 (sdf) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4NPRDDFLF (sdg) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N1VJKTUV (sdh) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS722020ALA330_JK1101B9GME4EF (sdi) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS5C3020ALA632_ML0220F30EAZYD (sdj) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: ST2000DL004_HD204UI_S2H7J90C301317 (sdk) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: ST2000DL003-9VT166_6YD1WXKR (sdl) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N4EZ7Z5Y (sdm) 2930266584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS722020ALA330_JK11H1B9GM9YKR (sdn) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: Hitachi_HDS722020ALA330_JK1101B9GKEL4F (sdo) 1953514584 [sAS2LP]

Oct 24 18:30:09 Tower emhttp: WDC_WD30EFRX-68EUZN0_WD-WCC4N3YFCR2A (sdp) 2930266584 [sAS2LP]

 

* I am not prepared to move all my 2TB disks to the motherboard right now (need cables and free time). If others report success, I'll crack open my case when I have time and plug all my 2TB & cache on the MB and leave the 3TB drives on the SAS2LP.

 

 

Last Parity on 5.0.5:

37311 seconds (avg between 10-11 hrs, same drives plugged into same ports)

 

Parity After upgrade to 6.1.3:

Last checked on Sun 25 Oct 2015 12:11:19 PM MST (four days ago), finding 0 errors.

Duration: 17 hours, 38 minutes, 42 seconds. Average speed: 47.2 MB/sec

 

 

Thanks for the detailed info.  Let's see if we can increase your parity check speeds by running the following commands in the console or ssh session:

 

echo 8 > /sys/block/sdc/queue/nr_requests
echo 8 > /sys/block/sdd/queue/nr_requests
echo 8 > /sys/block/sde/queue/nr_requests
echo 8 > /sys/block/sdf/queue/nr_requests
echo 8 > /sys/block/sdg/queue/nr_requests
echo 8 > /sys/block/sdh/queue/nr_requests
echo 8 > /sys/block/sdi/queue/nr_requests
echo 8 > /sys/block/sdj/queue/nr_requests
echo 8 > /sys/block/sdk/queue/nr_requests
echo 8 > /sys/block/sdl/queue/nr_requests
echo 8 > /sys/block/sdm/queue/nr_requests
echo 8 > /sys/block/sdn/queue/nr_requests
echo 8 > /sys/block/sdo/queue/nr_requests
echo 8 > /sys/block/sdp/queue/nr_requests

 

This will limit the requests each hdd (in the array) is trying to handle at a time from the default 128 down to 8.  No need to run it on cache devices and/or SSDs.  This seems to make a HUGE difference on Marvell controllers.  I suspect earlier versions on Linux (hence unRAID 5.x) had lower defaults for nr_requests.

 

Parity check can be already running or not when you're trying the commands above.  Speed should increase almost instantly...

 

Link to comment

... This will limit the requests each hdd (in the array) is trying to handle at a time from the default 128 down to 8.  No need to run it on cache devices and/or SSDs.  This seems to make a HUGE difference on Marvell controllers.  I suspect earlier versions on Linux (hence unRAID 5.x) had lower defaults for nr_requests.

 

Is this likely to impact ICH10 ports and/or Adaptec 1430SA's ??    Just curious if this is why parity checks are so much slower in v6 vs. v5 on my old C2SEA.

 

I presume it's safe to try -- right?

 

... also, is it persistent until the next reboot ?

 

Finally, does this explain why parity sync works so much faster than checks?  [i.e. does the sync process cause fewer queued requests?]

 

Link to comment

I have 3x3TB and 5x2TB drives on my card, as well as 3x4TB and 1x3TB on motherboard SATA ports.  I applied the above changes and instead of starting sync at 40MB/sec, I was getting 105MB/sec.  The Storage graph on System Stats page showed throughput of about 1.25GB/sec when before it maxed out around 750MB/sec.

 

HOWEVER, 3 minutes after starting the parity check, it slowed down to 40MB/sec for a few minutes, and now is back to 95MB/sec.

 

My last parity check had an average speed of 65.7MB/sec - I will report back when the current one finishes with the new result, fingers crossed.

unraid.PNG.024b89a79195adf73a0d70d04b003d08.PNG

Link to comment

I have 3x3TB and 5x2TB drives on my card, as well as 3x4TB and 1x3TB on motherboard SATA ports.  I applied the above changes and instead of starting sync at 40MB/sec, I was getting 105MB/sec.  The Storage graph on System Stats page showed throughput of about 1.25GB/sec when before it maxed out around 750MB/sec.

 

HOWEVER, 3 minutes after starting the parity check, it slowed down to 40MB/sec for a few minutes, and now is back to 95MB/sec.

 

My last parity check had an average speed of 65.7MB/sec - I will report back when the current one finishes with the new result, fingers crossed.

 

Depending a bit on your CPU, keeping the GUI on screen will impact the parity check speed.  For Max speed, close the window between quick peeks at what is happening and even then don't do it every ten seconds.  Look  at it in more like half hour intervals. 

Link to comment

I don’t want to celebrate too early but it appears you found the cause of the problem!!

 

My test server with 8 32GB SSDs

 

Before: Duration: 5 minutes, 27 seconds. Average speed: 97.9 MB/sec

After: Duration: 2 minutes, 31 seconds. Average speed: 212.0 MB/sec

 

Another server with 11 disks total, 4 on the SAS2LP, you can see the jump in the graph as I entered the commands:

 

 

 

 

sc.png.9d222dfc98574afa834018e650ce882b.png

Link to comment

... This will limit the requests each hdd (in the array) is trying to handle at a time from the default 128 down to 8.  No need to run it on cache devices and/or SSDs.  This seems to make a HUGE difference on Marvell controllers.  I suspect earlier versions on Linux (hence unRAID 5.x) had lower defaults for nr_requests.

 

Is this likely to impact ICH10 ports and/or Adaptec 1430SA's ??    Just curious if this is why parity checks are so much slower in v6 vs. v5 on my old C2SEA.

 

I presume it's safe to try -- right?

 

... also, is it persistent until the next reboot ?

 

Finally, does this explain why parity sync works so much faster than checks?  [i.e. does the sync process cause fewer queued requests?]

 

I'm not sure about ICH10 or your Adaptec card but I can say those nr_request changes didn't negatively affect a Intel C220 SATA controller or LSI SAS2308 controller that had no performance issues to begin with.  It's safe to try :)

 

The defaults will revert back once you reboot the machine. 

 

I presume the parity sync is writing synchronously and blocks more reads from queuing until the parity write is completed which conveniently finishes by time the pending reads complete from the other drives.  In other words, if there are no writes (parity check with no errors) then it'll keep filling the disk queues with async read requests.  Tom would be able to explain it better (or correct my poor interpretation).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.