[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)


TODDLT

Recommended Posts

Does the nr_requests script need to be added to the Go file while we wait for the next update?

 

I boot very rarely, so it's easy enough to just manually do the updates when I do that.    I'm optimistic that this issue will be resolved a future release (hopefully the next one) by either changing the default, or by providing a disk "tunable" that allows the user to set the value.

 

 

Link to comment
  • Replies 453
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

I couldn't help notice you mention this a couple times and somehow missed it up until tonight.  What do we know about this issue?  Do we know what the problem configurations are?  Is there someway to figure out if you have this issue before it causes a data issue, or is there no way to know until it's too late?

 

If there is some way to "know" if i am safe or not, I'll try the SAS2LP again this weekend and see what happens.

Certainly, we can't ask anyone to test something that could cause data issues, so ideally there's someone who can try it on a test system.  Or at least on a system with EVERYTHING backed up elsewhere.  If you do decide to test, we'll be grateful!  But it IS YOUR data, not ours we're risking!  On the other hand, if it has already caused parity issues, where you can't trust the current validity of your parity drive, then you're already in trouble, and have less to lose.  You NEED a parity drive you have confidence in, so this testing has clear benefits over the risks for you.  (not sure how clearly I worded that)

 

 

I have run the SAS2LP in my server before.  Slow speeds but no errors.  I'm confident in my parity as it stands right now. 

 

So does not having seen a parity check error occur in one parity check on this server mean I'm safe (as it relates to this specific error)?  Is this the type of problem that either you have always due to hardware configuration or never have?  Or should I continue to be concerned?

 

 

Link to comment

I am testing a build right now that has nr requests as a tunable setting in the webgui (amongst many many other changes)

 

BTW, this would be a VERY useful feature if there's any chance of getting it into the next release:

http://lime-technology.com/forum/index.php?topic=43765.0

 

... it would have allowed two cases I'm aware of (and probably more) to be easily resolved instead of resulting in lost data.

 

Link to comment

I couldn't help notice you mention this a couple times and somehow missed it up until tonight.  What do we know about this issue?  Do we know what the problem configurations are?  Is there someway to figure out if you have this issue before it causes a data issue, or is there no way to know until it's too late?

 

If there is some way to "know" if i am safe or not, I'll try the SAS2LP again this weekend and see what happens.

Certainly, we can't ask anyone to test something that could cause data issues, so ideally there's someone who can try it on a test system.  Or at least on a system with EVERYTHING backed up elsewhere.  If you do decide to test, we'll be grateful!  But it IS YOUR data, not ours we're risking!  On the other hand, if it has already caused parity issues, where you can't trust the current validity of your parity drive, then you're already in trouble, and have less to lose.  You NEED a parity drive you have confidence in, so this testing has clear benefits over the risks for you.  (not sure how clearly I worded that)

 

 

I have run the SAS2LP in my server before.  Slow speeds but no errors.  I'm confident in my parity as it stands right now. 

 

So does not having seen a parity check error occur in one parity check on this server mean I'm safe (as it relates to this specific error)?  Is this the type of problem that either you have always due to hardware configuration or never have?  Or should I continue to be concerned?

 

I don't think anyone here can definitively say there's no risk, or how big a risk, because I doubt anyone here has seen the code, examined it and figured out what's wrong, and that's what it would take to be sure of what's risky and what's not.  But if all you have experienced is parity check slowness, then there are no reports or even hints of problems with that.

 

My concern and cautions were more for those with a SAS2LP that has caused more serious issues, either parity errors or possibly even data loss.

Link to comment

Dumb question...  Do we think this setting will improve parity check speeds with the original SASLP card? (and V6)

 

No idea.  Looking at SuperMicro's specs, it IS based on a Marvell chip (Marvell 6480), but I don't know if that has the same issues as the 9480 chip used in the SAS2LP.

 

Very easy to test => just do a parity check (Just run it for perhaps 10 minutes and update the status);  then change the nr_requests values and repeat the process.

 

Ok..  With the default..  I was at 90-95MB/s  With the change I was at ~102-106MB/s

So there is a speed up for the SASLP original card as well.

 

I guess I put this in the go script?

 

Jim

 

Just out of curiosity and to see how your speed compares with my own testing, how many disks are on the SASLP? I’m guessing no more than 6?

Link to comment

 

Just out of curiosity and to see how your speed compares with my own testing, how many disks are on the SASLP? I’m guessing no more than 6?

  I have 8...  but 2 are not in the array .. in fact..  I may have been less than 6 as I'm not sure where my cache disks (2) are located!

 

Jim

If I trust the odering SDa- SDx...  I had 5 disks in the parity check on the SASLP controller. (11 total in the array including parity)

Link to comment

 

Just out of curiosity and to see how your speed compares with my own testing, how many disks are on the SASLP? I’m guessing no more than 6?

  I have 8...  but 2 are not in the array .. in fact..  I may have been less than 6 as I'm not sure where my cache disks (2) are located!

 

Jim

If I trust the odering SDa- SDx...  I had 5 disks in the parity check on the SASLP controller. (11 total in the array including parity)

 

I expected 6 arrays disks as your speed is very close to what I got from my tests with 6 SSDs:

 

default - 93.1MB/s

nr_r.=8 - 105MB/s

 

In any case your improvement is in line with what I get with 5 to 8 disks, between 9 and 12%, so I believe most SASLP users will also benefit from this tweak, naturally only if there aren't any other bottlenecks.

Link to comment

So another update that I think everyone will be happy to hear.  Tom wasn't happy with having to resort to the nr requests tweak to solve this issue.  Instead, he has been examining and tweaking driver code to see if he could make improvements from that side.  I'm pleased to report that in testing today, we were able to get back to pretty much full v5 performance on our latest build with this controller and without any tweaking to the nr requests item.

 

We still will have nr requests as a tunable setting under disk settings, but we are hopeful that most won't even have to resort to that with 6.2.

Link to comment

That's great news, Jon!

 

I thought I would report my nr_requests results.  The array is a 6TB parity, 6TB data, and 3 3TB data drives.  I went from a parity check time of just under 26 hours averaging 64.6 MB/s to 15.5 hours at an average of 107.5 MB/s.  I don't have good CPU utilization figures but it appears to have gone up modestly - which I suppose makes sense since the CPU is now working harder rather than sitting in wait states.  All in all a huge improvement!

 

I'm one of the people who had consistently reproducible false red balls during parity checks.  I implemented a bunch of workarounds to get past that.  Hopefully between the nr_requests change and/or the driver tweaks Tom is working on that issue will be addressed as well.

Link to comment

 

I don't think anyone here can definitively say there's no risk, or how big a risk, because I doubt anyone here has seen the code, examined it and figured out what's wrong, and that's what it would take to be sure of what's risky and what's not.  But if all you have experienced is parity check slowness, then there are no reports or even hints of problems with that.

 

My concern and cautions were more for those with a SAS2LP that has caused more serious issues, either parity errors or possibly even data loss.

 

What I was trying to figure out is if you heard this being a sleeper. IE everything working fine, no issues and then one day you are getting problems, or if its one of those things that would show up as soon as i tried to do a parity check?

 

Link to comment

So another update that I think everyone will be happy to hear.  Tom wasn't happy with having to resort to the nr requests tweak to solve this issue.  Instead, he has been examining and tweaking driver code to see if he could make improvements from that side.  I'm pleased to report that in testing today, we were able to get back to pretty much full v5 performance on our latest build with this controller and without any tweaking to the nr requests item.

 

Although the tweak fixed the issue for me (and from reading this thread appears to work for everyone else with the SAS2LP) I’m very happy that Tom has fixed the underlying issue, I believe that the SAS2LP performance has degraded further with almost every new kernel release since V5beta12, so there was a risk that the issue could reemerge in the future.

 

I am testing a build right now that has nr requests as a tunable setting in the webgui (amongst many many other changes)

We still will have nr requests as a tunable setting under disk settings, but we are hopeful that most won't even have to resort to that with 6.2.

 

Hmmm, I wonder if that means I’ll have to buy a new disk for my servers soon… :P

 

Link to comment

So another update that I think everyone will be happy to hear.  Tom wasn't happy with having to resort to the nr requests tweak to solve this issue.  Instead, he has been examining and tweaking driver code to see if he could make improvements from that side.  I'm pleased to report that in testing today, we were able to get back to pretty much full v5 performance on our latest build with this controller and without any tweaking to the nr requests item.

 

We still will have nr requests as a tunable setting under disk settings, but we are hopeful that most won't even have to resort to that with 6.2.

 

Great news!

 

Just curious, are the tweaks specific for SAS2LP, or will for example other Marvell users benefit?

Link to comment

So another update that I think everyone will be happy to hear.  Tom wasn't happy with having to resort to the nr requests tweak to solve this issue.  Instead, he has been examining and tweaking driver code to see if he could make improvements from that side.  I'm pleased to report that in testing today, we were able to get back to pretty much full v5 performance on our latest build with this controller and without any tweaking to the nr requests item.

 

We still will have nr requests as a tunable setting under disk settings, but we are hopeful that most won't even have to resort to that with 6.2.

 

Excellent news.  Is this a general update to Marvell-based drivers, or is it specific to the SAS2LP?    To be specific for my case (and, I suspect many others, since Tom used to use these cards a lot) will it help with 1430SA's?    Glad to see the nr_requests is still going to be a tunable "just in case" ... but it'd definitely be nice if this wasn't needed.

 

I hesitate to ask ... but curiosity demands that I do [  :) ] => any idea of the likely timeline for this update ?

 

Link to comment

So another update that I think everyone will be happy to hear.  Tom wasn't happy with having to resort to the nr requests tweak to solve this issue.  Instead, he has been examining and tweaking driver code to see if he could make improvements from that side.  I'm pleased to report that in testing today, we were able to get back to pretty much full v5 performance on our latest build with this controller and without any tweaking to the nr requests item.

 

We still will have nr requests as a tunable setting under disk settings, but we are hopeful that most won't even have to resort to that with 6.2.

 

Excellent news.  Is this a general update to Marvell-based drivers, or is it specific to the SAS2LP?    To be specific for my case (and, I suspect many others, since Tom used to use these cards a lot) will it help with 1430SA's?    Glad to see the nr_requests is still going to be a tunable "just in case" ... but it'd definitely be nice if this wasn't needed.

 

I hesitate to ask ... but curiosity demands that I do [  :) ] => any idea of the likely timeline for this update ?

It was an update to the unRAID driver, not the Marvell controller driver.

 

As far as release timeframe, I think we have been doing a pretty good job with follow up releases since 6 was pushed out (already at 6.1.3, actively working on 6.2 now).  So when I say "soon" here, take it in that context.

Link to comment

 

I don't think anyone here can definitively say there's no risk, or how big a risk, because I doubt anyone here has seen the code, examined it and figured out what's wrong, and that's what it would take to be sure of what's risky and what's not.  But if all you have experienced is parity check slowness, then there are no reports or even hints of problems with that.

 

My concern and cautions were more for those with a SAS2LP that has caused more serious issues, either parity errors or possibly even data loss.

 

What I was trying to figure out is if you heard this being a sleeper. IE everything working fine, no issues and then one day you are getting problems, or if its one of those things that would show up as soon as i tried to do a parity check?

 

From all reports, there's nothing 'sleeper' about this, problems show up fairly quickly.  The problems may be different on different systems, but is consistent within any given system.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.