[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)


TODDLT

Recommended Posts

  • Replies 453
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Yeah and the issue I'm having is the redball one.

 

So just to confirm I do the below steps once I get my H310:

 

 

Download the files in Dell link in this post: http://lime-technology.com/forum/index.php?topic=12767.msg409244#msg409244

 

Then perform the steps in this post:  http://lime-technology.com/forum/index.php?topic=12767.msg409058#msg409058

 

 

Thanks,

 

Doug

 

That’s what I used to flash mine.

Link to comment

Well I got my card, but apparently have no way to flash it.  None of my PCs except the UNRAID server have the correct PCI Express slots for this card.  I can't seem to get my UNRAID server to boot into DOS.  I tried one USB stick and when it boots my screen just says "Invalid Partition".  I tried another stick which was actually my other UNRAID stick for a test server and that one won't boot either.  I used that particular stick because I know it boots okay from USB.

 

I confirmed both USB sticks boot to DOS on another PC, but that PC allows me to hit F8 for a boot menu and choose the USB.  My UNRAID server has the SuperMicro X8SIL which doesn't have a boot menu, but allows you to chose the USB stick as the first boot device.

 

I also tried the H310 in a PC with a PCIe x16 slot, but it wouldn't boot with the card in it.  Booted find after taking the H310 out.

 

Anyone have any ideas how I can get this done?

Link to comment

Scratch that.  I got it to boot by removing the power connectors from all my drives.

 

Now the only problem is when I try the first command all I get back is the "Unknown command megarec" error.  That file isn't in the files provided in the link.  Where does this come from?

 

Doug

 

Try Googling your error message.  I got several hits....

 

 

Link to comment

Ok I got the megarec utility by downloading Fireball's package.  Card is successfully flashed!  Server booted up and everything looks normal.  Now to see if I can get all my drives working correctly again and do a parity check without get a random redball.

 

Thanks everyone for the help.

 

Doug

Link to comment

Okay, I have received my pair of Dell PERC H310 controllers, flashed the firmware, and completed a parity check.

 

unRAID 5.0.6 with Supermicro AOC-SAS2LP-MV8 controllers:

 

Nov  1 08:42:22 Cortex kernel: md: sync done. time=34871sec (unRAID engine)

Nov  1 08:42:22 Cortex kernel: md: recovery thread sync completion status: 0 (unRAID engine)

 

unRAID 6.1.6 with Dell PERC H310 controllers:

 

Dec 11 01:47:10 Cortex kernel: md: sync done. time=32663sec

Dec 11 01:47:10 Cortex kernel: md: recovery thread sync completion status: 0

 

Timing is a bit better than before.  Didn't have any issues at all with the drives/controllers.  In fact, there's literally nothing in the syslog between the start/end of the parity check except my login to the console.

 

My only issue now is I have a bunch of drives that are flagged for potential SMART failure, due to the "Command Timeout" flag ... a result of the SAS2LP controllers losing track of the drives and thus causing the redball issue.  Dunno what if anything can be done about that?  The drives are good.  At the very least, I can keep watching the syslog for any future errors.

 

PS:  If anyone wants to gift me some forward breakout cables, I don't mind holding onto the SAS2LP cards and testing on my alternate server.  PM me if you have some spares.

Link to comment

My only issue now is I have a bunch of drives that are flagged for potential SMART failure, due to the "Command Timeout" flag ... a result of the SAS2LP controllers losing track of the drives and thus causing the redball issue.  Dunno what if anything can be done about that?  The drives are good.  At the very least, I can keep watching the syslog for any future errors.

 

Most believe there’s no point of monitoring that attribute for Seagates, you can disable it on each Seagate disk or globally on global SMART settings.

Link to comment

My only issue now is I have a bunch of drives that are flagged for potential SMART failure, due to the "Command Timeout" flag ... a result of the SAS2LP controllers losing track of the drives and thus causing the redball issue.  Dunno what if anything can be done about that?  The drives are good.  At the very least, I can keep watching the syslog for any future errors.

 

Most believe there’s no point of monitoring that attribute for Seagates, you can disable it on each Seagate disk or globally on global SMART settings.

 

I did notice this only affected my Seagate drives ... I made the change and it solved my problem.  Thanks for the tip!

Link to comment

Ok I got the megarec utility by downloading Fireball's package.  Card is successfully flashed!  Server booted up and everything looks normal.  Now to see if I can get all my drives working correctly again and do a parity check without get a random redball.

 

Thanks everyone for the help.

 

Doug

 

Well I'm able to rebuild drives and do parity checks now without getting read errors on other drives.  So that's good.  My last parity check average looks like it was around 90 MB/s.

 

Unfortunately now I'm dealing with file system corruption issues, most likely because of all the rebuilds I tried to do when the SAS2LP was causing the redball errors.  The disks with corruption are the same ones that were redballing.  Whatever the bug is with that card is pretty ugly in v6.  I never had trouble with v5 and now v6 has been nothing but headaches.

 

Hopefully after getting this last drive file system check done I can get back to a stable environment I can just enjoy again.

 

Doug

Link to comment

I'm still experiencing a system hang every ~ 2-4 days w/ my 2xSAS2LP and RFS drives triggered during writes. My system load will start climbing and never stop (gets well into triple digits)... shares become unresponsive and eventually all I can do is telnet in. However, Powerdown does't work, nor can I stop the array or unmount drives manually. There is absolutely nothing in the logs that I can see when this happens.

 

In the past, when I would run the mover (usually nightly) the problem would surface after a few days while the mover was running and I'd have to hard-boot to recover. Likewise, when the cache drive was disabled, the problem would happen every 2-4 days (all my data drives are RFS and on SAS2LP, Parity is on motherboard).

 

My 500GB cache drive is XFS and uses a motherboard SATA connector. I've changed it to run the mover only monthly and have been going 10 days now writing to it daily via Sonarr and CP without issue (without running mover).

 

My plan is to run through the holidays like this without running the mover and if it doesn't hang I think I can safely assume its a combination of the SAS2LP+RFS (and maybe something else specific to my system??) triggering the issue. Someone mentioned to me that moving all their drives from RFS to XFS also corrected this issue for them, but honestly that'll take a LONG time.

 

I have acquired 2xm1015s to install after the new year. I guess if my issue persists with the new controllers, I'll have to decide if I should move ahead with the RFS->XFS conversion or go back to unRAID v5.0.5 which I rarely had to think about. I considered moving to one of the newer unRAID versions now but I don't want to interrupt my test and I don't see anyone saying it resolved this particular issue for them.

 

Hopefully I'll not have any crashes while writing solely to the cache and things will be back to the level of stability I enjoyed in 5.0.5 again after I move to m1015s.

Link to comment

A system hang like you describe is how I found this last round of file system corruption.  I was doing a parity check and near the end the webgui became unresponsive.  I could telnet in still so I took a quick look at the syslog and at the end of it I saw the file system errors telling me to run reiserfsck.  I tried to reboot from the command prompt but even that hung and I had to manually power off the system.  After rebooting I ran the reiserfsck on the disk that was throwing the error and I needed to use the --rebuild-tree option.  That will still be running for several hours, so while I wait for that I'm checking the other drives in another session just to be sure I don't have more.

 

Good luck resolving your issue.  After all the issues I've had I'd recommend replacing those controllers sooner rather than later, especially since you have them in hand.  I've lost some of my data due to these issues.  Luckily nothing important, but still a PITA.

 

Link to comment

Thanks. I have never had a problem with parity checks (aside from them being super slow before LT fixed that) or red-ball but decided to run reiserfsck on each drive after the last hang. Everything came back fine so thankfully I don't seem to be impacted by the corruption issue. I agree that though LT has been fixing things as they can, this controller just seems to cause trouble. I'm done with them. As soon as I get back from my holiday trip, they're coming out and going on eBay.

Link to comment

... I'm done with them. As soon as I get back from my holiday trip, they're coming out and going on eBay.

 

Easy to understand that sentiment.  I have no idea what happened in v6 that has made these so much of a hassle, but I agree I'd certainly not buy one now.  Not very long ago they'd have been my first choice for a new 8-port card ... but that's certainly no longer the case.

 

What's difficult to understand is why they work perfectly in Windows machines or with v5 of UnRAID ... and, for that matter, on SOME folks v6 systems, but not others.    [i suspect if we had more complete data we'd find some relationship between the system that are having problems and a particular chipset (or chipsets)]

 

 

Link to comment

Something interesting with these cards. I have three friends all using them in new builds. New as in v6 builds. None of them had a v5 machine and all three systems are working fine.

 

Are they all using XFS (or BTRFS)? I suspect its localized to SAS2LP-MV8 cards + ReiserFS + Writing operations + (possibly) some additional factor in my setup.

Link to comment

Something interesting with these cards. I have three friends all using them in new builds. New as in v6 builds. None of them had a v5 machine and all three systems are working fine.

 

Are they all using XFS (or BTRFS)? I suspect its localized to SAS2LP-MV8 cards + ReiserFS + Writing operations + (possibly) some additional factor in my setup.

 

I suffered from these issues as well, and had converted my entire server to XFS.

Link to comment

Something interesting with these cards. I have three friends all using them in new builds. New as in v6 builds. None of them had a v5 machine and all three systems are working fine.

 

Are they all using XFS (or BTRFS)? I suspect its localized to SAS2LP-MV8 cards + ReiserFS + Writing operations + (possibly) some additional factor in my setup.

 

I will ask and find out...

Link to comment

Something interesting with these cards. I have three friends all using them in new builds. New as in v6 builds. None of them had a v5 machine and all three systems are working fine.

 

Are they all using XFS (or BTRFS)? I suspect its localized to SAS2LP-MV8 cards + ReiserFS + Writing operations + (possibly) some additional factor in my setup.

 

Interestingly enough, the problem with SAS2LP (at least in my case) seemed to come during a Parity Check.  Isn't that a read-only operation at the sector level, and is therefore considered file-system agnostic?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.