Jump to content
We're Hiring! Full Stack Developer ×

Parity sync errors with AOC-SAS2LP-MV8


ElJimador

Recommended Posts

I didn't realize there were still issues using this controller with v6.  I had pulled it from my shared Plex server almost a year ago after issues that I later diagnosed as being caused by a bad breakout cable instead when they persisted with the LSI 9211-8i I used to replace it (namely connected drives dropping from the array, although IIRC now there were also random parity sync errors that did not carry over even before I replaced the cable).  Anyway, if any of the issues really were with the SAS2LP-MV8 itself I figured they would have been fixed by now, so I just tried putting it in my home server as an upgrade to the SASLP-MV8 I'd been using in that machine.  Which was a mistake apparently.  Not quite half way into the first parity check with it, it's running a touch slower than w/the SASLP-MV8 (using it in a PCIe 2.0 x16 slot) plus it's already showing 10 sync errors corrected when I never had any sync errors before and I had just run a parity check prior to swapping the cards.  So 2 questions now.

 

1) When I put the SASLP-MV8 back in should I just run a new parity check to correct the false error corrections with the SAS2LP-MV8, or would it be a better idea to a new config with the same drive assignments to rebuild parity altogether?

 

2) I see now that the Hardware Compatibility wiki mentions parity sync errors if you use the SAS2LP-MV8 with 6.1.x or later.  Does that mean I'd be safe to use it with 5.x or 6.0?  I have a new backup server I'm building that's purely going to be a dumb file server without any of the advanced features of 6 required.  So I'd rather not buy a new controller if it isn't necessary, however if there's any chance the card is not going to play nice with those earlier versions either then I would rather spend the money now and just stick with 6.4 to start. Last, if it is a safe bet to work with 5.x or 6.0 can I still download those earlier versions or would I need to email support for that?  

 

Thanks.

Link to comment
4 minutes ago, ElJimador said:

When I put the SASLP-MV8 back in should I just run a new parity check to correct the false error corrections with the SAS2LP-MV8,

this

 

5 minutes ago, ElJimador said:

Does that mean I'd be safe to use it with 5.x or 6.0?

it usually worked fine on v5, possibly because of using the 32bit driver

 

 

Link to comment
2 hours ago, johnnie.black said:

SASLP is also not really recommend anymore, not because of sync errors but because it drops disks sometimes, LSI is the best option for v6.

 

 

Really?  I've only had problems with the SAS2LP.  Other than being slow on parity checks (since it only uses 4 lanes @ PCIe v.1 speed) the SASLP has been rock solid for me.  

 

Maybe I'll go with the LSI anyway though.  I'm seeing that used for around $50 and the speed bump alone is probably worth the difference, especially going with an 8TB parity drive in the new server.  Otherwise parity checks are going to take 2 days.  

Link to comment
16 minutes ago, ElJimador said:

Really?  I've only had problems with the SAS2LP.  Other than being slow on parity checks (since it only uses 4 lanes @ PCIe v.1 speed) the SASLP has been rock solid for me.  

It doesn't affect everyone, but it does cause problems for a number of users, really not surprising since both the SASLP and the SAS2LP use the same driver, which appears to be the main source of the problems on v6.

Link to comment
5 minutes ago, johnnie.black said:

It doesn't affect everyone, but it does cause problems for a number of users, really not surprising since both the SASLP and the SAS2LP use the same driver, which appears to be the main source of the problems on v6.

Good to know.  In that case I'll just stick it back in my home server where it's been working fine.  I was considering switching it with the LSI in my shared Plex server once I finish downsizing the latter but doesn't seem worth the risk now when both cards are already working well enough where they are.  

 

Come to think of it, maybe I'll try the SAS2LP w/the new backup server running 6.4 after all.  The motherboard I'm using for that has 8 SATA ports so if I run the initial array onboard to start and make sure parity is valid before I try connecting them to the card instead then there shouldn't be any risk there, right?  If I get parity sync errors or dropped disks then I'll just know then to buy another card before I need to expand, meanwhile I'd be able to fix any errors by just reconnecting the disks onboard and running a new parity check like I'm going to be doing on the home server now.  

 

My only concern would be not getting any errors initially and having these problems appear only after I've expanded to the full size array.  Even though the data is all backed up it would still be a pain to recover if out of nowhere I have multiple disks drop during a parity check.  But there are v6 users who haven't had any problems with the SAS2LP, correct?  And for those like me who have, don't they usually see it right off the bat?   IDK, seems like it's worth a shot at least.

Link to comment

Also, it bears mentioning that I posted that you should correct parity since there's no way to know which disk(s) were affected but at least in some cases those sync errors are caused by changes to the data disks by the controller, only way to be sure would be to check all files against existing checksums (or scrub if the fs is btrfs).

Link to comment
28 minutes ago, johnnie.black said:

Also, it bears mentioning that I posted that you should correct parity since there's no way to know which disk(s) were affected but at least in some cases those sync errors are caused by changes to the data disks by the controller, only way to be sure would be to check all files against existing checksums (or scrub if the fs is btrfs).

The file system is XFS.  By correcting parity, you just mean running a new parity check with the box checked for "write corrections to parity", right?  Otherwise I'm not sure what you mean by checking all files against existing checksums.

Link to comment
10 minutes ago, ElJimador said:

By correcting parity, you just mean running a new parity check with the box checked for "write corrections to parity", right? 

correct

 

10 minutes ago, ElJimador said:

Otherwise I'm not sure what you mean by checking all files against existing checksums.

The sync errors might be caused by files being altered (corrupted) on the one or more data disks, but it would only be possible to check for that if you already had checksums for all files, in a xfs filesystem these would need to be created before hand, e.g., using the dynamix file integrity plugin, note also that if some files were corrupted it would likely be a small corruption, probably unnoticeable if for example they were on video files, just wanted to alert you that continuing to use that controller if it causes sync errors is a bad idea, example of one of those cases:

 

https://lime-technology.com/forums/topic/50698-monthly-5-parity-errors/?do=findComment&comment=562695

 

Link to comment
1 hour ago, johnnie.black said:

correct

 

The sync errors might be caused by files being altered (corrupted) on the one or more data disks, but it would only be possible to check for that if you already had checksums for all files...

 

 

If you have backup files you can also do a binary compare.  I have sometimes use rsync to do this to compare between main and backup server.  My server backups are organised on a disk by disk basis, with one share on the backup server per disk on the main server.  So I can use something along these lines with the rsync daemon running on the backup server to list in the differences.txt file on my cache drive exactly which files are in error ... 

rsync -rvnc --timeout=18000 /mnt/disk1/ BackupServer::mnt/user/Disk1-backup 2>&1 > /mnt/cache/Disk1_differences.txt &

I also had an SAS2LP causing file corruption, but only on about 3 files.  I switched off VT-d in the BIOS to stop the problem since I don't need it currently for my use case.  That was around 8 months ago.  The card is still giving good service with weekly parity checks and no further errors or file corruptions.

Link to comment
9 hours ago, johnnie.black said:

correct

 

The sync errors might be caused by files being altered (corrupted) on the one or more data disks, but it would only be possible to check for that if you already had checksums for all files, in a xfs filesystem these would need to be created before hand, e.g., using the dynamix file integrity plugin, note also that if some files were corrupted it would likely be a small corruption, probably unnoticeable if for example they were on video files, just wanted to alert you that continuing to use that controller if it causes sync errors is a bad idea, example of one of those cases:

 

https://lime-technology.com/forums/topic/50698-monthly-5-parity-errors/?do=findComment&comment=562695

 

Yikes, that's pretty scary.  I had assumed any errors thrown up during the parity checks with the SAS2LP would be confined to parity and not corrupting the actual data.  Now that I know that's not necessarily the case I'm going to be a lot more careful in any attempts to test it on the new backup server.  Because in my case it's not going to be a 100% backup since it's also going to double as my dad's home server, so I wouldn't be able to use the same method S80_UK described to identify what got corrupted (assuming I even had the patience to do all that in the first place).  

 

And since I wasn't aware of dynamix or any other tools to check file integrity before, I guess if there was any real data corruption in my case there's not going to be any way for me to know where it happened until I stumble on it.  Which is hardly ideal, but what else can I do at this point?  The parity check did finish with 10 errors btw and I've got the old SASLP installed again and running a new parity check now.  So I'll report back if there's any further weirdness with that.

Link to comment
7 hours ago, S80_UK said:

 

If you have backup files you can also do a binary compare.  I have sometimes use rsync to do this to compare between main and backup server.  My server backups are organised on a disk by disk basis, with one share on the backup server per disk on the main server.  So I can use something along these lines with the rsync daemon running on the backup server to list in the differences.txt file on my cache drive exactly which files are in error ... 


rsync -rvnc --timeout=18000 /mnt/disk1/ BackupServer::mnt/user/Disk1-backup 2>&1 > /mnt/cache/Disk1_differences.txt &

I also had an SAS2LP causing file corruption, but only on about 3 files.  I switched off VT-d in the BIOS to stop the problem since I don't need it currently for my use case.  That was around 8 months ago.  The card is still giving good service with weekly parity checks and no further errors or file corruptions.

Great info, thanks!  As I just mentioned to Johnnie, I'm not sure the rsync compare would be much use to me given that my backup server is not going to be a 100% backup like yours, however I will definitely make sure VT-D is turned off in my new server and cross my fingers that that helps prevent any issues using the SAS2LP in that machine to begin with.

Link to comment
1 hour ago, Squid said:

Actually, I proved that the 5 recurring parity check errors had nothing to do with the interrupts, etc but were rather due to the type of hard drives connected to the SAS2LP

 

https://lime-technology.com/forums/topic/59091-marvel-issues-starting-point-for-investigation/  The other issues that some users have with Marvel though are still indeterminate as to their cause / solution

 

Hi Squid.  Thanks for this however just FYI, none of the drives in my home server are ATA version ATA8-ACS and none in my shared Plex server either where I'd previously had a lot of problems with disks dropping and recurring parity check errors with the SAS2LP-MV8 (the connected drives in both were all 3-6TB WD Reds + my Crucial SSD cache drives). Also FWIW, IOMMU wasn't enabled on my home server prior to these parity check errors either (though I'm pretty sure it was on the shared Plex server when I was having the issues there).  

 

Question for anyone who wants to chime in:  the parity check w/the original SASLP-MV8 reinstalled is past 57% now with 0 errors corrected.  Assuming it completes with 0 errors, is that actually a good thing?  Because whether the 10 errors with the SAS2LP reflected incorrect changes to parity or corruption to the data itself, I was thinking the first parity check after would have to "re-correct" those before subsequent checks would return to 0.  If not, could that be an indicator that nothing actually got corrupted?  Or what should I read into that?  

Link to comment
9 minutes ago, ElJimador said:

Question for anyone who wants to chime in:  the parity check w/the original SASLP-MV8 reinstalled is past 57% now with 0 errors corrected.  Assuming it completes with 0 errors, is that actually a good thing?  Because whether the 10 errors with the SAS2LP reflected incorrect changes to parity or corruption to the data itself, I was thinking the first parity check after would have to "re-correct" those before subsequent checks would return to 0.  If not, could that be an indicator that nothing actually got corrupted?  Or what should I read into that?  

If like the SAS2LP changed some of your data and then those changes were what caused the sync errors, and if it was a correcting check, having another controller won't change that data back so 0 sync errors are expected now.

Link to comment
5 minutes ago, johnnie.black said:

If like the SAS2LP changed some of your data and then those changes were what caused the sync errors, and if it was a correcting check, having another controller won't change that data back so 0 sync errors are expected now.

Oh right.  That makes sense.  Well in that case I'm going to assume this check will complete at 0 errors and I'll be back to normal except for not knowing what got corrupted until I try to play one of those MKVs.  Hopefully not anything I'll miss too much.  Anyway, thanks again.

Link to comment
9 minutes ago, ElJimador said:

Oh right.  That makes sense.  Well in that case I'm going to assume this check will complete at 0 errors and I'll be back to normal except for not knowing what got corrupted until I try to play one of those MKVs.  Hopefully not anything I'll miss too much.  Anyway, thanks again.

If the corrupted files were videos there should be very small corruption, likely undetectable by just playing the file.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...