Jump to content
Joseph

[Resolved] 5 Errors After Every Parity Check

166 posts in this topic Last Reply

Recommended Posts

CORRECTED TITLE:  5 Parity Errors After Every Reboot

 

Hey Everybody, I've been pulling my hair out on this one guys.... on my unRAID box, I keep getting 5 parity errors that are not being corrected after every monthly and even manual parity checks. There are numerous errors in the syslog that I can't make heads or tails out of as well. I rebooted late last night and ran parity check overnight. Can someone take a look and the posted log and advise? Any help is much appreciated!!

 

Write corrections to parity is checked:

Last check completed on Sun 02 Apr 2017 10:37:14 AM CDT (today), finding 5 errors.
Duration: 10 hours, 24 minutes, 54 seconds. Average speed: 106.7 MB/sec

 

TL;DR: Don't use Marvell HDD controllers. Issue resolved by removing SATA cables from onboard Marvell controller and replacing the SAS2LP-MV8 HBA with a Dell H310, flashed to LSISAS2008 (P20).

Edited by Joseph
Issue Resolved

Share this post


Link to post

It is for a small number of users. Sometimes it helps to disable vt-d (if enable and you don't need it), other than that check for bios updates and try a different pcie slot if available, if issues persist consider getting an LSI based controller.

Share this post


Link to post
8 hours ago, johnnie.black said:

It is for a small number of users. Sometimes it helps to disable vt-d (if enable and you don't need it), other than that check for bios updates and try a different pcie slot if available, if issues persist consider getting an LSI based controller.

<sarcasm> grrreeeeaaaaat </sarcasm>  I bought this controller brand new right at a year ago too. :( Grrrr!

 

Anyways, thanks for your response Johnny. I can't remember when this issue started, but I suspect it occurred once my drive array grew from 3 or so drives to the 13 (includes 2 parity & 2 cache) drives I have now.

 

I run VMs, so I need vt-d enabled. The Mobo and the SAS2LP have the "latest" firmware. fwiw, I believe asus has stopped supporting the mobo. Moving the controller to another slot would be rather challenging because of the limitations of the slots and the existing cards installed.

 

Hopefully you can or will be willing to point me in the right direction on a few more related questions:

  • What controller board would you recommend for SATA 3 drives? (I see that supermicro sells the LSI00188 9200-8E)
  • Is there any real harm in having just 5 parity errors? (in other words how many files are affected by each parity error?)
  • How was it determined that controller is at fault? is there a whitepaper, discussion or perhaps some sort of an acknowledgement from supermicro?
  • If this is an known issue, is there a hardware recall?
  • Other than what you suggested, is there any other recourse?

Looking forward to hearing your thoughts.

 

Thanks again!!

 

Edited by Joseph
fixed typo

Share this post


Link to post

I had a SAS2LP controller in my SuperMicro server, I started getting regular errors in my parity check too, then things went crazy. In my case it may have been a compatibility issue between the controller and my backplane but they are made by the same company.... In any case I got rid of it and replaced it with an IT flashed Dell Perc H310 and my issues went away.

Share this post


Link to post
3 hours ago, Joseph said:

What controller board would you recommend for SATA 3 drives? (I see that supermicro sells the LSI00188 9200-8E)

 

Any SAS2008 based controller, the 9200-8e is one, but the ports are external. The equivalent internal model is the 9210-8i or 9211-8i, most buy the cheaper IBM M1015/Dell H310/H200 on ebay and then crossflash them to LSI IT mode.

 

3 hours ago, Joseph said:

Is there any real harm in having just 5 parity errors?

 

1 parity error is one too many, a single error is enough to cause one file to be corrupt after a rebuild (though depending on the file corruption may or not be noticeable, e.g., a single or very few errors probably wouldn't be noticeable on a video file).

 

4 hours ago, Joseph said:

How was it determined that controller is at fault? is there a whitepaper, discussion or perhaps some sort of an acknowledgement from supermicro?

 

I can't say the problem is controller, the driver or the combination of the controller and the hardware used, but I'm 99.99% sure than changing to a LSI will fix your issues.

 

4 hours ago, Joseph said:

If this is an known issue, is there a hardware recall?

 

It's a know issue with unRAID v6, but like I said it only affects a small percentage of users, there's no recall.

 

4 hours ago, Joseph said:

Other than what you suggested, is there any other recourse?

 

Already mentioned the workarounds I know, disabling vt-d is usually the one that works best.

Share this post


Link to post

I have installed a SAS2LP in my Backup Server, disabled VT-d and still having parity check errors.
I ordered a Dell Perc H200 today and will report back If that card is going to resolve my issues.
Btw. is it safe to run the parity check once the new controller is installed with the "Write corrections to parity" option?

Share this post


Link to post

Thanks Johnny for shedding light on the issue and various workarounds.

 

Edgar, Assuming your hardware is in working order it should be ok to run the parity check and write corrections. That will make them accurately record the parity for your setup.

Please let me know what you find out once you get the replacement card so I can purchase one.

 

Thank you!!

Share this post


Link to post

Does anyone know the difference between the Dell H310 0HV52W and the Dell H310 R1DNH? They look identical except for the sticker with the datamatrix code and the sticker on the chip.

 

0HV52W

http://www.ebay.com/itm/DELL-HV52W-RAID-CONTROLLER-PERC-H310-6GB-S-PCI-E-2-0-X8-0HV52W-/201657131656?hash=item2ef3b3a288:g:FfAAAOSwFdtXxe5r

 

R1DNH

http://www.ebay.com/itm/Dell-R1DNH-0R1DNH-PERC-H310-6GB-s-Low-Profile-SAS-RAID-Controller-w-Cable-B4-E-/131803966259?hash=item1eb020eb33:g:ZKkAAOSw9r1WCYXr

Share this post


Link to post
11 hours ago, ashman70 said:

I had a SAS2LP controller in my SuperMicro server, I started getting regular errors in my parity check too, then things went crazy. In my case it may have been a compatibility issue between the controller and my backplane but they are made by the same company.... In any case I got rid of it and replaced it with an IT flashed Dell Perc H310 and my issues went away.

Which one is it, the 0HV52W or R1DNH? (See my other post below with links)

Share this post


Link to post

No idea, I'd have to open my server and  physically look at my card. Are you connecting the HBA to a backplane or directly to drives?

Share this post


Link to post
8 minutes ago, ashman70 said:

No idea, I'd have to open my server and  physically look at my card. Are you connecting the HBA to a backplane or directly to drives?

Each cable connects to a drive cage with SATA drives installed. I believe this is equivalent to connecting directly to the hard drives.

Share this post


Link to post

Either will work after being flashed to IT mode.

Share this post


Link to post

Is there an online guide you can point me to for the correct firmware and how to flash? I found one online that was somewhat confusing and in the comments a couple people said their H310 was bricked or had problems.

Edited by Joseph
additional thoughts

Share this post


Link to post

So is there a way within unRAID to stress the SAS cards? I have two Supermicro AOC-SAS2LP-MV8 and over the weekend my whole box became unresponsive during a parity check. I ended up doing a hard reset which I then believed caused parity errors (200 or so). My worry now is how to know if it is the cards, a psu issue (19 hdds and 2 SSDs on a Corsaid 850 psu).

My unraid server is headless and so I am wondering can it be tested without attaching a monitor and keyboard.

Thoughts?

Patrick

Share this post


Link to post

I had all kinds of random parity errors, and I have 30 drives in my server. In my case it was the SAS2LP card, once I replaced it the random parity errors went away. You know your server better then any of us, so I am sure you can make an educated guess as to where the real problem lies.

Share this post


Link to post

fwiw, I just pulled the trigger on the eBay one marked HV52W. Fast ship -- the card should be here by this Thursday. Will let you know how it goes.

Share this post


Link to post

Well I just upgraded my server with a new case and added additional hard drives. The previous weekend the parity check ran fine with no errors, it was this weekend that it had issues. I am also wondering about heat, the new box has the drives running a little hotter and so I would assume the SAS cards are also running a little hotter. Also, in the old setup I only had a single SAS card, I added a second one in the new setup to facilitate the extra drives.

Share this post


Link to post
10 minutes ago, crowdx42 said:

My worry now is how to know if it is the cards, a psu issue (19 hdds and 2 SSDs on a Corsaid 850 psu).

The only stress test I know of is the preclear test (which as you know, you will lose data.) The thing is, if you're stress testing with faulty hardware, you won't be doing your drives or your mental state any favors. I'd re-seat all power and data cables. I did this and wound up replacing some low quality Y power connectors with higher quality ones and found a couple of items that weren't seated properly on my build. DOH!

 

Share this post


Link to post

Anybody wanna buy a gently used SAS2LP-MV8? I'll marked it down 5%... 1% for each parity error I'm receiving! :D

Share this post


Link to post

OK Guys, the H310 HV52W arrived today, however I tried it in 2 boxes and when the card is installed I get no post, no beep. The green led one the card is blinking continually. Any thoughts on what needs to be done so I can get the card flashed for unRAID?

Edited by Joseph
clarification

Share this post


Link to post

Just read up about masking the pins... trying that now.

Crazy, but it worked. :)

 

just flashed the card. Its no longer seen on boot; hope I didn't just Bork it! :S

Gonna test in unRAID box.

 

PEEEOOOWWWW...Success! The boot time was so much faster I thought for sure the card was bricked. *whew*

Will run a couple of parity checks and report back.

 

NEW CARD: DID NOT FIX PROBLEM!!

FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!!  FAIL!! 

Edited by Joseph
FAIL!

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.