Jump to content
Joseph

[Resolved] 5 Errors After Every Parity Check

166 posts in this topic Last Reply

Recommended Posts

Yes, I have noted they Supermicro AOC-SAS2LP-MV8 does get hot. I wonder if moving my parity disks to separete controllers helped resolve my issue. I do have a server chassis also which is pretty good airflow. It makes sense that parity checks would uncover these types of issues as it is not often in a daily unRAID setup that all drives are spun up and actually working.

Share this post


Link to post

Is there a read-me-first-before-building-an-unRAID-box stickie somewhere that has a list of hardware to avoid?

Share this post


Link to post
Just now, Joseph said:

Is there a read-me-first-before-building-an-unRAID-box stickie somewhere that has a list of hardware to avoid?

 

Well with the number of people who are not having issues with the Supermicro AOC-SAS2LP-MV8 , it would probably be on the list. I actually bought two new ones due to the first one I had in my previous build worked perfect for over 2 years. :)

Share this post


Link to post
1 hour ago, johnnie.black said:

It's an issue with linux, mostly on latest kernels, mostly with vt-d enable and it doesn't affect every user.

I used to be in the "doesn't affect every user" group. I was aware of the problem, but foolishly thought I was immune :$. Then my onboard Marvell controller had a hissy fit, and I ended up losing 3 drives connected to it.

 

I recovered 2 with dual parity, but the third had some serious issues that xfs_repair could not fix. Managed to recover a bunch of files through recovery software, but not a fun task :(.

 

Moral of the story: Get off Marvell controller asap! Especially if you're using vt-d. For the cost of a used Dell 310 controller, I could have avoided the hours I've spent recovering drives and content. Sigh. Live and learn :)

 

 

Edited by DoeBoye

Share this post


Link to post

Here's my known list :D:

 

Hardware                           Type        Mfg              Issue

P8Z68 DELUXE/GEN3       Mobo      ASUS            Potential time out issues with Marvel controller causing parity errors. Disable vt-d or avoid using Marvell ports.

AOC-SAS2LP-MV8            HBA        Supermicro   Potential time out issues with Marvel controller causing parity errors. Disable vt-d or use preferred HBA LSI2008

 

 

Share this post


Link to post

Follow up, this is what I would recommend:

 

Parity - currently on onboard intel port1 (sata3) - move to free intel port 3 (sata2)

cache2 - currently on onboard intel port2 (sata3) - leave as is

cache - curtrently on maverll - move to intel port1 (sata3)

parity2 - currently on marvell - move to free intel port4 or free LSI port

 

 

 

 

Share this post


Link to post
7 minutes ago, johnnie.black said:

Follow up, this is what I would recommend:

 

Parity - currently on onboard intel port1 (sata3) - move to free intel port 3 (sata2)

cache2 - currently on onboard intel port2 (sata3) - leave as is

cache - curtrently on maverll - move to intel port1 (sata3)

parity2 - currently on marvell - move to free intel port4 or free LSI port

 

 

 

 

Thank you SO MUCH for this! If I can't get to it this evening, it will have to be this weekend. Regardless, I'll keep this thread posted.

Edited by Joseph
grammar

Share this post


Link to post

So it seems like there's the parity issues are actually caused when shutting down... The parity check I've been running today (after last nights reboot which had errors) has come back clean.

 

@johnny.black, I'm not going to have time to move the HDDs off the on-board Marvell controller until this weekend. However, just for fun I'm going to reboot and run the parity check again, overnight... if things happen the way I anticipate, then it will have 5 errors around the 50% complete mark. Then I'll do another check tomorrow without a reboot and see if its ok.  Unless, I've missed something, this pattern will confirm its definitely happening on reboot. I'm sure the drives love the fitness test.

 

Share this post


Link to post

Sure, do it, it would good to confirm if errors appear only after a reboot, it may help diagnose other users with similar issues.

Share this post


Link to post
8 hours ago, johnnie.black said:

Sure, do it, it would good to confirm if errors appear only after a reboot, it may help diagnose other users with similar issues.

as suspected, 5 parity errors after reboot.

Apr 11 03:05:15 Tower kernel: md: recovery thread: Q corrected, sector=3519069768
Apr 11 03:05:15 Tower kernel: md: recovery thread: Q corrected, sector=3519069776
Apr 11 03:05:15 Tower kernel: md: recovery thread: Q corrected, sector=3519069784
Apr 11 03:05:15 Tower kernel: md: recovery thread: Q corrected, sector=3519069792
Apr 11 03:05:15 Tower kernel: md: recovery thread: Q corrected, sector=3519069800

Next steps, move drives off of the on board Marvell controller. wash. rinse. repeat. Won't be able to get to until this weekend. Running one more check now without reboot to confirm the pattern.

Share this post


Link to post

It is interesting to note, someone else had a similar issue 3 years ago with a similar pattern of sectors in question:

MINE:
sector=3519069768
sector=3519069776
sector=3519069784
sector=3519069792
sector=3519069800

THEIRS:

sector=1565565768

sector=1565565776

sector=1565565784

sector=1565565792

sector=1565565800

 

source:

https://forums.lime-technology.com/topic/30016-monthly-parity-check-found-5-errors/

 

Edited by Joseph
typo

Share this post


Link to post

Well I just finished a parity check on my system and I have 141 errors. Running it again to see if I get the same after a second check.

From what I see, the SATA on the MSI H97 is all intel. So that only leaves the SAS cards. If I get more errors tomorrow I will just bit the bullet and replace them. Is there any plug and play replacement options without the need to flash the cards to a different bios?

 

Share this post


Link to post

AFAIK, the Dell card needs to be flashed with LSI2008 firmware so it will play nice with unRAID. Its fairly easy to do; the hardest part was cutting a sliver of electrical tape to cover the pins. :)

 

I suppose you could buy an LSI2008 branded card and not have to go through the process. I don't knowif there are other cards that work natively, hopefully someone will post feedback.

 

Share this post


Link to post

Well I just checked my SAS cards and they are on the latest firmware but they had INT13 enabled which according to the Limetech hardware forum sticky it should be disabled. I have not disable vt-d yet as I may want to play with some virtual machines, so fingers crossed I will re-run the parity check and see what I get.

Share this post


Link to post

So it seems once INT13 is disabled it REALLY slows the cards down to treacle. I looked it up and it relates to SCSI, I am now wondering should a SATA setup have INT13 enabled? Thoughts?

Share this post


Link to post

Well regardless, I just bought a couple of the Dell 310s on ebay. They seem to spec a little faster than the Supermicro cards (6gbps vs 5gbps).

It just seems the more I read the more the Supermicro cards make me nervous :(

Share this post


Link to post

I just picked up a Dell 310 fr

2 hours ago, crowdx42 said:

Well regardless, I just bought a couple of the Dell 310s on ebay. They seem to spec a little faster than the Supermicro cards (6gbps vs 5gbps).

It just seems the more I read the more the Supermicro cards make me nervous :(

 

Good call. It's not fun if the Marvell card dumps/corrupts a bunch of drives. I picked up a used 310 off Ebay a few weeks ago. Once I flashed my Dell 310 with fireball3's script, it runs perfectly. No need to cover any pins with tape on mine.

Share this post


Link to post

Well both of the ones I ordered are DELL HV52W RAID CONTROLLER PERC H310 6GB/S PCI-E 2.0 X8 0HV52W . I am hoping that I still get good speeds with them. The parity check with the Supermicro cards now with INT13 disabled is painfully slow at only around 70MB/sec, with INT13 enabled they were hitting on average around 130MB/sec

Share this post


Link to post
6 minutes ago, crowdx42 said:

Well both of the ones I ordered are DELL HV52W RAID CONTROLLER PERC H310 6GB/S PCI-E 2.0 X8 0HV52W

 

That's the same model I ordered, so hopefully it will work just as well. Using fireball3's script, I had to use the 'b' option when flashing to IT. If you use his script and have to as well, make sure to send him the output txt file :)

 

Quote

Dell HV52W PERC H310 8-Port 6Gb/s SAS /SATA RAID Controller

 

Share this post


Link to post

Parity Check Update: 60% complete; 0 errors. Unfortunately, this is all the testing I can do until this weekend. But the pattern is, 5 errors, same sectors every time the unRAID is rebooted. I believe someone suggested that means its hanging on a drive on shut down, then not cleanly shutting down (even though I don't get an unclean shut down error) and I'm inclined to agree. In any case, I'll move the drives off the on board Marvell controller and run tests and keep y'all posted.

 

Thanks everyone for weighing in.

Share this post


Link to post
1 hour ago, crowdx42 said:

...with INT13 enabled they were hitting on average around 130MB/sec

I only get about 125MB/sec (w/parity checks) data moves slower ... I use cache drives, but I'd like to see improvement on the array.

Share this post


Link to post

Well I have two 1tb SSDs in RAID 1 for day to day loads to the server but it makes me nervous having parity checks running too long :P

Share this post


Link to post
On 3.4.2017 at 5:04 PM, EdgarWallace said:

I have installed a SAS2LP in my Backup Server, disabled VT-d and still having parity check errors.
I ordered a Dell Perc H200 today and will report back If that card is going to resolve my issues.
Btw. is it safe to run the parity check once the new controller is installed with the "Write corrections to parity" option?

 

I have finally received my Dell Perc H200 and thanks to the HUGHE efforts of @Fireball3 my backup server is running it's first Parity Check. I have unchecked the "Write corrections to parity" option in order to see if the server will stay at 8 errors that came up with the AOC-SAS2LP-MV8 card. If everything's running well I will check again including the "write corrections" options and running a 3rd check finally. Cross fingers.....

 

EDIT: 9 Sync Errors detected at 38,3% Parity Sync. What is making me nervous is this: Pastebin

 

Edited by EdgarWallace

Share this post


Link to post
17 hours ago, crowdx42 said:

Well I have two 1tb SSDs in RAID 1 for day to day loads to the server but it makes me nervous having parity checks running too long :P

It takes ~10 - 12 hours to get through a 4TB parity disk check... I can only imagine 8TB must be about twice that! :|

Share this post


Link to post
3 hours ago, EdgarWallace said:

If everything's running well I will check again including the "write corrections" options and running a 3rd check finally. Cross fingers.....

I would just run the parity check and write corrections. If errors are found on the first run then run it again to make sure the issues have been resolved with the new controller. let us know how it goes.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.