What gives with 6.3.x, seems unstable.


Recommended Posts

May  2 11:26:35 Tower kernel: md: recovery thread: Q incorrect, sector=98251312

 

Q means parity 2, P would mean parity, PQ both.

 

This time there are no ATA errors, you need to run a correcting check to fix those sync errors, after that run another non correcting check, and if sync errors are 0 all is well.

Link to comment
7 hours ago, johnnie.black said:

May  2 11:26:35 Tower kernel: md: recovery thread: Q incorrect, sector=98251312

 

Q means parity 2, P would mean parity, PQ both.

 

This time there are no ATA errors, you need to run a correcting check to fix those sync errors, after that run another non correcting check, and if sync errors are 0 all is well.

 

Alright I will do that.

This is weird though, I started getting random parity errors since I upgraded to 6.3

This is the second time something like this happened. Last time it was about a month or so ago when I had only a single parity drive. 

This is why I put in a second parity.

 

Would parity errors occur if the mover was invoked at the time that a parity check ran?

 

Link to comment
11 minutes ago, exist2resist said:

This is weird though, I started getting random parity errors since I upgraded to 6.3

 

You are using the SAS2LP, some users have this issue with those controllers, but since parity is also on the a SAS2LP and all sync errors are on parity2, first correct them.

 

11 minutes ago, exist2resist said:

Would parity errors occur if the mover was invoked at the time that a parity check ran?

 

No, although not recommended for performance reasons.

Link to comment
4 minutes ago, johnnie.black said:

 

You are using the SAS2LP, some users have this issue with those controllers, but since parity is also on the a SAS2LP and all sync errors are on parity2, first correct them.

 

I've had these controllers for 2.5 years, this is the first time that I've gotten parity errors on my server ever, and twice about a month apart.

Coincidentally this started after upgrading to 6.3, I don't know if that is significant or not but that is what I have observed.

 

What are the alternatives to this controller? Is there a firmware that I should be considering?

Edited by exist2resist
Link to comment

Alternative are the LSI controllers.

 

SAS2LP works without issues for most users, I have 2 myself, but certain combinations of hardware/software have problems.

 

These can sometimes help:

 

-disable VT-d if not needed

-look for a board bios update

-use the controller(s) on a different slot if available

Edited by johnnie.black
Link to comment
3 hours ago, exist2resist said:

I haven't changed anything in the last 2.5 years, and all of a sudden after upgrading to new version I start getting parity errors.

I think it is unlikely that the hardware decided to shit the bed. However I'm going to update all firmware and bios, this weekend.

 

Purely speaking from my own experience, I dealt with the same thing as you, and I see a number of similar posts in the forums. I ran disks on a Marvell controller without issue for quite some time. I upgraded to 6.3.2, and started having all sorts of issues. I assumed it was UnRaid that was causing the problem and rolled back to 6.2.4. Problems seemed to clear up. Tried 6.3.3 when it came out, more issue with parity errors, drives dropping out etc. Rolled back to 6.2.4.

 

Finally ordered a used Dell H310 off Ebay to replace the Marvell controller. Flashed it to IT, installed it, upgraded to 6.3.3. Smooth sailing :).

 

Something in UnRaid 6.3 (My money is on the Linux kernel, but that's a completely unsupported theory) is more sensitive to Marvell controllers, and some of the people who were using them without issue started having problems.

 

There are even folks who are having issues with earlier versions of UnRaid and Marvell controllers. Frankly, I'd recommend everybody ditch them. I ended up corrupting three drives, only was able to recover 2 (Dual Parity), and spent countless hours xfs_repairing and finally resorting to various recovery software packages to try and get my data back. I saved the bulk of what I lost, but it was a PITA, and not worth the $50 a used Dell H310 cost me.

 

Lesson learned! :)

 

[Edit] Hurrah! Post 1000! :P

 

 

 

Edited by DoeBoye
Link to comment
9 hours ago, DoeBoye said:

 

Purely speaking from my own experience, I dealt with the same thing as you, and I see a number of similar posts in the forums. I ran disks on a Marvell controller without issue for quite some time. I upgraded to 6.3.2, and started having all sorts of issues. I assumed it was UnRaid that was causing the problem and rolled back to 6.2.4. Problems seemed to clear up. Tried 6.3.3 when it came out, more issue with parity errors, drives dropping out etc. Rolled back to 6.2.4.

 

Finally ordered a used Dell H310 off Ebay to replace the Marvell controller. Flashed it to IT, installed it, upgraded to 6.3.3. Smooth sailing :).

 

Something in UnRaid 6.3 (My money is on the Linux kernel, but that's a completely unsupported theory) is more sensitive to Marvell controllers, and some of the people who were using them without issue started having problems.

 

There are even folks who are having issues with earlier versions of UnRaid and Marvell controllers. Frankly, I'd recommend everybody ditch them. I ended up corrupting three drives, only was able to recover 2 (Dual Parity), and spent countless hours xfs_repairing and finally resorting to various recovery software packages to try and get my data back. I saved the bulk of what I lost, but it was a PITA, and not worth the $50 a used Dell H310 cost me.

 

Lesson learned! :)

 

[Edit] Hurrah! Post 1000! :P

 

 

 

 

What does "Flashed it to IT," mean?

Link to comment
16 hours ago, exist2resist said:

 

What does "Flashed it to IT," mean?

It's a fairly simple process if you're at all comfortable poking around inside a computer. @Fireball3 has a great little tool in the forum that makes it a click of a few keys :).

 

as @flipphos mentioned, it is the preferred mode for Unraid. It disables all the raid functionality and makes the card a straight-up controller that passes the drives connected to it through to the OS.

 

Edited by DoeBoye
Link to comment
  • 4 weeks later...
On 5/5/2017 at 3:00 PM, DoeBoye said:

 

Purely speaking from my own experience, I dealt with the same thing as you, and I see a number of similar posts in the forums. I ran disks on a Marvell controller without issue for quite some time. I upgraded to 6.3.2, and started having all sorts of issues. I assumed it was UnRaid that was causing the problem and rolled back to 6.2.4. Problems seemed to clear up. Tried 6.3.3 when it came out, more issue with parity errors, drives dropping out etc. Rolled back to 6.2.4.

 

Finally ordered a used Dell H310 off Ebay to replace the Marvell controller. Flashed it to IT, installed it, upgraded to 6.3.3. Smooth sailing :).

 

Something in UnRaid 6.3 (My money is on the Linux kernel, but that's a completely unsupported theory) is more sensitive to Marvell controllers, and some of the people who were using them without issue started having problems.

 

There are even folks who are having issues with earlier versions of UnRaid and Marvell controllers. Frankly, I'd recommend everybody ditch them. I ended up corrupting three drives, only was able to recover 2 (Dual Parity), and spent countless hours xfs_repairing and finally resorting to various recovery software packages to try and get my data back. I saved the bulk of what I lost, but it was a PITA, and not worth the $50 a used Dell H310 cost me.

 

Lesson learned! :)

 

[Edit] Hurrah! Post 1000! :P

 

 

 

Dude you warned me, yesterday I has a dual drive failure, I've been seeing more and more parity errors over the month.

Fortunately I had dual parity, and my H310s came in from Ebay a day prior. 

I am rebuilding my array as we speak, flashed all 3x 310s to IT mode, hopefully it rebuilds fine and without errors.

But wow, 6.3.3 what a cluster fuck, how did that slip through the cracks.

Do you have a link to the thread where people complain about those SASLP-MV8 cards?

Edited by exist2resist
Link to comment
44 minutes ago, exist2resist said:

Dude you warned me, yesterday I has a dual drive failure, I've been seeing more and more parity errors over the month.

Fortunately I had dual parity, and my H310s came in from Ebay a day prior. 

I am rebuilding my array as we speak, flashed all 3x 310s to IT mode, hopefully it rebuilds fine and without errors.

But wow, 6.3.3 what a cluster fuck, how did that slip through the cracks.

Do you have a link to the thread where people complain about those SASLP-MV8 cards?

Glad to hear you got out ahead of data loss! Hopefully the rebuild finishes without issue!

 

I wouldn't blame unRaid. The issue is a bug with Marvell that just didn't seem to crop up much until recently (Increased use of VMs? New Linux kernel?).

 

I don't have a link handy, but if you do a search on Marvell controller and problems, I'm sure you'll come across a bunch of posts.

 

For the record, the Marvell controller that really messed me up was an onboard one. I do have a saslp-mv8, but I had retired it a few years ago, and only brought it out for a bit when a LSI controller started acting wonky. Now that my h310 is installed and running, I'm Marvell free!! :)

Edited by DoeBoye
Link to comment
2 hours ago, DoeBoye said:

Glad to hear you got out ahead of data loss! Hopefully the rebuild finishes without issue!

 

I wouldn't blame unRaid. The issue is a bug with Marvell that just didn't seem to crop up much until recently (Increased use of VMs? New Linux kernel?).

 

I don't have a link handy, but if you do a search on Marvell controller and problems, I'm sure you'll come across a bunch of posts.

 

For the record, the Marvell controller that really messed me up was an onboard one. I do have a saslp-mv8, but I had retired it a few years ago, and only brought it out for a bit when a LSI controller started acting wonky. Now that my h310 is installed and running, I'm Marvell free!! :)

Thanks I will search.

Still can't believe how crazy all this is, all of a sudden it would just come up like that. 

Now I will have to go into those drives and test the files on there make sure it's not corrupted, ugh.

I think this is something that would need to be documented as there could be a lot of people having their data corrupted

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.