Jump to content

[6.12.10] Reoccurring issues with array disks and parity


Go to solution Solved by JorgeB,

Recommended Posts

Hi everyone,

 

Since a few weeks now I have a lot of issues with my array disks and my parity checks.

I've just accessed my unRAID webUI and now after my data-rebuild of disk 10 after it got randomly unmountable, out of the blue disk 1 is now offline.

The disks are going offline randomly and without reboot or changing anything hardware-sided and software-sided.

Also in the same period my parity checks, which are cumulative are also randomly started over without completing any of them.


All the disks are older than 1 year, and they worked all the time without any hardware issues, so I'm assuming the disks are fine themselves.

 

I am thinking that my 7-year old Marvell controlled SATA expander card is the cause of all this.
Prior to this I had a few problems with that card in the past, but I ignored it because I though it wasn't the card that would be causing it.
The card I am using is the following one: https://www.supermicro.com/en/products/accessories/addon/AOC-SAS2LP-MV8.php.

It is a Supermicro AOC-SAS2LP-MV8, it was brand new when I purchased it back in the days.
A friend of mine also thinks this comes due to being it a Marvell based card, just like the plugin (FixCommonProblems) that gives me the warning for that card since I installed it.

I always ignored it, until it became clear now that that card could be the culprit.

 

The FixCommonProblems gives me the following warning:

 

It appears that your server has a Marvel based hard drive controller installed within it. Some users with Marvel based controllers exhibit random drives dropping offline, recurring parity errors during checks etc. This tends to be exacberated if VT-D / IOMMU is enabled in the BIOS. Generally, LSI based controllers would be preferred over Marvel based controllers because of these issues. Note that these issues are out of Limetechs hands. Depending upon the exact combination of hardware present in your server, you may not have any problems whatsoever. If you have no problems, then this warning can be safely ignored, but future versions of Unraid (and later Kernel versions) may (or may not) present you with the previously mentioned issues.

 

What do you guys and girls think about this problem?

If it is the Marvell card, I'm thinking to replace it with this card: https://amzn.eu/d/hW2tyz6.

Would that be a better solution?

 

I've attached my latest diagnostics and the outcome of the file system check from the disk 10 that is causing troubles now.

rik-tower-diagnostics-20240806-1017.zip Phase 1 - find and verify superbloc.txt

Edited by rikdegraaff
Link to comment
  • rikdegraaff changed the title to [6.12.10] Reoccurring issues with array disks and parity
12 minutes ago, JorgeB said:

Issues with multiple disks, most likely power/connection or the controller, reboot and post new diags after array start.

 

I've added some information to my post.

If you would like to look over my post one more time, I would be grateful. 

Link to comment
Posted (edited)
3 hours ago, JorgeB said:

SMART for disk1 looks OK, so it can be a controllers issue, the problem was disk1 and disk2, so if they share a power splitter for example, it could be also that.

 

The controller you linked uses SATA port multipliers, so it's not recommended, see here.

 

P.S. you also need to check filesystem on disk10

Hi thanks for your answer.

 

But disk 2 give me zero problems.
I also wrote in my OP that disk 1 and disk 10 are the problematic ones.

So I already did an filesystem check on disk 10 and that file is already attached to my OP.

After the filesystem check, disk 10 still gives me "Unmountable: Unsupported or no file system"-error.


Is it recommended to replace the Supermicro AOC-SAS2LP-MV8 with the one in the OP?

Edited by rikdegraaff
Link to comment
42 minutes ago, rikdegraaff said:

But disk 2 give me zero problems.

It didn't get disabled but it's still showing issues in the log.

 

42 minutes ago, rikdegraaff said:

After the filesystem check, disk 10 still gives me "Unmountable: Unsupported or no file system"-error.

If xfs_repair cannot fix the filesystem not much else you can do, other than maybe using a file recovery app like UFS explorer.

 

43 minutes ago, rikdegraaff said:

Is it recommended to replace the Supermicro AOC-SAS2LP-MV8 with the one in the OP?

Yep, it's been recommended for a long time.

Link to comment
6 hours ago, JorgeB said:

It didn't get disabled but it's still showing issues in the log.

 

If xfs_repair cannot fix the filesystem not much else you can do, other than maybe using a file recovery app like UFS explorer.

 

Yep, it's been recommended for a long time.

 

Thanks for the help, @JorgeB.

Which issue(s) are you talking about on disk 2?

And I am going to order that controller from Amazon as you said.

 

XFS_Repair cannot repair it, I've used the following command, "xfs_repair -v /dev/md10p1".

The command runs, but after writing a lot of lines in my terminal it hangs at the following line:


Metadata corruption detected at 0x438a03, xfs_inode block 0x2bc387ab8/0x4000
 

So what should I do now, quit the terminal or let it run forever?

Link to comment
11 hours ago, rikdegraaff said:

Which issue(s) are you talking about on disk 2?

There were ATA errors.

 

11 hours ago, rikdegraaff said:

And I am going to order that controller from Amazon as you said.

Which one?

 

11 hours ago, rikdegraaff said:

So what should I do now, quit the terminal or let it run forever?

Let it run for a few hours, if nothings changes it may not be fixable.

Link to comment
5 hours ago, JorgeB said:

There were ATA errors.

 

Which one?

 

Let it run for a few hours, if nothings changes it may not be fixable.

 

Okay, when I am home I'll start with replacing the SATA data cable for disk2, maybe that is giving the ATA errors on that disk.

 

And I just bought this SATA expansion card: https://amzn.eu/d/hr8HKH9.
I already have one of that brand with 6 ports, and that is working flawless.

 

I just opened putty and started the following command: "xfs_repair -v /dev/md10p1".

I'll let it run for 5 hours, if it is not completed after that amount of time, I'll format the disk again, because I then think that it is not repairable.

Link to comment
1 hour ago, JorgeB said:

 

 

 

Thanks @JorgeB.

I looked at that page, and I was searching on Amazon NL to find one.

I've found this one, with 8 ports: https://amzn.eu/d/iZHfj9V.

I still have those SAS breakout cables to connect them to SATA hdds.

 

Only that leaves me with 3 questions, maybe you can answer them.

  1. Can I use this out of the box or do I need to flash it to IT mode?
  2. If so, is there anywhere on the internet a guide to do that?
  3. Would you recommend me this card?

Again, thanks in advance!

Link to comment
Posted (edited)
On 8/8/2024 at 4:54 PM, JorgeB said:

That *should* work but it appears to be an LSI clone, and not sure if it really uses an LSI chip, if you can get an actual LSI it would be better.

Okay, well I've received it last Saturday.

I already replaced the old Marvell based controller with this one.

 

Until now still no problems, the card is working flawlessly!

Thanks one final time for all the help and guidance.

 

I owe you one!

Kind regards,

 

Rikdegraaff

Edited by rikdegraaff
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...