Multiple issues - remove drive from share


14 posts in this topic Last Reply

Recommended Posts

I'm running 6.9-rc2 with 6 8tb Iron Wolf drives and a 10tb parity drive.  Installed on my system is a Broadcom SAS 9300-8i HBA.  I can't go to an earlier version of unRaid because it won't work with my MB.  (6.8.x would not work with my MB NIC).  MB is Asus Tuf Gaming B460M-PLUS.

 

I originally hooked up all but my parity drive to the HBA and was getting random read errors which would kick drives out of the array.  Sometimes it was 100+ errors before it got kicked out, other times it's 1 error and BAM---  gone!   I posted my diag logs here and was told to upgrade the hba firmware (done) and check the cables.   I upgraded the firmware and changed to different cables while at the same time, moving as many drives as possible to the MB sata connectors.   This has greatly reduced my drive errors and a drive that had tons of errors hasn't had a peep so I'm 99.9% sure it's an HBA issue and not an actual drive issue.   Another interesting item is that this mainly happens at night... as if there is a sequence of events that happens to render the supposed read error.

 

The remaining HBA drive got kicked out yesterday and I'm tired of always having to rebuild parity so I want to either yank it or just remove the drive from my share.     I'm currently using unBalance to move the data off that drive to my other drives and would rather not have to rebuild the array so my idea was to go into my share definitions and just exclude that drive from all my shares.  My theory is that the drive will just sit there as part of the array as an idle member until I decide on a course of action.  As long as it's idle, there is a much reduced chance of it getting kicked out.

 

I don't know if my version of unRaid isn't playing nicely with the HBA or if the HBA is just finicky.   Maybe I'll get another HBA?  Any recommendations on a new HBA or just in general?

Link to post
1 hour ago, TimV said:

My theory is that the drive will just sit there as part of the array as an idle member until I decide on a course of action.

If another drive fails, that drive will be used with all the others to rebuild the failed drive. Any unreliable disk in the array puts others at increased risk.

 

1 hour ago, TimV said:

Any recommendations on a new HBA or just in general?

If possible before rebooting and preferably with the array started
Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Link to post
2 hours ago, trurl said:

 

Do you mean you rebuilt parity without the problem disk assigned?

No,  I rebuilt parity with all original drives included.  Disk2 is the black sheep drive.   WKD32TB8

Link to post

Just had some read errors not long after I started a move.      896 read errors just popped up and it's still in a valid disk whereas other times I get 1 read error and it's kicked out.   Attached is the diagnostics.  I don't have any user data on there, there are 55G of space used on there, not sure if there are any valid files.  When I go in there as root from the command line, I do an ls -lia and nothing shows up.  My 30+ years of unix experience tells me there is not really anything there, but you never know.  

cygnus-diagnostics-20210212-1105.zip

Link to post
1 hour ago, TimV said:

896 read errors just popped up and it's still in a valid disk whereas other times I get 1 read error and it's kicked out.

Unraid disables a disk when a write to it fails. If the failed read can't be recovered from it will get the data from the parity calculation and try to write it back to the disk and if that write fails the disk is disabled.

 

Run an extended SMART test on that disk.

Link to post

I'm trying to run the extended test.  So far, I've done it twice and it just sits there with a spinning circle reporting 10% done for over an hour.  I see the disk is spun up and activity is happening.  My guess is something isn't right or it's hitting every sector and won't update the percentage until after it's done.

Link to post
On 2/10/2021 at 2:51 PM, TimV said:

Any recommendations on a new HBA or just in general?

The one you're using is a good option, but it might be a fake or just some other problem/compatibility issue, for a list of recommended controllers see here:

 

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.