(6.3.5) Disk Disabled (during read?); Try Clearing Slot ID Data or Just Replace?


Recommended Posts

Just had a disk go disabled when I was writing to an unrelated disk (same share folder on both disks, though; hardly any read/writes on the disabled disk since the system was last booted up a couple of days ago, just noticed it disabled when it threw up 2 error messages a little while ago).

 

My instinct is to replace it, but the lack of writes made me think something may be up, and since it's been close to 2 months since my last parity check (was planning on one this weekend), I'd like to put the disk back in for at least a parity check before I upgrade it to a 6TB as planned anyway.

 

Before pulling diagnostics and shutting the system down, I checked the dashboard data and unraid wasn't able to pull any SMART data at all.  Is this a giant red flag that the disk is totally dead and I shouldn't even bother the FAQ process of clearing the disabled slot's ID data with an array start missing the disk, and a second start with the disabled disk reinstalled?

 

Somewhat related but not directly: is there a "best practices" guide anywhere on how to keep an eye on potentially failing disks?  I'm planning to upgrade quite a few in the next month or so, and I'd rather upgrade ones that I know are leaning towards death if there's a solid way to predict which are heading that way beyond obvious SMART reallocation or error numbers.

tower-diagnostics-20190418-2316.zip

Link to comment

This just got weird. I turned the tower back on to get a temperature read on all the drives, and now Disk 13 is disabled but Disk 12 looks fine?

 

Edit: yeah, this is weird. Now both 12 and 13 are showing a pass on SMART overall health tests in the Dashboard menu despite 13 being disabled. I’m just going to power down for safety’s sake until I hear some ideas on concrete next steps from here. I don’t have a replacement drive ready to go, but I should have one soon.

 

Attaching new diagnostics.

tower-diagnostics-20190418-2344.zip

Edited by wheel
Link to comment

Good to know, thanks! I’ll dig into threads on that now. Timing wise, are the SASLP cards such an issue that I need to fix them before I run my next parity check / disk upgrade? Regardless of disk failure, I was planning on upgrading a random disk in this tower next week anyway.

 

Also, I’m an idiot on 12/13. Just saw that in my screenshot of the webgui now.

Edited by wheel
Link to comment

Well, damn. Looks like I may need an entire new motherboard to go LSI, so this just got interesting. I definitely don’t have the funds or time for a major rebuild anytime soon, and mounting/unplugging/replugging feels crazy risky with one disk potentially on the fritz. Knowing myself personally, if I put resolving this disabled disk situation off until I can replace my motherboard and controller cards, I may as well write this tower off until the end of summer.

 

So since I’m stuck with SASLP for now, if I have to run the risk of losing the disk (hate to do it, but versus almost any other disk it’d be preferable), is there a good way to “grade” my other 18 disks and see if any are close enough to failure that I’m better off using what disk lifespan I have left for a straight rebuild instead of a safety parity check first?

 

Also (and this may be crazy), since it sounds like most of the SASLP issues started with 6 (I know I’d never had any for years), could I conceivably roll back to a pre-6 unraid that still supported >4TB drives (of which I think there were a few stable?) and just keep using that with this box? I can live without plugins or any real bells and whistles so long as I have my parity and can keep that box running along without upgrades.

Link to comment

I am surprised you think you need a new motherboard to go LSI?  This is the first time I can recollect where that has been mentioned as an issue.  What makes you think this?   What motherboard do you currently have? 

 

You are correct in that the SASLP controllers seemed fine on v5.    The suspicion is that the Linux 64-bit drivers that v6 uses are not as reliable as the 32-bit ones that v5 used.   They seem to work fine most of the time with v6, but some users find that they are prone to 'dropping' disks without any obvious reason.   You are also on quite an old Unraid release (the current stable being 6.6.7 with 6.7.0 getting close to going Stable).

 

It might be possible to go back to v5 (although that means hardly anyone here could then help with issues), but if you do not have a copy in your archives then you may need to contact Limetech to see if they can provide a link to the last v5 release (5.06 I think).  Also remember that v5 only supported disks in ReiserFS format so if you have any drives that are in XFS or BTRFS format going back to v5 is unlikely to be practical.

Edited by itimpi
Link to comment

I have 2 SASLP cards in my system and I suffered disk dropouts, but only with my WD 6TB disks (I have a variety of disks from Seagate and WD from 4TB to 10TB).

 

Now I seem to remember some posts when the Marvel controller problem was identified as the cause that there also appeared to be some correlation with specific disk manufacturer/size/firmware (which possibly explains why some people have problems but others do not).

 

On my system I tried setting all my WD 6TB disks to never spin down and this appears to stop them dropping offline.  Now this is only a workaround of the problem, and might be specific to my system,  but it may be worth trying until such time as you are able to replace the SASLP cards.

Link to comment
  • 3 months later...

Bringing this thread back to life because I think I can finally afford the upgrade SASLP controllers to LSI (hoping this is the only box I need to do it on, but guessing my new issues on a 5.0.6 box are going to be chalked up to something similar, so...).

 

I've been searching through threads that include discussion of the SASLP problems and the LSI solutions, but I can't seem to find a comprehensive guide on what specifics I should be looking for when shopping (all I can gather is that I might need to "IT Flash" some that I purchase, and I can't tell whether I'll need a separate device or cable for one of my non-unraid machines to make this flash happen).

 

Am I completely missing some obvious FAQ on the "upgrading your old box to LSI controllers" issue since old SASLP controllers seem to be such a widespread problem for 6.0+ Unraid boxes?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.