wheel Posted April 19, 2019 Share Posted April 19, 2019 Just had a disk go disabled when I was writing to an unrelated disk (same share folder on both disks, though; hardly any read/writes on the disabled disk since the system was last booted up a couple of days ago, just noticed it disabled when it threw up 2 error messages a little while ago). My instinct is to replace it, but the lack of writes made me think something may be up, and since it's been close to 2 months since my last parity check (was planning on one this weekend), I'd like to put the disk back in for at least a parity check before I upgrade it to a 6TB as planned anyway. Before pulling diagnostics and shutting the system down, I checked the dashboard data and unraid wasn't able to pull any SMART data at all. Is this a giant red flag that the disk is totally dead and I shouldn't even bother the FAQ process of clearing the disabled slot's ID data with an array start missing the disk, and a second start with the disabled disk reinstalled? Somewhat related but not directly: is there a "best practices" guide anywhere on how to keep an eye on potentially failing disks? I'm planning to upgrade quite a few in the next month or so, and I'd rather upgrade ones that I know are leaning towards death if there's a solid way to predict which are heading that way beyond obvious SMART reallocation or error numbers. tower-diagnostics-20190418-2316.zip Quote Link to comment
wheel Posted April 19, 2019 Author Share Posted April 19, 2019 (edited) This just got weird. I turned the tower back on to get a temperature read on all the drives, and now Disk 13 is disabled but Disk 12 looks fine? Edit: yeah, this is weird. Now both 12 and 13 are showing a pass on SMART overall health tests in the Dashboard menu despite 13 being disabled. I’m just going to power down for safety’s sake until I hear some ideas on concrete next steps from here. I don’t have a replacement drive ready to go, but I should have one soon. Attaching new diagnostics. tower-diagnostics-20190418-2344.zip Edited April 19, 2019 by wheel Quote Link to comment
JorgeB Posted April 19, 2019 Share Posted April 19, 2019 3 minutes ago, wheel said: and now Disk 13 is disabled but Disk 12 looks fine? It always was disk13, disk looks fine, problem appears to have been caused by the SASLP, these have known issues with Unraid and should be replaced with LSI controllers. Quote Link to comment
wheel Posted April 19, 2019 Author Share Posted April 19, 2019 (edited) Good to know, thanks! I’ll dig into threads on that now. Timing wise, are the SASLP cards such an issue that I need to fix them before I run my next parity check / disk upgrade? Regardless of disk failure, I was planning on upgrading a random disk in this tower next week anyway. Also, I’m an idiot on 12/13. Just saw that in my screenshot of the webgui now. Edited April 19, 2019 by wheel Quote Link to comment
JorgeB Posted April 19, 2019 Share Posted April 19, 2019 3 minutes ago, wheel said: Timing wise, are the SASLP cards such an issue that I need to fix them before I run my next parity check / disk upgrade? Difficult to say, they can work well most times but you could also get more errors and not even be able to rebuild that disk. Quote Link to comment
wheel Posted April 19, 2019 Author Share Posted April 19, 2019 Well, damn. Looks like I may need an entire new motherboard to go LSI, so this just got interesting. I definitely don’t have the funds or time for a major rebuild anytime soon, and mounting/unplugging/replugging feels crazy risky with one disk potentially on the fritz. Knowing myself personally, if I put resolving this disabled disk situation off until I can replace my motherboard and controller cards, I may as well write this tower off until the end of summer. So since I’m stuck with SASLP for now, if I have to run the risk of losing the disk (hate to do it, but versus almost any other disk it’d be preferable), is there a good way to “grade” my other 18 disks and see if any are close enough to failure that I’m better off using what disk lifespan I have left for a straight rebuild instead of a safety parity check first? Also (and this may be crazy), since it sounds like most of the SASLP issues started with 6 (I know I’d never had any for years), could I conceivably roll back to a pre-6 unraid that still supported >4TB drives (of which I think there were a few stable?) and just keep using that with this box? I can live without plugins or any real bells and whistles so long as I have my parity and can keep that box running along without upgrades. Quote Link to comment
itimpi Posted April 19, 2019 Share Posted April 19, 2019 (edited) I am surprised you think you need a new motherboard to go LSI? This is the first time I can recollect where that has been mentioned as an issue. What makes you think this? What motherboard do you currently have? You are correct in that the SASLP controllers seemed fine on v5. The suspicion is that the Linux 64-bit drivers that v6 uses are not as reliable as the 32-bit ones that v5 used. They seem to work fine most of the time with v6, but some users find that they are prone to 'dropping' disks without any obvious reason. You are also on quite an old Unraid release (the current stable being 6.6.7 with 6.7.0 getting close to going Stable). It might be possible to go back to v5 (although that means hardly anyone here could then help with issues), but if you do not have a copy in your archives then you may need to contact Limetech to see if they can provide a link to the last v5 release (5.06 I think). Also remember that v5 only supported disks in ReiserFS format so if you have any drives that are in XFS or BTRFS format going back to v5 is unlikely to be practical. Edited April 19, 2019 by itimpi Quote Link to comment
JorgeB Posted April 19, 2019 Share Posted April 19, 2019 Well, damn. Looks like I may need an entire new motherboard to go LSI, You don't, they work perfectly fine on the board you have. Quote Link to comment
remotevisitor Posted April 19, 2019 Share Posted April 19, 2019 I have 2 SASLP cards in my system and I suffered disk dropouts, but only with my WD 6TB disks (I have a variety of disks from Seagate and WD from 4TB to 10TB). Now I seem to remember some posts when the Marvel controller problem was identified as the cause that there also appeared to be some correlation with specific disk manufacturer/size/firmware (which possibly explains why some people have problems but others do not). On my system I tried setting all my WD 6TB disks to never spin down and this appears to stop them dropping offline. Now this is only a workaround of the problem, and might be specific to my system, but it may be worth trying until such time as you are able to replace the SASLP cards. Quote Link to comment
wheel Posted July 29, 2019 Author Share Posted July 29, 2019 Bringing this thread back to life because I think I can finally afford the upgrade SASLP controllers to LSI (hoping this is the only box I need to do it on, but guessing my new issues on a 5.0.6 box are going to be chalked up to something similar, so...). I've been searching through threads that include discussion of the SASLP problems and the LSI solutions, but I can't seem to find a comprehensive guide on what specifics I should be looking for when shopping (all I can gather is that I might need to "IT Flash" some that I purchase, and I can't tell whether I'll need a separate device or cable for one of my non-unraid machines to make this flash happen). Am I completely missing some obvious FAQ on the "upgrading your old box to LSI controllers" issue since old SASLP controllers seem to be such a widespread problem for 6.0+ Unraid boxes? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.