ericswpark Posted June 16, 2023 Share Posted June 16, 2023 Hi everyone, after upgrading 6.11.5 -> 6.12.0 I was greeted with this lovely sight: Thank god for dual parity (and a separate backup server). Not worrying too much about the data really takes the strain off of surprises like these. However, I suspect that the drives aren't actually "dead", since they were working fine right up to the upgrade. Also, it's rather strange for two drives to "die" like that. I suspect the cable that plugs into those two drives may have become loose, either on the drive end or on the LSI card's end. Unfortunately, the server is located remotely and therefore I cannot go and check it physically. I was wondering if anybody could find any clues as to why the drives are not coming up in the logs. I did find some messages saying "SATA link down", but it didn't say why, or I might've missed it. If not I'll have to check it over the next time I get a chance to inspect it in person. Any ideas are appreciated! Diagnostics attached. dipper-diagnostics-20230616-0845.zip Quote Link to comment
JorgeB Posted June 16, 2023 Share Posted June 16, 2023 Unlikely to upgrade related, if the server is remote you cannot look at the BIOS to see if the drives are detected, but you can downgrade to confirm. Quote Link to comment
ericswpark Posted June 16, 2023 Author Share Posted June 16, 2023 (edited) That's a shame, thanks @JorgeB. I'll post an update in two weeks if I find anything interesting about how it failed. I don't want to risk downgrading and have it drop even more drives when I don't know the root cause. Access to data isn't that urgent as of right now Edited June 16, 2023 by ericswpark Quote Link to comment
ericswpark Posted June 17, 2023 Author Share Posted June 17, 2023 I happened to check the webGUI today and noticed that the parity drive... came back on its own? The syslogs just say a device connect or power on event occurred. The drive somehow just woke back up and decided to work normally. The other data disk is still missing, but I'm hoping that with enough power fed to it it would eventually recover itself like the parity drive. Still very curious as to what's going on. It doesn't seem like a loose connection at all as the UDMA CRC values are all still at zero for the parity drive that returned. ¯\_(ツ)_/¯ Quote Link to comment
JorgeB Posted June 17, 2023 Share Posted June 17, 2023 Likely a power/connection problem. Quote Link to comment
Solution ericswpark Posted July 16, 2023 Author Solution Share Posted July 16, 2023 After upgrading to 6.12.3 today I found that the same two drives had died again. A physical inspection last time didn't turn up anything, but I decided to check again. I noticed that the drives that had "died" were connected to my HBA with one of those SAS to SATA cables, and the cable on the SATA end had gotten a bit bent as I built the NAS in a mini-ITX case. I replaced the entire cable and it seems like the problem has been fixed? I'll keep the old cable around, but as long as two drives don't drop out during upgrades I think it's safe to rule this as a cable issue. The missing drives didn't even show up in the SAS configuration utility when the suspected faulty cable was used. moral of the story: change cables and don't build your NAS in a mini-ITX case 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.