jkBuckethead Posted January 14, 2020 Share Posted January 14, 2020 A couple of weeks ago, completely out of the blue I saw I had errors on two storage drives plus one of my two parity drives was offline. The first sign something was weird was that both storage drives had the exact same number of errors. This would be a huge coincidence if it was physical drive failures. It turned out that all three drives were connected to the same breakout cable (the 4th was unused) on my LSI 9207-8i HBA. Thinking it might be a bad cable, I swapped out the cable and rebooted. I rebuilt the 2nd parity drive and everything has been fine for the past two weeks. Tonight, I updated to version 6.8.1. Right after rebooting I saw a strange warning message that one of my cache drives was unavailable. Oddly, when I checked the drive on the MAIN page it said the drive was operating normally. A few minutes later, the same parity and two storage drives started having similar problems as before. While on a different breakout cable, the cache drive is connected to the same HBA as the other malfunctioning drives. I shut down and swapped the HBA for a spare I just bought for another machine. It seems like the HBA may be sketchy. I prefer not to put the HBA back into service without confirming it is healthy. I also don't want to buy another if not necessary. Does anyone know of any software tools or other methods for testing an HBA? unbucket-diagnostics-20200113-2308.zip Quote Link to comment
itimpi Posted January 14, 2020 Share Posted January 14, 2020 While not answering your question directly, I thought it might be worth mentioning that I saw similar symptoms on my own server at one point and it turned out to be due to the HBA not being firmly seated in the motherboard slot. I assume this meant that vibration could occasionally cause momentary disconnects. After reseating the HBA the problem disappeared. Quote Link to comment
JorgeB Posted January 14, 2020 Share Posted January 14, 2020 Diags are just after rebooting so not much to see, if it happens again grab and post new ones before rebooting. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.