steve1977 Posted August 4, 2019 Share Posted August 4, 2019 I have some issues with my M1015 controller card. One harddisk got disabled by an error, which was likely caused by a cable/controller rather than a harddisk error. Upon rebooting Unraid, the disk completely disappeared (disk 12) and I can no longer see the disk at all. It shows unassigned. Coincidentally, my cache drive also shows "unassigned". I had an NVME assigned to it before reboot. I can re-add it (the NVME shows). It then shows "blue" (not "green"), which seems strange as this was working before the renboot. Any thoughts? Log attached. tower-diagnostics-20190804-0123.zip Quote Link to comment
JorgeB Posted August 5, 2019 Share Posted August 5, 2019 Check if the disk is being detect in the controller bios, also LSI is using the megaraid driver, it would be best to flash to IT mode to use the HBA driver. Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 It seems like a second disk is disabled now. Shall I just set up a "new config". I don't think I have written a lot to the disabled drives, so hopefully no data loss. Thoughts? Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 And here we go with the log. tower-diagnostics-20190819-1712.zip Quote Link to comment
JorgeB Posted August 19, 2019 Share Posted August 19, 2019 Disabled disk isn't showing full SMART report, it could be going bad or not, try getting a manual SMART report on the console: smartctl -x /dev/sdX Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 Weird... The disk just reappeared without me doing anything. I started to rebuild the parity, but this is not gonna go anywhere give slow speed... Attached a new log. tower-diagnostics-20190819-1728.zip Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 Btw, you may see in the log that I bought a new controller card. Just to avoid that the card is the issue. This should already be flasghed to IT mode? Quote Link to comment
JorgeB Posted August 19, 2019 Share Posted August 19, 2019 There's what appears to be a connection problem with disk13, check/replace cables, there's also a similar problem with disk9, though to a much lesser extent. Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 Same old controller issue I am facing 😞 Thought it was solved, but seems it is back. I wish it was a cabling issue, but very unlikely. I just fail to get the M1015 to run stable. Will spend some more time on it over the weekend. Maybe a dedicated fan for the M1015 will help. It gets incredibly hot, which is what I believe is causing the unstable connection of disks connected to the M1015. Quote Link to comment
JorgeB Posted August 19, 2019 Share Posted August 19, 2019 9 minutes ago, steve1977 said: I just fail to get the M1015 to run stable. Both disks I mentioned are on the onboard SATA controller, not the HBA. Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 Oh... Very interesting... I never had a single failed disk on a disk connected to the onboard controller. Also disk 12 is on the M1015. So, this is something different this time. Let me unplug and replug disks 9 and 13. Will get back with a new log shortly. If error remains on non-M1015 disks, I'll do the recabling. This is indeed new! Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 And here we go. The cables of disks 9 & 13 are indeed quite old. Disk 12 has rather new cables (disk 12 is disabled and connected to M1015). So if the issue now is disk 9 & 13, I will look into new cables over the weekend. Updated log attached. Thanks! tower-diagnostics-20190819-1837.zip Quote Link to comment
steve1977 Posted August 19, 2019 Author Share Posted August 19, 2019 I am starting to think that this may indeed be a different topic. Maybe disk 9 is even faulty rather than my "common" controller / cable issue? I tried to run a SMART on disk 12, but failed. Any insights from the diagnostic on disks 9&13 or 12? Thanks! tower-diagnostics-20190819-2305.zip Quote Link to comment
JorgeB Posted August 20, 2019 Share Posted August 20, 2019 SMART looks fine for both disks 9 and 12. Quote Link to comment
steve1977 Posted August 20, 2019 Author Share Posted August 20, 2019 You mean 9 and 13? The two on the on-board controller. Is 12 also ok? That's the one on the raid controller, which is the trouble maker (disabled, sometimes not even appearing at all, etc.). Quote Link to comment
JorgeB Posted August 20, 2019 Share Posted August 20, 2019 I meant 9 and 12, but 13 also looks fine. Quote Link to comment
steve1977 Posted August 20, 2019 Author Share Posted August 20, 2019 Got it. Do disks 9 & 13 still show with cable failures? So most likely a cabling or controller issue. So, let me look for new cables and add a fan for the controller. Quote Link to comment
JorgeB Posted August 20, 2019 Share Posted August 20, 2019 40 minutes ago, steve1977 said: Do disks 9 & 13 still show with cable failures? On latest diags disk 13 still showing issues. Quote Link to comment
steve1977 Posted August 20, 2019 Author Share Posted August 20, 2019 Got it, then let me try all of the above. Quote Link to comment
itimpi Posted August 20, 2019 Share Posted August 20, 2019 Just thought in might be worth mentioning that I was getting CRC errors at regular intervals on some drives that stopped happening when I used DeoxIT on the SATA connectors for the drives. Something that is easy to try if swapping cables does not seem to be helping. Quote Link to comment
steve1977 Posted August 20, 2019 Author Share Posted August 20, 2019 Interesting, let me try this as well. You are referring to something like: https://www.analogueseduction.net/contact-treatment/cdeo[1].html Where do you apply this to? The mobo or the cable (both sides?) or the HD? Quote Link to comment
itimpi Posted August 20, 2019 Share Posted August 20, 2019 3 minutes ago, steve1977 said: Interesting, let me try this as well. You are referring to something like: https://www.analogueseduction.net/contact-treatment/cdeo[1].html Where do you apply this to? The mobo or the cable (both sides?) or the HD? I was using the Red version to 'clean' the contacts and then the Gold version to re-plate the contacts. I just did it to the drives (I have hot-swap cages), but I cannot see how using it on the other parts in the connection path can cause an issue as long as you are careful about it. Quote Link to comment
steve1977 Posted August 20, 2019 Author Share Posted August 20, 2019 Got it. So, you applied it directly to the drives itself (where the sata cable connects to). First "red", then "gold" right thereafter. No drying or anything, just apply one after the after? It is a new thing that I haven't tried out yet. While I still suspect the over-heating of the controller card to be the primary issue, it's worth to give it a shot to rule out all options. Also, the over-heating may be even more severe since I added a meaningful number of 10Tb drives. I remember that a few years ago, I could solve the issue by only connecting 7 instead of 8 disks (or running two controller cards each with 4 disks). I believe this may have helped to reduce heat? Quote Link to comment
itimpi Posted August 20, 2019 Share Posted August 20, 2019 1 minute ago, steve1977 said: Got it. So, you applied it directly to the drives itself (where the sata cable connects to). First "red", then "gold" right thereafter. No drying or anything, just apply one after the after? It is a new thing that I haven't tried out yet I applied the Red version; left it for about 10 minutes; used a cotton bud to clean off any excess left; applied the Gold version; left that for about 5 minutes; plugged the drive back in. If I have any reason to take the system apart I might do the SATA cables as well, and possibly even those inside the hot-swap cages. Quote Link to comment
steve1977 Posted August 20, 2019 Author Share Posted August 20, 2019 Thanks, very helpful. Have you ever experienced any heat issues with your controller card? Even if just plugged in for a few minutes, it gets incredibly got (if touching the heat sink). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.