Gog Posted January 24, 2020 Share Posted January 24, 2020 I just noticed a disabled drive but I don't know if I have a cable issue or if the drive is dying. Can someone with SMART knowledge read my disk report and guide me on change cable and reseat vs trash the drive? Thanks WDC_WD40EFRX-68WT0N0_WD-WCC4EF24PX5J-20200124-0804.txt Quote Link to comment
trurl Posted January 24, 2020 Share Posted January 24, 2020 SMART for that disk looks OK. But you should always go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post. Diagnostics include SMART for all disks, syslog that might give a better idea of what happened (if you haven't rebooted), and many other things that give a more complete understanding of your situation. I will wait on the diagnostics before making any recommendations about how to proceed. Quote Link to comment
Gog Posted January 25, 2020 Author Share Posted January 25, 2020 Thanks for the reply, complete diagnostics attached. I've had a number of CRC errors on two disks, but not this one. tower-diagnostics-20200124-1948.zip Quote Link to comment
trurl Posted January 25, 2020 Share Posted January 25, 2020 Most of your disks are very full, and some are still ReiserFS. Why are you logging Mover? 3 hours ago, Gog said: I just noticed a disabled drive Jan 15 03:53:42 Tower kernel: md: disk3 write error, sector=1953336760 Looks like disk3 got disabled Jan 15. Do you not have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? You also have problems communicating with cache and disk1. Are these all the same controller? Disk3 needs to be rebuilt of course, especially since it is out-of-sync more than a week. But I'm not confident about the rebuild with these other issues. Jan 8 19:31:51 Tower kernel: ata1.00: ATA-10: KINGSTON SA400S37480G, 50026B778227C383, SBFK71B1, max UDMA/133 Jan 8 19:32:09 Tower emhttpd: import 30 cache device: (sdi) KINGSTON_SA400S37480G_50026B778227C383 Jan 23 07:08:31 Tower kernel: ata1.00: exception Emask 0x10 SAct 0x1800000 SErr 0x280100 action 0x6 frozen Jan 23 07:08:31 Tower kernel: ata1: hard resetting link ... Jan 8 19:31:51 Tower kernel: ata2.00: ATA-9: HGST HDN726060ALE614, K8H5GNMD, APGNW7JH, max UDMA/133 Jan 8 19:32:08 Tower kernel: md: import disk1: (sdj) HGST_HDN726060ALE614_K8H5GNMD size: 5860522532 Jan 23 03:38:34 Tower kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Jan 23 03:38:34 Tower kernel: ata2: hard resetting link Let's see if @johnnie.black is still awake and if he has anything to say about your controller or suggestions about how to proceed. Quote Link to comment
Gog Posted January 25, 2020 Author Share Posted January 25, 2020 Quote Most of your disks are very full, and some are still ReiserFS Yes, new drives are xfs but I'm not actively migrating data. I just remove the smallest drive when I add a new one. Quote Why are you logging Mover? I was tracking an odd behavior a while ago and forgot to mute the mover. Quote Looks like disk3 got disabled Jan 15. Do you not have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? I do but missed that email. I did an inbox cleanup and here we are Quote You also have problems communicating with cache and disk1 yes, these are the CRC errors I mentioned Quote Are these all the same controller? Not sure, I'll have to power down the server to pull the drawers to verify Quote Link to comment
trurl Posted January 25, 2020 Share Posted January 25, 2020 12 minutes ago, Gog said: I do but missed that email. I did an inbox cleanup and here we are Set Array status notification for every day. You would have gotten a new email every day that told you "Array Health FAIL" Quote Link to comment
JorgeB Posted January 25, 2020 Share Posted January 25, 2020 Disk1 and cache need a new SATA cable Jan 9 03:23:49 Tower kernel: ata1: SError: { UnrecovData 10B8B BadCRC } ... Jan 12 11:40:04 Tower kernel: ata2: SError: { UnrecovData 10B8B BadCRC } It's also highly recommended to update the LSI to latest firmware p20.00.07.00, all earlier p20 releases have known issues and possibly what got disk3 disabled. Quote Link to comment
Gog Posted January 26, 2020 Author Share Posted January 26, 2020 On 1/25/2020 at 2:54 AM, johnnie.black said: Disk1 and cache need a new SATA cable Jan 9 03:23:49 Tower kernel: ata1: SError: { UnrecovData 10B8B BadCRC } ... Jan 12 11:40:04 Tower kernel: ata2: SError: { UnrecovData 10B8B BadCRC } I replaced those cables Disk 3 is on the LSI but disk 1 and cache were not. Quote It's also highly recommended to update the LSI to latest firmware p20.00.07.00, all earlier p20 releases have known issues and possibly what got disk3 disabled. I'm on p20.00.02.00, trying to get the p20.00.07.00 from a reliable source but supermicro's ftp is refusing connections from my IP. These instructions are bang on except I can't get the binaries: https://www.ixsystems.com/community/threads/flashing-the-lsi2308-firmware-on-a-supermicro-x10sl7-f-motherboard.38884/ I found https://www.mediafire.com/?py9c1w5u56xytw2 that gives a procedure to upgrade LSI SAS 9211-8i to p20.00.07.00. Do you know id the same firmware works on my controller(LSI 2308) ? The broadcom website is not really helpful Quote Link to comment
JorgeB Posted January 27, 2020 Share Posted January 27, 2020 You can get the package from Broadcom's support site, under legacy controllers. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.