Hastor Posted May 22, 2021 Share Posted May 22, 2021 (edited) I've been running this array, at least with all the disks in place etc, for I guess a little over a week. My 1st parity drive triggered emails to me about UDMA CRC errors today when mover ran. It reported 12517 errors. I turned off, reseated some cables, rebooted, and it reported 449 errors. Did another cable reseat/reboot and now running a SMART test on it. No errors reported since the last couple reboots. This was a somewhat small move, as I've moved a lot over the past couple days, filling the cache drive. I don't know exactly, but I don't think I had more than 20GB or so this time when it happened. Just thoughts on what I should look into to declare this 'probably a cable issue' or 'replace the drive'. This drive is about a year old, not as old as the 10TB drives in my array. Being parity, it was getting worked harder though. This is a Seagate Ironwolf. If a Smart test passes, should I go with it, or look into replacing this drive? Just noticed the spike in prices for 16TB drives just since I bought a couple more a little over a month ago. I can't afford to replace it now. In that case, I guess my option is to shut things down and wait for drive prices to be normal again, or go single-parity, which would require being unprotected while it rebuilt. I'd be down for any other suggestions! Attaching Diagnostic. I did disconnect some other drives during a few reboots, as I forgot which physical drive this was. Array doesn't auto-start so it didn't start missing any disks, other than the 1st parity disk that was taken off due to the errors. This diagnostic is taken right after I started the extended Smart report, which is still running. If we need to wait on that, it's fine. Just 'first failure' concerns and don't want to spend money I don't need to or give up a drive it is ok. This drive, along with some others in the array, were just moved from a Drobo that ran for a couple years with no problem. A couple of the drives are brand new as well. I appreciate any insight or thoughts! I know this might be a little premature with the SMART still running, but if anything can be told from the logs, I'd like to know. Of course with this being rather new, it's worried me! I guess it is still under warranty with Seagate, but I've never used that process and don't know what their criteria is to consider a drive 'not working' megachurch-diagnostics-20210522-1219.zip Edited May 22, 2021 by Hastor Quote Link to comment
John_M Posted May 22, 2021 Share Posted May 22, 2021 The disk looks fine. Replace the SATA cable. The SMART extended self-test checks the disk surface and heads. That isn't where UDMA CRC errors occur. They happen in the SATA connection between the controller and the drive electronics and can be caused by a bad controller, bad drive electronics or bad cable/connections. Most of the time the culprit is the cable, which is fine because it's the cheapest and easiest component to replace. While you're there, make sure the power cable is firmly seated too. 1 Quote Link to comment
Hastor Posted May 22, 2021 Author Share Posted May 22, 2021 It's a breakout cable from a sad controller to 4 drives. Only this drive has reported issues. Almost sounds like I should do a parity verification after the reseat and see how it goes. This cable had only been in use a month or so with no issues and not moved recently. Should pick up a spare though. Quote Link to comment
Frank1940 Posted May 22, 2021 Share Posted May 22, 2021 (edited) Did you bundle the cables up to make things look neat? This can lead to problems-- particularly, if you got 1M cables and only needed about 12" (.3M) to make the connection. EDIT: One more thing. Make sure you have slack in SATA data cables. They are notorious for coming loose when tight due to hard drive vibrations. Edited May 22, 2021 by Frank1940 Quote Link to comment
Hastor Posted May 22, 2021 Author Share Posted May 22, 2021 (edited) 8 minutes ago, Frank1940 said: Did you bundle the cables up to make things look neat? This can lead to problems-- particularly, if you got 1M cables and only needed about 12" (.3M) to make the connection. I did not bundle them but these connectors come with the cables in a single bigger cable that breaks out for the last few inches. That's how they are always made and used. The shortest I can find anyone even making is 1.6ft. which is what I'm using. This is for an HBA330 NON-RAID with the SFF-8643 connectors. Seems to be industry standard to make them this way, should I cut off the 'sleeve' holding the cables together? Of course they'll always be bundled together on the end that plugs in the controller. Edited May 22, 2021 by Hastor Quote Link to comment
Frank1940 Posted May 22, 2021 Share Posted May 22, 2021 (edited) 1 hour ago, Hastor said: I did not bundle them but these connectors come with the cables in a single bigger cable that breaks out for the last few inches. That's how they are always made and used. If it was made that way, (hopefully), they select shielded cables. If the after-breakout length is short, make sure that you can make the connections without any backwards strain the connections. (Any back strain will allow the connector to work loose using the normal vibration of a working drive.) You can test the 'connection' by gently pulling on the cable to see if there is some 'resistance' to removing the data connector from the socket out of the drive. (There are a couple of plastic nubs in the SATA cable that are suppose to provide an interference fit to provide the retention force.) Edited May 22, 2021 by Frank1940 Grammer Quote Link to comment
Frank1940 Posted May 22, 2021 Share Posted May 22, 2021 Cable swaps can be used to see if the issue follows a given connector... 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.