Jump to content

Getting First CRC errors on new array


Recommended Posts

I've been running this array, at least with all the disks in place etc, for I guess a little over a week.

My 1st parity drive triggered emails to me about UDMA CRC errors today when mover ran. It reported 12517 errors. I turned off, reseated some cables, rebooted, and it reported 449 errors. Did another cable reseat/reboot and now running a SMART test on it. No errors reported since the last couple reboots. This was a somewhat small move, as I've moved a lot over the past couple days, filling the cache drive. I don't know exactly, but I don't think I had more than 20GB or so this time when it happened.

 

Just thoughts on what I should look into to declare this 'probably a cable issue' or 'replace the drive'. This drive is about a year old, not as old as the 10TB drives in my array. Being parity, it was getting worked harder though. This is a Seagate Ironwolf.

 

If a Smart test passes, should I go with it, or look into replacing this drive? Just noticed the spike in prices for 16TB drives just since I bought a couple more a little over a month ago. I can't afford to replace it now. In that case, I guess my option is to shut things down and wait for drive prices to be normal again, or go single-parity, which would require being unprotected while it rebuilt. I'd be down for any other suggestions!

Attaching Diagnostic. I did disconnect some other drives during a few reboots, as I forgot which physical drive this was. Array doesn't auto-start so it didn't start missing any disks, other than the 1st parity disk that was taken off due to the errors. This diagnostic is taken right after I started the extended Smart report, which is still running. If we need to wait on that, it's fine. Just 'first failure' concerns and don't want to spend money I don't need to or give up a drive it is ok. This drive, along with some others in the array, were just moved from a Drobo that ran for a couple years with no problem. A couple of the drives are brand new as well.

 

I appreciate any insight or thoughts! I know this might be a little premature with the SMART still running, but if anything can be told from the logs, I'd like to know. Of course with this being rather new, it's worried me! I guess it is still under warranty with Seagate, but I've never used that process and don't know what their criteria is to consider a drive 'not working'

megachurch-diagnostics-20210522-1219.zip

Edited by Hastor
Link to comment

The disk looks fine. Replace the SATA cable. The SMART extended self-test checks the disk surface and heads. That isn't where UDMA CRC errors occur. They happen in the SATA connection between the controller and the drive electronics and can be caused by a bad controller, bad drive electronics or bad cable/connections. Most of the time the culprit is the cable, which is fine because it's the cheapest and easiest component to replace. While you're there, make sure the power cable is firmly seated too.

  • Like 1
Link to comment

It's a breakout cable from a sad controller to 4 drives. Only this drive has reported issues. Almost sounds like I should do a parity verification after the reseat and see how it goes.

This cable had only been in use a month or so with no issues and not moved recently. Should pick up a spare though.

Link to comment

Did you bundle the cables up to make things look neat?   This can lead to problems-- particularly, if you got 1M cables and only needed about 12" (.3M) to make the connection.

 

EDIT:  One more thing.  Make sure you have slack in SATA data cables.  They are notorious for coming loose when tight due to hard drive vibrations.

Edited by Frank1940
Link to comment
8 minutes ago, Frank1940 said:

Did you bundle the cables up to make things look neat?   This can lead to problems-- particularly, if you got 1M cables and only needed about 12" (.3M) to make the connection.

I did not bundle them but these connectors come with the cables in a single bigger cable that breaks out for the last few inches. That's how they are always made and used. The shortest I can find anyone even making is 1.6ft. which is what I'm using. This is for an HBA330 NON-RAID with the SFF-8643 connectors.

 

Seems to be industry standard to make them this way, should I cut off the 'sleeve' holding the cables together? Of course they'll always be bundled together on the end that plugs in the controller.

Edited by Hastor
Link to comment
1 hour ago, Hastor said:

I did not bundle them but these connectors come with the cables in a single bigger cable that breaks out for the last few inches. That's how they are always made and used.

If it was made that way, (hopefully), they select shielded cables.  If the after-breakout length is short, make sure that you can make the connections without any backwards strain the connections.  (Any back strain will allow the connector to work loose using the normal vibration of a working drive.)

 

You can test the 'connection' by gently pulling on the cable to see if there is some 'resistance' to removing the data connector from the socket out of the drive.  (There are a couple of plastic nubs in the SATA cable that are suppose to provide an interference fit to provide the retention force.)

 

 

Edited by Frank1940
Grammer
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...