Jump to content

Failing hard drive?


Recommended Posts

I have a array of two parity drives (2 x WD Red Plus 6 TB) and data five data drives (2 x WD Red Plus 4 TB, 1 x WD Green 4 TB, 1 x Toshiba NAS 4 TB, 1 x WD Blue 5 TB).  For 2 months now the WD Blue 5 TB has started to generate UDMA CRC errors. It is now up to 40. When the thumb turns orange I do acknowledge it and the thumb turns green.

 

My scheduled monthly parity checks report everything is fine.

 

I did do as suggested on this forum in other threads.

 

I switched the sata cable with a brand new one.

Errors returned.

 

I then moved the sata cable from the motherboard's sata port to a port on my PCIE sata controller.

Errors returned.

 

I then removed the drive from a power connector on one of my sata power cables. And plug it into an empty power connector on a different power sata cable.

Errors returned.

 

I do have a brand new WD Red Plus 6 TB drive which has not been used yet.

 

Before I swap the failing WD Blue 5 TB with the new WD Red Plus 6 TB is there anything else I could try?

Since the WD Diag Tools are for Windows I can not run these from the Slackware OS. I did install the DiskSpeed docker and the failing drive's performance appears to be fine via DiskSpeed.

 

Sorry to bring up a topic that appears to be thrashed to death, but I'd rather ask here then blindly proceed.

 

I recently upgraded to Unraid 6.10.3 but these errors were being reported with 6.10.2 as well.

 

Thank you for your time.

 

Link to comment
1 hour ago, Vetteman said:

Errors returned.

 

First thing, CRC errors are not fatal errors.  They are always corrected or the system comes to a halt with a read error.   If there are a lot of CRC errors, the time to correct them can become an issue.  (The data is reread from the disk and resent until it gets there correctly.)  They are almost never caused by a problem with the hard disk!  One of the problems is that the number of detected CRC errors is stored on the hard disk and that counter can not be reset.  (In Unraid, we can knowledge the current count and Unraid will not notify us until the count increases.)

 

Second thing.  You say the errors returned.  Does this mean that the count is actually increasing or that you 'noticed' that the disk was showing the errors.  BTW, forty errors is not an excessive number.   What period of time are we talking about here-- hours, days, weeks, months? 

 

One thing to make sure of is that you have not bundled your SATA data cables together in an effort to make things look 'neat'.  Crosstalk between cables can cause CRC errors.  You also mentioned that it was a WD drive.  You might want to read this Support Article from Western Digital:

 

    https://support-en.wd.com/app/answers/detailweb/a_id/15954

 

You can also run the long SMART test from the     Self-Test    tab for that disk after clicking on that disk from the    Main   tab of the Unraid GUI.   That will perform basically the same test of the WD Tools.

Link to comment

Thanks kindly for the reply.

 

When I said the errors returned, I mean the number of errors increased. What started out as 1 error is now 40. I did check for crossed cables and made sure the cable for the drive is not touching another sata cable.

 

Presently running a SMART extended test on the drive.

 

Again many thanks...

Link to comment

The SMART extended test reports "completed without error".

 

Tried different sata cables, different power connections, different sata ports (mobo, pcie controller), traced cable to ensure it is not crossed with another and it is not. Not sure if my internal sata cables are shielded or not. Too bad esata have a different sata connector as I have a few esata shielded cables I could use.

 

So I am perplexed and very concerned why I am getting the UDMA CRC errors.

Link to comment
1 hour ago, Vetteman said:

Using locking cables.

Read that article that I posted the link to carefully.   If the drive does not have the shroud molded into it, a metal locking connector will allow a 'floating' connection as the metal locking connectors seldom have the internal locking nibs.  This floating connection is unreliable in maintaining a good connection as the drive will vibrate due to the rotation of the platters.   (The SATA connector system is a poster child on how not to design any connector system!!!) 

 

You can test for this condition by gently pulling on the cable.  With the non-locking cable, You should be able to feel some slight resistance as the cable disconnects.  No resistance means a poor connection.  With a locking connector, you should not be able to pull the connector out with any reasonable force.

Link to comment

(Side Note:  Any locking cabling that does not have the internal "bumps" violates the spec of the cable and should be avoided in the first place - FWIW, every single locking cable I've ever had that comes included with a motherboard is missing those bumps)

Link to comment

In this photo it shows the WD hard drive SATA connectors that do not have the shroud and should not use the locking connectors but says for this type of connector "use the SATA cable receptables with internal bumps like the images above".  Yet there are NO images above of the SATA cables.

WD-NO-BUMPS.JPG

Link to comment

I have in my hands two SATA cables. A blue one with the metal latches and a red cable with black connectors that does not have the metal connectors.  And other than the latches, they look identical.

 

I see the differences between the SATA connectors on the hard drives for latching and non latching. The shroud is in plain sight. 

 

But with the cables, the only differences I can see are the latches.

 

I've checked Amazon.ca and Newegg.ca for latches and unlatched sata cables and the only differences are the latches.

 

I will check the connector on the WD Blues 5TB.

Edited by Vetteman
Link to comment

The nibs are down inside the female connectors on the cable.  They are the same color as the plastic of the connector housing.  They are very small and quite hard to see unless you shine a light down inside of the connector.  If there are the proper nibs on the cable, when you pull the cable off of the drive, you can feel a slight resistance until the connector releases.  (The nibs are designed so that the cable has a slight interference fit when it is inserted into the plug on the drive.  Because these parts are made with thermoplastic, the design can only allow a very minimum of interference before the design would have problems with cold flow of the thermoplastic.)

 

It is generally accepted that ALL cables without the metal latch have the plastic nibs.  Furthermore, it also accepted that any cable with the metal latches does NOT have the plastic nibs.  

 

The problem comes with those certain models of WD drives that do not have the shroud. ( The shroud is not a part of the SATA connector spec.)  However, it is present on all other hard drives.  I have grabbed a screenshot of that shroud and circled it.

image.png.d2e50ba044cd9811443c627d89128057.png

The metal tab of the latch type cable pushes against the shroud to force the mating connectors together (the nibs perform the same service).  (The barbs on the latch also provide additional locking function to prevent the connector from working loose.)  As you can visualize, with the shroud missing, there is no connector force as well as well as no locking function. 

 

There is another picture in the link of a WD drive without the shroud.  It may look like the shroud is there but a careful inspection will show that the opening is much larger and the latch will not function properly.   If you can pull a latching connector off the drive with little to no force, the drive does not have the shroud!

Link to comment

Looking over the thread, I think there is one more thing to try.  Swap the drive end of the SATA cable (connected to the drive with the CRC errors) to another drive.  Take the SATA cable from that drive and connect it to the drive with the CRC errors.  Now let's see where the CRC errors occur. 

Link to comment

You might actually have a most highly unusual situation.  You have a hard drive that does generate CRC errors when transferring data.  (It occurs so rarely and the cost to repair/test is so much greater than most of the other  much more likely causes-- SATA cables and SATA controllers-- that we avoid even considering it until it is the last possible thing left...)

 

Edited by Frank1940
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...