Number of CRC Errors, when to RMA?


TyantA

Recommended Posts

My 8TB Toshiba Parity drive that I got about a year ago threw 5 CRC errors pretty close to each other, then nothing for a while. Within the last two months, it inched up to 8. Doesn't seem like much but I don't like the trend. 

 

So I order a new drive that finally shows up. I replace it with a 8TB RED and the build finished this morning. 

 

During the rebuild, I see that another drive, a Seagate 4TB drive's error count is now 314! That seems far more concerning. I also happen to have ordered a 4TB RED as well, so I'll get to subbing that one in today. 

 

My question is: what's the threshold for concern? If they're still under warranty for a while, is it OK to let them go, or best practice to replace them as soon as a drive shows signs of CRC errors? 

 

I intend to replace these under warranty and re-deploy them to replace smaller disks. 

Thoughts? 

Link to comment

I would also suggest double checking that ALL of the SATA connectors are firmly seated after working around any cabling inside the server!  (The SATA design is a poster child for how-not-to design a connector!) Be wary of locking SATA connectors as some drives don't actually have the plastic shroud to 'lock' the connector in place.  (If the connector does not 'lock', use a standard non-locking SATA cable.) 

 

 

Edited by Frank1940
Link to comment
  • 5 weeks later...

So... my drives aren't behaving very nicely these days. 

 

Since I posted: 

  • I did some transfers to the array which generated 2656 errors on one drive; a HGST 6TB drive.
  • In around the same timeframe, the unraid notified me of a CRC error count of 64, then the very next message said 835. 
  • Later that day, a parity check ran, reporting 0 errors. 

I'm not sure what to make of this. 

The other hardware change was installing a Sapphire RX 470 GPU in the last motherboard slot, close to the Dell Perc controller card. It's possible heat is taking its toll. 

 

Not sure how I should proceed.   

Link to comment

As of right now, I have not, but I did re-seat them. There have been no more issues with the original 8TB drive. It's now this 6TB drive that's acting up with CRC errors + actual read errors. 

 

I'll dig up my drive map and figure out which one the 6TB is. I'm sure the parity drive goes straight to the board while the newly problematic 6TB drive, I believe, is on the Dell H310 controller. To replace those cables, I'd have to get new SFF-8070 to sata cables like these I'm guessing. I have spare SATA but none of those lying around.

 

I think next, I'll either take the previous parity 8TB Toshiba and either run preclear or other diagnostics to see that it's still OK before changing the cable (if I can) and getting it to replace the 6TB drive. Once the 6TB is out, I can do the same to verify its condition.

Link to comment
1 hour ago, johnnie.black said:

Like mentioned above UDMA CRC errors are a communication problem, most times a bad SATA cable, replacing that is the first thing to try.

I can confirm on that, I had this issue with one of my SAS to 4xSATA cable, it gave me a couple of CRC errors every now and then on one of the disks, and before the new cables I ordered arrived, the CRC error count had gone up to 57.

Since I replace the SAS to 4xSATA cables, I actually replaced both of the two cables on on the controller, the disk haven't got a sigle CRC error.

 

If anyone is interested, this is the cable I ordered a pair of: https://www.ebay.com/itm/CableDeconn-50-cm-Mini-SAS-36p-SFF-8087-zu-4-SATA-7Pin-90-Grad-Target-Festpla/223499837798  

Link to comment
On 1/28/2020 at 2:36 AM, johnnie.black said:

Like mentioned above UDMA CRC errors are a communication problem, most times a bad SATA cable, replacing that is the first thing to try.

I'm less concerned about the CRC errors after you mentioned this and more concerned about the 2656 errors - in the "error column" within unraid that have cropped up on the 6TB drive. It has both - errors in that column plus CRC errors; perhaps that's related, perhaps not. 

Link to comment
On 1/28/2020 at 3:48 AM, gberg said:

I can confirm on that, I had this issue with one of my SAS to 4xSATA cable, it gave me a couple of CRC errors every now and then on one of the disks, and before the new cables I ordered arrived, the CRC error count had gone up to 57.

Since I replace the SAS to 4xSATA cables, I actually replaced both of the two cables on on the controller, the disk haven't got a sigle CRC error.

 

If anyone is interested, this is the cable I ordered a pair of: https://www.ebay.com/itm/CableDeconn-50-cm-Mini-SAS-36p-SFF-8087-zu-4-SATA-7Pin-90-Grad-Target-Festpla/223499837798  

I guess I'll order another set as well, if only for backup. While I don't doubt it can be a cable issue, it's curious that it would arise when not much else has changed / moved. Unfortunately the link you posted does not ship to Canada, but I'll look for something similar. 

Link to comment
2 hours ago, TyantA said:

I guess I'll order another set as well, if only for backup. While I don't doubt it can be a cable issue, it's curious that it would arise when not much else has changed / moved. Unfortunately the link you posted does not ship to Canada, but I'll look for something similar. 

Nothing has to be visibly moved / changed for CRC error to happen. For example:

  • Cable that are bent at awkward angle may not break immediately but the stress will cause it to fracture over time
  • Connection may come lose over time due to HDD vibration or just by gravity
  • Oxidation of connector pins
  • Poor soldering quality

All it takes really is for the connection to break by a tiny bit and you will have CRC errors.

Link to comment
2 minutes ago, testdasi said:

Nothing has to be visibly moved / changed for CRC error to happen. For example:

  • Cable that are bent at awkward angle may not break immediately but the stress will cause it to fracture over time
  • Connection may come lose over time due to HDD vibration or just by gravity
  • Oxidation of connector pins
  • Poor soldering quality

All it takes really is for the connection to break by a tiny bit and you will have CRC errors.

Fair enough. 

 

I guess the relevant question re: 6TB drive is are errors reported in the "Errors" column in Unraid's Main page different / unrelated / related to the CRC errors reported? To me those are read/write errors which, presuming I have a cable not doing its job, could be to blame. I'm just not sure if it's possibly something more concerning. 

 

Anyway, placing an Amazon order tonight; I'll see if I can pick up a decent set of replacement cables for my controller card. 

Link to comment
16 hours ago, TyantA said:

I guess I'll order another set as well, if only for backup. While I don't doubt it can be a cable issue, it's curious that it would arise when not much else has changed / moved. Unfortunately the link you posted does not ship to Canada, but I'll look for something similar. 

Here you got what looks to be the same cable, but much cheaper, on the downside there may be long shipping time from china. https://www.ebay.com/itm/0-5M-Mini-SAS-36P-SFF-8087-to-4-SATA-7Pin-90-Degrees-Target-Hard-Disk-Data-Cable/193011984946

Link to comment

I ended up grabbing a cheaper 2 pack from amazon today. We'll see how they work out. I ordered before reading the earlier replies but did end up getting 50cm cables. There were shorter options but I felt that would be a little too short. The only thing these cheaper ones didn't have were the locks. Maybe I should have opted for ones with them. Oh well! There were good reviews from people using them in Unraid with controller cards so... we'll see! 

 

Upon verifying... it looks like I somehow ended up with the 1m long ones. That's excessive; I may cancel my order! 

Link to comment

They wouldn't let me cancel but told me to order the shorter version & send back the longer ones. Both arrived yesterday, I just haven't had time to put them in the server yet. They seem short, but we'll see. Maybe I'll use this opportunity to re-arrange the drives in my server; put the ones on the controller closer to the bottom of the motherboard side of the case while the motherboard sata cables, being closer to the cages, can probably reach the right cage. 

 

I have auto parity checks set to go the first of the month; the Feb check ran with no further CRC or read errors. 

 

Hopefully today or tomorrow I'll get to running a pre-clear on the original Toshiba 8TB drive I started this thread about to assess (with a new cable) if it is in fact failing or was just a connection issue all along. 

Link to comment
On 1/31/2020 at 9:24 AM, Frank1940 said:

You should probably read this:

      https://support-en.wd.com/app/answers/detail/a_id/15954

 

I would.  Get a .5M one.

I missed this. 

 

Interesting; I have yet to come across WD drives w/o a shroud, good to know! 

 

Got the new 50cm cables installed last night. Without a drive location reorg, they *just* reached the furthest drives. I'll definitely want to move things around as right now the sata cables are strewn across the CPU fan. But they work. 

 

Next up, I'm going to be swapping out as many of the cables going to my motherboard's SATA ports as I can with new ones I have lying around. May as well; I value my data. 

 

I've cleared the errors and am nearly done pre-clearing the initial 8TB (previous parity) drive with hopes of being able to re-introduce it to the array and bump a smaller drive. 

 

A parity check was just done at the beginning of the month "finding 0 errors" so I'm presuming I'm good to go to swap out the data disk. 

Link to comment

By the way, Unraid 'tracks' assigns the physical drive to the logical disk # location by the drive's serial number.  So you can move cables to clean up the cable tangle.  I would just keep all of the SSD's on the MB SATA connections.  (Some older LSI  controllers don't handle some SSD functions well.)  After you do this, double (and triple) check that all SATA data and power cables are secure.  Disturbing a SATA cable can loosen it so that it loses connection at a later time!!!

Link to comment
14 hours ago, jonathanm said:

Not a good thing. The connectors don't like any stress, the cables should be relaxed and the connectors perfectly square to the drive.

That's true. I wouldn't call it stress, just the path across the CPU fan is less than ideal. They do sit square on the drive, but it's not how I like to arrange a system. 

Link to comment
14 hours ago, Frank1940 said:

By the way, Unraid 'tracks' assigns the physical drive to the logical disk # location by the drive's serial number.  So you can move cables to clean up the cable tangle.  I would just keep all of the SSD's on the MB SATA connections.  (Some older LSI  controllers don't handle some SSD functions well.)  After you do this, double (and triple) check that all SATA data and power cables are secure.  Disturbing a SATA cable can loosen it so that it loses connection at a later time!!!

Thanks, yes, right now I have 2 SSDs and am planning to add a third to finally start a cache pool (the other is for VMs). I had left a spot open on the motherboard just for such occasions as preclearing (which finally just finished; I think my 8TB drive is good!) and for the eventual second cache SSD. 

 

I certainly *could* simply move the cables about, temporarily for sure, I just have a certain order to thinks that I'd like to keep. Just need to find some time to get re-organized :). It'll be a good time to put in the new fans I picked up too. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.