udma crc error count


Recommended Posts

Ever since upgrading to the latest version, I’m getting udma crc error count on my of my drives.

 

I reseated the cables, didn’t work.

 

I bought new SAS to SATA cables, still getting them.

 

It’s happening to every drive. Could the Dell Perc cars be bad?

 

Thanks!

 

 

Sent from my iPhone using Tapatalk

  • Like 1
  • Upvote 1
Link to comment

Chances are is just the updated reporting since 6.4.1.  Once acknowledged the error counts are known and are only flagged again if they go up.  The errors that you're seeing now are most likely historic and nothing to worry about.  See this thread and also the 6.4.1 release notes... 

 

 

Edited by S80_UK
Link to comment

Thank you!  So I think it's still ticking up.  That's why I think there is something wrong.  All those drives can't possibly bad, so I'm "guessing" it's the SAS card?  That is the only common denominator after I swapped the SAS cables.

 

Could the Dell PERC SAS card be the culprit for this?

 

OBTW, I originally thought I had a bad PSU so I swapped that out too.

 

Thanks!

Link to comment

Please, read before you do anything about your current hardware.  CRC errors only indicate a problem if they are increasing now!  Read this post:

 

If you acknowledge the error, you should not get another warning UNLESS you have a  new error.  As @johnnie.black says even an occasional one is not a problem.  In fact, you should never lose data because of a CRC error.  Once the error is detected, the data will simply be resent until it received correctly.  (I still have not quite figured out why they decided to highlight what is a minor problem in virtually all cases.)  

  • Like 1
Link to comment

Be sure that you have not tied the sata cables together to make it 'neat' on the inside of the case.  You should only do this if the cables are shielded and 99.9% of SATA cable made today are not!  I am not sure if SAS cables are shielded or not.  The problem is crosstalk between the cables.  (This is a problem with any server because all of cables will have information on them during many parity operations.) 

 

Next thing to double check is that all of the cables are firmly seated on their connectors (again, tying the cables together can contribute to a problem here.)  If you are using metal locking cables, check that they have friction when you pull on the drive end.  For reason, see here (And WD may not be the only manufacturer to have done this...):  

 

       https://support.wdc.com/knowledgebase/answer.aspx?ID=10477

 

  • Like 1
Link to comment
  • 3 weeks later...
  • 2 weeks later...

i have the same issue with a handful of drives (ie numbers continue to increase). i have even moved some to a diffferent controller (and therefore different cables) to no avail.

 

i think i'll just have to turn off notification since it is flooding my email with these alerts.

 

steve

 

 

 

Link to comment
2 hours ago, evans036 said:

i think i'll just have to turn off notification since it is flooding my email with these alerts.

 

It's good that it's flooding your mail - the very clear message to you is that you need to do something to solve the original problem.

 

You still have issues with controllers and/or disks and/or cables and/or the PSU - and the SMART data keeps ticking up because it isn't a problem that is safe to ignore.

 

When the counter ticks up, you can be lucky that the drive caught the transfer error - the scary thing is that some transfers may contain errors that the drive doesn't pick up. That means you read out - or write down - incorrect data.

Link to comment

Folks really need to understand that what a CRC error is that the data  going in one end of a wire is not coming out of the other end of the wire with the identical data.    It is not a software failure!  It is a hardware issue!  And it is relativity uncommon, but it is not totally unexpected.  (Otherwise, there would be no need for checking using CRC!)  The error itself is correctable, and that is always done.  

 

Better than 90% of the time, multiple error situations are cable related--Loose cables or crosstalk between cables being common causes.  There could be problems in the decoding and checking hardware but again that is rare and can be a bear to figure out exactly where it is. (I have had a flakey SATA expansion card in the last six months that threw up a couple of hundred CRC errors in a few hours as well as an End-to-End error.  So that possibility must always be kept in mind.)

 

For these reasons, each one of you who has a problem should post up in a new thread about the issues in your system.  Keeping track of several different investigations with only a slight chance of any two have the same cause and solution is very confusing. 

  • Upvote 2
Link to comment
28 minutes ago, Frank1940 said:

For these reasons, each one of you who has a problem should post up in a new thread about the issues in your system.  Keeping track of several different investigations with only a slight chance of any two have the same cause and solution is very confusing. 

Sorry I let this get out of hand.

 

This thread belongs to the OP, ChaOConnor. Anybody else who wants help with this should start a separate thread for themselves.

Link to comment
On 3/5/2018 at 7:54 PM, Frank1940 said:

If you acknowledge the error, you should not get another warning UNLESS you have a  new error.  As @johnnie.black says even an occasional one is not a problem.  In fact, you should never lose data because of a CRC error.  Once the error is detected, the data will simply be resent until it received correctly.  (I still have not quite figured out why they decided to highlight what is a minor problem in virtually all cases.)  

 

I skipped 6.4.1 and went to 6.5 and have two drives with crc errors. Would you mind describing how to acknowledge them? I see the crc counts in the smart listing on the affected drives' pages highlighted in orange, but there are no buttons, etc, to acknowledge the error.

 

Thanks,

- Eric

Link to comment
3 minutes ago, eweitzman said:

 

I skipped 6.4.1 and went to 6.5 and have two drives with crc errors. Would you mind describing how to acknowledge them? I see the crc counts in the smart listing on the affected drives' pages highlighted in orange, but there are no buttons, etc, to acknowledge the error.

 

Thanks,

- Eric

If you click on the orange icon in the Dashboard page there is an option to acknowledge the value there.

  • Like 2
Link to comment
  • 1 month later...

I'm on 6.5.2 and a week ago I started getting a lot of these errors and then write errors which offlined my newest drive, only in service a month. I replaced the drive and have been fine until now I received 1 UDMA CRC error with the brand new drive. Diagnostics on the "old" new drive passed on a different system with only 3Gb/s SATA. In fact, I just RMA'd the drive and shipped it today. I had actually tried to put it back in service but as soon as unRAID booted back up it started to get UDMA CRC errors again on a different tray.

 

I'm running an IBM M1015 reflashed to an LSI 9211-8i. First time I've ever seen this kind of thing.

 

Since this is something that all of a sudden just started happening I'm not sure what to do. The two drives I've been having issues with are in a 4 slot drive cage. The other 5 drives are direct connected. The drives I've been having trouble with are new old stock HGST 7K4000 2TB drives. Not sure whether the cage or the controller or SAS to 4xSATA cable could be the issue? I have a shorter cable I could try.

 

For now it's just a single error so nothing has gone offline again. This is still troubling.

Edited by Taddeusz
Link to comment

How is you M1015 installed? These controller gets very warm, even under idle. With rising ambient temperatures this could also be the culprit. I would highly recommend installing the Noctua NF-A4x10 FLX on top of the heat sink. If you are up for it replace the thermal compound with something better. Mine was dried up and flaky. I used a small cable tie to hold the fan in place, but others are using screws (image is not from a M1015, but results are pretty much the same) :

 

lsi-9212-4i-2-700.jpg

 

I leave it running at full speed, it is rather quiet and the heat sink is cool to the touch, even under load.

 

Try the cables first, as suggested by @johnnie.black, but maybe also consider installing the fan.

Link to comment

Well, had my newest drive offlined Friday night. Spent a lot of yesterday troubleshooting. I've come to the conclusion that there's something wrong with the backplane in my drive cage. It's a Rosewill RSV-SATA-Cage-34. I removed the backplane and rigged up some fans for cooling. Direct connected all the drives. It ran a rebuild just fine with no more UDMA CRC errors.

 

I tried to replace the thermal compound on my SAS card but they used thermal glue so it's not coming off. I've got a couple drive cages coming but they don't have backplanes. Also Rosewill but really low cost.

Link to comment
  • 1 month later...
4 hours ago, bombz said:


Any thoughts on my post?

unRAID Disk 10 SMART health [199]    Warning [UNRAID] - udma crc error count is 8731    WDC_WD20EARS

I have done cable replacements, the drive itself has 12GB free, could that be a prob? 
The drive is old.

 

Both of your posts in both threads are a bit out of the wild blue wonder.  (You should really have started a new thread...) The basic question that always has to be asked for any SMART 199 errors, it this:  Is the count continuing to increase?   The reason being that this counter can never be reset.  The Error itself is a nuisance more than anything else.  The error will always be automatically fixed by requesting that the data be resent until CRC code is correct.  The only real problem is that it slows down the data transfer rate.  If it is an occasional error, this delay is insignificant in the bigger picture.  Unfortunately, you will get a warning every time unRAID reboots (and, as I recall, with the periodic status reports) unless you turn them off.  (Where to do this at the moment escapes me but I think it was on the Dashboard page...)

Edited by Frank1940
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.