udma crc error count


Recommended Posts

4 hours ago, Frank1940 said:

The error will always be automatically fixed by requesting that the data be resent until CRC code is correct.

 

Just note that there is a limit to how large errors the CRC can detect.


A well selected x bit wide CRC polynomial can catch all odd bit errors and a single burst error up to x-1 bits long. But if there are two bit errors that are further away than the burst capacity, then the CRC may start to accept broken packets.

 

For a busy system with a single CRC error now and then, that isn't a problem, because the probability of bit errors is then low and the probability of having multiple bit errors in the same packet is then even lower. But for a system with constantly ticking CRC, there is a danger that a packet may have more than one bit error at different locations in the packet - and this may result in a broken packet being accepted.

So it's generally ok to have a system where the counter ticks up once/month while it's directly dangerous to have a system where the CRC counter ticks hundreds of times/day. Each user must then decide where their comfort zone is, when transfer errors must be fixed.

 

Note that the frequency of CRC errors must be put in relation to the number of transfers - an idle disk that ticks a CRC error or two every day could produce a huge number of errors if a TB-sized transfer is started.

  • Like 1
Link to comment
9 hours ago, Taddeusz said:

From the dashboard screen you can acknowledge the warning by clicking on the warning icon.

And once acknowledged, you will get new warnings when the count increases, which is what you want. If not increasing then no new problem. Note that this is just for CRC errors. You typically don't want to acknowledge other SMART errors since they may mean you need to replace the disk.

Link to comment
  • 1 year later...

Hello, 
I have acknowledged the error, which changed from 0 to 1
If the disk trips again that would be concerning to replacing the disk?
This disk is on the LSI SAS 9211-8i breakout cable. their not easy cables to swap (I'd have to order new breakout cables). 
Is it advisable to replace the cable or just the disk (If the errors continue to rise)

udma crc error count returned to normal value

array health report [PASS]

smart.JPG

error.JPG

error2.JPG

Edited by bombz
Link to comment
19 minutes ago, bombz said:

This disk is on the LSI SAS 9211-8i breakout cable. their not easy cables to swap (I'd have to order new breakout cables). 

 

9 minutes ago, trurl said:

This is a connection issue. If it only increases a small amount infrequently then I just acknowledge it, otherwise look for the problem causing the connection issue.

This error can also be caused by cross-talk between SATA cables.  Be careful when dressing cables that you not tie cables tightly together--  Particularly those 1M long ones. Plus tying them together makes them more prone to work loose with vibration...

  • Like 1
Link to comment
  • 7 months later...
On 6/15/2020 at 8:58 AM, bombz said:

Hello, 
I have acknowledged the error, which changed from 0 to 1
If the disk trips again that would be concerning to replacing the disk?
This disk is on the LSI SAS 9211-8i breakout cable. their not easy cables to swap (I'd have to order new breakout cables). 
Is it advisable to replace the cable or just the disk (If the errors continue to rise)

udma crc error count returned to normal value

array health report [PASS]

smart.JPG

error.JPG

error2.JPG

I have 5 of 8 8TB WD Red Drives that have a Cycle Count of 1 with this same error..  could this error correct itself in Unraid?  or should I replace these Disks?

 

 

Screen Shot 2021-01-17 at 9.55.06 PM.png

Link to comment
14 hours ago, JorgeB said:

CRC errors are a connection problem, just acknowledge them and as long as it doesn't keep increasing your fine, if it does start by replacing the SATA cables on the affected disks.

I purchased 8 - 8TB WD Red Drives the other night for my unRAID server I built.. I put them all in a booted.. 5 of the 8 drives showed those errors with the error count of 1. that's all.. I tried to zero the drives.   I moved them around and it didn't matter which bay they were placed in they always had the error..   I even placed them in the bays that the good ones were installed into..   I would just hate to waste money on my sata cables that go from my RAID card just to find out it doesn't work..  the count doesn't seem to be changing..   when I check the smart info in Crystaldiskinfo and other tools.. it all comes out perfect?  is there anyway in unRAID to ignore the count if it stays the same and just mark the disk as healthy?

Edited by ryanleeis
Link to comment
3 minutes ago, ryanleeis said:

I purchased 8 - 8TB WD Red Drives the other night for my unRAID server I built.. I put them all in a booted.. 5 of the 8 drives showed those errors with the error count of 1. that's all.. I tried to zero the drives.   I moved them around and it didn't matter which bay they were placed in they always had the error..   I even placed them in the bays that the good ones were installed into..   I would just hate to waste money on my sata cables that go from my RAID card just to find out it doesn't work..  the count doesn't seem to be changing..   when I check the smart info in Crystaldiskinfo and other tools.. it all comes out perfect?

The CRC errors are permanently recorded in SMART. As long as you don’t get any more it should be ok.

Link to comment
10 minutes ago, ryanleeis said:

 is there anyway in unRAID to ignore the count if it stays the same and just mark the disk as healthy?

As I recall, open the Dashboard tab.  Then click on the drive with the problem.  You should be able to tell it to ignore the error.   This means it will not report the drive as having a CRC error unless the count increases. 

Link to comment
47 minutes ago, Frank1940 said:

As I recall, open the Dashboard tab.  Then click on the drive with the problem.  You should be able to tell it to ignore the error.   This means it will not report the drive as having a CRC error unless the count increases. 

Thanks I can't seem to find the option to do that.. is there an admin setting or something ?  this would be great if it would show healthy then show error again if it increases.. 

 

57 minutes ago, Taddeusz said:

The CRC errors are permanently recorded in SMART. As long as you don’t get any more it should be ok.

Thanks.. 

Link to comment
1 hour ago, ryanleeis said:

can't seem to find the option to do that

 

2 hours ago, Frank1940 said:

open the Dashboard tab

In the section in the lower right corner of the Dashboard it shows all of your disks, with a column labeled SMART which shows the health of each disk with a thumbs up or thumbs down icon. Click on the thumbs down icon and you get a popup that lets you Acknowledge the current count. It won't warn you again unless it increases.

Link to comment
  • 2 weeks later...

Hi Guys,

 

I know this is an old thread but wanted to get some advice around this topic. I have had my server running for about 2 years now, i have six drives in total.

 

My UDMA errors area s follows,

 

Parity, 1 error cant recall when,

Drive 1, 0 errors

Drive 2, 2 errors didn't keep track of when

Drive 3 3 errors NOW gone to 4 as of 1/02/21

Disk 4 0 errors

Disk 5 2 errors Gone to 3 as of 1/1/21 then to 4 on 1/2/21

 

Should i be worried or making a change at this stage?

 

Cheers

 

 

 

 

 

Link to comment

An occasional error of that type is not really something to worry about  x it just indicates a transfer failed a CRC check and got retried.  If they occur regularly then you may have a cabling issue.  Note the counts never gets reset to zero - they can only increase so if they are stable do not worry.

Link to comment
47 minutes ago, itimpi said:

An occasional error of that type is not really something to worry about  x it just indicates a transfer failed a CRC check and got retried.  If they occur regularly then you may have a cabling issue.  Note the counts never gets reset to zero - they can only increase so if they are stable do not worry.

 

Thanks for the reply,

 

So one a month not that bad?

Link to comment
  • 2 weeks later...

Hello, 

I just got a new setup using 2x4TB Seagate Ironwolf drives. One of the drive is having this flag and udma crc count is 1.

 

I understand crc can be ignored, once I acknowledge the status becomes healthy. But issue is, on each reboot the alert pops back again and I need to re-acknowledge it. 

 

Please let me know if this is intended behaviour of unraid? Or am I missing some other permanent way of acknowledgement. 

 

It's just been 2 days I got the drives and have a replacement policy for 15 days, would it be good to get the drive replaced with a new one? Just don't want to carry smart error counts from beginning. 

 

Thanks in advance :)

Edited by SaranG
Link to comment

You are not super clear about the number of errors.  Does this drive have only one error?  Or are you getting a new error on each startup?  

 

You do know that on the Dashboard, you can click on the 'thumb-down' icon and accept/acknowledge the error and you will never hear about crc errors on that drive UNLESS more crc errors occur.  (As an aside, I have a drive that has 72,300 crc errors on it.  It has been a couple of years since the last one occurred.  Remember that crc errors are not actual drive errors in 99.99% of the cases.  They are data transmission errors.  Most of the time they are a cable issue.  (In my case, I had a cheap china-made two-port SATA card that was flakey.)  However, about half of my drives have one or more crc errors.  If you are getting more than one a month on a drive, you should be investigating to find the reason. 

Edited by Frank1940
Link to comment

Hey there @Frank1940 

 

Thank you for quick reply. 

 

Attaching the current screenshot. I am considering RAW value which remains as 1. Should I bee seeing count on each reboot? 

 

I think if it is the count, I will do couple of reboots and in case it is varying will reseat the drive and do cabling. Cables are brand new but I did tie both sata cables together. I read somewhere it should not be done, so I'll get back again checking these things later on. 

 

Thanks again, have a good day! 

Screenshot_20210213-172549_Chrome.jpg

Link to comment
  • 11 months later...
  • 9 months later...

I had the same problem for a month or two, ever since I moved from FreeNAS.

Just found out that's due to a missmach for the controller. On the drive's configuration page, it was set to "Auto".

I looked and set up the correct company ( Marvell, for my old Toshiba drives ) and that fixed the errors.

 

** writing at Google's top results for "Unraid UDMA error".

You are very welcome, good fellows.

 

Link to comment
  • 10 months later...

I too have been experiencing this issue lately. I see that possible root cause (from different posts in the forums) would be cable, the SAS card itself or the hard drive....  

 

If the Card was the issue would we not see this error randomly on different drives or even or more than one drive?  Cable I can understand because of vibration and some reseating may need to be done.  If it is the quality of the cable the issue, what does the forum recommend for quality SAS cables ?

 

Thanks.

lsi 9211-8i  is the card I am presently using.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.