UDMA CRC Errors


Recommended Posts

Hi

I have a HP DL180 G6 with 18TB (9 x 2TB) of SAS Drives, 2x 2TB Red NAS SATA drives, and a SanDisk 120GB SSD Cache drive.  I have AppData set to the cache drive, as well as the downloads share (which is just a temporary intermediary location until the various apps file the downloads away to their correction locations), and I have the system drive on the Cache too.


This has been running fine for several weeks, but I found the 120GB SSD Cache drive to be too small, so I decided to upgrade it.

 

I got a Kingston 480GB SSD, installed it and got everything back up and running, but I got tons of UDMA CRC error notifications when I was downloading something.   Also, over time I noticed the AppData CA Backup/Restore backup wasn't running correctly, dockers wouldn't start up again in the early hours after scheduled maintenance, and the docker log would fill up to 100%, and then after about a week of this, this whole thing would require a reboot to get running again, after which everything ran fine again till the next morning, all the time with constant CRC error notifications on the Cache drive.  The errors are only clicking up 1 at a time, so a total of about 100 errors over a week.

 

So I reverted to the original SSD, and everything worked fine again, no CRC Errors, no docker log filling up, etc, so I figured the new SSD was faulty, so I returned that one and got another one, this time a Crucial 500GB SSD.  Installed this one and I'm getting UDMA CRC Errors again, but not to the same extent as the last SSD, and none of the server/docker errors, so far anyway.  The CRC error count is currently at about 25.

 

So, what could be the problem here?  I can't change any cables which seems to be the usual culprit of CRC errors as all the drives are on the same backplane connected to the controller via SAS cables....none of the other drives are giving CRC errors.

 

If the server continues to run fine with the second SSD, albeit with the CRC errors, can I just ignore the errors?  Is there any way to disable the constant CRC error notifications without masking other errors that might arise in the future, or do I just have to live with them?

 

Is this possibly a hardware incompatibility with my drive controller in the server and the drives themselves, and the SanDisk original SSD was fine, but the Kingston and Crucial are not?  Should I get a 500GB SanDisk SSD and try that instead of the Crucial and Kingston?

 

Thanks in advance!

 

Leo

Link to comment

Thanks for the reply.

 

If it were the backplane or cables, surely the other mechanical drives, or the original SSD would also be showing errors? When I swap back to the original SSD, there are no errors, its just with these two new SSDs.

 

There is nothing mission critical on the cache, and anything that might be lost can just be either redownloaded or recovered from the previous days back up, so I'm not overly concerned about disabling the erroring....I can always check it manually periodically to see how it's doing, I just don't really want my phone chirping every few hours when something is downloading with a notification telling me what I already know

Link to comment
14 minutes ago, darkcyde said:

If it were the backplane or cables, surely the other mechanical drives, or the original SSD would also be showing errors? When I swap back to the original SSD, there are no errors, its just with these two new SSDs.

Not necessarily, disks don't reach SSD speeds, so a backpalne that can for example handle 200MB/s without issues might not handle 500MB/s, also some devices are pickier than others with connection quality, the problem isn't the SSDs for sure.

Link to comment

Yeah, you could be right there....looking at the specs on the original SSD, they have the speeds as 180MB/s read, and 133MB/s write, whereas the new drives are upwards of 530MB/s, so it's probably the backplane not handling the throughput very well.

 

I will just disable the error reporting and see how things go.....as longs as it doesn't go back to the whole thing crashing down like it did before, I'll stick with the latest drive and just disable the CRC error report.

 

Thanks for your help!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.