Extreme UDMA CRC Error Counts


Recommended Posts

Hi there,


After digging around on Google and the forums I believe the issues with my array come down to the issue that I am getting UDMA CRC errors on a number of my drives, but honestly I'm not sure where to begin looking at the cause. In my eyes, and from reading, I believe it could be one or a combination of 3 things:

  1. My SAS to SATA cables (maybe they are cross-talking and the likely candidate?) - I've tried 2 different brands but still get the issue, though both brands the cables looked the same, just slightly different colours. - https://www.amazon.ca/gp/product/B0736J45V2/
  2. My drive cages: I have a Rosewill RSV-L4412 which came with 3 drive cages (can't remember the part number for them) - https://www.rosewill.com/product/rosewill-rsv-l4412-4u-rackmount-server-case-or-chassis-12-sata-sas-hot-swap-drives-5-cooling-fans-included/
  3. My SAS controller which is a Fujitsu (?) card flashed to be an LSI 9211-8i in "IT" mode

 

At this point I believe the cables but I'd be interested in hearing what others think. 8 of my disks use these breakout cables as the way they connect, the other 4 go directly to the motherboard SATA ports. What I find interesting is it seems like the drives on these breakout cables have the issue much worse, though this is only so far a short term observation since I read about this, and the cage that's wired directly currently only has 3 drives in it, the rest are fully loaded with 4.

I'm curious if people think I'd be better served with which of the potential options to try and solve this:

  1. Get different breakout cables.
  2. Get new drive cages.
  3. change out the controller.

In any case I'd be interested in seeing the recommendations people have on this.

This all comes from my seeing what I think are VERY high read error counts as I'm rebuilding my array after changing out a drive. Attached is my diagnostics file from the server. Its in the middle of building that drive as I mentioned, so whatever decision I make I'm a couple of days away from actually implementing at least assuming I can eve get the parts to do it at this point.

I'm interested to see what people think. Thanks!

tower-diagnostics-20200318-1415.zip

Link to comment

Quick update. After I finished writing this I noticed that my SATA cable based drives were also getting these errors, but not all of them. The Parity drive is reporting 0 over its entire life, but the drive nearest it is showing an increasing, but slower than other drive on the SAS to SATA breakouts, number of CRC errors. This maybe leads to a combination of the cables and the cages? I really don't know at this point.

Link to comment
4 minutes ago, johnnie.black said:

Always, but they weren't reported before, the attribute wasn't monitored.

Interesting.

Otherwise, if you don't me asking, but does the system look fine at a cursory glance? I do still have UNRAID reporting high read errors as well. Other than the glaring "its rebuilding a drive" thing of course.

Edited by firrae
Link to comment
6 minutes ago, johnnie.black said:

BTW no point in rebuilding a disk with multiple disk errors.

What would be the path forward do you think then? I'm not sure what I should do. I have multiple disks reporting read errors, but none show issues other than the CRC errors in SMART. Should I stop the rebuild, flash the firmware, and then... what? Rebuild if a parity check goes well?

Link to comment
On 3/18/2020 at 2:29 PM, johnnie.black said:

Mar 18 13:03:31 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.00.00), ChipRevision(0x03), BiosVersion(07.39.00.00)

 

Update the LSI firmware to 20.00.07.00, it's a known issues with that one.

Can you tell me how to update the LSI firmware, I have the same issue.

Thank you 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.