Jump to content

(Solved) UDMA CRC Errors from only 1 disk from LSI Broadcom SAS 9300-8i


Go to solution Solved by JorgeB,

Recommended Posts

Hi all.

 

Just curious if anyone could shed some light on why I'm getting UDMA CRC errors from only one disk, a WD Red Plus 10TB, attached to an LSI Broadcom 9300-8i SAS HBA card in IT mode. I'm using two CableCreation Internal HD Mini SAS (SFF-8643 Host) cables to attach them to seven disks. I don't remember what firmware is on the card, but I flashed the card about 2 years ago when I built the system.

 

I forgot to screenshot the logs of the errors, but they are coming in only after the disks are awakened for a SMART check. 

I rebooted the system and more errors came. The errors were at 777 and now they're at 879. No other disk is showing these errors. 

 

I have yet to power down the system, unplug every cable, reseat the HBA card, then reattach the cables, making sure they're not kinked and away from power cables. I will do that at some point. 

 

So I'm wondering if I should buy new cables, because I did have some of these same errors when building the system, but thought they were bad disks and got the disks replaced. The replacement disks didn't show any of these UDMA CRC errors. 

 

Any help would be appreciated. 

Thanks for reading.

Chris

 

Unraid 6.9.2

System specs are in my signature

Screenshot 2022-03-20 181226.jpg

Screenshot 2022-03-20 181404.jpg

threadripper19-diagnostics-20220320-1814.zip threadripper19-smart-20220320-1815.zip

Edited by FQs19
Topic Solved
Link to comment

First, read this article:

 

        https://support-en.wd.com/app/answers/detail/a_id/15954

 

You are using a WD disk with the problem.  The picture of the cable that you say you are using has the metal locking tabs on it.  So check it out the situation from that standpoint.  When you pull on the cable, you should not be able to move it with a moderate amount of force if the locking mechanism is working.

 

Second, if you are considering purchasing new cables, look for the half-meter ones unless you absolutely require the extra length.  That extra four meters of cable inside a case is noting but a set of different problems.  Any one of which can cause CRC errors...  Here is a listing for a half meter cable.

 

       https://www.amazon.com/dp/B013G4EMH8/ref=sspa_dk_detail_0?psc=1&pd_rd_i=B013G4EMH8&pd_rd_w=eKGhZ&pf_rd_p=57cbdc41-b731-4e3d-aca7-49078b13a07b&pd_rd_wg=TAt6U&pf_rd_r=KCA29QC3ASN0EGJ4P8R3&pd_rd_r=b08db1ca-1d4c-40ab-8c74-071b8bdcde98&s=pc&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFMRFA2MUpSRjdEMkQmZW5jcnlwdGVkSWQ9QTA3OTM1MjgyRENJMFU2TVA0V0w2JmVuY3J5cHRlZEFkSWQ9QTA3MTcyNzEzTUc4QUZSUUpUNUdBJndpZGdldE5hbWU9c3BfZGV0YWlsX3RoZW1hdGljJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ==

 

  • Like 1
Link to comment
8 minutes ago, Frank1940 said:

First, read this article:

 

        https://support-en.wd.com/app/answers/detail/a_id/15954

 

You are using a WD disk with the problem.  The picture of the cable that you say you are using has the metal locking tabs on it.  So check it out the situation from that standpoint.  When you pull on the cable, you should not be able to move it with a moderate amount of force if the locking mechanism is working.

 

Second, if you are considering purchasing new cables, look for the half-meter ones unless you absolutely require the extra length.  That extra four meters of cable inside a case is noting but a set of different problems.  Any one of which can cause CRC errors...  Here is a listing for a half meter cable.

 

       https://www.amazon.com/dp/B013G4EMH8/ref=sspa_dk_detail_0?psc=1&pd_rd_i=B013G4EMH8&pd_rd_w=eKGhZ&pf_rd_p=57cbdc41-b731-4e3d-aca7-49078b13a07b&pd_rd_wg=TAt6U&pf_rd_r=KCA29QC3ASN0EGJ4P8R3&pd_rd_r=b08db1ca-1d4c-40ab-8c74-071b8bdcde98&s=pc&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFMRFA2MUpSRjdEMkQmZW5jcnlwdGVkSWQ9QTA3OTM1MjgyRENJMFU2TVA0V0w2JmVuY3J5cHRlZEFkSWQ9QTA3MTcyNzEzTUc4QUZSUUpUNUdBJndpZGdldE5hbWU9c3BfZGV0YWlsX3RoZW1hdGljJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ==

 

Thanks for the response.

The WD drives are attached to a backplane inside my Rosewill 4U Server Chassis. So the connection shouldn't be a problem. Hopefully.

 

I agree with the long lengths though. At the time, I purchased the longer ones because I wasn't sure what length to get. 

I will absolutely get the .5 meter length ones.

 

Thanks for the listing.

Link to comment

@Frank1940

 

I finally got new .5 meter cables, installed them, and after a while I got a UDMA CRC error on a completely different disk; Disk #6. I didn't change what cable connects to which disk. I also blew out all dust I could find in my system then re-inserted every cable firmly. I even pulled my HBA card and did the same thing. 

The error count only went up to 2. 

The other disk has yet to get another UDMA CRC error, so at least that one is good now. 

 

I'm also getting these errors:

Mar 25 17:41:35 ThreadRipper19 kernel: sd 8:0:4:0: [sdj] tag#1716 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
Mar 25 17:46:34 ThreadRipper19 kernel: sd 8:0:4:0: [sdj] tag#1666 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 cmd_age=0s

No idea what those are. I've never seen that error before. 

 

Do you think I have some bad ram modules?

Or maybe I should re-flash my HBA card with the latest firmware?

 

I'm at a loss at this point. 

There's one other thing that is strange on my system. The Plex Streams app is loading extremely slowly. Before, the page would populate within 3 seconds. Now, I'm waiting almost 30-45 seconds.

 

Do you have any idea what's going on?

 

Thanks for your help.

Chris

Screenshot 2022-03-25 181142.jpg

Screenshot 2022-03-25 181320.jpg

threadripper19-diagnostics-20220325-1812.zip threadripper19-smart-20220325-1805.zip

Link to comment
  • Solution
13 hours ago, FQs19 said:

and after a while I got a UDMA CRC error on a completely different disk

If it's happening with multiple disks, and since and understand the cables were replaced number one suspect would be the backplane, after that the controller itself, not the firmware, AFAIK that firmware has no known issues, and it's same I'm using, though if it it was the controller I would expect errors in all disks.

Link to comment
20 minutes ago, JorgeB said:

If it's happening with multiple disks, and since and understand the cables were replaced number one suspect would be the backplane, after that the controller itself, not the firmware, AFAIK that firmware has no known issues, and it's same I'm using, though if it it was the controller I would expect errors in all disks.

Thanks for looking into this problem of mine. 

If I understand you correctly, you believe it's either the backplane in my Rosewill chassis or the LSI card itself?

 

And you don't think it has anything to do with my ram modules, correct?

Link to comment
2 minutes ago, FQs19 said:

you believe it's either the backplane in my Rosewill chassis or the LSI card itself?

Yes, most likely the backplane.

 

2 minutes ago, FQs19 said:

And you don't think it has anything to do with my ram modules, correct?

No, RAM won't cause UDMA_CRC errors, they are a communication problem, can only be the controller, cable/connection or the disks, disks is very rare.

 

Link to comment
13 minutes ago, JorgeB said:

Yes, most likely the backplane.

 

No, RAM won't cause UDMA_CRC errors, they are a communication problem, can only be the controller, cable/connection or the disks, disks is very rare.

 

My Rosewill case has 12 hot swappable bays. So I'm going to remove every disk, re-tighten all screws that attach the disks to the drive sleds, blowout any dust in the backplane connections, then re-insert each disk, making sure they seat firmly. 

 

I'll do the same with the cables from my HBA to the backplane. I'll also try and arrange my power cables so that they don't run parallel to the data cables. I don't think any are, but I can clean up my cabling a little. 

13 minutes ago, JorgeB said:

Yes, most likely the backplane.

 

No, RAM won't cause UDMA_CRC errors, they are a communication problem, can only be the controller, cable/connection or the disks, disks is very rare.

 

Just for giggles, I'm going to test my ram with memtest86 to see if it shows any errors. I've been thinking about getting ECC ram, but didn't want to spend another $300-400 on memory. 

13 minutes ago, JorgeB said:

Also note that if you get one a day or so it's not ideal but not a big deal, constant errors are more of a problem since they will impact performance.

I have yet to get an error on the first disk that gave me errors, but the errors went up to ~1500 before I could replace the cables. 

 

The errors on this second disk went up to 2 after changing the cables. No other errors since. 

 

Thank you again for the help. 

I've spent a ton of money and time on this server, so it's just nerve racking getting all these errors randomly. 

 

My luck, one of my original cables was bad and I one of my new cables is bad. Lol 

Link to comment
3 minutes ago, Frank1940 said:

 

Check the user reviews on vendor sites that sell this Rosewill case and see if there are any reports of problems similiar to what you are having.

That's a great idea. Thanks. 

I was going to contact Rosewill to see if they had replacement backplanes, but looking up reviews first is even easier. 

Link to comment
20 hours ago, Frank1940 said:

 

Check the user reviews on vendor sites that sell this Rosewill case and see if there are any reports of problems similiar to what you are having.

Like you thought, most of the reviews of my Rosewill chassis mention the hot swap bays aren't the best. 

I might try and contact Rosewill to see if they'll replace them. 

If they don't I'll look into removing the bays and putting in cages without a backplane. Otherwise, I'll start my search for another case. 

 

Thank you both, @JorgeB, for your help. I'll mark this as solved with the most likely culprit being the Rosewill case's backplane. I'll add a photo of my server when I get home. 

Link to comment
  • FQs19 changed the title to (Solved) UDMA CRC Errors from only 1 disk from LSI Broadcom SAS 9300-8i

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...