firrae Posted March 18, 2020 Share Posted March 18, 2020 Hi there, After digging around on Google and the forums I believe the issues with my array come down to the issue that I am getting UDMA CRC errors on a number of my drives, but honestly I'm not sure where to begin looking at the cause. In my eyes, and from reading, I believe it could be one or a combination of 3 things: My SAS to SATA cables (maybe they are cross-talking and the likely candidate?) - I've tried 2 different brands but still get the issue, though both brands the cables looked the same, just slightly different colours. - https://www.amazon.ca/gp/product/B0736J45V2/ My drive cages: I have a Rosewill RSV-L4412 which came with 3 drive cages (can't remember the part number for them) - https://www.rosewill.com/product/rosewill-rsv-l4412-4u-rackmount-server-case-or-chassis-12-sata-sas-hot-swap-drives-5-cooling-fans-included/ My SAS controller which is a Fujitsu (?) card flashed to be an LSI 9211-8i in "IT" mode At this point I believe the cables but I'd be interested in hearing what others think. 8 of my disks use these breakout cables as the way they connect, the other 4 go directly to the motherboard SATA ports. What I find interesting is it seems like the drives on these breakout cables have the issue much worse, though this is only so far a short term observation since I read about this, and the cage that's wired directly currently only has 3 drives in it, the rest are fully loaded with 4. I'm curious if people think I'd be better served with which of the potential options to try and solve this: Get different breakout cables. Get new drive cages. change out the controller. In any case I'd be interested in seeing the recommendations people have on this. This all comes from my seeing what I think are VERY high read error counts as I'm rebuilding my array after changing out a drive. Attached is my diagnostics file from the server. Its in the middle of building that drive as I mentioned, so whatever decision I make I'm a couple of days away from actually implementing at least assuming I can eve get the parts to do it at this point. I'm interested to see what people think. Thanks! tower-diagnostics-20200318-1415.zip Quote Link to comment
firrae Posted March 18, 2020 Author Share Posted March 18, 2020 Quick update. After I finished writing this I noticed that my SATA cable based drives were also getting these errors, but not all of them. The Parity drive is reporting 0 over its entire life, but the drive nearest it is showing an increasing, but slower than other drive on the SAS to SATA breakouts, number of CRC errors. This maybe leads to a combination of the cables and the cages? I really don't know at this point. Quote Link to comment
JorgeB Posted March 18, 2020 Share Posted March 18, 2020 Mar 18 13:03:31 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.00.00), ChipRevision(0x03), BiosVersion(07.39.00.00) Update the LSI firmware to 20.00.07.00, it's a known issues with that one. 1 Quote Link to comment
firrae Posted March 18, 2020 Author Share Posted March 18, 2020 @johnnie.black is that a newer issue or has it been that way since before v6? I only got into UNRAID as v6 was launching and don't remember there being an issue until recently? Quote Link to comment
JorgeB Posted March 18, 2020 Share Posted March 18, 2020 Always, but they weren't reported before, the attribute wasn't monitored. Quote Link to comment
firrae Posted March 18, 2020 Author Share Posted March 18, 2020 (edited) 4 minutes ago, johnnie.black said: Always, but they weren't reported before, the attribute wasn't monitored. Interesting. Otherwise, if you don't me asking, but does the system look fine at a cursory glance? I do still have UNRAID reporting high read errors as well. Other than the glaring "its rebuilding a drive" thing of course. Edited March 18, 2020 by firrae Quote Link to comment
JorgeB Posted March 18, 2020 Share Posted March 18, 2020 15 minutes ago, firrae said: I do still have UNRAID reporting high read errors as well. Likely related to the same issue, do the firmware upgrade first and see how it goes, BTW no point in rebuilding a disk with multiple disk errors. Quote Link to comment
firrae Posted March 18, 2020 Author Share Posted March 18, 2020 6 minutes ago, johnnie.black said: BTW no point in rebuilding a disk with multiple disk errors. What would be the path forward do you think then? I'm not sure what I should do. I have multiple disks reporting read errors, but none show issues other than the CRC errors in SMART. Should I stop the rebuild, flash the firmware, and then... what? Rebuild if a parity check goes well? Quote Link to comment
JorgeB Posted March 18, 2020 Share Posted March 18, 2020 Cancel the rebuild, update firmware, start rebuild again and see if the read errors go away. Quote Link to comment
firrae Posted March 18, 2020 Author Share Posted March 18, 2020 OK, will try that. Thanks for pointing me in a direction @johnnie.black! Quote Link to comment
firrae Posted March 18, 2020 Author Share Posted March 18, 2020 Well, I can't thank you enough @johnnie.black! After updating the firmware I see 0 read errors and the CRC errors have completely stopped increasing on all drives. Thanks a bunch for pointing this out, I would NEVER have thought of this being an issue. Quote Link to comment
Rene Posted March 26, 2020 Share Posted March 26, 2020 On 3/18/2020 at 2:29 PM, johnnie.black said: Mar 18 13:03:31 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.00.00), ChipRevision(0x03), BiosVersion(07.39.00.00) Update the LSI firmware to 20.00.07.00, it's a known issues with that one. Can you tell me how to update the LSI firmware, I have the same issue. Thank you Quote Link to comment
JorgeB Posted March 26, 2020 Share Posted March 26, 2020 1 minute ago, Rene said: Can you tell me how to update the LSI firmware, Download the latest firmware package from Broadcom's support site then boot with a DOS flash drive and follow the instructions included with the package. Quote Link to comment
Rene Posted March 26, 2020 Share Posted March 26, 2020 3 minutes ago, johnnie.black said: Download the latest firmware package from Broadcom's support site then boot with a DOS flash drive and follow the instructions included with the package. ok thank you Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.