bkastner Posted October 15, 2022 Share Posted October 15, 2022 For some reason I've been having an issue since May or so. I had a drive fail, and got a RMA replacement sent from WD. I did a preclear on another machine, shut down UnRAID, swapped the drive and brought UnRAID back up to start the rebuild. However, as the rebuild started I started getting udma crc error count errors on a drive that previously reported no issues. I figured it was just a fluke that a second drive failed while rebuiding, but once the rebuild was done I RMAd the new drive and repeated the process. Then another drive failed during that build with the same thing, and this just keeps happening over and over. I am now rebuilding my 5th or 6th drive, and again, I am getting a ton of udma errors (Disk 3 (disk dsbl) in the logs). While all my previously failed drives were bought a couple of years ago, the new Disk3 drive with errors is a WD Gold I just bought back in May (just before this all started). I did preclear it with no issue originally. I don't understand what's going on, but am hoping someone can take a look at my diagnostics and provide some insights. This is happening far too frequently for me to think it's actually drive failure after failure, but I could be wrong. I know sometimes rebooting will clear the crc errors, but I'd like to try and understand root cause and see if I can do something more permanent to fix it. cydstorage-diagnostics-20221015-1520.zip Quote Link to comment
Solution JorgeB Posted October 16, 2022 Solution Share Posted October 16, 2022 Disk3 dropped offline so there's no SMART, but looks more like a power/connection problem, check/replace cables and post new diags. 13 hours ago, bkastner said: I started getting udma crc error count errors on a drive that previously reported no issues. These are not a disk problem, usually it's bad SATA cable, you should also update the LSI firmware since it's quite old. Quote Link to comment
bkastner Posted October 17, 2022 Author Share Posted October 17, 2022 Thank you. I've ordered replacement cables and will check the LSI firmware and see how things progress Quote Link to comment
bkastner Posted October 18, 2022 Author Share Posted October 18, 2022 lol.. sorry. I didn't realize I'd sent that. I was going to ask about the lsi flash. I had the files on the flash drive, but couldn't make them executable, so was going to ask you, but figured I could create a MS DOS boot USB with the files and do it that way. 1 Quote Link to comment
bkastner Posted October 18, 2022 Author Share Posted October 18, 2022 Okay, so I've flashed the firmware (what a pain in the ass that was), and I've replaced all my sff-8643 cables, and brought the system back up. Everything seems like it's better. The last drive that was screaming at me as 130050 UDMA CRC errors, but seems to not be moving. When I started rebuilding the last failed drive the CRC errors on this drive were skyrocketing, so I am guessing them remaining stable now is a good sign. Is there an easy way to test? I've browsed the drive through the GUI as I figured that would cause a read operation which would maybe cause the CRC errors to climb, but it's still the same. I also ran a short SMART test and it came back without error. Is it fairly safe to assume that it was a controller/cable issue, and I shouldn't have any more drive failures? Or is there another test I should run before considering this case closed? Quote Link to comment
JorgeB Posted October 19, 2022 Share Posted October 19, 2022 Run a non correcting parity check, that's a good stress test. Quote Link to comment
bkastner Posted October 19, 2022 Author Share Posted October 19, 2022 I have 2 drives on that particular backplane... the one with 130000 CRC errors (disk3), and the second drive that has 1 random one (disk8). Disk3 isn't in the array anymore and is being simulated. I had to do a parity check over the weekend as I inadventenly caused an unclean shutdown, and I didn't get any CRC errors on Disk8 during the check. The 130000 CRC errors on Disk3 occured while UnRAID was rebuilding the previously failed drive, but disk8 just had that one minor hiccup. Is a parity check still worthwhile in this scenario? I am guessing not, but want to confirm in case I am missing something. Quote Link to comment
JorgeB Posted October 19, 2022 Share Posted October 19, 2022 If you already did a check without issues it should be solved. Quote Link to comment
bkastner Posted October 19, 2022 Author Share Posted October 19, 2022 Perfect. Thank you for your help Jorge 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.