ChaOConnor Posted March 6, 2018 Share Posted March 6, 2018 Ever since upgrading to the latest version, I’m getting udma crc error count on my of my drives. I reseated the cables, didn’t work. I bought new SAS to SATA cables, still getting them. It’s happening to every drive. Could the Dell Perc cars be bad? Thanks! Sent from my iPhone using Tapatalk 1 1 Quote Link to comment
S80_UK Posted March 6, 2018 Share Posted March 6, 2018 (edited) Chances are is just the updated reporting since 6.4.1. Once acknowledged the error counts are known and are only flagged again if they go up. The errors that you're seeing now are most likely historic and nothing to worry about. See this thread and also the 6.4.1 release notes... Edited March 6, 2018 by S80_UK Quote Link to comment
ChaOConnor Posted March 6, 2018 Author Share Posted March 6, 2018 Thank you! So I think it's still ticking up. That's why I think there is something wrong. All those drives can't possibly bad, so I'm "guessing" it's the SAS card? That is the only common denominator after I swapped the SAS cables. Could the Dell PERC SAS card be the culprit for this? OBTW, I originally thought I had a bad PSU so I swapped that out too. Thanks! Quote Link to comment
Frank1940 Posted March 6, 2018 Share Posted March 6, 2018 Please, read before you do anything about your current hardware. CRC errors only indicate a problem if they are increasing now! Read this post: If you acknowledge the error, you should not get another warning UNLESS you have a new error. As @johnnie.black says even an occasional one is not a problem. In fact, you should never lose data because of a CRC error. Once the error is detected, the data will simply be resent until it received correctly. (I still have not quite figured out why they decided to highlight what is a minor problem in virtually all cases.) 1 Quote Link to comment
JorgeB Posted March 6, 2018 Share Posted March 6, 2018 5 hours ago, ChaOConnor said: So I think it's still ticking up. If they are still increasing then there's still a problem, if it's not the cables the controller is the next candidate. Quote Link to comment
Frank1940 Posted March 6, 2018 Share Posted March 6, 2018 Be sure that you have not tied the sata cables together to make it 'neat' on the inside of the case. You should only do this if the cables are shielded and 99.9% of SATA cable made today are not! I am not sure if SAS cables are shielded or not. The problem is crosstalk between the cables. (This is a problem with any server because all of cables will have information on them during many parity operations.) Next thing to double check is that all of the cables are firmly seated on their connectors (again, tying the cables together can contribute to a problem here.) If you are using metal locking cables, check that they have friction when you pull on the drive end. For reason, see here (And WD may not be the only manufacturer to have done this...): https://support.wdc.com/knowledgebase/answer.aspx?ID=10477 1 Quote Link to comment
ChaOConnor Posted March 12, 2018 Author Share Posted March 12, 2018 So, fingers crossed it was the controller. I put a new SAS to SATA controller in and I haven't had an error tick up in 3 days. Thank you! 1 Quote Link to comment
Vr2Io Posted March 13, 2018 Share Posted March 13, 2018 Seems several post report those error even with LSI controller ( not all confirm it is controller problem, may be fake card with bad componet build ) Quote Link to comment
yanakis Posted March 30, 2018 Share Posted March 30, 2018 (edited) I also have a LSI controller (9211-8i) and I have CRC errors on 5 drives out of 8....and counting? Edited March 30, 2018 by yanakis Quote Link to comment
JorgeB Posted March 31, 2018 Share Posted March 31, 2018 10 hours ago, yanakis said: I also have a LSI controller (9211-8i) and I have CRC errors on 5 drives out of 8....and counting? If the number of errors keeps increasing there's a problem. Quote Link to comment
evans036 Posted April 8, 2018 Share Posted April 8, 2018 i have the same issue with a handful of drives (ie numbers continue to increase). i have even moved some to a diffferent controller (and therefore different cables) to no avail. i think i'll just have to turn off notification since it is flooding my email with these alerts. steve Quote Link to comment
JorgeB Posted April 8, 2018 Share Posted April 8, 2018 29 minutes ago, evans036 said: i think i'll just have to turn off notification That's a bad idea, if it keeps increasing there's still a problem. Quote Link to comment
pwm Posted April 8, 2018 Share Posted April 8, 2018 2 hours ago, evans036 said: i think i'll just have to turn off notification since it is flooding my email with these alerts. It's good that it's flooding your mail - the very clear message to you is that you need to do something to solve the original problem. You still have issues with controllers and/or disks and/or cables and/or the PSU - and the SMART data keeps ticking up because it isn't a problem that is safe to ignore. When the counter ticks up, you can be lucky that the drive caught the transfer error - the scary thing is that some transfers may contain errors that the drive doesn't pick up. That means you read out - or write down - incorrect data. Quote Link to comment
Frank1940 Posted April 8, 2018 Share Posted April 8, 2018 Folks really need to understand that what a CRC error is that the data going in one end of a wire is not coming out of the other end of the wire with the identical data. It is not a software failure! It is a hardware issue! And it is relativity uncommon, but it is not totally unexpected. (Otherwise, there would be no need for checking using CRC!) The error itself is correctable, and that is always done. Better than 90% of the time, multiple error situations are cable related--Loose cables or crosstalk between cables being common causes. There could be problems in the decoding and checking hardware but again that is rare and can be a bear to figure out exactly where it is. (I have had a flakey SATA expansion card in the last six months that threw up a couple of hundred CRC errors in a few hours as well as an End-to-End error. So that possibility must always be kept in mind.) For these reasons, each one of you who has a problem should post up in a new thread about the issues in your system. Keeping track of several different investigations with only a slight chance of any two have the same cause and solution is very confusing. 2 Quote Link to comment
trurl Posted April 8, 2018 Share Posted April 8, 2018 28 minutes ago, Frank1940 said: For these reasons, each one of you who has a problem should post up in a new thread about the issues in your system. Keeping track of several different investigations with only a slight chance of any two have the same cause and solution is very confusing. Sorry I let this get out of hand. This thread belongs to the OP, ChaOConnor. Anybody else who wants help with this should start a separate thread for themselves. Quote Link to comment
eweitzman Posted April 16, 2018 Share Posted April 16, 2018 On 3/5/2018 at 7:54 PM, Frank1940 said: If you acknowledge the error, you should not get another warning UNLESS you have a new error. As @johnnie.black says even an occasional one is not a problem. In fact, you should never lose data because of a CRC error. Once the error is detected, the data will simply be resent until it received correctly. (I still have not quite figured out why they decided to highlight what is a minor problem in virtually all cases.) I skipped 6.4.1 and went to 6.5 and have two drives with crc errors. Would you mind describing how to acknowledge them? I see the crc counts in the smart listing on the affected drives' pages highlighted in orange, but there are no buttons, etc, to acknowledge the error. Thanks, - Eric Quote Link to comment
itimpi Posted April 16, 2018 Share Posted April 16, 2018 3 minutes ago, eweitzman said: I skipped 6.4.1 and went to 6.5 and have two drives with crc errors. Would you mind describing how to acknowledge them? I see the crc counts in the smart listing on the affected drives' pages highlighted in orange, but there are no buttons, etc, to acknowledge the error. Thanks, - Eric If you click on the orange icon in the Dashboard page there is an option to acknowledge the value there. 2 Quote Link to comment
Taddeusz Posted June 8, 2018 Share Posted June 8, 2018 (edited) I'm on 6.5.2 and a week ago I started getting a lot of these errors and then write errors which offlined my newest drive, only in service a month. I replaced the drive and have been fine until now I received 1 UDMA CRC error with the brand new drive. Diagnostics on the "old" new drive passed on a different system with only 3Gb/s SATA. In fact, I just RMA'd the drive and shipped it today. I had actually tried to put it back in service but as soon as unRAID booted back up it started to get UDMA CRC errors again on a different tray. I'm running an IBM M1015 reflashed to an LSI 9211-8i. First time I've ever seen this kind of thing. Since this is something that all of a sudden just started happening I'm not sure what to do. The two drives I've been having issues with are in a 4 slot drive cage. The other 5 drives are direct connected. The drives I've been having trouble with are new old stock HGST 7K4000 2TB drives. Not sure whether the cage or the controller or SAS to 4xSATA cable could be the issue? I have a shorter cable I could try. For now it's just a single error so nothing has gone offline again. This is still troubling. Edited June 8, 2018 by Taddeusz Quote Link to comment
JorgeB Posted June 8, 2018 Share Posted June 8, 2018 13 minutes ago, Taddeusz said: Not sure whether the cage or the controller or SAS to 4xSATA cable could be the issue? I have a shorter cable I could try. Try the cable first, if it's not that then the cage is the next most likely candidate. Quote Link to comment
Taddeusz Posted June 8, 2018 Share Posted June 8, 2018 51 minutes ago, johnnie.black said: Try the cable first, if it's not that then the cage is the next most likely candidate. I guess I have some digging to do this weekend. Quote Link to comment
Seige Posted June 8, 2018 Share Posted June 8, 2018 How is you M1015 installed? These controller gets very warm, even under idle. With rising ambient temperatures this could also be the culprit. I would highly recommend installing the Noctua NF-A4x10 FLX on top of the heat sink. If you are up for it replace the thermal compound with something better. Mine was dried up and flaky. I used a small cable tie to hold the fan in place, but others are using screws (image is not from a M1015, but results are pretty much the same) : I leave it running at full speed, it is rather quiet and the heat sink is cool to the touch, even under load. Try the cables first, as suggested by @johnnie.black, but maybe also consider installing the fan. Quote Link to comment
Taddeusz Posted June 8, 2018 Share Posted June 8, 2018 I know that card gets hot. It just has a heatsink on it. I'll look at putting a fan on. I'll at least put some fresh thermal compound on there. It could probably use it. Quote Link to comment
Taddeusz Posted June 10, 2018 Share Posted June 10, 2018 Well, had my newest drive offlined Friday night. Spent a lot of yesterday troubleshooting. I've come to the conclusion that there's something wrong with the backplane in my drive cage. It's a Rosewill RSV-SATA-Cage-34. I removed the backplane and rigged up some fans for cooling. Direct connected all the drives. It ran a rebuild just fine with no more UDMA CRC errors. I tried to replace the thermal compound on my SAS card but they used thermal glue so it's not coming off. I've got a couple drive cages coming but they don't have backplanes. Also Rosewill but really low cost. Quote Link to comment
bombz Posted July 30, 2018 Share Posted July 30, 2018 Any thoughts on my post? unRAID Disk 10 SMART health [199] Warning [UNRAID] - udma crc error count is 8731 WDC_WD20EARS I have done cable replacements, the drive itself has 12GB free, could that be a prob? The drive is old. Quote Link to comment
Frank1940 Posted July 31, 2018 Share Posted July 31, 2018 (edited) 4 hours ago, bombz said: Any thoughts on my post? unRAID Disk 10 SMART health [199] Warning [UNRAID] - udma crc error count is 8731 WDC_WD20EARS I have done cable replacements, the drive itself has 12GB free, could that be a prob? The drive is old. Both of your posts in both threads are a bit out of the wild blue wonder. (You should really have started a new thread...) The basic question that always has to be asked for any SMART 199 errors, it this: Is the count continuing to increase? The reason being that this counter can never be reset. The Error itself is a nuisance more than anything else. The error will always be automatically fixed by requesting that the data be resent until CRC code is correct. The only real problem is that it slows down the data transfer rate. If it is an occasional error, this delay is insignificant in the bigger picture. Unfortunately, you will get a warning every time unRAID reboots (and, as I recall, with the periodic status reports) unless you turn them off. (Where to do this at the moment escapes me but I think it was on the Dashboard page...) Edited July 31, 2018 by Frank1940 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.