February 16, 201610 yr I just changed to a different motherboard the other day and all of a sudden i'm starting to get BadCRC errors, the log is showing "interface failure errors" on one of my drives. I've got 1 drive now showing UDMA CRC error count of 5. I've re-seated the cable and have also tried changing to a different reverse breakout cable to my backplane. Should I blame this on a bad onboard SATA controller and send the motherboard back for replacement? I really don't want to be stuck with it if there is a chance it is bad. I still have about 20 days left for a refund. I just find this too much of a coincidence. My other board was performing with no problems and I was having no issues. It's just the onboard SATA causing problems. I've got a IBM 1015 with a bunch of other drives running from it and they are all fine. Thank you!
February 16, 201610 yr Community Expert If at all possible, I would switch back to the old MB and see if the problem disappears. If that is not possible, do an immediate exchange for a replacement motherboard.
February 16, 201610 yr Author Unfortunately my other board has already been sold. I will call newegg and ask for a replacement MB (Supermicro X10SLH-F). Hopefully this fixes things. If not, I will try switching backplanes next. Thank you!
February 16, 201610 yr I had this same problem recently and went nuts trying to figure it out. For me it happened with my two WD Green data drives (both WD20EZRX). The errors were really intermittent. One thing that would always trigger them was running a check with the dynamix file integrity plugin, I'm assuming because of the nature of that workload. First I swapped all my sata cables with brand new ones, still had the problem. Then I swapped the backplane, no help. Going further, I plugged the drives straight into the motherboard (Asrock H97 Pro4)...and no luck. The thought of RMA'ing the board was a nightmare, so I stepped away from it for a while. After more research and troubleshooting, it turned out to be a combination of NCQ, WD Green drives, and the Intel controller. The green's just didn't like having NCQ enabled on the motherboard controller. My WD Black and Seagate were fine. I ran with NCQ disabled for a few days, no errors in the log. The write speeds were worse with NCQ disabled, so I grabbed an extra Dell Perc H310 card from work and flashed it to the lsi IT firmware to turn it into an HBA. I plugged all my drives into the card and enabled NCQ. It's coming up on 2 weeks error free this way with great performance. Long winded, I apologize. Try messing around with the Force NCQ Disabled setting.
February 17, 201610 yr Author Thank you for posting that beardedpants. So this might be just an intel controller issue. I honestly don't think my motherboard is bad but i'm going to change it anyway since I already have a replacement on the way. Those BadCRC error warnings in the log seem to have gone away, at least for now after I shut everything down and rebooted the server. I've only seen a couple of them so far so it's hopefully not a big deal. The WD red 4TB drive that's showing 5 CRC errors is like over a year old so it's not really that bad for it being that old but i'm pretty sure all those errors happened recently after I changed my motherboard. I ran a full SMART test on the drive today and it's fine. I'm probably going to just start running LSI/IBM 1015 cards for my 24 array drives and save the onboard intel ports for SSD cache and vm drives.
February 17, 201610 yr Author Got more errors last night. See attached log file error .txt. SMART shows UDMA CRC jumped from 5 to 7 on that drive That's it!!!! I'm shutting everything down and all my array drives are going on the LSI controllers!!! error_message.txt error_message.txt
February 17, 201610 yr To be clear, a BadCRC error is a communication error, subsequently retried successfully. It usually indicates an issue with the cables and connections between the drive and controller. In my experience, the most common cause is a bad SATA cable, followed by defective connectors, and lastly (and rarely) something wrong or inconsistent with the power to the drive. That means there's nothing wrong with the drive or the controller (unless it's a defective connector on the motherboard). The BadCRC flag is raised when there's a corrupted packet received by the controller from the drive. The UDMA CRC count increases when there's a corrupted packet received by the drive from the controller. It's not a problem if you are seeing one of either type perhaps once a year. It does indicate a small problem, if you're seeing more than that. However, corrupted packets are detected (by the CRC calc/test), then retried until successfully sent, so there's no danger of data corruption. You could safely live with it, although performance may be impacted slightly by the delays.
February 17, 201610 yr Author Thanks RobJ. I have 4 drives connecting to the onboard Intel ports (4 SATA ports to SAS backplane - reverse breakout cable) . Only 1 of those drives is showing problems. I've already switched to a new cable and i'm still getting errors. I will be replacing this motherboard I just got just to rule that part out. I have moved everything over to my LSI controllers just in case. It's nice to know that this isn't going to cause any data corruption.
February 19, 201610 yr Author Just for the record it was absolutely a bad SATA port on the MB. I put in a replacement board this morning and everything is working fine now.
Archived
This topic is now archived and is closed to further replies.