700+ CRC Errors on Samsung 860 SSD


Recommended Posts

Hello,

 

I just build my first UNRAID box and everything appears to be going fine except that my cache drive, a Samsung 860 250GB SSD was getting a LOT of CRC errors.  Flooding my dashboard with them.  

 

Attempts to fix:

 

1)  First thing I did was replace the cable.

2) Tried a new port on the motherboard    (Motherboard is  ASUSTeK COMPUTER INC. - SABERTOOTH 990FX R3.0 with AMD FX 8350 )

3) I installed a PCIE SATA card I had laying around.

None of these worked, so I went to step number 4.

4) I bought a new Samsung 860 1TB drive to rule out the drive being the problem and to give me more room.

 

Now the new drive gets the same CRC errors.

 

What can I do?

I love Samsung drives in all my Windows PC's, is there a compatibility problem with UNRAID?  

Is it a trim problem?   

Do I need to buy a network IT quality SATA card?

 

I appreciate any help.

 

Thanks,

Poppa John

 

Link to comment
1 hour ago, PoppaJohn said:

I love Samsung drives in all my Windows PC's, is there a compatibility problem with UNRAID? 

No, you just don't get warning in Windows about CRC errors.

 

1 hour ago, PoppaJohn said:

Is it a trim problem?   

No.

 

1 hour ago, PoppaJohn said:

Do I need to buy a network IT quality SATA card?

No, though you should avoid Marvell based controllers, Asmedia for 2 ports or LSI for more than 2.

 

99 times out of 100 CRC errors are caused by the SATA cable, Samsung SSDs are particularly picky with cables, they require high quality cables, likely you're using less than optimal cables.

Link to comment

I know this may sound strange, and maybe someone will know why this works, but I seem to have (at least for now) solved the CRC error by installing the Dynamix SSD Trim plugin.   I set SSD TRIM schedule to be "Hourly" and Time of day: Every Hour and now, for the first time, there are no more CRC errors.

 

 

 

 

 

Link to comment
13 hours ago, johnnie.black said:

Happy for you, still unrelated.

 

Let me refrase that, almost certainly unrelated and it wouldn't make any sense, but stranger things have happened.

 

@PoppaJohn, the reason that @johnnie.black is saying this is that  a CRC error is that when data on a SATA cable is being sent to-and-from a hard drive, it is encoded with an error detecting protocol.  A CRC error is triggered when that code detects an data error.  (The data will be resent until if is received without error.) 

 

As a complete aside, I  have never quite figured out why CRC errors have been flagged out as being such a big issue.  They really don't endanger data unless you are getting thousands of them per second!  In that case, they would significantly slowly down data transfer rates and, perhaps more importantly, there is a possibility that an error could slip through undetected as all error detecting protocols can fail if the number of errors in a data block exceeds the number that the protocol can detect in a single block.  

Link to comment
53 minutes ago, Frank1940 said:

 

@PoppaJohn, the reason that @johnnie.black is saying this is that  a CRC error is that when data on a SATA cable is being sent to-and-from a hard drive, it is encoded with an error detecting protocol.  A CRC error is triggered when that code detects an data error.  (The data will be resent until if is received without error.) 

 

As a complete aside, I  have never quite figured out why CRC errors have been flagged out as being such a big issue.  They really don't endanger data unless you are getting thousands of them per second!  In that case, they would significantly slowly down data transfer rates and, perhaps more importantly, there is a possibility that an error could slip through undetected as all error detecting protocols can fail if the number of errors in a data block exceeds the number that the protocol can detect in a single block.  

When you get a crc error it can sometimes take a few seconds for the system to recover from it.   In the terms of disk access speeds this is an eternity so even a relatively small number can adversely affect performance if they are occurring regularly.

 

@PoppaJohn: it could be worth reversing the change that you think ‘fixed’ your problem to see if it reappears.    If not then the fact that the error stopped with this change is probably just a co-incidence. 

Link to comment
23 minutes ago, itimpi said:

When you get a crc error it can sometimes take a few seconds for the system to recover from it.   In the terms of disk access speeds this is an eternity so even a relatively small number can adversely affect performance if they are occurring regularly.

Thinking about this further, I can see where a considerable delay could occur since all hard drives now use a ram cache and the drive head would probably have to physically move to get back to the data storage area for re-read of the media to resend the data.  Since bad SATA cables and bad SATA connections cause virtually all of CRC errors and that condition is so easily fixed, there is little need to have a quicker error recovery procedure...

Link to comment
  • 3 months later...
Guest and_re

I've had the same problem with the 860 EVO 250GB in my AMD system (AMD G785 mobo).
The first symptom was that I couldn't get in the Win7 even though I cleanly reinstall the Windows and try again. When I fortunately succeed to enter, it got down soon working very unstably. The error was similar to the common SSD freezing but the mouse cursor can be moved being unclickable.
I once solved it by exchanging "AMD AHCI Controller" into the Windows default driver "Standard AHCI 1.0 Serial ATA Controller". After the solution, it worked well only except CRC error count increasing. It always came with Event ID 11 atapi error (Windwos Event Log).

Recently I got the solution from the site below. Now my SSD is working perfectly. The solution is to disable NCQ by creating a new registry value.
https://superuser.com/questions/1294158/what-may-cause-very-high-crc-errors-on-ssd-apart-from-bad-sata-cables-if-any

 

1) If you use the default storahci MS driver add to the registry [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\storahci\Parameters\Device] "NcqDisabled"=dword:00000001 or "SingleIO"=hex(7):2a,00,00,00,00,00

2) If you use the AMD SATA driver add this instead: [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\amd_sata\Parameters\Device] "AmdSataNCQDisabled"=dword:0000000F or "AmdSataQueueDepth"=dword:00000001

 

I think this is caused by a kind of incompatibility or conflict between AMD SATA chips and Samsung 860 SSD as the latest Samsung Magician software even turns off RAPID mode automatically when a mobo has a AMD or ASMedia chipset.

Dbdk1v9XcAAuKkC.jpg

Edited by and_re
Link to comment
  • 2 months later...
On 5/9/2019 at 2:18 PM, Guest and_re said:

I've had the same problem with the 860 EVO 250GB in my AMD system (AMD G785 mobo).
The first symptom was that I couldn't get in the Win7 even though I cleanly reinstall the Windows and try again. When I fortunately succeed to enter, it got down soon working very unstably. The error was similar to the common SSD freezing but the mouse cursor can be moved being unclickable.
I once solved it by exchanging "AMD AHCI Controller" into the Windows default driver "Standard AHCI 1.0 Serial ATA Controller". After the solution, it worked well only except CRC error count increasing. It always came with Event ID 11 atapi error (Windwos Event Log).

Recently I got the solution from the site below. Now my SSD is working perfectly. The solution is to disable NCQ by creating a new registry value.
https://superuser.com/questions/1294158/what-may-cause-very-high-crc-errors-on-ssd-apart-from-bad-sata-cables-if-any

 

1) If you use the default storahci MS driver add to the registry [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\storahci\Parameters\Device] "NcqDisabled"=dword:00000001 or "SingleIO"=hex(7):2a,00,00,00,00,00

2) If you use the AMD SATA driver add this instead: [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\amd_sata\Parameters\Device] "AmdSataNCQDisabled"=dword:0000000F or "AmdSataQueueDepth"=dword:00000001

 

I think this is caused by a kind of incompatibility or conflict between AMD SATA chips and Samsung 860 SSD as the latest Samsung Magician software even turns off RAPID mode automatically when a mobo has a AMD or ASMedia chipset.

Dbdk1v9XcAAuKkC.jpg

I'm using an AMD setup and i'm on my second Samsung 860 SSD. Changing cables, ports and drives have done nothing for the continuous CRC errors.

Do you have any tips that are not Windows related, as I'm not using any VMs in my UnRAID setup.

Link to comment
  • 2 months later...

I've seen exactly the same.

Added a new Evo 860 Evo 500 GB SSD and pre cleared it for fun.

It started to get more and more CRC errors. im now at around 100 after some hours.

My setup is close to yours:
SABERTOOTH 990FX R2.0 with AMD FX 8350, 16 gig.

The Intel 300GB SSD before had not a single issue with the same cables and power cables and port on the mainboard.

 

What i tried:

- exchange cables

- use LSI SATA controller instead of onboard

- tried different power rail

 

This seems like a driver or a general issue here.

 

Link to comment
  • 1 month later...

Sorry for upping this older topic, but since my problems are exactly the same, I thought it was a waste to make another topic about this.

I recently swapped my unraid server's mobo/ram/cpu with my desktop hardware, and the CRC errors began. It used to run fine for months on a Ryzen 1600 cpu with a X370 mobo. Now that I switched it to an AMD 8320 on a AM3+ (Gigabyte 990FXA UD3 R5) mobo with 16GB RAM, it gives constand CRC errors. I've tried to change SATA cables with brand new ones, changed the cables from the PSU, put the SSDs in different connectors on the mobo, but nothing worked.

Is there any BIOS setting or unraid config I can do to fix this, or is there no known fix as of now?

I don't have the money at the moment to get a decent SAS controller for all my drives (would need one capable of running 12 drives, for future expansion), so that would be a no go for now.

Does anyone have any ideas how to fix this?

Edited by FooYoungHi
typo
Link to comment

I am indeed using Samsung SSDs as cache drives (2x 250GB 860 EVO).

I assume there is no need to paste the diagnostics zip anymore, since there is no software fix?

 

If I were to look for a SAS controller for my setup, are there some things I need to look for, compatibility wise? I currently have a 12 drive license, so with the 6 ports on my mobo, a 2 port SAS card (=8 SATA with 'special' cable, right?)should suffice for now.

Link to comment
8 minutes ago, FooYoungHi said:

If I were to look for a SAS controller for my setup, are there some things I need to look for, compatibility wise?

We recommend LSI, but note that trim isn't working on SAS2 LSI HBAs, at least not with latest firmware, it works with an older release, trim works fine with SAS3 LSI models, as long as the SSDs support deterministic trim, this is a requirement for all LSI HBAs, the 860 EVO does, the 850 EVO for example doesn't.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.