[6.9.2] UDMA CRC ERROR COUNT RAPIDLY INCREASING


Recommended Posts

Hello,

    I just installed an NetApp DS4246 disk shelf that was connected to an LSI SAS 9200-8e controller with an 40G QSFP SFF-8436 to Mini SAS  SFF-8088 DDR Hybrid SAS Cable 4M For NetApp. I then moved my array of 10 SATA hard drives to the array and installed 4 new SSD's into the  main server box for new cache pools. I updated the firmware of the LSI SAS 9200-8e to 20.00.07.00. everything was recognized by unraid and I was able to start the array. 

   After I started the array a parity check was started, everything looked good until I was spammed with UDMA CRC ERROR COUNTs from the 10 hard drives in the DS4246. I stopped the parity and moved the 10 hard drives back into the server box and the 6 SSD's into the DS4246. I also reseated all cards and cables. 

   I again started the system and everything was working no more errors. Later that night my system did its app backup and I got a UDMA CRC ERROR COUNT warning on one of the ssd's. so I started googling to see if I could find a solution. the common theme of the UDMA CRC ERROR COUNT problems seems to be connection issues or bad cards. 

   I ordered a new LSI SAS 9207-8e card and it arrived today, I updated the firmware to 20.00.07.00 and replaced the LSI SAS 9200-8e card. I moved 2 of the hard drives from my server box to the DS4246. I started a parity check to see if that had solved my issue, but after a bit the errors came back on the 2 drives on the DS4246. I switched ports on both the LSI card and the DS4246, errors. I swapped the IOM6 modules in the DS4246, still errors, I even removed the bottom IOM6 module to see if there was an issue with the modules. 

  I have ordered a new 2M cable which will be here next week. I have also moved the 2 hard drives back into the server box. does anyone have any suggestions on things I can try/do to fix this issue. if I need to exchange my DS4246 I only have about 2 weeks left.

 

 

 

System - Unraid 6.9.2

  Motherboard - ASUS KGPE-D16

  Processor - 2x AMD Opteron™ 6386 SE

  Memory - 128gb

  Network main - Mellanox Technologies MT26448 10g

  Network Legacy - 4 port gigabit Ethernet 

  Disk Controller 1 - SAS9211-8i

  Disk Controller 2 - SAS9207-8e paired with an NetApp DS4246 disk shelf

  

  Main Array 

  •     2 - 8 tb parity.
  •     7 - 8 tb storage.
  •     1 - 4 tb storage.

 

Cache Pools

  • Bulk_File - 2 x 1 tb ssd
  • Cache_docker - 2 x 1 tb ssd
  • Cache_VM - 2 x 1 tb ssd

 

Thanks

   The Doc.

 

Noticed that there was a lot of  "unraid kernel: w83795 1-002f: Failed to read from register 0x045" errors in the log, searched and discovered that this was a sensor error, I removed the "Dynamix System Temp" plugin and they went away. I am pretty sure that this is unrelated to the issue above.

unraid-diagnostics-20220318-1616.zip

Edited by DocLove
added more information
Link to comment
  • DocLove changed the title to [6.9.2] UDMA CRC ERROR COUNT RAPIDLY INCREASING

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.