Missing Data / Increasing UDMA CRC Count on hard drives

ProductionPete · July 14, 2021

Hello, Over the past year I have been getting increasingly frequent reports of the UDMA CRC numbers increasing on each hard drive in my server and eventually the drives would become disabled. I started off replacing my old WD drives from a previous server with IronWolf 4tb drives but the problem kept happening even after I had sent the drives back for repair to Seagate. After I did some research I found that the increasing numbers were nothing to be that worried about and could be down to poor SATA cables or even bends in the cables. I improved my cable management and rebuild the current drive that had been disabled and all seem

fine.

Cut to yesterday when it happened again, however this time when I rebuilt the drive the parity check passed but the data isn't visible. Unraid shows the correct drive percentage under the 'Main' tab of the GUI but browsing the drive by clicking the file logo gives me the message 'No Listing: Too many files'. The shares have disappeared from the 'Shares' tab and windows cannot access them either.

The drive with the problem is drive 4 in my set up but drive 3 (an old WD) is also giving me the same problems and creating thousands of error lines. I have one spare (brand new Ironwolf) drive to swap into the box if needed but don't want to try this until I'm sure this will help and not hinder the problems. I have attached the zip file from the diagnostics page (as per the 'Read me first' post on this forum) and downloaded all the SMART reports in case they are helpful later.

I am running Unraid 6.9.2 with these plugins: Recycle Bin, Unassigned Devices, Community Applications, Disk Location, Dynamic Active Streams, Dynamix SSD Trim, Fix Common Problems, Preclear Disks, Unassigned Devices Plus, unbalance, User Scripts. My hardware is a ASRock H370M-ITX, i7-8700 with 32g RAM.

Any help is much appreciated.

fileserver1-diagnostics-20210714-1733.zip

JorgeB · July 14, 2021

Disk3 dropped offline 1 minute after you started rebuilding disk4, so the rebuilt disk4 will be mostly corrupt, if system notifications are enable you'd be notified about the read errors, syslog cuts off due to log spam, if if you were rebuilding on top of the old disk, all data on that disk will be gone, and likely there will be be filesystem corruption preventing user shares from working, reboot and post new diags after array start.

ProductionPete · July 14, 2021

Hi JorgeB. As grateful as I was for your response I was tearing my hair out thinking I had lost all that data. However after following your instructions and rebooting the server, all the shares are working and my data is accessible! This is amazing.

I have still posted the diagnostics file in case you can help me prevent this from happening again. Certainly advice on best practices to storing data would be appreciated, should I run two parity disks or get a bigger case as my Fractal design node 304 is pretty tight.

Regards

fileserver1-diagnostics-20210714-1936.zip

JorgeB · July 15, 2021

Data on disk4 can look OK but should still be mostly corrupt, unless the previous diags don't show the old story.

Missing Data / Increasing UDMA CRC Count on hard drives

Recommended Posts

ProductionPete

Link to comment

JorgeB

Link to comment

ProductionPete

Link to comment

JorgeB

Link to comment

Join the conversation