Jump to content
Sign in to follow this  
Ascii227

[SOLVED] Hash mismatch errors when copying to shares

8 posts in this topic Last Reply

Recommended Posts

Hi,

I have been trying to copy large chunks of data onto my unraid shares. Unraid is reporting nothing wrong with the system, and all cache and shares seem to be working ok. The issue is when I copy files over to the share, the MD5 hashes never match those of the original files. I am trying to use teracopy with the MD5 verify option enabled but I always get hash mismatch errors after copying. If for example I copy video files onto an unraid share and then watch it back I will get visual glitches and audio chirps where the information is obviously corrupt. These are obviously not present when watching back the original files.

I get the same issue whether I am copying from a seperate windows machine on the network, or directly from an internal windows 10 VM onto the share.

 

Is there anything I can do to try and troubleshoot this issue? I would really like to be able to copy large files reliably to unraid!

I have included my diagnostics just incase. Thanks in advance for any advice.

asq-nas-diagnostics-20190213-1909.zip

Share this post


Link to post
1 hour ago, johnnie.black said:

Start by running memtest, bad RAM would be my first suspect.

Thanks very much for the advice. I have just ran memtest86+ for the last hour and got 100% pass rate on all tests. I even then ran the original memtest86 as well just to be thorough and got no errors on that too. 

 

Is there another place I could look to track down these errors? Thanks again.

Edited by Ascii227

Share this post


Link to post

Well this is annoying, I have the ACS Override enabled to seperate my gpu in the IOMMU groups to use for a VM passthrough. Your comment about memory made me suspicious, so I rebooted with the ACS override patch disabled. So far I have copied 5 x 9 Gb movies to my share and all 5 have MD5 hash matches. Does this point to instability in the IOMMU groups on my motherboard? Is there anything I can do to invesgtigate this further? Thanks for any advice.

Edited by Ascii227

Share this post


Link to post
2 hours ago, Ascii227 said:

Does this point to instability in the IOMMU groups on my motherboard?

If just disabling it solved the issue it's very likely, you were having extreme data corruption, btrfs was complaining of checksum errors left and right, and there were even a couple of segfaults, so the issue should be easily repeatable.

Share this post


Link to post
7 hours ago, johnnie.black said:

If just disabling it solved the issue it's very likely, you were having extreme data corruption, btrfs was complaining of checksum errors left and right, and there were even a couple of segfaults, so the issue should be easily repeatable. 

Thanks for taking a look, I went through the syslog and saw the BTRFS checksum errors and segfaults you mentioned. I disabled the cache drive for the share as that is where the errors seem to be, and have now copied lots of data succesfully to the drives.

 

However, that seems to have led me to another issue which may have been the cause all along. I have files on my cache drive which shouldnt be there, the mover should have moved them to the array overnight. I manually invoked the mover and suddenly my syslog filled up with BTRFS checksum errors again and all the files are still on the cache drive. Does this now point to something wrong with my cache drive, or still something with my motherboard?

 

It is really strange because I have my huge plex config folder stored on the cache drive and I have no issues using plex whatsoever, also I have my VM vDisk on there and have no problems with my VM. Only the mover seems to be having issues. Could it possibly be that there is just some very corrupt data left on the cache drive from previoiusly copying errored data which the mover is having trouble dealing with? The data it is having trouble moving is some weeks old.

 

I have a brand new SSD here i can use as a cache replacement drive, but I am not quite sure how to swap it out safely especially since the mover doesnt seem to be working properly. Any advice on how to achieve this would be very much appreciated, thankyou again.

 

New diagnostics attached just in case.

 

asq-nas-diagnostics-20190214-0719.zip

Edited by Ascii227

Share this post


Link to post
8 minutes ago, Ascii227 said:

However, that seems to have led me to another issue which may have been the cause all along. I have files on my cache drive which shouldnt be there, the mover should have moved them to the array overnight.

btrfs, like all other checksummed filesystems, will give an i/o error if it finds a corrupt file, so you'll know there's a problem, you need to delete those files and replace them with backups, this corruption is the result of the original problem, just that xfs doesn't detect corruption, btrfs does.

Share this post


Link to post
31 minutes ago, johnnie.black said:

btrfs, like all other checksummed filesystems, will give an i/o error if it finds a corrupt file, so you'll know there's a problem, you need to delete those files and replace them with backups, this corruption is the result of the original problem, just that xfs doesn't detect corruption, btrfs does.

Thankyou so much, I think Ive finally got there!

 

I removed all the errored files from the cache, and now I am getting no BTRFS errors when running the mover. I tested it by enabling the cache on the share and copying a 10gb file, the MD5 checksum was fine. I then ran the mover and the file was copied onto the array and removed from the cache. Another MD5 verfication on the final file on the array was successful, so I am now pretty confident those issues have gone.

 

Thanks again for stepping me through this, in future I will be much more prepared with knowledge to diagnose this myself!

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this