Corruption files


Go to solution Solved by Konnor378,

Recommended Posts

Hi, I ran into a problem with file corruption. It's worth saying straight away that I'm using non-ecc RAM, no cache and no parity disc if that affects anything. I previously thought the btrfs filesystem was to blame, but having changed to xfs and zfs I found out that it's not the filesystem or the drives that are to blame, as this problem is spreading to all drives and even new drives that have only been running for a few hours with a completely clean SMART (specifically bought a brand new cheap drive to check). The zfs and btrfs checks built into the file system do not reveal any problems. I then started checking the RAM with the inbuilt memtest86+ and after 25 hours it showed no errors, all passes were in passed mode, then running the array again I saw that the files recently downloaded before memtest86 had also become corrupted. Maybe it's my unraid upgrade from 6.12.4 to 6.12.6 as I found this problem on 6.12.6. I don't even know what to think or what the problem could be. I don't know if all file types are corrupted, but the exe format is definitely corrupted sooner or later

bigshell-diagnostics-20240106-2327.zip

Edited by Konnor378
Link to comment

All above not the reason for file corrupt. Are you access file through network ? Pls check network part too.

 

In local ( Unraid ), you can hash a file i.e. b3sum <file> and keep track does the hash result always different.

Edited by Vr2Io
Link to comment
Posted (edited)
36 minutes ago, Vr2Io said:

All above not the reason for file corrupt. Are you access file through network ? Pls check network part too.

 

In local ( Unraid ), you can hash a file i.e. b3sum <file> and keep track does the hash result always different.

Yes, I am accessing all the files. I have also copied corrupted files to several computers and in all cases they were corrupted. The checksum was also different from the original one. I'll attach a screenshot of the error if that helps with the solution. The dynamix file integrity plugin also marks files as corrupted

error.png

Edited by Konnor378
Link to comment
21 minutes ago, Konnor378 said:

I wrote earlier that I ran memtest86+ for 24 hours and didn't get a single error

One problem with memtest is that although it is a definitive result if it fails, passing is not as definitive so it is possible to pass and still have RAM issues.

Link to comment
12 minutes ago, Lolight said:

OK, how many passes?

About 30 passes
 

10 minutes ago, itimpi said:

One problem with memtest is that although it is a definitive result if it fails, passing is not as definitive so it is possible to pass and still have RAM issues.

In that case I need to run memtest for a few days or even a week?

Link to comment
2 hours ago, Konnor378 said:

In that case I need to run memtest for a few days or even a week?

That does not help.    There can be times when the issue is not the RAM directly but things like the memory controller on the motherboard struggling under some types of load or RAM configurations.

 

A good check is to see if running with less RAM stabilizes the system.  

Link to comment

I agree 24hrs memory test already enough, in fact, I usually test in several hrs or several pass only. I never have ecc-memory.

 

The problem is not much trouble shoot info. have provide, so I don't know how to provide problem fix direction.

 

When you download a file to Unraid, what hash at source and destination ? How long it will corrupt ? If it is a static file, this also hard to imagine why it will corrupt later.

 

Pls make some test by rsync, copy some file to /mnt/user/test1/

 

then

 

rsync -ah /mnt/user/test1/ /mnt/user/test2/

 

Does success without error ?

 

 

Edited by Vr2Io
Link to comment
Posted (edited)
1 hour ago, Vr2Io said:

I agree 24hrs memory test already enough, in fact, I usually test in several hrs or several pass only. I never have ecc-memory.

 

The problem is not much trouble shoot info. have provide, so I don't know how to provide problem fix direction.

 

When you download a file to Unraid, what hash at source and destination ? How long it will corrupt ? If it is a static file, this also hard to imagine why it will corrupt later.

 

Pls make some test by rsync, copy some file to /mnt/user/test1/

 

then

 

rsync -ah /mnt/user/test1/ /mnt/user/test2/

 

Does success without error ?

 

 

The rsync test passed without errors. When I upload the file to the system, the checksums match, but within 1-3 days the file gets corrupted, and after the corruption the checksums are different from the original. The files are static and are not moved anywhere, just stored on the disc. When I ran memtest86 I put the clean test files on the disc and was sure they were intact, and after checking the RAM I found those test files corrupted again, with memtest showing no errors. I also tried reconnecting the RAM though in the same slot, but got corrupted files again

Edited by Konnor378
Link to comment

If memory test fine, I believe it is other issue cause corrupt problem.

 

As mention, if static file just seat at storage, no much reason it will corrupt. Could you change it to read only and monitor its content will be change ?

 

Pls ensure no one have copy/backup it in same destinations, and no user share to disk share or vise reverse.

Edited by Vr2Io
Link to comment
53 minutes ago, Vr2Io said:

If memory test fine, I believe it is other issue cause corrupt problem.

 

As mention, if static file just seat at storage, no much reason it will corrupt. Could you change it to read only and monitor its content will be change ?

 

Pls ensure no one have copy/backup it in same destinations, and no user share to disk share or vise reverse.

Yes, I put the file in read mode and left the other one as is to see the changes, and to rule out other trivial problems, I updated the bios on the motherboard and, following the advice, moved the RAM to a different slot and restricted access to other devices for the time being. We'll see how the files behave.

Link to comment

AMD 3000G / A320 chipsert may not support ECC memory.

 

Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support.

 

image.png.b210a683d244ffb5f66d18e5699df0f9.png

 

I believe motherboard problem ( or other unknown reason ) more then memory.

Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ?

 

There also not sense if it is memory problem

- If memory problem, system will crash, not just file corrupt.

- If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash.

 

I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again.

 

 

Edited by Vr2Io
Link to comment
On 1/9/2024 at 8:28 AM, Vr2Io said:

AMD 3000G / A320 chipsert may not support ECC memory.

 

Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support.

 

image.png.b210a683d244ffb5f66d18e5699df0f9.png

 

I believe motherboard problem ( or other unknown reason ) more then memory.

Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ?

 

There also not sense if it is memory problem

- If memory problem, system will crash, not just file corrupt.

- If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash.

 

I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again.

 

 

Okay. I'll try copying the test files to /tmp. 

As for ECC memory, the motherboard supports unbuffered ecc, but the processor is not sure. I have an asus prime a320i-k board and an amd athlon 3000g processor and I'm not sure if the processor supports ecc because I've come across mixed information and I'm not sure if it can work with ecc memory.

I also have an assumption that the image from the USB stick through which unraid is launched is corrupted. I don't know if this is true, but on version 6.11.... this never happened. I don't mean to disparage unraid in any way, but I previously had an OMV system with the same configuration and it did save my files, but the excessive crutching and almost constant console work was a bit stressful

Edited by Konnor378
Link to comment

 

On 1/9/2024 at 8:28 AM, Vr2Io said:

AMD 3000G / A320 chipsert may not support ECC memory.

 

Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support.

 

image.png.b210a683d244ffb5f66d18e5699df0f9.png

 

I believe motherboard problem ( or other unknown reason ) more then memory.

Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ?

 

There also not sense if it is memory problem

- If memory problem, system will crash, not just file corrupt.

- If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash.

 

I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again.

 

 

I moved the file from the /tmp folder back to the disc, and it was even more broken than usual. Looks like it's the RAM after all. I'll get new RAM from ecc today and see if the situation changes.

Link to comment

I think of myself as a fool/idiot, but I would never put multiple drives; configured as more than one drive each, in a system without ECC memory.

 

Of course, I lied. I did. I learned. Now, "I would never put multiple drives; configured as more than one drive each, in a system without ECC memory."

 

I hope you get it all sorted Konnor378

 

MrGrey.

 

 

Link to comment
On 1/9/2024 at 8:28 AM, Vr2Io said:

AMD 3000G / A320 chipsert may not support ECC memory.

 

Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support.

 

image.png.b210a683d244ffb5f66d18e5699df0f9.png

 

I believe motherboard problem ( or other unknown reason ) more then memory.

Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ?

 

There also not sense if it is memory problem

- If memory problem, system will crash, not just file corrupt.

- If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash.

 

I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again.

 

 

So, I finished testing the hardware and found out that it's not the components, but the unraid itself. Another RAM with ECC did not help. I tried OMV/TrueNas/WinServer2022 systems and on all systems there was no file corruption, moreover even the smb transfer rate on all above systems increased (150-190 MB/s), on unraid it was around 40-50 MB/s. Maybe there is some incompatibility with unraid, but I remember exactly that on version 6.11.... files were fine. I then completely reinstalled unraid with partition cleanup on the flash drive, didn't install any plugins/containers, just created shared folders on the drives, shared via smb, and a day later the files were corrupted again. I now have OMV installed, also running via the same flash drive and the files are fine. Looks like I'll have to wait for newer versions of unraid with a fixed bug to ever get back to it.

Edited by Konnor378
Link to comment

I don't see how it can be Unraid, maybe a driver in the current kernel that doesn't like your hardware, since there's a Realtek NIC did you try using the Realtek driver plugin? There are some known issues with those NICs, though corruption is not one of them AFAIK.

Link to comment
  • Solution
4 hours ago, JorgeB said:

I don't see how it can be Unraid, maybe a driver in the current kernel that doesn't like your hardware, since there's a Realtek NIC did you try using the Realtek driver plugin? There are some known issues with those NICs, though corruption is not one of them AFAIK.

Yes, I was installing drivers from ich777, as far as I remember ITE, realtek and radeon-top. Maybe the speed problem is solved by forcing smb3, because I also have smb3 minimum protocol in ovm, and the file corruption is probably some kind of component incompatibility. I also learnt on another forum that unraid doesn't work well on boards with Marvell controller, I don't remember exactly, and maybe it's my case.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.