Konnor378 Posted January 6 Share Posted January 6 (edited) Hi, I ran into a problem with file corruption. It's worth saying straight away that I'm using non-ecc RAM, no cache and no parity disc if that affects anything. I previously thought the btrfs filesystem was to blame, but having changed to xfs and zfs I found out that it's not the filesystem or the drives that are to blame, as this problem is spreading to all drives and even new drives that have only been running for a few hours with a completely clean SMART (specifically bought a brand new cheap drive to check). The zfs and btrfs checks built into the file system do not reveal any problems. I then started checking the RAM with the inbuilt memtest86+ and after 25 hours it showed no errors, all passes were in passed mode, then running the array again I saw that the files recently downloaded before memtest86 had also become corrupted. Maybe it's my unraid upgrade from 6.12.4 to 6.12.6 as I found this problem on 6.12.6. I don't even know what to think or what the problem could be. I don't know if all file types are corrupted, but the exe format is definitely corrupted sooner or later bigshell-diagnostics-20240106-2327.zip Edited January 6 by Konnor378 Quote Link to comment
Vr2Io Posted January 6 Share Posted January 6 (edited) All above not the reason for file corrupt. Are you access file through network ? Pls check network part too. In local ( Unraid ), you can hash a file i.e. b3sum <file> and keep track does the hash result always different. Edited January 6 by Vr2Io Quote Link to comment
Konnor378 Posted January 6 Author Share Posted January 6 (edited) 36 minutes ago, Vr2Io said: All above not the reason for file corrupt. Are you access file through network ? Pls check network part too. In local ( Unraid ), you can hash a file i.e. b3sum <file> and keep track does the hash result always different. Yes, I am accessing all the files. I have also copied corrupted files to several computers and in all cases they were corrupted. The checksum was also different from the original one. I'll attach a screenshot of the error if that helps with the solution. The dynamix file integrity plugin also marks files as corrupted Edited January 6 by Konnor378 Quote Link to comment
Konnor378 Posted January 7 Author Share Posted January 7 3 hours ago, JorgeB said: Start by running memtest. I wrote earlier that I ran memtest86+ for 24 hours and didn't get a single error Quote Link to comment
Lolight Posted January 7 Share Posted January 7 19 minutes ago, Konnor378 said: I wrote earlier that I ran memtest86+ for 24 hours and didn't get a single error OK, how many passes? Quote Link to comment
itimpi Posted January 7 Share Posted January 7 21 minutes ago, Konnor378 said: I wrote earlier that I ran memtest86+ for 24 hours and didn't get a single error One problem with memtest is that although it is a definitive result if it fails, passing is not as definitive so it is possible to pass and still have RAM issues. Quote Link to comment
Konnor378 Posted January 7 Author Share Posted January 7 12 minutes ago, Lolight said: OK, how many passes? About 30 passes 10 minutes ago, itimpi said: One problem with memtest is that although it is a definitive result if it fails, passing is not as definitive so it is possible to pass and still have RAM issues. In that case I need to run memtest for a few days or even a week? Quote Link to comment
itimpi Posted January 7 Share Posted January 7 2 hours ago, Konnor378 said: In that case I need to run memtest for a few days or even a week? That does not help. There can be times when the issue is not the RAM directly but things like the memory controller on the motherboard struggling under some types of load or RAM configurations. A good check is to see if running with less RAM stabilizes the system. Quote Link to comment
Vr2Io Posted January 7 Share Posted January 7 (edited) I agree 24hrs memory test already enough, in fact, I usually test in several hrs or several pass only. I never have ecc-memory. The problem is not much trouble shoot info. have provide, so I don't know how to provide problem fix direction. When you download a file to Unraid, what hash at source and destination ? How long it will corrupt ? If it is a static file, this also hard to imagine why it will corrupt later. Pls make some test by rsync, copy some file to /mnt/user/test1/ then rsync -ah /mnt/user/test1/ /mnt/user/test2/ Does success without error ? Edited January 7 by Vr2Io Quote Link to comment
Lolight Posted January 7 Share Posted January 7 37 minutes ago, itimpi said: A good check is to see if running with less RAM stabilizes the system. It looks like he runs only one stick. In that case @Konnor378 should try to change to another RAM socket. If no change then try another RAM stick. Quote Link to comment
Konnor378 Posted January 7 Author Share Posted January 7 (edited) 1 hour ago, Vr2Io said: I agree 24hrs memory test already enough, in fact, I usually test in several hrs or several pass only. I never have ecc-memory. The problem is not much trouble shoot info. have provide, so I don't know how to provide problem fix direction. When you download a file to Unraid, what hash at source and destination ? How long it will corrupt ? If it is a static file, this also hard to imagine why it will corrupt later. Pls make some test by rsync, copy some file to /mnt/user/test1/ then rsync -ah /mnt/user/test1/ /mnt/user/test2/ Does success without error ? The rsync test passed without errors. When I upload the file to the system, the checksums match, but within 1-3 days the file gets corrupted, and after the corruption the checksums are different from the original. The files are static and are not moved anywhere, just stored on the disc. When I ran memtest86 I put the clean test files on the disc and was sure they were intact, and after checking the RAM I found those test files corrupted again, with memtest showing no errors. I also tried reconnecting the RAM though in the same slot, but got corrupted files again Edited January 7 by Konnor378 Quote Link to comment
Konnor378 Posted January 7 Author Share Posted January 7 1 hour ago, Lolight said: It looks like he runs only one stick. In that case @Konnor378 should try to change to another RAM socket. If no change then try another RAM stick. I will try moving the RAM to a different slot. If there are any changes, I will let you know Quote Link to comment
Vr2Io Posted January 7 Share Posted January 7 (edited) If memory test fine, I believe it is other issue cause corrupt problem. As mention, if static file just seat at storage, no much reason it will corrupt. Could you change it to read only and monitor its content will be change ? Pls ensure no one have copy/backup it in same destinations, and no user share to disk share or vise reverse. Edited January 7 by Vr2Io Quote Link to comment
Konnor378 Posted January 7 Author Share Posted January 7 53 minutes ago, Vr2Io said: If memory test fine, I believe it is other issue cause corrupt problem. As mention, if static file just seat at storage, no much reason it will corrupt. Could you change it to read only and monitor its content will be change ? Pls ensure no one have copy/backup it in same destinations, and no user share to disk share or vise reverse. Yes, I put the file in read mode and left the other one as is to see the changes, and to rule out other trivial problems, I updated the bios on the motherboard and, following the advice, moved the RAM to a different slot and restricted access to other devices for the time being. We'll see how the files behave. Quote Link to comment
Konnor378 Posted January 8 Author Share Posted January 8 The problem still exists. Going to go buy another RAM with ecc tomorrow. I will report the problem in a few days after the tests Quote Link to comment
Vr2Io Posted January 9 Share Posted January 9 (edited) AMD 3000G / A320 chipsert may not support ECC memory. Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support. I believe motherboard problem ( or other unknown reason ) more then memory. Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ? There also not sense if it is memory problem - If memory problem, system will crash, not just file corrupt. - If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash. I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again. Edited January 9 by Vr2Io Quote Link to comment
Konnor378 Posted January 11 Author Share Posted January 11 (edited) On 1/9/2024 at 8:28 AM, Vr2Io said: AMD 3000G / A320 chipsert may not support ECC memory. Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support. I believe motherboard problem ( or other unknown reason ) more then memory. Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ? There also not sense if it is memory problem - If memory problem, system will crash, not just file corrupt. - If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash. I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again. Okay. I'll try copying the test files to /tmp. As for ECC memory, the motherboard supports unbuffered ecc, but the processor is not sure. I have an asus prime a320i-k board and an amd athlon 3000g processor and I'm not sure if the processor supports ecc because I've come across mixed information and I'm not sure if it can work with ecc memory. I also have an assumption that the image from the USB stick through which unraid is launched is corrupted. I don't know if this is true, but on version 6.11.... this never happened. I don't mean to disparage unraid in any way, but I previously had an OMV system with the same configuration and it did save my files, but the excessive crutching and almost constant console work was a bit stressful Edited January 11 by Konnor378 Quote Link to comment
Konnor378 Posted January 12 Author Share Posted January 12 On 1/9/2024 at 8:28 AM, Vr2Io said: AMD 3000G / A320 chipsert may not support ECC memory. Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support. I believe motherboard problem ( or other unknown reason ) more then memory. Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ? There also not sense if it is memory problem - If memory problem, system will crash, not just file corrupt. - If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash. I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again. I moved the file from the /tmp folder back to the disc, and it was even more broken than usual. Looks like it's the RAM after all. I'll get new RAM from ecc today and see if the situation changes. Quote Link to comment
MrGrey Posted January 12 Share Posted January 12 I think of myself as a fool/idiot, but I would never put multiple drives; configured as more than one drive each, in a system without ECC memory. Of course, I lied. I did. I learned. Now, "I would never put multiple drives; configured as more than one drive each, in a system without ECC memory." I hope you get it all sorted Konnor378 MrGrey. Quote Link to comment
Konnor378 Posted January 13 Author Share Posted January 13 (edited) In general, on a different RAM card with ECC the problem remains. I will try another OS to rule out incompatibility of my components with unraid Edited January 13 by Konnor378 Quote Link to comment
Konnor378 Posted January 16 Author Share Posted January 16 (edited) On 1/9/2024 at 8:28 AM, Vr2Io said: AMD 3000G / A320 chipsert may not support ECC memory. Below table ( not A320 ) also indicate different CPU with different chipset have different ECC support. I believe motherboard problem ( or other unknown reason ) more then memory. Could you try copy some file to /tmp ( ram disk ) then monitor does file content / hash will change with time ? There also not sense if it is memory problem - If memory problem, system will crash, not just file corrupt. - If array disk file corrupt, then it also expect system file / USB file corrupt too, as result system will also crash. I have a strange experience, a new build platform, all test was great and running well, but when I insert a NVMe, file copy to it will corrupt immediate, no different with different NVMe ( no any error log / even PCIe error and no problem on SATA disk ). But problem suddenly gone after few days troubleshoot. Finally, I RMA this mobo and problem never happen again. So, I finished testing the hardware and found out that it's not the components, but the unraid itself. Another RAM with ECC did not help. I tried OMV/TrueNas/WinServer2022 systems and on all systems there was no file corruption, moreover even the smb transfer rate on all above systems increased (150-190 MB/s), on unraid it was around 40-50 MB/s. Maybe there is some incompatibility with unraid, but I remember exactly that on version 6.11.... files were fine. I then completely reinstalled unraid with partition cleanup on the flash drive, didn't install any plugins/containers, just created shared folders on the drives, shared via smb, and a day later the files were corrupted again. I now have OMV installed, also running via the same flash drive and the files are fine. Looks like I'll have to wait for newer versions of unraid with a fixed bug to ever get back to it. Edited January 16 by Konnor378 Quote Link to comment
Vr2Io Posted January 16 Share Posted January 16 Strange, all my build were latest, no problem. Quote Link to comment
JorgeB Posted January 16 Share Posted January 16 I don't see how it can be Unraid, maybe a driver in the current kernel that doesn't like your hardware, since there's a Realtek NIC did you try using the Realtek driver plugin? There are some known issues with those NICs, though corruption is not one of them AFAIK. Quote Link to comment
Solution Konnor378 Posted January 16 Author Solution Share Posted January 16 4 hours ago, JorgeB said: I don't see how it can be Unraid, maybe a driver in the current kernel that doesn't like your hardware, since there's a Realtek NIC did you try using the Realtek driver plugin? There are some known issues with those NICs, though corruption is not one of them AFAIK. Yes, I was installing drivers from ich777, as far as I remember ITE, realtek and radeon-top. Maybe the speed problem is solved by forcing smb3, because I also have smb3 minimum protocol in ovm, and the file corruption is probably some kind of component incompatibility. I also learnt on another forum that unraid doesn't work well on boards with Marvell controller, I don't remember exactly, and maybe it's my case. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.