August 27, 20232 yr I have been getting frequent occurrences of my docker.img file becoming corrupt. No other drive or filesystem errors. SMART tests do not report error. I don't know if the constant problem is my cache drive, docker, Unraid, or ZFS. My guess: My cache drive is ZFS. The docker file is BTRFS vDisk. Maybe there's a conflict? # zpool status -xv pool: cache state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:15:38 with 3 errors on Sun Aug 27 12:39:33 2023 config: NAME STATE READ WRITE CKSUM cache ONLINE 0 0 0 sdi1 ONLINE 0 0 128 errors: Permanent errors have been detected in the following files: /mnt/cache/system/docker/docker.img tower-diagnostics-20230827-1425.zip
August 27, 20232 yr Community Expert 17 minutes ago, Jaybau said: My cache drive is ZFS. The docker file is BTRFS vDisk. Maybe there's a conflict? That's fine, I would start by running memtest.
August 28, 20232 yr Author 6 hours ago, JorgeB said: That's fine, I would start by running memtest. memtest = 1 pass, 0 errors. Edited August 28, 20232 yr by Jaybau
December 31, 20241 yr Did you ever solve this? I have the same issue happening. memtest passed multiple times, checked monthly, although I have only been running unraid for one month, I have ran it at least 3 times and all passed. root@NAS2:~# zpool status cache -v pool: cache state: ONLINE scan: scrub repaired 0B in 00:00:29 with 1 errors on Tue Dec 31 10:16:17 2024 config: NAME STATE READ WRITE CKSUM cache ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvme0n1p1 ONLINE 0 0 4 nvme1n1p1 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: cache/domains:<0x4> root@NAS2:~# zpool status cache -v pool: cache state: ONLINE scan: scrub repaired 0B in 00:00:29 with 1 errors on Tue Dec 31 10:16:17 2024 config: NAME STATE READ WRITE CKSUM cache ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvme0n1p1 ONLINE 0 0 7 nvme1n1p1 ONLINE 0 0 7 errors: Permanent errors have been detected in the following files: /mnt/cache/system/docker/docker.img cache/domains:<0x4> As you can seen docker.img pops up inbetween two command runs for some reason, without any change in docker containers The 0x4 shows because I deleted the file earlier today as that file was irrelevant, one one but now same is happening for docker.img I am having similar issues on other drives as well, all zfs/btrfs are having corruption. 16TBx2 MIRROR Seagate Exos x18 root@NAS2:~# zpool status data -v pool: data state: ONLINE scan: scrub in progress since Tue Dec 31 10:00:33 2024 2.21T / 4.78T scanned at 504M/s, 972G / 4.78T issued at 216M/s 256K repaired, 19.87% done, 05:09:15 to go config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdc1 ONLINE 0 0 7 (repairing) sde1 ONLINE 0 0 7 (repairing) errors: Permanent errors have been detected in the following files: /mnt/data/Backup/2024-12-23.zip /mnt/data/PlexData/Series/...1080p.mkv 8TBx2 MIRROR IronWolf UUID: 9920b33f-88eb-49cf-a2d7-784bb99eaab9 Scrub started: Mon Dec 30 03:03:45 2024 Status: finished Duration: 16:03:12 Total to scrub: 13.85TiB Rate: 251.06MiB/s Error summary: csum=2 Corrected: 0 Uncorrectable: 2 Unverified: 0 It shouldn't be related to the ssd/drives, as they were running fine without any issues in another system. All drive extended smart tests are all passing, HDDScan ERASE passed, newly formatted afterwards Running 7.0.0-rc.2
December 31, 20241 yr Community Expert 5 hours ago, hhhhh said: memtest passed multiple times, memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM which is the #1 reason for data corrutpion.
January 5, 20251 yr This morning the server was unresponsive on the web UI. I tried to login locally, which was very laggy and ended up with "login: timed out after 60 seconds" after entering the username, exact behavior as mentioned in following post: Because of these stability issues and recent checksum issues I ran a memtest after a reboot which failed when using 2*16GB, tried either RAM stick 1x16GB in either slot when running memtest, both slots/sticks failed. Both memory sticks started failing somehow in the last month. Strange as this memory was working fine in the same system running windows prior to using unraid. I performed a memtest after deciding to go with ZFS, which passed mid december. This system ran fine for 2 years. Either way I purchased a new 1x32GB and no memtest errors. I will keep monitoring, and hope faulty memory caused the issues I experienced. I did notice the CPU was getting quite hot due to some scheduled maintenance/backup tasks between 2 to 3:25 AM, which I believe is when unraid got stuck, HA (running on unraid) did keep collecting data from the external power monitor, but no longer from unraid, until I suppose docker even got stuck, ending any data stored after 6 AM. I've changed my fan settings to keep the CPU temp lower under full load, and also improved ventilation so that the memory / cache drives run less hot. Hoping these changes will solve the issues with system instability Edited January 5, 20251 yr by hhhhh
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.