Docker.img corruption (again)

August 27, 20232 yr

I have been getting frequent occurrences of my docker.img file becoming corrupt.

No other drive or filesystem errors. SMART tests do not report error.

I don't know if the constant problem is my cache drive, docker, Unraid, or ZFS.

My guess:

My cache drive is ZFS. The docker file is BTRFS vDisk. Maybe there's a conflict?

# zpool status -xv
  pool: cache
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:15:38 with 3 errors on Sun Aug 27 12:39:33 2023
config:

        NAME        STATE     READ WRITE CKSUM
        cache       ONLINE       0     0     0
          sdi1      ONLINE       0     0   128

errors: Permanent errors have been detected in the following files:

        /mnt/cache/system/docker/docker.img

tower-diagnostics-20230827-1425.zip

Quote

August 27, 20232 yr

Community Expert

17 minutes ago, Jaybau said:

My cache drive is ZFS. The docker file is BTRFS vDisk. Maybe there's a conflict?

That's fine, I would start by running memtest.

Quote

August 28, 20232 yr

Author

6 hours ago, JorgeB said:

That's fine, I would start by running memtest.

memtest = 1 pass, 0 errors.

Edited August 28, 20232 yr by Jaybau

Quote

December 31, 20241 yr

Did you ever solve this? I have the same issue happening.

memtest passed multiple times, checked monthly, although I have only been running unraid for one month, I have ran it at least 3 times and all passed.

root@NAS2:~# zpool status cache -v
  pool: cache
 state: ONLINE
  scan: scrub repaired 0B in 00:00:29 with 1 errors on Tue Dec 31 10:16:17 2024
config:
        NAME           STATE     READ WRITE CKSUM
        cache          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme0n1p1  ONLINE       0     0     4
            nvme1n1p1  ONLINE       0     0     4
errors: Permanent errors have been detected in the following files:
        cache/domains:<0x4>
        
root@NAS2:~# zpool status cache -v
  pool: cache
 state: ONLINE
  scan: scrub repaired 0B in 00:00:29 with 1 errors on Tue Dec 31 10:16:17 2024
config:
        NAME           STATE     READ WRITE CKSUM
        cache          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme0n1p1  ONLINE       0     0     7
            nvme1n1p1  ONLINE       0     0     7
errors: Permanent errors have been detected in the following files:
        /mnt/cache/system/docker/docker.img
        cache/domains:<0x4>

As you can seen docker.img pops up inbetween two command runs for some reason, without any change in docker containers

The 0x4 shows because I deleted the file earlier today as that file was irrelevant, one one but now same is happening for docker.img

I am having similar issues on other drives as well, all zfs/btrfs are having corruption.

16TBx2 MIRROR Seagate Exos x18

root@NAS2:~# zpool status data -v
  pool: data
 state: ONLINE
  scan: scrub in progress since Tue Dec 31 10:00:33 2024
        2.21T / 4.78T scanned at 504M/s, 972G / 4.78T issued at 216M/s
        256K repaired, 19.87% done, 05:09:15 to go
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdc1    ONLINE       0     0     7  (repairing)
            sde1    ONLINE       0     0     7  (repairing)

errors: Permanent errors have been detected in the following files:

        /mnt/data/Backup/2024-12-23.zip
        /mnt/data/PlexData/Series/...1080p.mkv

8TBx2 MIRROR IronWolf

UUID:             9920b33f-88eb-49cf-a2d7-784bb99eaab9
Scrub started:    Mon Dec 30 03:03:45 2024
Status:           finished
Duration:         16:03:12
Total to scrub:   13.85TiB
Rate:             251.06MiB/s
Error summary:    csum=2
  Corrected:      0
  Uncorrectable:  2
  Unverified:     0

It shouldn't be related to the ssd/drives, as they were running fine without any issues in another system.

All drive extended smart tests are all passing, HDDScan ERASE passed, newly formatted afterwards

Running 7.0.0-rc.2

Quote

December 31, 20241 yr

Community Expert

5 hours ago, hhhhh said:

memtest passed multiple times,

memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM which is the #1 reason for data corrutpion.

Quote

January 5, 20251 yr

This morning the server was unresponsive on the web UI. I tried to login locally, which was very laggy and ended up with "login: timed out after 60 seconds" after entering the username, exact behavior as mentioned in following post:

Because of these stability issues and recent checksum issues I ran a memtest after a reboot which failed when using 2*16GB, tried either RAM stick 1x16GB in either slot when running memtest, both slots/sticks failed. Both memory sticks started failing somehow in the last month. Strange as this memory was working fine in the same system running windows prior to using unraid. I performed a memtest after deciding to go with ZFS, which passed mid december. This system ran fine for 2 years. Either way I purchased a new 1x32GB and no memtest errors. I will keep monitoring, and hope faulty memory caused the issues I experienced.

I did notice the CPU was getting quite hot due to some scheduled maintenance/backup tasks between 2 to 3:25 AM, which I believe is when unraid got stuck, HA (running on unraid) did keep collecting data from the external power monitor, but no longer from unraid, until I suppose docker even got stuck, ending any data stored after 6 AM.

I've changed my fan settings to keep the CPU temp lower under full load, and also improved ventilation so that the memory / cache drives run less hot.

Hoping these changes will solve the issues with system instability

Edited January 5, 20251 yr by hhhhh

Quote

Docker.img corruption (again)

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)