Cache Disk Full Warning Docker and VMs Crash

thefarelkid · May 6

This started happening almost a year ago, but hasn't happened for a long while now. I do get docker.img warnings above 70% when I do updates to my dockers, but that is temporary and goes back to normal after updates are complete.

But cache disks utilization at 100% is very confusing though as my cache disk is usually about 60%.

Diagnostics attached. Thank you for anyone who can help.

gemini-diagnostics-20240506-0540.zip

JorgeB · May 6

Cache floor is set quite high for system and appdata shares, but the pools seems to still have some free space, you are you seeing that it's full?

Also, zfs is detecting data corruption, post the output of

zpool status -v

thefarelkid · May 6

Thanks for your help Jorge. Both the main pool and cache don't seem to be too full usually. Main Pool is around 47% and cache is 60%. Also I just noticed I don't have a scrub schedule for my cache pool. Oops.

root@Gemini:~# zpool status -v
  pool: cache
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

        NAME           STATE     READ WRITE CKSUM
        cache          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme1n1p1  ONLINE       0     0     0
            nvme2n1p1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /mnt/cache/system/sanoid/cache_appdata/sanoid.conf

JorgeB · May 6

2 minutes ago, thefarelkid said:

Main Pool is around 47% and cache is 60%.

So where were you seeing this?

4 hours ago, thefarelkid said:

But cache disks utilization at 100%

2 minutes ago, thefarelkid said:
/mnt/cache/system/sanoid/cache_appdata/sanoid.conf

This file should be deleted/restore from a backup, and if more corruptions are found recommend running memtest.

thefarelkid · May 6

image.png.ddd5d732c441e277fe4a1e49b2ec8bdd.png

That's what I see for disk utilization normally.

But overnight I'll get notifications that read: Unraid Cache disk disk utilization Alert [GEMINI] - Cache disk is low on space (100%) Description PCIe_SSD_21010751200007 (nvme1n1) Priority alert. I got that at 1:11am. That was the last one I got. The first one started at 12:41am at 71% and went up from there.

I will attempt to restore the sanoid.conf file and run a memtest to be sure. I'm trying to research what a sharecachefloor means, but not getting far.

JorgeB · May 6

17 minutes ago, thefarelkid said:

But overnight I'll get notifications that read: Unraid Cache disk disk utilization Alert [GEMINI] - Cache disk is low on space (100%) Description

That suggests it's getting full, and then probably the mover moves some data, you will need to check that because the cache getting 100% full can cause other issues.

thefarelkid · May 6

That was my thought. And you're right about other issues too. This morning the cache pool had returned to normal, but I couldn't get the docker service to start without initiating a full restart. Maybe I could have if I knew hot to do it from the command line. But back when this was a little more regular, the GUI would crash as well. The only thing that I can think that would do that much writing to the cache would be something like SABnzbd. But that hadn't had any activity since 11 last night. Could it be running a backup of my HomeAssistant VM? Those backups are only ~400MB though. No. They don't start running until 2:00am. I'm stumped for now. Is there a way to log the disk utilization by service?

thefarelkid · May 7

I think I solved it. I remembered that I have Veeam running on a Windows PC that backs up every night at 12:30. Looks like last night was a rather large job. The target for that backup is a share that utilizes the cache for speed, and then the mover moves it off the cache. But since it's an overnight backup, I will move the target to a new share that doesn't touch the cache at all.

Thanks for helping me discover my other issues though. I definitely need to sort out my zfs issues.

Cache Disk Full Warning Docker and VMs Crash

Recommended Posts

thefarelkid

Link to comment

JorgeB

Link to comment

thefarelkid

Link to comment

JorgeB

Link to comment

thefarelkid

Link to comment

JorgeB

Link to comment

thefarelkid

Link to comment

thefarelkid

Link to comment

Join the conversation