Jump to content

Cache Disk Full Warning Docker and VMs Crash


Go to solution Solved by thefarelkid,

Recommended Posts

This started happening almost a year ago, but hasn't happened for a long while now. I do get docker.img warnings above 70% when I do updates to my dockers, but that is temporary and goes back to normal after updates are complete.

 

But cache disks utilization at 100% is very confusing though as my cache disk is usually about 60%.

 

Diagnostics attached. Thank you for anyone who can help.

gemini-diagnostics-20240506-0540.zip

Link to comment

Cache floor is set quite high for system and appdata shares, but the pools seems to still have some free space, you are you seeing that it's full?

 

Also, zfs is detecting data corruption, post the output of

 

zpool status -v

 

Link to comment

Thanks for your help Jorge. Both the main pool and cache don't seem to be too full usually. Main Pool is around 47% and cache is 60%. Also I just noticed I don't have a scrub schedule for my cache pool. Oops.

 

root@Gemini:~# zpool status -v
  pool: cache
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

        NAME           STATE     READ WRITE CKSUM
        cache          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme1n1p1  ONLINE       0     0     0
            nvme2n1p1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /mnt/cache/system/sanoid/cache_appdata/sanoid.conf

 

 

Link to comment
2 minutes ago, thefarelkid said:

Main Pool is around 47% and cache is 60%.

So where were you seeing this?

4 hours ago, thefarelkid said:

But cache disks utilization at 100%

 

 

2 minutes ago, thefarelkid said:
/mnt/cache/system/sanoid/cache_appdata/sanoid.conf

This file should be deleted/restore from a backup, and if more corruptions are found recommend running memtest.

Link to comment

image.png.ddd5d732c441e277fe4a1e49b2ec8bdd.png

 

That's what I see for disk utilization normally.

 

But overnight I'll get notifications that read: Unraid Cache disk disk utilization Alert [GEMINI] - Cache disk is low on space (100%) Description PCIe_SSD_21010751200007 (nvme1n1) Priority alert. I got that at 1:11am. That was the last one I got. The first one started at 12:41am at 71% and went up from there.

 

I will attempt to restore the sanoid.conf file and run a memtest to be sure. I'm trying to research what a sharecachefloor means, but not getting far.

Link to comment
17 minutes ago, thefarelkid said:

But overnight I'll get notifications that read: Unraid Cache disk disk utilization Alert [GEMINI] - Cache disk is low on space (100%) Description

That suggests it's getting full, and then probably the mover moves some data, you will need to check that because the cache getting 100% full can cause other issues.

Link to comment

That was my thought. And you're right about other issues too. This morning the cache pool had returned to normal, but I couldn't get the docker service to start without initiating a full restart. Maybe I could have if I knew hot to do it from the command line. But back when this was a little more regular, the GUI would crash as well. The only thing that I can think that would do that much writing to the cache would be something like SABnzbd. But that hadn't had any activity since 11 last night. Could it be running a backup of my HomeAssistant VM? Those backups are only ~400MB though. No. They don't start running until 2:00am. I'm stumped for now. Is there a way to log the disk utilization by service?

Link to comment
  • Solution

I think I solved it. I remembered that I have Veeam running on a Windows PC that backs up every night at 12:30. Looks like last night was a rather large job. The target for that backup is a share that utilizes the cache for speed, and then the mover moves it off the cache. But since it's an overnight backup, I will move the target to a new share that doesn't touch the cache at all.

 

Thanks for helping me discover my other issues though. I definitely need to sort out my zfs issues.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...