Hi All,
Recently (last few months), I've started to have troubles with certain docker containers freezing and not responding to their webUI as well as not responding to docker kill or restart commands. Once this starts to happen, unRaid and the machine get more and more unstable necessitating a reboot within the next 12 hours or so. If I leave it longer than that without a reboot, there is a progression where more docker containers become inaccessible, then the unRaid GUI stops responding, then eventually I can't even SSH into the machine. This happens over a progression of 4-8 hours.
I *think* it has something to do with rclone, dockers, and copying many files at once because it mainly happens when downloading and importing a few seasons of a TV show at once. It's almost always the Sonarr container that freezes, but sometimes it's Plex or Radarr. I have the appropriate slave option set in the Docker settings for each share that touches an unassigned device. According to netdata and the unRaid GUI, I always have lots of free ram, so I don't think it's a memory issue. But, it sure *acts* like a memory issue.
But, this setup used to work just fine about 3 months ago, so I'm having a hard time pinning down what the actual problem is so I know what to change. Any help would be greatly appreciated!
Software:
Unraid 6.8.3 Nvidia (but this happened with 6.8.2 and I think 6.8.1... so I think it's a config issue on my side, not an unRaid problem)
Dockers - Plex, Sonarr, Radarr (x2), Caddy, DelugeVPN, EAPController, Tautulli, Ombi, Sabnzbd, Jackett, netdata
Plugins - rclone vfs mount setup using an unassigned SSD as the write cache
Have VMs setup, but since this issue started happening, I've stopped using them to limit variables
Hardware:
4790k - Gigabyte mobo - 32GB ram
3x 8tb Seagate SMR drives
1 256gb nvme drive in a pci express adaptor
1 256gb sata SSD
^^^^ these 2 SSDs are pooled in a BTRFS pool
1 500GB sata SSD mounted in as an unassigned device
Zotac 1060 6GB mini
I'm attaching a few logs where this has happened. The logs would be from after the
tower-diagnostics-20200324-0957.zip
tower-diagnostics-20200328-1928.zip
tower-diagnostics-20200329-2023.zip
tower-diagnostics-20200330-0917.zip