Hi,
I have an issue with Unraid where I think the FUSE filesystem dies when a specific docker container receives a specific interaction. I know it sounds vague at start, but please read on, it's always reproducible and rather easy to do..
I'm using the latest Unraid (6.11.1 - basic license) and just upgraded my setup to have a parity drive, as well as to have a cache drive (did a clean install, not migrated).
My setup is: 2x 4TB HDD (1 parity, 1 xfs) + 1x 500GB SSD as cache (tried with btrfs as well as xfs here)
What I did:
Installed all pre-requisites to have Apps as well as docker-compose; enabled docker, etc. - everything which you need to add the MariaDB-Official app:as well as to run docker-compose:
Within Apps, added MariaDB Official, made its database persistent inside /mnt/user/appdata/... + created own network for it ("mariadb")
I don't think the external MariaDB as well as the custom network is relevant here, but I didn't want to modify the example. I just wanted to use one single MariaDB for other use-cases too, hence the initial setup looked like this.
Brought up Owncloud from their official docker-compose file (modified file attached to allow recreating the issue - modification includes the use the MariaDB Official docker as well as the persistent storage for both the Owncloud docker container as well as Redis - for both, I've added the volume mounts to /mnt/user/appdata/owncloud/...)
Enabled remote syslog to gather some logs, as after the issue happens, there's no way to get any diagnostics...
What is observed:
After doing the Owncloud initial setup via web, you can check that everything is working fine in the web browser, as well as using the PC client, HOWEVER:
If by any chance you'd try to access the server via the iOS Owncloud application, everything dies instantly - and I mean everything, instantly (you don't really even have to do anything, just have the iOS client request a login, or similar):
You cannot access the owncloud web interface, it's dead.
You cannot get into the owncloud container in any way, it's dead
(there's no logs written by docker for the container about any kind of error)
You cannot stop/remove the container:
Error response from daemon: cannot stop container: owncloud_server: tried to kill container, but did not receive an exit event
You cannot stop the docker service:
stopping dockerd... waiting for docker to die... repeat x15 times... umount: /var/lib/docker: target is busy.
You cannot stop the storage array anymore, you get into an infinite loop of the following log entries (it goes until infinity, does not take into account any timeout set at the array settings)
Unmounting disks...
shcmd (72301): umount /mnt/disk1
umount: /mnt/disk1: target is busy.
shcmd (72301): exit status: 32
shcmd (72302): umount /mnt/cache
umount: /mnt/cache: target is busy.
shcmd (72302): exit status: 32
Retry unmounting disk share(s)...
Unmounting disks...
You cannot create diagnostics, it also hangs indefinitely at "Starting diagnostics collection..." - does not generate any kind of log
powerdown -r also does not work the first time you issue it out.
the second invocation of the powerdown -r reboots the server with the following logs (these are the last logs I receive via syslog, nothing after this until the new logs from the rebooted system)
md: md_notify_reboot
md: stopping all md devices
md: 1 devices still in use.
sd 4:0:0:0: [sdd] Synchronizing SCSI cache
sd 2:0:0:0: [sdc] Synchronizing SCSI cache
sd 1:0:0:0: [sdb] Synchronizing SCSI cache
A bonus event here: before trying to do anything with the system (after owncloud dies) if you try to access the FTP settings (Settings / FTP Server) in Unraid, the Unraid web interface also dies completely, and nothing seems to work getting it back until you reboot the system. I didn't find any logs related to this either.
Of course, as the array did not stop properly, the whole parity check needs to be re-done, which is 8+ hours with the 4TB disk, so it's not nice.
After reading about a lot of similar issues (albeit very outdated ones, from 2015-2016) it seemed to me that the issue could be caused by the internal FUSE filesystem which tries its best to use the cache drive transparently. To put this theory to test, I've recreated the owncloud container with the volume shares mounted at /mnt/cache/appdata/owncloud as well as /mnt/disk1/appdata/owncloud - so bypassing the /mnt/user/appdata/... construct. In both cases, the application worked perfectly, there weren't any crashes; and most importantly, no Unraid OS level crashes (or rather, no storage array hanging issues)
The problem with this solution is that I cannot use the cache feature which would be very beneficial for the HDDs to not spin up every time something needs to be written to them; so I'd really like to use the /mnt/user/appdata/... path.. (as far as I've understood, if you directly point a volume share to /mnt/cache/appdata, then it's true you're using the cache drive, but it also means that:
the data is not going to be flushed at any time to the HDDs, where the parity would protect the data
the amount of space you can use is only as much as the cache drive, as everything is stored there completely.
What I've attached to this post:
The remote syslog capture - it's rather long, but you can skip until Oct 8 11:14:37 in the log to see the important stuff onwards.
The owncloud docker-compose.yml file, which can be used to reproduce the issue (warning - you WILL face parity check at the end...) - I've modified the user/passwords inside the file to be generic, as well as removed the external .env file need for easier testing. If you like, you can get the unmodified versions from their site: https://doc.owncloud.com/server/10.11/admin_manual/installation/docker/#docker-compose - scroll downwards, you'll find the .env as well as the docker-compose.yml - just make sure you modify it so it wouldn't use a docker volume, but an external share.
A diagnostics zip file which I've created AFTER the system rebooted (I've noticed at some posts a diagnostics file was also requested after reboot, so here it is in advance..)
Any help would be appreciated about solving this while still maintaining the ability to use the cache feature.
tower-diagnostics-20221008-2003.zip
syslog
docker-compose.yml