Hi all,
I'm running unRaid 6.12.3 and a few days ago I noticed that the Docker page is showing "Docker Service failed to start."
I've found a few other threads on this and tried some of the suggestions. The main one being to stop the docker service, delete docker.img, and re-start the docker service. After this did nothing, I went through a terminal to check it was actually getting deleted and it didn't seem to be. Trying to delete docker.img manually from the terminal said that my array was in Read-Only.
I had one disk full in the array so I scattered shares around with unBalance and rebooted which seemed to get the array out of read-only mode. I tried stopping docker and deleting docker.img again - I can confirm that it deleted, but it still didn't fix the problem.
I see some BTRFS errors in my syslog and after scrubbing my 1TB nvme cache drive - nvme0n1 - which holds the appdata and system shares, the drive seems to have 3 uncorrectable errors. Can't find out more about them as when I try to run a SMART test, nothing seems to happen. The drive has about 300GB free, so I'm not sure what is causing the errors or if the drive is dying (it's only about 2 years old).
Some highlights from my syslog below, and full diagnostics are attached.
Aug 8 12:59:23 fractal emhttpd: mounting /mnt/cache Aug 8 12:59:23 fractal emhttpd: shcmd (46): mkdir -p /mnt/cache
Aug 8 12:59:23 fractal emhttpd: /sbin/btrfs filesystem show 9e62a0a8-02e6-4e30-9f96-4d413126e2b5 2>&1
Aug 8 12:59:23 fractal emhttpd: Label: none uuid: 9e62a0a8-02e6-4e30-9f96-4d413126e2b5
Aug 8 12:59:23 fractal emhttpd: Total devices 1 FS bytes used 657.42GiB Aug 8 12:59:23 fractal emhttpd: devid 2 size 931.51GiB used 736.03GiB path /dev/nvme0n1p1
Aug 8 12:59:23 fractal emhttpd: /mnt/cache uuid: 9e62a0a8-02e6-4e30-9f96-4d413126e2b5
Aug 8 12:59:23 fractal emhttpd: shcmd (47): mount -t btrfs -o noatime,space_cache=v2 -U 9e62a0a8-02e6-4e30-9f96-4d413126e2b5 /mnt/cache
Aug 8 12:59:23 fractal kernel: BTRFS info (device nvme0n1p1): using crc32c (crc32c-intel) checksum algorithm
Aug 8 12:59:23 fractal kernel: BTRFS info (device nvme0n1p1): using free space tree
Aug 8 12:59:23 fractal kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Aug 8 12:59:23 fractal kernel: BTRFS info (device nvme0n1p1): enabling ssd optimizations
Aug 8 12:59:23 fractal emhttpd: shcmd (48): mount -o remount,discard=async /mnt/cache
Aug 8 12:59:23 fractal kernel: BTRFS info (device nvme0n1p1: state M): turning on async discard
Aug 8 12:59:23 fractal kernel: BTRFS error (device nvme0n1p1): incorrect extent count for 4778965336064; counted 1519, expected 1512
Would it be worth moving the appdata and system share from cache to array (currently set to array --> cache and previously 'cache only') and then trying to start docker without using the (potentially) faulty nvme cache drive?
I also noticed that after deleting docker.img and restarting the docker service, it seems to create system/docker/docker.img but also system/docker/docker/docker.img - is this normal behaviour?
Server info: Fractal: 4u rackmount ~ Intel® Core™ i5-10400 CPU @ 2.90GHz ~ Gigabyte B460M DS3H ~ 48 GB Corsair Vengeance DDR4 @ 3200 Mhz ~ 1 TB WD Blue M.2 NVMe ~ 4x 6TB, 1x 2TB, single parity
Thanks so much for any help!
fractal-diagnostics-20230808-1715.zip