September 17, 20223 yr Hello! I could really use some help. I recently discovered my docker containers were offline and went to look why. When pulling up the Docker tab on Unraid I get the error "Docker Service failed to start.". Digging further I have also found the error "Unable to write to nvme_cache" from fix common problems. I have tried to fix this the ways I know but am not sure how to proceed without potential causing more harm than good. Things I have tried so far: Deleting the Docker vDisk file (did not work) Running a BTRFS Scrub on the nvme_cache (gets aborted immediately) Notable recent occurrences: This occurred days before I had to move houses. So I had to shut down the server and come back to it. I have attached the diagnostics file from my server (pulled just now) to hopefully provide better details than I can. Any ideas on how to fix this and get my docker services running properly again? Thanks! anton-diagnostics-20220916-2213.zip
September 17, 20223 yr Community Expert Sep 16 21:39:46 Anton kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 80548, rd 0, flush 79403, corrupt 25, gen 0 This shows that nvme01n1 device dropped offline in the past, start with a scrub and post the output, also run a scrub on the other pool since there's corruption found and see here for better pool monitoring,
September 17, 20223 yr Author Thank you for ideas! I have two cache pools, one is 4x1TB SATA SSDs and the other is 2x2TB NVME SSDs. For the SATA pool (named cache) the output of the scrub is: UUID: 0ad59d90-fcd8-4af3-a622-ade321c10ea0 Scrub started: Sat Sep 17 10:07:52 2022 Status: finished Duration: 0:07:15 Total to scrub: 416.91GiB Rate: 981.42MiB/s Error summary: no errors found Running the command from your post here is the output: root@Anton:~# btrfs dev stats /mnt/cache [/dev/sdc1].write_io_errs 0 [/dev/sdc1].read_io_errs 0 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 317 [/dev/sdc1].generation_errs 0 [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 901 [/dev/sdb1].generation_errs 0 [/dev/sde1].write_io_errs 0 [/dev/sde1].read_io_errs 0 [/dev/sde1].flush_io_errs 0 [/dev/sde1].corruption_errs 992 [/dev/sde1].generation_errs 0 [/dev/sdaf1].write_io_errs 0 [/dev/sdaf1].read_io_errs 0 [/dev/sdaf1].flush_io_errs 0 [/dev/sdaf1].corruption_errs 886 [/dev/sdaf1].generation_errs 0 For the NVME pool (named Nvme_cache) the output of the scrub is: UUID: 94c08dd5-7765-4b75-8d62-7c23c4b37b3f Scrub started: Sat Sep 17 10:08:20 2022 Status: aborted Duration: 0:00:00 Total to scrub: 2.44TiB Rate: 0.00B/s Error summary: no errors found Running the command from your post here is the output: root@Anton:~# btrfs dev stats /mnt/nvme_cache [/dev/nvme0n1p1].write_io_errs 80548 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 79438 [/dev/nvme0n1p1].corruption_errs 276 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 It appears I cannot run the scrub on the Nvme_cache. It immediately reports a status of "aborted". Any ideas on how to correct this? Is this the point of hardware failure and replacement? Thanks!
September 18, 20223 yr Community Expert SATA pool is OK, corruption errors are old, you should clear the stats, for the NVMe pool if the scrub fails best bet is to backup and re-format.
September 19, 20223 yr Author @JorgeB I was afraid that was going to be the answer but thank you for confirming. As a prep for re-formatting I am trying to move the data I can off of the NVMe pool. After reading other threads it seemed the best method to do this was to set the shares on the NVMe pool from "Prefer" to "Yes" for the pool. Following this I ran the mover which took a full day to run. However, the pool is still showing the same used storage quantity (1.34TB). If I'm understanding this correctly, this is a result of the NVMe pool being read only. So how can I verify if the shares were written to the array? Should I be using another method of backing up? Thanks!
September 19, 20223 yr Community Expert 2 minutes ago, buccadebeppo said: If I'm understanding this correctly, this is a result of the NVMe pool being read only. Most likely. 3 minutes ago, buccadebeppo said: So how can I verify if the shares were written to the array? Run rsync -av /path/to/source/ /path/to/dest/ it will only copy any missing data.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.