May 10, 20215 yr Woke up this morning to hundreds of warnings and emails that "Cache pool BTRFS missing device" Decided to restart unraid and now my pool is read only but the drive is there. Going to try copying everything off the cache to the array and format the drives in the pool. Not the first time my btrfs goes to read only, getting frustrated by it. 1. Is what I'm planning to do alright? 2. is there anything I can do to prevent this from happening again? 3. Could someone help me try and figure out what happened? attached is the diagnostics before the restart tatooine-diagnostics-20210510-0630.zip Edited May 10, 20215 yr by 5252525111 reorder wording
May 10, 20215 yr Author ran the btrfs stats. seems like both have numbers there. Tatooine:~# btrfs dev stats /mnt/app_cache/ [/dev/nvme0n1p1].write_io_errs 2350795 [/dev/nvme0n1p1].read_io_errs 953269 [/dev/nvme0n1p1].flush_io_errs 74132 [/dev/nvme0n1p1].corruption_errs 9861 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 239 [/dev/nvme1n1p1].generation_errs 0 Is this a btrfs issue or M.2 issue? should i be looking to replace my NVMEs? Edited May 10, 20215 yr by 5252525111
May 10, 20215 yr The second one is showing corruption errors, unless they are old it suggests a hardware problem, like bad RAM. NVMe devices dropping are usually a BIOS/kernel issue, but could also be a bad device, though unlikely, this sometimes helps: Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference.
May 10, 20215 yr Author Thanks @JorgeB. I put `nvme_core.default_ps_max_latency_us=0` in and decided I may as well do a BIOS update. The cache pool was still read only, but I've back everything up and will be formatting the drives. Hopefully it won't occur again.
May 10, 20215 yr Author Just started copying everting back to the pool already have this Tatooine:~# btrfs dev stats /mnt/app_cache/ [/dev/nvme0n1p1].write_io_errs 129163 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 3 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 I take it, it could be a failing drive. Specifically `nvme0n1p1`?
May 10, 20215 yr It dropped again, try swapping NVMe slots and see if the problems stays with the slot or follows the device.
May 10, 20215 yr Author Followed the drive. Swapped it out and seems to be good now. Thanks for the help!!! Much appreciated.
Archived
This topic is now archived and is closed to further replies.