5252525111 Posted May 10, 2021 Share Posted May 10, 2021 (edited) Woke up this morning to hundreds of warnings and emails that "Cache pool BTRFS missing device" Decided to restart unraid and now my pool is read only but the drive is there. Going to try copying everything off the cache to the array and format the drives in the pool. Not the first time my btrfs goes to read only, getting frustrated by it. 1. Is what I'm planning to do alright? 2. is there anything I can do to prevent this from happening again? 3. Could someone help me try and figure out what happened? attached is the diagnostics before the restart tatooine-diagnostics-20210510-0630.zip Edited May 10, 2021 by 5252525111 reorder wording Quote Link to comment
JorgeB Posted May 10, 2021 Share Posted May 10, 2021 One of the NVMe devices dropped offline, see here for better pool monitoring. Quote Link to comment
5252525111 Posted May 10, 2021 Author Share Posted May 10, 2021 (edited) ran the btrfs stats. seems like both have numbers there. Tatooine:~# btrfs dev stats /mnt/app_cache/ [/dev/nvme0n1p1].write_io_errs 2350795 [/dev/nvme0n1p1].read_io_errs 953269 [/dev/nvme0n1p1].flush_io_errs 74132 [/dev/nvme0n1p1].corruption_errs 9861 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 239 [/dev/nvme1n1p1].generation_errs 0 Is this a btrfs issue or M.2 issue? should i be looking to replace my NVMEs? Edited May 10, 2021 by 5252525111 Quote Link to comment
JorgeB Posted May 10, 2021 Share Posted May 10, 2021 The second one is showing corruption errors, unless they are old it suggests a hardware problem, like bad RAM. NVMe devices dropping are usually a BIOS/kernel issue, but could also be a bad device, though unlikely, this sometimes helps: Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. 1 Quote Link to comment
5252525111 Posted May 10, 2021 Author Share Posted May 10, 2021 Thanks @JorgeB. I put `nvme_core.default_ps_max_latency_us=0` in and decided I may as well do a BIOS update. The cache pool was still read only, but I've back everything up and will be formatting the drives. Hopefully it won't occur again. Quote Link to comment
5252525111 Posted May 10, 2021 Author Share Posted May 10, 2021 Just started copying everting back to the pool already have this Tatooine:~# btrfs dev stats /mnt/app_cache/ [/dev/nvme0n1p1].write_io_errs 129163 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 3 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 I take it, it could be a failing drive. Specifically `nvme0n1p1`? Quote Link to comment
5252525111 Posted May 10, 2021 Author Share Posted May 10, 2021 tatooine-diagnostics-20210510-1346.zip New diags Quote Link to comment
JorgeB Posted May 10, 2021 Share Posted May 10, 2021 It dropped again, try swapping NVMe slots and see if the problems stays with the slot or follows the device. 1 Quote Link to comment
5252525111 Posted May 10, 2021 Author Share Posted May 10, 2021 Followed the drive. Swapped it out and seems to be good now. Thanks for the help!!! Much appreciated. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.