May 20, 20251 yr Hi guys, So, I have a dual NVMe SSD btrfs cache pool that has been running fine for at least a couple of years now. Well... it was, until earlier this morning (although only just got a mail for it). Now, with a drive failure I know can remove it and re-add it if think it's cables or something. What can I try with an NVMe drive? Reboot? Stop/start array? Is it toast and RMA time? I have this message in the logs that could be a clue? May 20 05:36:11 monty kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff May 20 05:36:11 monty kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? May 20 05:36:11 monty kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug Not sure how to check the power profile when it's disabled though. Appreciate any help! Cheers, Pacman monty-diagnostics-20250520-1315.zip
May 20, 20251 yr Community Expert 1 hour ago, SudoPacman said: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" Add this to syslinux.cfg, for the boot option you are using, then retest.
May 20, 20251 yr Author Thanks, I'll try.Will the disabled drive be re-enabled automatically or do I need to do something?
May 20, 20251 yr Author Ah. Rebooted and the drive is not there so the array has not started...I guess I change the pool to a single drive and start it and then stop and see if re-appears?
May 20, 20251 yr Author Started the array, and everything is running.Think my drive might be hosed though, since not showing in unassigned devices.Pool looks like this:
May 20, 20251 yr Community Expert You typically need to power cycle the server to get the device back, just rebooting won't be enough.
May 20, 20251 yr Author @JorgeB Okay power cycled and the old drive has appeared in unassigned devices, so that's good.However, the other cache drive, that is part of a RAID1, is now showing as unmountable!I do have a backup, but would rather avoid having to restore if possible!When I stop the array the cache shows as a single slot.If I change to 2 slots and add the drive in it will not let me start the array. I get the following:What's my next step please?Cheers!
May 20, 20251 yr Author @JorgeB Okay, removed the second drive and changed slots back to 1.Output from btrfs fi show gives:Label: none uuid: 057dcd04-fb86-434a-be64-ee1d0bf433eb Total devices 1 FS bytes used 416.00KiB devid 1 size 1.00GiB used 126.38MiB path /dev/loop2 Label: none uuid: ebee1354-a882-4fbd-8b63-3d6a56422b17 Total devices 2 FS bytes used 530.13GiB devid 2 size 931.51GiB used 561.03GiB path /dev/nvme0n1p1 devid 3 size 931.51GiB used 527.03GiB path /dev/nvme1n1p1
May 20, 20251 yr Author Interestingly, if I mount the supposedly failed drive in unassigned devices I can access it and it seems ok...
May 20, 20251 yr Author If I try and stop the array and switch one drive for the other it shows the one that failed as disabled.
May 20, 20251 yr Author If I remove both but do snot start the array I can mount them both... Interestingly, one shows up as Pool...
May 20, 20251 yr Author Ahh, clicked on cache and have managed to remove the pool. I'll try and re-add it now.
May 20, 20251 yr Author Okay, removed the pool, and re-created it.Little bit nervy since wasn't sure if was going to wipe it, but seems to have come back up okay...Now have a warning:Event: Unraid Cache disk message Subject: Warning [MONTY] - pool BTRFS too many profiles (You can ignore this warning when a pool balance operation is in progress) Description: WD_BLACK_SN850X_1000GB_23230X803108 (nvme0n1) Importance: warningDo I need to do a rebalance or something?
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.