August 18, 20241 yr I have a BTRFS mirror of 2 NVME drives, twice now the same NVME drive has "failed" yet the volume shows as healthy and no warnings in the UI about the failed disk. If I reboot the system, the drive comes back online and a srub with repair resolves the problem. This is concerning that the UI shows the volume as healthy. The only clue is that the drive reports no temp just a *. Any thoughts? this is a new Samsung 990 Pro drive and has the latest firmware. see attached images. I am on Unraid 6.12.11 Do you think the drive is bad or the controller. Running this system on an HP EliteDesk 800 G6 and using the 2 motherboard slots for the NVME drives. Also why didn't unraid report the failed array/drive? Edited August 18, 20241 yr by ihavoc add more information
August 19, 20241 yr Community Expert Solution Aug 18 08:47:37 unraid kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff Aug 18 08:47:37 unraid kernel: nvme nvme1: Does your device have a faulty power saving mode enabled? Aug 18 08:47:37 unraid kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug NVMe device is dropping offline, try adding those options to syslinux, after /bzroot, then reboot and retest. As for no warning, pools are not currently monitored by the GUI, old issue, see here for now.
August 19, 20241 yr Author Thank you @JorgeB I have made the syslinux change and removed all power management options in the bios. Normally if I have an issue it can take a week or more to show up. I will report back if I see any issues.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.