January 16, 20251 yr I have a zfs cache pool (raidz1) and will randomly get degraed with random nvme to down. in the syslog will see the follwing zio pool=cache vdev=/dev/nvme6n1p1 error=5 type=6 offset=189249257472 size=45056 flags=524480 These 7 nvmes are attaced in a PLX8749 cards (max to 8 nvmes) These 7 nvmes are healthy, and for each time it will choose a different nvme to get down. I don't know why tower-diagnostics-20250116-2115.zip
January 16, 20251 yr Community Expert Solution Jan 16 21:05:41 Tower kernel: nvme nvme6: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10 Jan 16 21:05:41 Tower kernel: nvme nvme6: Does your device have a faulty power saving mode enabled? Jan 16 21:05:41 Tower kernel: nvme nvme6: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug NVMe is dropping offline, try adding those options to syslinux.cfg
January 17, 20251 yr Author 11 hours ago, JorgeB said: Jan 16 21:05:41 Tower kernel: nvme nvme6: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10 Jan 16 21:05:41 Tower kernel: nvme nvme6: Does your device have a faulty power saving mode enabled? Jan 16 21:05:41 Tower kernel: nvme nvme6: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug NVMe is dropping offline, try adding those options to syslinux.cfg Many Thanks. I have tested the zpool in serveral hours with the setting "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off", and everthing is fine
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.