technicavivunt Posted March 29, 2022 Share Posted March 29, 2022 Looks like my Google-Fu has failed me, but maybe someone can shed some insight. My XPG S70 Blade seems to have connection dropouts with no rhyme or reason that I can find. It'll be fine for a few hours then all of a sudden I get a notification that the device is missing. Mar 29 11:00:27 TheRedQueen kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xFFFF I tried passing it through to a VM to check on it's firmware to see if there's an update, but to no avail. Also the error has shifted from the passthrough now to TheRedQueen kernel: vfio-pci 0000:02:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update I should mention that in the VM the drive never actually drops out of the OS, but unraid it definitely does and a reboot is necessary to get it back on track. I've reseated it once just in case, but seems the issue seems to happen once every 24 hours. Thoughts? Supermicro M12SWA-TF AMD Ryzen Threadripper PRO 3955WX NVIDIA GTX 1060 6GB (For Transcoding Purposes) 2x LSI 9202-16e HBAs LSI 9272-8i HBA 2x T-Force Cardea 1TB (Cache) in a ASUS Hyper M.2 Expansion (Bifurcated x4x4x4x4) Seasonic PRIME 1000W Platinum PSU. theredqueen-diagnostics-20220329-1129.zip Quote Link to comment
JorgeB Posted March 29, 2022 Share Posted March 29, 2022 The below might help and it's worth a shot, if not best bet is a different NVMe device (or different board). Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. Quote Link to comment
technicavivunt Posted March 29, 2022 Author Share Posted March 29, 2022 Looks like I'm still getting this while the SSD is passed through after the the Syslinux Configuration change.; I'm going to poke around in my BIOS and turn off C States if possible when I get home to see if that's the root cause. Definitely feels power management related. vfio-pci 0000:02:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update Quote Link to comment
technicavivunt Posted March 30, 2022 Author Share Posted March 30, 2022 Didn't find much regarding to C States, but in the BIOS there's a mention under the NVMe configuration for AMI Firmware versus vendor firmware. Selecting AMI Firmware has seemed to resolve the issue (at least over the past 12+ hours). I'll test it with some less important stuff over the next few days just in case and will post an update. 1 Quote Link to comment
technicavivunt Posted April 3, 2022 Author Share Posted April 3, 2022 After some testing, looks like a combination of using the AMI firmware for the nvme drives and @JorgeB's solution in Unraid the drive seems stable for the last few days. Quote Link to comment
Splash Posted December 8, 2023 Share Posted December 8, 2023 I dont have the drives (I have in RAID1 BTRFS) being dropped by the UnRaid OS, but I do see a lot of these in my dmesg logs. I am also going to try the kernel option to see if it helps. [320250.699810] nvme 0000:03:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update [320250.829812] nvme 0000:06:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.