May 6, 20233 yr Hello, I've been having an issue that has recently started popping up. One of my NVMe cache drives keeps dropping. I have 4 cache pools setup. With the following names: Cache DownloadCache Plexcache (this is the faulty/disappearing one) nvme1n1p1 Systemcache I have four cache pools, because when I attempted to run them in btrfs raid 10 it was horrible. But that isn't the issue here. All four of the cache pools are single NVMe drives none are in any form of raid. They are formatted with XFS. The NVMe drives I am using are XPG_GAMMIX_S50_Lite (https://www.xpg.com/us/xpg/681?tab=specification). I have all four of the drives running on the HBA card AORUS Gen4 AIC Adapter (https://www.gigabyte.com/us/Solid-State-Drive/AORUS-Gen4-AIC-Adaptor/sp#sp). The motherboard I am using is ASRock Rack ROMED8-2T (https://www.asrockrack.com/general/productdetail.asp?Model=ROMED8-2T#Specifications). The slot the HBA is in is set to 4x4x4x4 mode with the speed manually set to PCIe 3.0. Previously I was getting PCIe error with them running at PCIe 4.0, but those errors went away when I forced the speed down to 3.0 (I think this was due to communication errors/signal integrity?). The error I am having is that the cache pool named Plexcache will drop out sometimes. This device is nvme1n1p1. None of the other NVMe drives on the HBA are exhibiting this error/inconsistent behavior. Now luckily I have an elastic cluster that ingests my unraid servers syslog, so I can see the error messages, but I don't know how to solve the issue. I have run a SMART short self-test on the drive and it reports No Errors Logged. The SMART report is attached. The file attached named "Syslog nvme keyword search.csv" contains the syslog but filtering for *nvme*. The file attached named "Syslog 2 days.csv" contains all syslog data from the past two days. If anyone has experienced something like this please let me know. Specs: Motherboard: ROMED8-2T BMC Firmware Version: 1.19.00 BIOS Firmware Version: P3.50 CPU: AMD EPYC 7542 Cores: 32 Threads 64 Base: 2.9 GHz Boost: 3.4 GHz Cache: 128MB L3 Cache Memory Controller: 3200 MHz Memory Channels: 8 PCI Express Revision: 4.0 PCI Express Lanes: 128 Socket SP3 TDP 225W Series: AMD EPYC 7002 CPU Cooler: Noctua NH-U9 TR4-SP3 RAM: Kingston 32GB DDR4 Model: KSM32RD4/32HDR Flash Storage: 4 XPG GAMMIX S50 Lite 1TB M.2 2280 PCIe Gen 4.4 NVMe XPG_GAMMIX_S50_Lite_2L252LQJ58LY XPG_GAMMIX_S50_Lite_2L252LQH8ERH XPG_GAMMIX_S50_Lite_2L25292BJACA XPG_GAMMIX_S50_Lite_2L2529QB66YE 1 970 EVO Plus 1TB Samsung_SSD_970_EVO_Plus_1TB_S59ANJ0N123475B (Unused) Case: Norco RPC-4220 NVME PCIe 4.0 Adapter: AORUS Gen4 AIC Adapter Model: GC-4XM2G4 XPG_GAMMIX_S50_Lite_2L252LQH8ERH-20230506-1424.txt Syslog nvme keyword search.csv Syslog 2 days.csv
May 7, 20233 yr Community Expert Solution This can help in some cases, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference.
May 7, 20233 yr Author 8 hours ago, JorgeB said: This can help in some cases, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference. On it, will let you know the results. May take awhile to determine if the solution has worked. Thank you for responding.
December 7, 20232 yr Author Been awhile. During my testing I was unable to solve the issues related to the NVME drives dropping out while all four were in Raid 10 with btrfs as the filesystem. I ended up having to move back to XFS with all four drives as independent cache pools. Also, I do find it humorous that soon after I started testing raid cache pools ZFS was officially supported by Unraid. 🤷♂️ Unfortunately, during the intermediate time frame before ZFS was officially supported I needed to upgrade one of the NVME drives from 1TB to 4TB which limits my ability to use it in a ZFS pool. So in lieu of moving a ton of data around just to go back to a smaller drive I will instead be upgrading the other three NVME drives over time and then will use Raid 10 or Raid Z2 depending on my storage size needs at the time. I'm not really sure why I had such a bad experience with btrfs, and I'm sure others have had great success with it. But, my option is that ZFS is probably safer and more stable anyways. Thanks for the help @JorgeB.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.