Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

NVMe Cache Drive Dropping Out

Featured Replies

Hello,

 

I've been having an issue that has recently started popping up. One of my NVMe cache drives keeps dropping. I have 4 cache pools setup. With the following names:

  • Cache
  • DownloadCache
  • Plexcache (this is the faulty/disappearing one) nvme1n1p1
  • Systemcache

I have four cache pools, because when I attempted to run them in btrfs raid 10 it was horrible. But that isn't the issue here. All four of the cache pools are single NVMe drives none are in any form of raid. They are formatted with XFS. The NVMe drives I am using are XPG_GAMMIX_S50_Lite (https://www.xpg.com/us/xpg/681?tab=specification). I have all four of the drives running on the HBA card AORUS Gen4 AIC Adapter (https://www.gigabyte.com/us/Solid-State-Drive/AORUS-Gen4-AIC-Adaptor/sp#sp). The motherboard I am using is ASRock Rack ROMED8-2T (https://www.asrockrack.com/general/productdetail.asp?Model=ROMED8-2T#Specifications). The slot the HBA is in is set to 4x4x4x4 mode with the speed manually set to PCIe 3.0. Previously I was getting PCIe error with them running at PCIe 4.0, but those errors went away when I forced the speed down to 3.0 (I think this was due to communication errors/signal integrity?).

 

The error I am having is that the cache pool named Plexcache will drop out sometimes. This device is nvme1n1p1. None of the other NVMe drives on the HBA are exhibiting this error/inconsistent behavior.

 

Now luckily I have an elastic cluster that ingests my unraid servers syslog, so I can see the error messages, but I don't know how to solve the issue. I have run a SMART short self-test on the drive and it reports No Errors Logged. The SMART report is attached. The file attached named "Syslog nvme keyword search.csv" contains the syslog but filtering for *nvme*. The file attached named "Syslog 2 days.csv" contains all syslog data from the past two days.

 

If anyone has experienced something like this please let me know.

 

Specs:

Motherboard: ROMED8-2T

   BMC Firmware Version: 1.19.00

   BIOS Firmware Version: P3.50

CPU: AMD EPYC 7542

    Cores: 32
    Threads 64
    Base: 2.9 GHz
    Boost: 3.4 GHz
    Cache: 128MB L3 Cache
    Memory Controller: 3200 MHz
    Memory Channels: 8
    PCI Express Revision: 4.0
    PCI Express Lanes: 128
    Socket SP3
    TDP 225W
    Series: AMD EPYC 7002

CPU Cooler: Noctua NH-U9 TR4-SP3
RAM: Kingston 32GB DDR4

    Model: KSM32RD4/32HDR

Flash Storage:

    4 XPG GAMMIX S50 Lite 1TB M.2 2280 PCIe Gen 4.4 NVMe
        XPG_GAMMIX_S50_Lite_2L252LQJ58LY
        XPG_GAMMIX_S50_Lite_2L252LQH8ERH
        XPG_GAMMIX_S50_Lite_2L25292BJACA
        XPG_GAMMIX_S50_Lite_2L2529QB66YE
    1 970 EVO Plus 1TB
        Samsung_SSD_970_EVO_Plus_1TB_S59ANJ0N123475B (Unused)

Case: Norco RPC-4220
NVME PCIe 4.0 Adapter: AORUS Gen4 AIC Adapter

    Model: GC-4XM2G4

 

XPG_GAMMIX_S50_Lite_2L252LQH8ERH-20230506-1424.txt Syslog nvme keyword search.csv Syslog 2 days.csv

Solved by JorgeB

  • Community Expert
  • Solution

This can help in some cases, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

  • Author
8 hours ago, JorgeB said:

This can help in some cases, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

On it, will let you know the results. May take awhile to determine if the solution has worked. Thank you for responding.

  • 6 months later...
  • Author

Been awhile.

 

During my testing I was unable to solve the issues related to the NVME drives dropping out while all four were in Raid 10 with btrfs as the filesystem. I ended up having to move back to XFS with all four drives as independent cache pools. Also, I do find it humorous that soon after I started testing raid cache pools ZFS was officially supported by Unraid. 🤷‍♂️

 

Unfortunately, during the intermediate time frame before ZFS was officially supported I needed to upgrade one of the NVME drives from 1TB to 4TB which limits my ability to use it in a ZFS pool. So in lieu of moving a ton of data around just to go back to a smaller drive I will instead be upgrading the other three NVME drives over time and then will use Raid 10 or Raid Z2 depending on my storage size needs at the time.

 

I'm not really sure why I had such a bad experience with btrfs, and I'm sure others have had great success with it. But, my option is that ZFS is probably safer and more stable anyways. Thanks for the help @JorgeB.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.