Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

cache drive dying?

Featured Replies

I didn't want to hijack @Elmojo 's thread about his failing cache drive, https://forums.unraid.net/topic/199306-cache-pool-errors-failing-drive-or/

But it seems I have a very similar problem, at a similar time, with a similar nvme ssd - a Samsung 990 Pro. Passes all tests and doesn't directly report any read / crc / anything else errors.

Last week I came back from vacation and Unraid reported the drive was offline - I forget the wording as I thought it was just a glitch and rebooted the server - to find my cache gone. Rebooted a few times, checked a couple things, nothing changed.

Server was kept off for a couple days while I ordered a nvme to usb dock to just check the drive - because isn't that what we'd all do? :D Disk seemed good, went back into the server and booted up - everything worked. Hmm.. ok weird, right?

Fast forward until tonight, about an hour ago the cache share disappeared again - no messages from Unraid other than console logs such as:

UNRAID kernel: nvme nvme1: I/O tag 735 (c2df) opcode 0x2 (I/O Cmd) QID 14 timeout, aborting req_op:READ(0) size:131072

UNRAID kernel: I/O error, dev nvme1n1, sector 1954447249 op 0x1:(WRITE) flags 0x29800 phys_seg 1 prio class 2

UNRAID kernel: I/O error, dev loop3, sector 75840 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2

UNRAID kernel: BTRFS error (device loop3): bdev /dev/loop3 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0

UNRAID emhttpd: device /dev/nvme1n1 has size zero

This time I rebooted, and everything is fine again. Preemptively I'm going to start backing up the cache drive to the array like Elmojo is doing, but does this all point to a failing ssd or should I be looking at some other cause?

Diagnostic file doesn't have anything of interest relating to this drive other than the logs above, and I've been running 7.2.4 for quite some time so it's not a recent upgrade issue.

Edited by Energen

  • Community Expert

try this, on Main click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off

Reboot and then see if it makes a difference; if it still drops post the diagnostics next time it happens.

  • Author

The only change from what's currently there is the pcie_port_pm=off

Original:

label Unraid OS

menu default

kernel /bzimage

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

I'll give this a go and see what happens. Thanks!

It's very peculiar that this has only started, and that's it's occurred a week apart. If the SSD was dying I would expect a much more consistent problem.

Edited by Energen

  • Author

@JorgeB So things have been running fine for the last week, until today. Same problem occurred out of nowhere.

Disk Location: nvme0n1 Alert - Device failure

Samsung SSD 990 PRO with Heatsink 2TB

Jun 18 18:49:04 UNRAID kernel: I/O error, dev loop3, sector 76160 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 2

Jun 18 18:49:04 UNRAID kernel: BTRFS error (device loop3): bdev /dev/loop3 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0

I find it a little concerning/coincidental that there have been a few threads recently about nvme cache drive problems all occurring around the same time out of nowhere. If the drive were failing I would expect it to be failing daily and not run fine for a week without issue. It's as if something on the system is causing a breakdown. I have 2 identical nvme's installed at the same time, granted used for different purposes and the cache drive get used with more activity, but this is very strange. I ran SATA SSD cache drives for years without problems, and now the nvme for ~2 years without a problem, not sure what's going on. Is it the drive or something in Unraid.

unraid-diagnostics-20260618-1848.zip

  • Author

I rebooted the server and had the same problem upon start, drive missing.

I shut down the server (power off) and turned it back on and drive is online.

  • Community Expert

Jun 18 03:11:30 UNRAID kernel: nvme nvme0: Abort status: 0x371

Jun 18 03:11:51 UNRAID kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1

Jun 18 03:11:51 UNRAID kernel: nvme nvme0: Disabling device after reset failure: -19

NVMe is still dropping offline; if the kernel options don't help, best bet is to use a different brand/model device (or a different board)

7 hours ago, Energen said:

I rebooted the server and had the same problem upon start, drive missing.

I shut down the server (power off) and turned it back on and drive is online.

This is normal when this happens; a power cycle is required to bring the device back, not just a reboot.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.