Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Dual NVME Cache Drive Errors

Featured Replies

New system build and new to Unraid, so this could be something simple.  I've been having a weird issue where whenever I enable the use of cache disks on a share it causes the cache drives to error out and force a reboot.  Cache drives are 2x Samsung M.2 NVME drives set up in drive pool for redundancy.  Whenever I start a file transfer using Krusader to transfer files from a network NAS, it will 100% error out and require a reboot if the share is using the cache drives.  I've tested by disabling cache drive usage for the share and get no errors.

 

Diagnostic logs attached, snapshot taken from non-error state

syslog attached from error state

 

Any help or insight would be appreciated because I've spent hours trying to isolate this issue and it's driving me nuts.

empunraid-diagnostics-20191110-0800.zip empunraid-syslog-20191110-0700.zip

  • Author

Update:

I've tried taking both NVME drives out of the cache pool and clearing them - Crashed Unraid

I've tried taking both NVME drives out of the cache pool and checking them - Crashed Unraid

I've tried breaking the pool, reformatting a single NVME drive as XFS and re-adding as a cache drive - Crashed Unraid

 

Edited by dis3as3d

One of the NVMe devices is dropping offline, hardware problem 

  • Author

The strange thing is it happens to either of the two NVME drives independently.  I've tested both drives independently and they both fall offline whenever I initiate a large data transfer or I/O heavy operation.  I may be holding out hope, but I'm wondering if this could be a drivers issue.

Just now, dis3as3d said:

I may be holding out hope, but I'm wondering if this could be a drivers issue.

It could, but IMHO more likely a board/controller/bios issue.

  • Author

I flashed the BIOS to the latest version last night as well, no dice.  Agreed it could be a MOBO/Controller issue but the fact it only happens under heavy I/O feels more like software.  There's also this long thread dating back to 2017(Seriously why is a bug this old still open?!) about issues with some Samsung NVME drives and Unraid.  Seems strangely similar, and the thread goes on to talk about sector sizes on some Samsung NVME drives causing issues.

 

I'm new to Linux so I've got no clue where to start troubleshooting this.  Might just give up on Unraid and run Windows. 

 

 

31 minutes ago, dis3as3d said:

Seems strangely similar,

Not at all, since no devices drop offline in that case.

52 minutes ago, dis3as3d said:

but the fact it only happens under heavy I/O

maybe simple overheating on controller? How you connect these drives to Mobo?

  • Author
10 minutes ago, uldise said:

maybe simple overheating on controller? How you connect these drives to Mobo?

Yes, the Mobo has 3 M.2 slots directly on the board.  Drives are in slot 0 and 1 at the moment.  Other HDD are running off a LSI 9211 since NVME shares a bus with the SATA ports. I haven't smelled any burning, and even ran an IR gun over the board and drives looking for hot spots and didn't find anything.  I think I'd have to be very specific in where I'm reading the temp, so not sure if I would've caught the controller overheating.

 

*Edit - The drives crash in 2-5 min of starting a heavy I/O operation as well.  I'd expect it any overheating to take longer than that.

Edited by dis3as3d
update

14 minutes ago, dis3as3d said:

The drives crash in 2-5 min of starting a heavy I/O operation as well

this is more than enough to start overheating, if your case have not a good ventilation.

but you can make just a test - place some fan near the drives to blow hot air away from drives.

i have no personal experience with NVMe drives, but i see many PCIe NVMe controllers that comes with their own fans..

  • Author
5 minutes ago, uldise said:

this is more than enough to start overheating, if your case have not a good ventilation.

but you can make just a test - place some fan near the drives to blow hot air away from drives.

i have no personal experience with NVMe drives, but i see many PCIe NVMe controllers that comes with their own fans..

The drives themselves aren't overheating, I'll have to look up where the controller is on the board and give that a test.

  • Author

Oh, looks like NVME is controlled by the southbridge and not a seperate chip.  I definately turned the IR gun on the southbridge.  Don't think that's the issue.

  • Author

New Theory: NVME drives go into a lower power standby mode.  Samsung drives in particular seem to give Linux problems.  While in standby you can still do low I/O transfers, and that aligns with my issue where I can see the drive and even write to it some, but larger I/O transfers give me problems.

 

The recommended fix is to add this to the syslinux.cfg:

nvme_core.default_ps_max_latency_us=5500

Being new to Linux I tried and couldn't get it working.  Below is my edited syslinux.cfg.  What did I do wrong?

default menu.c32
menu title Lime Technology, Inc.
prompt 0
timeout 50
label Unraid OS
  menu default
  kernel /bzimage
  append initrd=/bzroot
  append nvme_core.default_ps_max_latency_us=5500
label Unraid OS GUI Mode
  kernel /bzimage
  append initrd=/bzroot,/bzroot-gui
label Unraid OS Safe Mode (no plugins, no GUI)
  kernel /bzimage
  append initrd=/bzroot unraidsafemode
label Unraid OS GUI Safe Mode (no plugins)
  kernel /bzimage
  append initrd=/bzroot,/bzroot-gui unraidsafemode
label Memtest86+
  kernel /memtest

 

don't use second append line, just add at the end of the first one. 

  • Author

Is it space delineated?  Just space and then nvme_core...? 

Edited by dis3as3d

  • Author

Well, that didn't work.  Back to square 1.  Tried both of the below and the drive still crashes.  Anyone got any ideas?

 

nvme_core.default_ps_max_latency_us=0

nvme_core.default_ps_max_latency_us=5500

 

 

  • Author

One other thing I tried is updating to the unstable version of Unraid and still had issues.

  • 9 months later...

Sorry to necro, but this is exact issue I'm facing with nvme drive at the moment. Fine under small loads, then chokes when things get busy. 

 

Xfs, single cache drive. 

 

Expecting it to be hardware issue at this point. 

Latest beta has a different alignment for SSDs which may improve performance with some devices.

 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.