Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Strange NVME Cache issue

Featured Replies

Hi All, 

 

I am new to unraid, swapped from a QNAP NAS as it failed and they wanted £1040 to try and fix the issue or buy :( 

 

I have the following:

  • Asus PRIME B660-PLUS D4
  • i5-2400
  • 48GB DDR4 - 2x 8GB 3200 DDR4, 2x 16GB 3200 DDR4
  • 1x LSI 9211-8i IT Mode HBA SAS SATA
  • 8x Seagate IronWolf 3TB drives (ST3000VN007) - connected to the LSI card
  • 2x 1TB WDC WDS100T2B0C connected to a two port QNAP nvme to PCIe adapter (moved from the QNAP NAS)
  • 1x  1TB WDC WDS100T2B0C in a motherboard slot
  • 1x INTEL SSDPEKNW512G8

 

Main Array (8x ST3000VN007, connected to the LSI Card)

  1. Parity (ST3000VN007)
  2. Parity2 (ST3000VN007)
  3. Disk  1 .. 6 ST3000VN007 (xfs)

Cache_ssd (xfs) - used for iso share

  1. INTEL_SSDPEKNW512G8_PHNH9420007Q512A - 512 GB (nvme1n1)

Cache_nvme (xfs) - used for docker, vm and system folders

  1. WDC_WDS100T2B0C-00PXH0_2131CQ471804 - 1 TB (nvme0n1)

Cache_protected (btrfs mirror) - used to cache data to the main array - Connected via the QNAP nvme to PCIe card

  1. WDC_WDS100T2B0C-00PXH0_20427P467007 - 1 TB (nvme2n1) 

  2. WDC_WDS100T2B0C-00PXH0_2052FR446602 - 1 TB (nvme3n1)    

 

I have been testing unraid before I move any data across, so have copied 4 movie files (~50-80GB in size) and I am having some major issues with the Cache_protected pool.

 

  1. copying from Windows PC -> Cache_protected over 1GB ethernet works fine (~280GB, 4x files).
  2. Running Mover - copying from the Array -> Cache_protected works ever time I have tested (4-5 times now)
  3. Running Mover - copying from Cache_protected -> Array after between 150-200GB nvme3n1 (second disk in the Cache_protected) disconnects from unraid and the log has the following error message

 

Mar 27 13:45:39 Obsidian move: file: /mnt/cache_protected/movies/test.mkv
Mar 27 13:53:06 Obsidian kernel: nvme nvme3: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Mar 27 13:53:06 Obsidian kernel: nvme nvme3: Does your device have a faulty power saving mode enabled?
Mar 27 13:53:06 Obsidian kernel: nvme nvme3: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111992656, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111992656 op 0x0:(READ) flags 0x84700 phys_seg 5 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111993680, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111993680 op 0x0:(READ) flags 0x84700 phys_seg 5 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111994704, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111994704 op 0x0:(READ) flags 0x84700 phys_seg 5 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111995728, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111995728 op 0x0:(READ) flags 0x84700 phys_seg 5 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111996752, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111996752 op 0x0:(READ) flags 0x84700 phys_seg 5 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111997776, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111997776 op 0x0:(READ) flags 0x84700 phys_seg 5 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111998800, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111998800 op 0x0:(READ) flags 0x84700 phys_seg 24 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 111999824, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 111999824 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 112000848, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 112000848 op 0x0:(READ) flags 0x84700 phys_seg 32 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme3n1: I/O Cmd(0x2) @ LBA 112001872, 1024 blocks, I/O Error (sct 0x3 / sc 0x71) 
Mar 27 13:53:08 Obsidian kernel: I/O error, dev nvme3n1, sector 112001872 op 0x0:(READ) flags 0x84700 phys_seg 20 prio class 2
Mar 27 13:53:08 Obsidian kernel: nvme 0000:09:00.0: Unable to change power state from D3cold to D0, device inaccessible
Mar 27 13:53:08 Obsidian kernel: nvme nvme3: Removing after probe failure status: -19
Mar 27 13:53:08 Obsidian kernel: nvme3n1: detected capacity change from 1953525168 to 0
Mar 27 13:53:08 Obsidian kernel: BTRFS error (device nvme2n1p1): bdev /dev/nvme3n1p1 errs: wr 21837240, rd 7, flush 5, corrupt 1, gen 0
Mar 27 13:53:08 Obsidian kernel: BTRFS error (device nvme2n1p1): bdev /dev/nvme3n1p1 errs: wr 21837240, rd 7, flush 5, corrupt 1, gen 0

 

It is strange that the nvme3n1 only disconnects / has issues when moving from Cache_protected to the Array??? any other operation works fine.

 

I have removed the Parity2 from the array to speed up testing and it seems to copy more data before nvme3n1 disconnects, any ideas or suggestions

 

I am going to try swapping nvme3n1 and nvme2n1 and see if the error follows the port or the drive.

 

NOTE: I have tested the RAM using memtest overnight and all passes.

NOTE: The LSI card and all eight drives have been tested in Windows using HD sentinel - Extended SMART test then 2x write - read surface scans - all working fine without error

NOTE: I have no docker containers or VM's running.

Solved by Fluxonium

  • Community Expert
1 hour ago, Fluxonium said:
Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off"

Try this first, if it doesn't help try a different m.2 slot or a different NVMe device (or board).

  • Author

"nvme_core.default_ps_max_latency_us=0 pcie_aspm=off"

 

I will give it ago, seems unlikely to fix this issue as the other WDS100T2B0C has no issue.

 

I have also tested the drives and controller in Windows using using HD sentinel - Extended SMART test then 2x write - read surface scans - all with no issues, also copied enough data to fill the drive to 90% in windows and no issues or warnings, using the same hardware just a different boot drive, so looks like an issue with Linux / unraid for some reason.

 

i am also getting the following in the log from time to time:

 

Mar 27 16:03:29 Obsidian kernel: pcieport 0000:00:1c.4: AER: Multiple Corrected error received: 0000:08:00.0
Mar 27 16:03:29 Obsidian kernel: nvme 0000:08:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 27 16:03:29 Obsidian kernel: nvme 0000:08:00.0:   device [15b7:5009] error status/mask=00000001/0000e000
Mar 27 16:03:29 Obsidian kernel: nvme 0000:08:00.0:    [ 0] RxErr                 
Mar 27 16:24:19 Obsidian kernel: pcieport 0000:00:1c.4: AER: Multiple Corrected error received: 0000:09:00.0
Mar 27 16:24:19 Obsidian kernel: nvme 0000:09:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 27 16:24:19 Obsidian kernel: nvme 0000:09:00.0:   device [15b7:5009] error status/mask=00000001/0000e000
Mar 27 16:24:19 Obsidian kernel: nvme 0000:09:00.0:    [ 0] RxErr                 

 

device 15b7:5009 is the QNAP nvme to PCIe adapter both nvme3n1 and nvme2n1 are connected to:

 

IOMMU group 22:	[15b7:5009] 08:00.0 Non-Volatile memory controller: Sandisk Corp SanDisk Ultra 3D / WD Blue SN550 NVMe SSD (rev 01)
	[N:2:1:1]    disk    WDC WDS100T2B0C-00PXH0__1                  /dev/nvme2n1  1.00TB
IOMMU group 23:	[15b7:5009] 09:00.0 Non-Volatile memory controller: Sandisk Corp SanDisk Ultra 3D / WD Blue SN550 NVMe SSD (rev 01)
	[N:3:1:1]    disk    WDC WDS100T2B0C-00PXH0__1                  /dev/nvme3n1  1.00TB

 

BIOS has been updated to the latest version and all drive firmware has been checked in Windows for the latest version.

 

Thanks for your help.

Edited by Fluxonium

  • Community Expert
1 hour ago, Fluxonium said:

i am also getting the following in the log from time to time:

 

Might also be related to the power management issue

  • Author

Unfortunately, "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" did not work.

 

Strange thing is I cannot break the drive in Windows both work perfectly (on the same hardware other than the boot disk), going to try the following but cannot see it working unless Windows is not doing something unraid is (better / more stable drivers).

 

i have even tried sleeping / hibernate from Windows and both drives still work.

 

  1. Remove adapter and swap drive positions (clean drive and adapter contacts)
  2. Remove both drives from the adapter and insert into motherboard and re-test

 

Edited by Fluxonium

  • Community Expert

If you swap them does the problem follow the device or the slot?

  • Author
  • Solution

OK, after some more testing it looks like the issue is the QNAP QM2-2P-374 as the nvme drives work in the motherboard slots without error (so far :)).

 

I am guessing this is a Linux / unraid driver issue as the card works fine in Windows.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.