NVMe Drive Disappears/Drops Offline


Recommended Posts

Hi,

 

I am an absolute newbie to UNRAID, and I am just trying to set up my server.

 

I've only just got the system booted into the UNRAID OS, but I am facing this issue where my Samsung 970 Evo Plus 1TB NVMe SSD randomly disappears and only reappears when I shut down and turn on the server again.

 

My Samsung 850 Evo 1TB SATA SSD and WD 2TB HDD (old reused devices) appear under unassigned devices just fine, but for some reason, after starting the server, the Samsung 970 Evo Plus NVMe SSD appears for a short while then disappears. It is a brand new drive so I'm not sure what the issue is.

 

The 970 Evo is in an MSI Z590i UNIFY motherboard which supports 2 NVMe drives, though I am only using one.

 

I have attached my Diagnostics ZIP here.

 

I appreciate any help and thank you in advance!

diagnostics-20220215-2353.zip

Link to comment

Logs is spammed with Bluetooth related errors, look for a BIOS update for the board, the below also helps sometimes.

 

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0


Reboot and see if it makes a difference.

Link to comment
9 hours ago, JorgeB said:

Logs is spammed with Bluetooth related errors, look for a BIOS update for the board, the below also helps sometimes.

 

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0


Reboot and see if it makes a difference.

 

 

Hi JorgeB, thank you for your reply!

I have already updated the BIOS to its latest version prior to booting into UNRAID. Not sure what the Bluetooth errors are but could they be related to the wireless keyboard and mouse USB dongles I have attached?

I tried the fix that you suggested, but I can't seem to even "see" the NVMe device in UNRAID although it appears in the BIOS.

Also, under my "Syslinux Configuration", there is additional content after "append initrd=/bzroot", please see below:

 

append initrd=/bzroot,/bzroot-gui unraidsafemode

 

UNRAID is started in normal mode, not safe mode, so I'm not sure why that suffix is shown.

 

I have appended the "nvme_core.default_ps_max_latency_us=0" after that as in:

 

append initrd=/bzroot,/bzroot-gui unraidsafemode nvme_core.default_ps_max_latency_us=0

 

This does not seem to fix the issue. The NVMe drive does not appear under unassigned devices even after full shutdown and startup. 

 

I also tried removing the ",/bzroot-gui unraidsafemode" to make it as you suggested:
 

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0

 

However, the NVMe drive still does not appear under unassigned devices.

 

What should I do in this case?


Attached are the new diagnostic files.

 

Thank you for your help!

Edited by stealth007
wrongly used "quote" function rather than "code"
Link to comment
Feb 15 18:04:11 TheSentinel kernel: nvme nvme0: pci function 0000:04:00.0
Feb 15 18:04:11 TheSentinel kernel: nvme 0000:04:00.0: can't change power state from D3hot to D0 (config space inaccessible)
Feb 15 18:04:11 TheSentinel kernel: nvme nvme0: Removing after probe failure status: -19

 

Device is failing to initialize, don't think there's much you can do other than trying a different NVMe device (or a different board), you can also try v6.10-rc2 but doubt that it would help.

Link to comment
3 hours ago, JorgeB said:
Feb 15 18:04:11 TheSentinel kernel: nvme nvme0: pci function 0000:04:00.0
Feb 15 18:04:11 TheSentinel kernel: nvme 0000:04:00.0: can't change power state from D3hot to D0 (config space inaccessible)
Feb 15 18:04:11 TheSentinel kernel: nvme nvme0: Removing after probe failure status: -19

 

Device is failing to initialize, don't think there's much you can do other than trying a different NVMe device (or a different board), you can also try v6.10-rc2 but doubt that it would help.

 

Thank You for your help and advice all this while! I've turned off the bt and wifi from the bios itself.

 

And as for the nvme troubles, I actually managed to "fix" it in the dumbest way possible I think?

 

Since my motherboard has 2 slots, I simply put it in the other slot and somehow it seems to be detected and works fine for now.

 

I'll just have to keep in mind that if I get a second nvme drive for the original slot, it probably should not be a 970 Evo Plus so as to avoid this issue again.

 

As to the potential explanation why it works in one slot and not the other, according to my motherboard specs, the slot I had it in initially was controlled by the mobo chipset whereas the slot I have it in now is controlled by the CPU, so perhaps there's some shenanigans going on there that I'm not familiar with.

 

I hope this may help anyone else stumbles across a similar issue.

Link to comment
On 2/16/2022 at 11:16 PM, JorgeB said:
Feb 15 18:04:11 TheSentinel kernel: nvme nvme0: pci function 0000:04:00.0
Feb 15 18:04:11 TheSentinel kernel: nvme 0000:04:00.0: can't change power state from D3hot to D0 (config space inaccessible)
Feb 15 18:04:11 TheSentinel kernel: nvme nvme0: Removing after probe failure status: -19

 

Device is failing to initialize, don't think there's much you can do other than trying a different NVMe device (or a different board), you can also try v6.10-rc2 but doubt that it would help.

 

Hi JorgeB,

 

I bought a Sabrent Rocket NVMe drive instead, and am facing the same issue as before where the drive appears online for a while then drops offline.

 

I still have the code you provided in syslinux:

 

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0

 

Any idea what could be causing this issue? Is it really an issue with the motherboard? The drive appears fine in the BIOS.

 

Thank you for your help.

diagnostics-20220218-1330.zip

Link to comment
On 2/18/2022 at 5:20 PM, JorgeB said:

This is usually board or NVMe device related, or both together.

 

Hi JorgeB, sorry for the late reply.

 

I was doing some testing in Windows instead.

 

Right now I have a Samsung 970 Evo Plus Gen 3 drive installed in the M2_1 slot and a Sabrent Rocket Gen 3 drive installed in the M2_2 slot.

 

The M2_2 slot is the one that keeps dropping offline in unraid. Even when I had the Samsung drive installed in that slot, it would drop offline and now the Sabrent drive does the same thing.

 

I loaded Windows on the Samsung drive in the M2_1 slot and added the sabrent drive in the M2_2 slot as a secondary (D:) drive. I ran the pc with windows running for about an hour and every so often I would try to access the Sabrent "D:" drive and copy files onto and off of it from/to the Samsung "C:" drive.

 

In the scenario above, the drive never dropped offline and always remained online and accessible throughout.

 

The issue where the drive is first detected and then suddenly drops offline within 5 mins of booting only occurs in UNRAID. Could this then be an issue unique to UNRAID?

 

Thank You for your help.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.