Jump to content

nvme disks suddenly won't be detected but show up in BIOS


Recommended Posts

Hello there,

this is my first time activly using this (or any other) forum. First of all I want to thank everybody here who works on the problems of others. I've found a lot of the discussions here very helpful in the past. For this problem all the solutions offered haven't worked which is why I hope anyone has another hint.

 

I've been running 2 nvmes as a cache for the last 4 months and they have been wonderful so far - until yesterday when I was working on my VM and suddenly it froze. In UNRAID all VMs are gone and docker as well. Reboot and my two NVMes are missing (diag_1.zip). They show up in BIOS but neither lspci nor lsblk show anything. If I understand the log correctly they have been ditched because of a modprobe error during startup.

 

There is some data on it that I don't have a backup of (I know...), but it is mainly data that would cause a lot of work (appdata etc.). So ideally I would like to restore the data on it. If it won't work the world will keep on spinning :)

 

Disclaimer: I use BTRFS Raid 5. I know it is unstable, but my research suggested it is suprisingly stable. However I don't think it has anything to do with the Write Hole bug since there was no power loss or anything.

 

Setup that might be relevant:

MSI X99S SLI PLUS (Modified BIOS to allow PCIE bifurcation)

2x WD Blue SN570 NVMe (Total Bytes written so far <6TB) on a PCIE x16 -> 4x M2

 

 

What I already tried

1. Check the syslogs and followed the solution for other people

I added nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off in all variations and orders to the boot options but nothing helped. Logs didn't change.

 

2. Revert UNRAID version

As some solutions have been taylored to 6.10.3 I downgraded and somehow broke my install so I had to reinstall. All the effort was for nothing. While the logs changed my nvmes are still not recognized (diag_2.zip)

 

3. OS change

I tried Ubuntu and also Windows. Both didn't show the drive. However the Windows one might not be reliable (see below).

 

4. Hardware change

As my BIOS is modded and a rather old platform I put the NVMes into my Desktop PC (AMD 3900x) and ran UNRAID. No success with either UNRAID versions (diag_3_xxx.zip). At this point I was sure that my drives are gone but then I tried my desktop Windows install and they both show up in Windows. I installed WinBTRFS but they are not mounted. I don't have a lot of experience with WinBTRFS so I quit digging into that.

 

The way I see it the drives are not gone but for some reason they are in a weird state where any Linux kernel doesn't seem to like them. Does anyone of you have any other idea what I could try?

 

Thank you and best regards

Jakob

diag_3_6.10.3.zip diag_3_6.11.5.zip diag_2.zip diag_1.zip

Link to comment

Apr 18 17:47:15 Aither kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Apr 18 17:47:15 Aither kernel: nvme nvme1: Does your device have a faulty power saving mode enabled?
Apr 18 17:47:15 Aither kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug

Those look concerning to me in the logs.  Others had a similar issue here:
 

 

Link to comment

As described above I already changed the mainboard and no luck with unraid as well. However they are detected in windows.
I also put in my original mainboard another nvme which worked.

As my faulty ssds have worked in the past I think they are just locked up in some weird state.

 

Thats why I'm a little bit lost.

Link to comment

This happening out of the blue and the devices having the same problem with a different board suggests to me that the devices might have failed, they are detected during boot by both boards, including btrfs detecting the filesystem, but after 30 seconds they drop offline, it would be weird both failing at the same time, but stranger things have been known to happen.

Link to comment

I'm afraid that might be the case. The only thing that left me hoping is that they are only about 4 months old and they are being detected in windows. Do you have any idea why the windows kernel seems to able to work? Windows disk manager reports that they are both working normally and the size is displayed correctly as well. It just doesn't know BTRFS. But I don't think windows would be able to detect that. Do you see a chance where I could get this to run using Windows?

Link to comment
24 minutes ago, Jokel said:

The only thing that left me hoping is that they are only about 4 months old and they are being detected in windows.

They are also being detect in Linux, for 30 seconds, until the fs is detected, probably will also fail with Windows if you try to uses them.

 

25 minutes ago, Jokel said:

Do you see a chance where I could get this to run using Windows?

Don't know, but most likely it won't work anyway, you know it's not Unraid since it was working before.

Link to comment
  • 11 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...