Jokel Posted April 19, 2023 Share Posted April 19, 2023 Hello there, this is my first time activly using this (or any other) forum. First of all I want to thank everybody here who works on the problems of others. I've found a lot of the discussions here very helpful in the past. For this problem all the solutions offered haven't worked which is why I hope anyone has another hint. I've been running 2 nvmes as a cache for the last 4 months and they have been wonderful so far - until yesterday when I was working on my VM and suddenly it froze. In UNRAID all VMs are gone and docker as well. Reboot and my two NVMes are missing (diag_1.zip). They show up in BIOS but neither lspci nor lsblk show anything. If I understand the log correctly they have been ditched because of a modprobe error during startup. There is some data on it that I don't have a backup of (I know...), but it is mainly data that would cause a lot of work (appdata etc.). So ideally I would like to restore the data on it. If it won't work the world will keep on spinning Disclaimer: I use BTRFS Raid 5. I know it is unstable, but my research suggested it is suprisingly stable. However I don't think it has anything to do with the Write Hole bug since there was no power loss or anything. Setup that might be relevant: MSI X99S SLI PLUS (Modified BIOS to allow PCIE bifurcation) 2x WD Blue SN570 NVMe (Total Bytes written so far <6TB) on a PCIE x16 -> 4x M2 What I already tried 1. Check the syslogs and followed the solution for other people I added nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off in all variations and orders to the boot options but nothing helped. Logs didn't change. 2. Revert UNRAID version As some solutions have been taylored to 6.10.3 I downgraded and somehow broke my install so I had to reinstall. All the effort was for nothing. While the logs changed my nvmes are still not recognized (diag_2.zip) 3. OS change I tried Ubuntu and also Windows. Both didn't show the drive. However the Windows one might not be reliable (see below). 4. Hardware change As my BIOS is modded and a rather old platform I put the NVMes into my Desktop PC (AMD 3900x) and ran UNRAID. No success with either UNRAID versions (diag_3_xxx.zip). At this point I was sure that my drives are gone but then I tried my desktop Windows install and they both show up in Windows. I installed WinBTRFS but they are not mounted. I don't have a lot of experience with WinBTRFS so I quit digging into that. The way I see it the drives are not gone but for some reason they are in a weird state where any Linux kernel doesn't seem to like them. Does anyone of you have any other idea what I could try? Thank you and best regards Jakob diag_3_6.10.3.zip diag_3_6.11.5.zip diag_2.zip diag_1.zip Quote Link to comment
smdion Posted April 19, 2023 Share Posted April 19, 2023 Apr 18 17:47:15 Aither kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10 Apr 18 17:47:15 Aither kernel: nvme nvme1: Does your device have a faulty power saving mode enabled? Apr 18 17:47:15 Aither kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Those look concerning to me in the logs. Others had a similar issue here: Quote Link to comment
Jokel Posted April 19, 2023 Author Share Posted April 19, 2023 Hi smdion, thanks for the quick reply. I stumbled across these as well and added nvme_core.default_ps_max_latency_us=0 pcie_aspm=off in various combinations with no luck (see 1. in my post). Quote Link to comment
JorgeB Posted April 20, 2023 Share Posted April 20, 2023 10 hours ago, Jokel said: and added nvme_core.default_ps_max_latency_us=0 pcie_aspm=off in various combinations with no luck (see 1. in my post). If this doesn't help best bet if possible is to use different model devices (or different board) Quote Link to comment
Jokel Posted April 20, 2023 Author Share Posted April 20, 2023 As described above I already changed the mainboard and no luck with unraid as well. However they are detected in windows. I also put in my original mainboard another nvme which worked. As my faulty ssds have worked in the past I think they are just locked up in some weird state. Thats why I'm a little bit lost. Quote Link to comment
JorgeB Posted April 20, 2023 Share Posted April 20, 2023 I though they were being detected and dropping offline after some time, this is usually for a different error but worth a try: https://forums.unraid.net/topic/132930-drives-and-usb-devices-visible-in-bios-not-available-once-booted-asus-wrx80-sage-5965wx/?do=findComment&comment=1208035 Quote Link to comment
Jokel Posted April 20, 2023 Author Share Posted April 20, 2023 Thank you for helping me out. Unfortunately it didn't work. I added diagnostics. I also tried it with append pci=realloc=off nvme_core.default_ps_max_latency_us=0 pcie_aspm=off initrd=/bzroot but still no luck diag_pci_realloc.zip Quote Link to comment
JorgeB Posted April 20, 2023 Share Posted April 20, 2023 This happening out of the blue and the devices having the same problem with a different board suggests to me that the devices might have failed, they are detected during boot by both boards, including btrfs detecting the filesystem, but after 30 seconds they drop offline, it would be weird both failing at the same time, but stranger things have been known to happen. Quote Link to comment
Jokel Posted April 20, 2023 Author Share Posted April 20, 2023 I'm afraid that might be the case. The only thing that left me hoping is that they are only about 4 months old and they are being detected in windows. Do you have any idea why the windows kernel seems to able to work? Windows disk manager reports that they are both working normally and the size is displayed correctly as well. It just doesn't know BTRFS. But I don't think windows would be able to detect that. Do you see a chance where I could get this to run using Windows? Quote Link to comment
JorgeB Posted April 20, 2023 Share Posted April 20, 2023 24 minutes ago, Jokel said: The only thing that left me hoping is that they are only about 4 months old and they are being detected in windows. They are also being detect in Linux, for 30 seconds, until the fs is detected, probably will also fail with Windows if you try to uses them. 25 minutes ago, Jokel said: Do you see a chance where I could get this to run using Windows? Don't know, but most likely it won't work anyway, you know it's not Unraid since it was working before. Quote Link to comment
Jokel Posted April 21, 2023 Author Share Posted April 21, 2023 Alright then. I'll keep investigating and if I find anything I will post it here. In the meantime I will buy new NVMes. Thank you anyway for digging into it 1 Quote Link to comment
thisfantastic Posted March 25 Share Posted March 25 hello - did you find a resolution to this? I am having the same issue. All different kernel's (OEM and HWE) and replaced the drive. I guess its good to know someone else is having this! I am on an HP z2 g9, 6.5 bios with AMD GPU. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.