Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

NVME Drive Disconnecting/Errors

Featured Replies

Hi All,

 

Sorry for the long post, but I have been experiencing some very strange SSD issues since I upgraded my platform from an i9-10900 to a i7-12700. Its been driving me mad. I had a stable Z490/DDR4/i9-10900 server which I migrated to B660/DDR5/i7-12700 and then W680/DDR5/i7-12700. I know SSDs are technically not supported in the array, but I am not using parity so I think its ok for my purposes (also understanding that my data is unprotected). 

 

ARRAY: TWO Intel P4510 8TB U.2 NVME drives - NO PARITY

CACHE: TWO WD SN850X 1TB M.2 NVME drives - BTRFS RAID 1

 

ARRAY DRIVE ISSUES: 

Upon moving to the new platform (ASUS ROG STRIX B660-I) I started experiencing issues where the NVME drives would randomly drop out. CACHE drives were connected direct to the motherboard M.2 slots and ARRAY drives were connected via a Highpoint SSD7104 PCI-e 3.0 x16 RAID card with U.2 to M.2 adapters because of ITX motherboard limitation. CACHE drives were solid/stable, but the ARRAY drives would drop off randomly - only one at a time, never both. The server would also lock up and need a hard restart every few days. The motherboard was experiencing some coil whine as well so the entire situation drove me to replace the motherboard. 

 

CACHE DRIVE ISSUES:

Now I have everything in a SUPERMICRO MBD-X13SAE-F-O W680 setup. The CACHE drives were connected straight to the motherboard via the M.2 slots and the ARRAY drives are now connected via U.2 to PCI-e 3.0 x4 adapters - got rid of the highpoint pci-e card since I have additional pci-e slots now. Memory in this system is NON-ECC Team T-Force Vulcan 32GB (2 x 16GB) DDR5 FLBD532G5600HC36BDC01. Everything seemed to be running fine and stable until after a few days the CACHE drives started disappearing and I was getting nvme drive btrfs errors in my system log. Only one drive would disappear, not both. 

 

I tried setting the pci-e link speed on the m.2 drives to 4.0 manually instead of AUTO - even though the slots on the motherboard and the drives are both 4.0. Worked fine for a bit and then one drive would drop out. Did some digging on the forum and people mentioned this could be memory related issue so I set the memory to default settings. Single CACHE drive dropped and error logs came back. At this point I figured it was something with the M.2 slots on the motherboard so I reintroduced the Highpoint SSD7104 into the system and put the CACHE  drives on there. Again, everything worked fine for a bit and then one CACHE drive dropped off. I tried setting the PCI-e 5.0 slot on the motherboard to 3.0 manually to match the Highpoint SSD7104 card. Worked fine for a bit and single drive gone again. 

 

At this point I assume the Highpoint SSD7104 card might be the culprit since originally the ARRAY drives were dropping off connected to it (via the U.2 to M.2 adapters) and now the CACHE drives are dropping off connected to it. However, this doesn't explain why the CACHE drives were dropping off while connected to the M.2 slots on the SUPERMICRO MBD-X13SAE-F-O directly. 

 

As it stands, I don't know if I need to replace the Team Group DDR5 memory with some ECC memory/different memory or if the WD SN850X drives are not good as CACHE drives on this motherboard. I have had to reconfigure all my dockers and my docker image numerous times in the past few weeks. All of these drives are super expensive and have worked reliably in the past on the older DDR4 platform. I am hoping someone can take a look at my diagnostics and give me an idea of where to start on fixing this. I have done some research on ASPM to see if that could be causing issues with the NVME drives but I don't know what settings to change, if any. 

 

Thank you in advance for reading and any input at all!

 

fc-unraid-diagnostics-20230215-1022.zip

Edited by FCruz2489

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

  • Author
5 minutes ago, JorgeB said:

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

 

Thank you so much for the response! I added that to my flash config and am rebooting the server now. My docker and VM images are jacked again, so I will reconfigure them with this setting enabled and see if all drives can finally be reliable. 

 

If I wanted to add this to the GUI boot option as well, would it be the text below or do I need a comma? 

append initrd=/bzroot,/bzroot-gui nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

 

Flash.png

  • Author

Sadly, one of the drives disappeared with this setting enabled while i was running the cach pool BTRFS scrub/repair. But they are still connected via the Highpoint card so maybe I need to put them back into the M.2 slots with this new setting. 

image.png.97cde9f2ac9375b62a4dbd0e039d3374.png

1 hour ago, FCruz2489 said:

would it be the text below or do I need a comma? 

just add the text after a space.

  • Author
On 2/15/2023 at 11:45 AM, JorgeB said:

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

 

Hi! Just wanted to provide an update - since I added the line you suggested to the config AND changing all my PCI-e and NVME slots to disable aspm in the bios, the drives have been solid for a week. I'll continue to monitor it, but this is very promising and I hope it did the trick. Thank you again! 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.