unRAID not detecting both M.2 drives


Recommended Posts

I just purchased 2 M.2 drives hoping to use them as a cache pool. The UEFI menu will show both drives, but the unRAID web GUI will only list one of the drives. Below is a little about my setup.

 

Motherboard: ASRock X370 Taichi

CPU: Ryzen 5 2600

M.2 drives: 2x XPG SX6000 Lite M.2 2280 512GB PCI-Express 3.0 x4 3D NAND

HBAs: 2x LSI 9207-8i

GPU: ATI FireMV 2250

 

I tested each drive separately and each will be detected in the web GUI in either M.2 slot when installed one at a time. However, unRAID isn't showing both of them when they're both installed, although the UEFI sees both of them.

 

After swapping drives and slots, I removed the GPU thinking it may be some odd PCIe lane allocation issue, but the issue persisted. I also updated from 6.7.0 to 6.7.2. Attached are the diagnostics.

 

I looked through syslog.txt and it looks to be an issue with both drives using the same NVMe Qualified Name (NQN). I'm researching this path at the moment, but hope someone with more experience can weigh in with a possible solution. Any ideas?

 

 

elysium-diagnostics-20190713-0247.zip

Edited by Alabaster
Link to comment

2 NVMe detect, problem may fix by Kernel or NVMe firmware update.

 

01:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
    Subsystem: Realtek Semiconductor Co., Ltd. Device [10ec:5762]
    Kernel driver in use: nvme
    Kernel modules: nvme

 

21:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
    Subsystem: Realtek Semiconductor Co., Ltd. Device [10ec:5762]
    Kernel modules: nvme

 

Jul 12 21:26:04 Elysium kernel: nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).
Jul 12 21:26:04 Elysium kernel: nvme nvme1: Removing after probe failure status: -22

 

https://forums.lenovo.com/t5/ThinkPad-X-Series-Laptops/X1-Extreme-Intel-NVMe-Firmware-Upgrade-NQN-Duplicate-Issue/m-p/4415819#M99048

 

commit b9453f9bb66e864f8b7d7e112aea475bdd7a4e2b
Author: James Dingwall <james@dingwall.me.uk>
Date:   Tue Jan 8 10:20:51 2019 -0700

    nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN
    
    [ Upstream commit 6299358d198a0635da2dd3c4b3ec37789e811e44 ]
    
    If a device provides an NQN it is expected to be globally unique.
    Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did
    not satisfy this requirement.  In these circumstances if a system has >1
    affected device then only one device is enabled.  If this quirk is enabled
    then the device supplied subnqn is ignored and we fallback to generating
    one as if the field was empty.  In this case we also suppress the version
    check so we don't print a warning when the quirk is enabled.
    
    Reviewed-by: Keith Busch <keith.busch@intel.com>
    Signed-off-by: James Dingwall <james@dingwall.me.uk>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

 

Edited by Benson
  • Like 1
Link to comment

I am also having this same issue with 2 Adata XPG GAMMIX S5 256GB drives.  Both are seen in the BIOS but I see the following during system start up.

 

Jul 14 21:56:11 TheWatchtower kernel: nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).

Jul 14 21:56:11 TheWatchtower kernel: nvme nvme1: Removing after probe failure status: -22

 

Is there a config tweak that can be made or will this have to be added by the main dev team?

 

Alabaster where you able to get the issue sorted out?

 

Cheers,

 

Chris

 

Link to comment
On 7/15/2019 at 12:09 AM, drkCrix said:

I am also having this same issue with 2 Adata XPG GAMMIX S5 256GB drives.  Both are seen in the BIOS but I see the following during system start up.

 

Jul 14 21:56:11 TheWatchtower kernel: nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).

Jul 14 21:56:11 TheWatchtower kernel: nvme nvme1: Removing after probe failure status: -22

 

Is there a config tweak that can be made or will this have to be added by the main dev team?

 

Alabaster where you able to get the issue sorted out?

 

Cheers,

 

Chris

 

Unrelated to your issue?

 

did you buy you're ADATA from MASS DROP?

Link to comment

That was me. I ran into this issue with my new Thinkpad X1 Extreme laptop. Feel free to adapt the patch if you have different variants of this SSD. `lspci -nn` will give you the PCI vendor and device IDs. If it's Realtek, it should be 0x10ec for the vendor ID and the device ID will be different for different models of the SSD.

Also, we really should get ADATA/Realtek to patch their lame firmware :( I'm sure the Linux kernel guys aren't happy with an ever-growing list of quirks.

Also, here's a resource for building custom kernels for unRAID: 
https://wiki.unraid.net/Building_a_custom_kernel

(Note: I'm not an unRAID user, just circling back here as I hate the phenomenon of finding some post about some problem, but no solutions.)

9 hours ago, drkCrix said:

@Alabaster

 

I found this today

 

https://lkml.org/lkml/2019/7/15/57

 

Looks like it is for drives with the realtek controller like we have.

 

Edited by mishan
Link to comment

@drkCrix

No, I haven't sorted this out.

 

According to the following link, it appears there will be a fix in the Linux 5.3 kernel. I'm not Linux savvy, so I'm not sure how that will play out for unRAID.

https://forum.proxmox.com/threads/only-one-of-two-nvme-detected-in-linux-duplicate-subnqn.54480/

 

I thought the NQN was something configurable by the manufacturer. So, I would blame on ADATA and not Realtek. I could be completely wrong about that though.

 

I contacted ADATA "customer service" (in quotes since it is a joke). Their response email looks completely automated as it starts with "Dear Customer" and doesn't even have the name I entered when filling out the online form to contact them. The email starts out acknowledging I have an issue and then goes right into stating they are here to assist with the return of the product. The email did provide a few generic troubleshooting steps, but had absolutely no mention of the issue I explicitly detailed for them (again, since this is an automated, impersonal email.) I'll try replying to the email to see if that actually gets anywhere.

 

I think I'll probably just return the drives and spent a bit more for another brand.

Edited by Alabaster
  • Like 1
Link to comment

Ok, I have cancelled the MP510 order.

 

@Benson are there nvme drives that just work?  

 

Samsung ?

HP EX920 ?

 

Noticed that there are a few firmware updates to the Phison E12 family of nvme drives (currently on 12.3) I wonder if any of the updates fixed the trim issues

 

Thanks

Edited by drkCrix
Link to comment
  • 2 weeks later...

UPDATE:

I returned one of the ADATA drives and purchased a HP EX920. Both drives will now show up now.

 

I was doing testing on my UPS setup to make sure the server will come back on after a power outage. I have a CyberPower CP1500AVRLCD UPS and am using the NUT plugin for communication. I noticed the UPS was turning off before the server would shut down. This was causing a hard shutdown and the ADATA drive would not show up after powering the server back on, not even in BIOS/UEFI. It also wouldn't show up on subsequent reboots either. I would have to swap the drives between the M.2 slots and boot it back up for both drives to show up again (not convenient!!). I was able to tweak the settings and config files in the NUT plugin to get my desired shutdown sequence in order to avoid this problem.

 

In case anyone was wondering, I effectively ran the Autodetect function in NUT, switched the 'Enable Manual Config Only' option to Yes, then added the two highlighted lines in the screenshot to the ups.conf file. Adding the 'offdelay' instructs my UPS to delay its shutdown for roughly X seconds. This allowed more time for the server to shutdown. The default is 20secs which is simply not enough. I tried adding the same to the ups.conf file when the 'Enable Manual Config Only' option was set to No, but the value would default when the service was started, thus I had to set it to Yes. Below are the settings I'm using for visual reference(I didn't have the service started in the screenshot as I was still testing, I also only had the shutdown time set at 1min for testing purposes as well). Of course, your mileage may vary. Hope that helps someone! 

image.thumb.png.042667086da8b09e926cdbad79585f03.png

 

 

 

Link to comment
  • 1 month later...
8 minutes ago, johnnie.black said:

Then you have a different problem, they are known to work with Unraid.

The ones previously seen perhaps.

 

I doubt that it is an entirely different issue regarding same model NVMEs.  Giving such an absolute response when not even knowing which models are effected is "specious" at best.

 

No?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.