Unraid thunderbolt egpu shows without the GPU


Recommended Posts

Hello

 

I have a Lenova Legion BoostStation,

a thunderbolt 3 100W 2m Belkin  USB-C cable,

a AsRock W480 Creator - latest BIOS (tried with thunderbolt security access to none OR user authorised)

a Nvidia 1660GTX

 

I tried on Unraid 6.9, and now again on 6.10.rc2

I have tried Downstream, Multi-function and Both

 

I turn on the BoostStation, the Booststation fan spins up (the GPU is attached to a monitor in standby  by HMDI)

I turn on the power to the server and boot unraid.

The GPU spins fans spin up.

 

Unraid reports:

IOMMU group 29:[8086:15ea] 0c:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 30:[8086:15ea] 0d:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 31:[8086:15ea] 0d:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 32:[8086:15ea] 0d:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 33:[8086:15ea] 0d:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 34:[8086:15eb] 0e:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)

IOMMU group 35:[8086:15ec] 42:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)

                          Bus 005 Device 001 Port 5-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

                          Bus 006 Device 001 Port 6-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

 

 

1 in 30 times I've somehow made the Nvidia 1660 show, but I don't know what I've done to get it up - when it does happen I can never get the same result repeating or maintaining what I've done ..

 

I am at a complete loss, am I correct in saying it is detecting the Legion Lenova Enclosure but not the GPU.

 

Can anyone help?

 

Thank you

Chris

 

Edited by SmokeyColes
Link to comment

Round 2

 

I have bought a Razor Core X enclosure and it has a stock cable.

When I put the GTX 1660S it still does not work.

This is making me think the GPU could be defective or it is incompatible with UNRAID.

 

That said in 1 of the 15 boots - it did show up (again whatever I did - I did the same and could not repeat). 

 

IOMMU group 29:[1106:3432] 0b:00.0 USB controller: VIA Technologies, Inc. VL800/801 xHCI USB 3.0 Controller (rev 03)

Bus 003 Device 001 Port 3-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 003 Device 002 Port 3-1 ID 2109:0811 VIA Labs, Inc. Hub

Bus 004 Device 001 Port 4-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

IOMMU group 30:[8086:15ea] 0c:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 31:[8086:15ea] 0d:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 32:[8086:15ea] 0d:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 33:[8086:15ea] 0d:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 34:[8086:15ea] 0d:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 35:[8086:15eb] 0e:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)

IOMMU group 36:[8086:15ec] 42:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)

Bus 005 Device 001 Port 5-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 006 Device 001 Port 6-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

IOMMU group 37:[8086:15da] 43:00.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)

IOMMU group 38:[8086:15da] 44:01.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)

IOMMU group 39:[10de:2184] 45:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)

IOMMU group 40:[10de:1aeb] 45:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)

IOMMU group 41:[10de:1aec] 45:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)

Bus 007 Device 001 Port 7-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 008 Device 001 Port 8-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

IOMMU group 42:[10de:1aed] 45:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)

IOMMU group 43:[1bb1:5012] 77:00.0 Non-Volatile memory controller: Seagate Technology PLC FireCuda 510 SSD (rev 01)

[N:1:1:1] disk Seagate FireCuda 510 SSD ZP1000GM30031__1 /dev/nvme1n1 1.00TB

IOMMU group 44:[1d6a:07b1] 78:00.0 Ethernet controller: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02)

 

It also seems more likely to show up in the IOMMU work when you power down, unplug egpu enclosure, swap from a different gpu to the 1660, power back up.  This makes me think UNRAID is the issue (it makes me think it is compatible but only when there is a hardware change or I have done something significant in the bios).

 

I have effectively eliminated the cable, and egpu enclosure.

 

Currently the razor core is working with a:

IOMMU group 39:[10de:0de1] 45:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 430] (rev a1)

On every reboot - the GeForce GT 430 shows up.

I make no changes to the system, just power down, swap the 1660 for the gt 430.  430 works,  power off then back on - 430 still shows up.

Wish the 1660 would behave the same!

Edited by SmokeyColes
Link to comment

I am at a complete loss.

 

I couldn't replicate the 1660 showing.. 

 

These are the steps I took:

 

       1660 is in and doesn't show up

        using 2nd eGPU enclosure - Razor CoreX and 0.5m thunderbolt cable instead of 2m

        BIOS security is set to No Security in thunderbolt bios setting

        IOMMU is both in unraid

 

  1. with lots of testing done before now, initial conditions above
  2. Power down UNRAID, switch off, replace with GT430, boot unraid - GT430 shows
  3. Power down UNRAID, power up - GT430 shows
  4. Power down UNRAID, power up - GT430 shows
  5. Power down UNRAID, replace with 1660 GTX, boot unraid - 1660 shows
  6. Power down UNRAID, power up - nothing shows
  7. Power down UNRAID, change BIOS setting from No Security to User Authorisation - nothing shows
  8. Power down UNRAID, replace with GT430, boot unraid - nothing shows
  9. Power down UNRAID, change BIOS setting from User Authorisation to No Security - nothing shows
  10. Power down UNRAID, power up - GT430 shows
  11. Power down UNRAID, power up - GT430 shows
  12. Power down UNRAID, power up - GT430 shows
  13. Power down UNRAID, replace with 1660 GTX, boot unraid - nothing shows
  14. Power down UNRAID, power up - nothing shows
  15. Power down UNRAID, power up - nothing shows
  16. Disabled thunderbolt support in BIOS and IOMMUs still show (see below).

I have diagnostic ZIPs for steps 5, 6, 7, 8, 10, 11, 12, 13, 15 and 16

 

 

Going to try swapping the 1660 and 3060, in my PC

 

Step 16

IOMMU group 30:[8086:15ea] 0c:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 31:[8086:15ea] 0d:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 32:[8086:15ea] 0d:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 33:[8086:15ea] 0d:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 34:[8086:15ea] 0d:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 35:[8086:15eb] 0e:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)

IOMMU group 36:[8086:15ec] 10:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)

Bus 005 Device 001 Port 5-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 006 Device 001 Port 6-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

Edited by SmokeyColes
Link to comment

So I swapped the 3060 and 1660.

The 1660 worked fine in my PC.

 

The 3060 showed once randomly then would not show again.

 

 

*************************

Now I've found a way to make the 3060 show up - I've no idea what I'm doing but I type:

 

# echo 1 > /sys/bus/pci/rescan

 

In 3 times where the gefore gtx 3060 did not show up, the above command made it show.  Then when I started the array, I could select them (as i binded them).

 

I have not loaded ubuntu yet, as plan to turn off.  Swap the 3060 for the 1660, encounter the same issue - i will do the above command.

 

I don't want to do this each time, part of me is thinking this must be a bug in UNRAID...?

 

This is the IOMMU after the rescan command in terminal.  (I've decided to only rescan when the array is stopped as I don't want to bugger up the 3x SAS pci controllers and I've no clue what I am doing really!  I found the command on something called archlinux)

 

IOMMU group 24:[8086:15ea] 0c:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 25:[8086:15ea] 0d:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 26:[8086:15ea] 0d:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 27:[8086:15ea] 0d:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 28:[8086:15ea] 0d:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 29:[8086:15eb] 0e:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)

IOMMU group 30:[8086:15ec] 42:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)

Bus 005 Device 001 Port 5-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 006 Device 001 Port 6-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

IOMMU group 31:[8086:15da] 43:00.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)

IOMMU group 32:[8086:15da] 44:01.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)

[10de:2489] 45:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3060 Ti Lite Hash Rate] (rev a1)

[10de:228b] 45:00.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)

 

 

 

Next step

 

Remove Razor X

Put in 1660 in the Legion Lenova.

Try the rescan command when UNRAID system devices doesn't see it (fingers crossed)

 

Link to comment

@Squid thank you, I'm sure you'll know more.

 

Ok so I removed the 3060 and the Razor CoreX.

I took the 1660 from the PC and into the Legion Lenova BoostStation using the 2m Thunderbolt lead (original setup).

Powered up, sure enough it did not show.

Issued the rescan command - yes it all showed!

Binded the gpu and sound and usb.

Rebooted UNRAID, it booted up and it did not show even after rescan

(but I rebooted - the legion is unlike the Razor CoreX which only has a main psu on/off button, the legion has a little push power button at the front.  So it didn't switch off an issue.  Anyway i shutdown UNRAID this time - not rebooted UNRAID.  Turned off the Legion and then back on.  Next Booted UNRAID.  I think the razor possibly power cycles itself unlike the legion)  

This time the GPU didn't show, I rescanned and it - and it showed as expected.

 

I took the diagnostics before and after.

 

All this time it seems it has been working - but in 30ish reboots I've had mixed results - the only one which seemed not to be as bad as the 1660 & 3060 was the GT430 which doesn't need a PCI rescan.  I'm not sure exactly whats happening but I am positive with my rescan approach.

 

IOMMU group 24:[8086:15ea] 0c:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 25:[8086:15ea] 0d:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 26:[8086:15ea] 0d:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 27:[8086:15ea] 0d:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 28:[8086:15ea] 0d:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)

IOMMU group 29:[8086:15eb] 0e:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)

IOMMU group 30:[8086:15d3] 0f:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)

IOMMU group 31:[8086:15d3] 10:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)

[10de:2184] 11:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)

[10de:1aeb] 11:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)

[10de:1aec] 11:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)

[10de:1aed] 11:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller (rev a1)

IOMMU group 32:[8086:15d3] 10:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)

[8086:15c0] 12:00.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)

 

(Note somewhere along the line I switched to Multi-mode and not Both

(this I don't think matters regarding the problem - just in case anyone is curious to it looking different in earlier IOMMU groupings).

 

Please see attached 

 

 

diagnostics17.zip diagnostics18.zip

Edited by SmokeyColes
Link to comment

@limetech  Hi can you provide any advice?

 

the thunderbolt eGPU shows (TitanRidge) but not the GPU

 

echo 1 > /sys/bus/pci/rescan

 

will then show the GPU every time...

 

however by this point where I am typing the command; the system has booted and the VFIOs are not binded (green dots in the device scanner) -

is there any way I can force this pci rescan before vfio bindings are established, so it sees the GPU?

 

Thanks

Edited by SmokeyColes
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.