Thunderbolt 3 eGPU with Windows 10


Recommended Posts

Hello,

i found UNRAID in home assistant forum and trying to see if i can change, without many hassle, my current setup.

I start saying i'm focused on the best power consumption and not the best performance, since the system is up mainly for home assistant and home control.

My setup is:

- NUC, nuc8i5beh with 32Gb of RAM and 1 Tb nvme + 1 Tb ssd sata

- Node lite Thunderbolt 3 enclosure

- Gigabyte 1050ti that control 2 hdmi monitors when i need to check something and sometimes play some non demanding games

Now, i just installed Windows 10 and then enabled hyper-v to create an ubuntu server that i use for home assistant installation + many docker containers.

If i want to use Windows ( web browsing or games or office things ) i just power on node lite with 1050ti and after some seconds it connects to the monitors.

The only trick is to log-off user before switch off again the thunderbolt 3 enclosure or it hangs up RDP connection.

On idle is not consuming that much but i see i can improve the system, if don't really need Windows, why don't just shut it off, leaving only home assitant things on?

Now the UNRAID part.

I tried UNRAID with a sata ssd for tests and i migrated successfully the hyper-v ubuntu with home assistant but i'm struggling a bit with thunderbolt 3 and Windows VM.

I start saying it must have to boot with thunderbolt 3 enclosure switched on or can't see it NVIDIA entry in device log.

Going forward, i can assign the gpu to the Windows 10 VM created for testing and installing driver, but i see some tearing effects and glitches, not really a "clean" visualization when i resize windows, opening task manager, on youtube i see some artifacts, etc etc...not bad but not so good either.

More over, if i switch off thunderbolt 3 enclosure after stopping VM ( if i don't need Windows or VMs it's a waste of power leaving a gpu on doing nothing, since it will be on 24/7 ) i can't start the VM anymore, after powering enclosure on, saying it can't see associated device ( the NVIDIA card ID ). I don't know if there are commands to rescan for the device or something similiar to do before start the VM again.

I see potential for UNRAID with my setup, but i'm struggling with that Windows 10 VM ( i will need a Windows 10 installation ) and thunderbolt thing that i need to sort out if i want to really switch to this solution.

Thanks for the reading, and maybe helping 🙂

 

p.s. i did sent a diagnostic of my system if that could help

Edited by Malaga
Link to comment

I did some tests meanwhile.

It appears UNRAID doesn't track when the thunderbolt3 external case is disconnected/connected, "lspci" shows the device even if disconnected but at reconnection the VM with associated eGPU can't start with the error of my initial post.

For tearing i just used an updated ISO, 1809.

Tried for comparison proxmox, latest version as of now, and it detects when thunderbolt3 node lite is disconnected and reconnected, starting the VM just fine. Tried many start&stop followed by unplug TB3 cable and then plug it again, no issue at all.

So for now i think i'll go with proxmox, will see if UNRAID team will ever sort out thunderbolt3 plug&play feature.

Bye :)

Edited by Malaga
Link to comment

Hi @Malaga,

 

by passing through the eGPU i.e. ticking the device in the VM template, Unraid adds a hostdev section to your Windows 10 template. When you now turn off your device, it is no longer available to the host. But the hostdev section remains in your Windows 10 template. When you start the VM then, it will try to find the device (which is turned off) and throw an error. I think this is expected behaviour.

 

However, sad to hear that Unraid does not recognize the device when you turn the eGPU on again resulting in a VM startup issue. I guess this is truly an issue with Unraid as both utilize KVM under the hood. Haven't used Proxmox myself, so please correct me if I'm wrong. Thunderbold support can still be considered as experimental in Unraid. Personally, I'm also interested in eGPU support. Some people where able to run Unraid as VM in Proxmox [1] - in case you are missing Unraid features in Proxmox.

 

Quote

So for now i think i'll go with proxmox, will see if UNRAID team will ever sort out thunderbolt3 plug&play feature.

but i see some tearing effects and glitches

 

A solution might be passing through the Thunderbold controller to the the VM. Why? Search for posts in the forum where people have problems with passing through USB devices (crackling sounds with USB sound cards, etc.). The solution to all of these problems is to pass through the whole USB controller. That also allows them to use the USB hot plug feature in their VM.

In your case, it could solve the issues you observe with Thunderbold. It's just a theory. You would be one of the first Unraid users experimenting with thunderbold passthrough.

 

cheers

 

[1] https://forums.unraid.net/topic/29679-guide-virtualizing-unraid-on-proxmox-31/

 

Link to comment

 

@T0a Thank for the answer and your considerations, really appreciate that :) .

I start saying i'm not a super linux expert, i saw that, with lspci, if i disconnect the TB3 enclosure it still showed while with proxmox it disappear, maybe there are different things under the hood, not only KVM related?

Because i did the same things, passed the GPU on both system, but UNRAID doesn't manage the plug&play feature, while proxmox it does the job done quite well.

I saw that the the TB3 enclosure use also xhci_hcd driver, since it has 2 TB3 ports, and i don't know if that can conflict with the build in USB controller, that has some issue, like can't use an external HDD sharing only the USB port, i have to share all the controller, but that has issue too. This with USB from 00:14.0 USB controller.

I mitigate the problem, on proxmox, sharing the whole 08:00.0 id, using a usb-c to USB3 adapter and that's quite fine, this is my pci list:

root@proxmox:~# lspci -nnk
00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ed0] (rev 08)
        Subsystem: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:2074]
        Kernel driver in use: skl_uncore
00:02.0 VGA compatible controller [0300]: Intel Corporation Iris Plus Graphics 655 [8086:3ea5] (rev 01)
        Subsystem: Intel Corporation Iris Plus Graphics 655 [8086:2074]
        Kernel driver in use: i915
        Kernel modules: i915
00:08.0 System peripheral [0880]: Intel Corporation Skylake Gaussian Mixture Model [8086:1911]
        Subsystem: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model [8086:2074]
00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Point-LP Thermal Controller [8086:9df9] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP Thermal Controller [8086:2074]
        Kernel driver in use: intel_pch_thermal
        Kernel modules: intel_pch_thermal
00:14.0 USB controller [0c03]: Intel Corporation Cannon Point-LP USB 3.1 xHCI Controller [8086:9ded] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP USB 3.1 xHCI Controller [8086:2074]
        Kernel driver in use: xhci_hcd
00:14.2 RAM memory [0500]: Intel Corporation Cannon Point-LP Shared SRAM [8086:9def] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP Shared SRAM [8086:2074]
00:14.3 Network controller [0280]: Intel Corporation Cannon Point-LP CNVi [Wireless-AC] [8086:9df0] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP CNVi [Wireless-AC] [8086:0034]
        Kernel driver in use: iwlwifi
        Kernel modules: iwlwifi
00:16.0 Communication controller [0780]: Intel Corporation Cannon Point-LP MEI Controller [8086:9de0] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP MEI Controller [8086:2074]
        Kernel driver in use: mei_me
        Kernel modules: mei_me
00:17.0 SATA controller [0106]: Intel Corporation Cannon Point-LP SATA Controller [AHCI Mode] [8086:9dd3] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP SATA Controller [AHCI Mode] [8086:2074]
        Kernel driver in use: ahci
        Kernel modules: ahci
00:1c.0 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port [8086:9db8] (rev f0)
        Kernel driver in use: pcieport
00:1c.4 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port [8086:9dbc] (rev f0)
        Kernel driver in use: pcieport
00:1d.0 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port [8086:9db0] (rev f0)
        Kernel driver in use: pcieport
00:1d.6 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port [8086:9db6] (rev f0)
        Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Cannon Point-LP LPC Controller [8086:9d84] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP LPC Controller [8086:2074]
00:1f.3 Audio device [0403]: Intel Corporation Cannon Point-LP High Definition Audio Controller [8086:9dc8] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP High Definition Audio Controller [8086:2074]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel, snd_sof_pci
00:1f.4 SMBus [0c05]: Intel Corporation Cannon Point-LP SMBus Controller [8086:9da3] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP SMBus Controller [8086:2074]
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP SPI Controller [8086:9da4] (rev 30)
        Subsystem: Intel Corporation Cannon Point-LP SPI Controller [8086:2074]
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (6) I219-V [8086:15be] (rev 30)
        Subsystem: Intel Corporation Ethernet Connection (6) I219-V [8086:2074]
        Kernel driver in use: e1000e
        Kernel modules: e1000e
02:00.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02)
        Kernel driver in use: pcieport
03:00.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02)
        Kernel driver in use: pcieport
03:01.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02)
        Kernel driver in use: pcieport
03:02.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02)
        Kernel driver in use: pcieport
04:00.0 System peripheral [0880]: Intel Corporation JHL6340 Thunderbolt 3 NHI (C step) [Alpine Ridge 2C 2016] [8086:15d9] (rev 02)
        Subsystem: Intel Corporation JHL6340 Thunderbolt 3 NHI (C step) [Alpine Ridge 2C 2016] [8086:2074]
        Kernel driver in use: thunderbolt
        Kernel modules: thunderbolt
05:00.0 PCI bridge [0604]: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] [8086:15d3] (rev 02)
        Kernel driver in use: pcieport
06:01.0 PCI bridge [0604]: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] [8086:15d3] (rev 02)
        Kernel driver in use: pcieport
06:04.0 PCI bridge [0604]: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] [8086:15d3] (rev 02)
        Kernel driver in use: pcieport
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GP107 [GeForce GTX 1050 Ti] [1458:3746]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
07:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd GP107GL High Definition Audio Controller [1458:3746]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
08:00.0 USB controller [0c03]: Intel Corporation JHL6540 Thunderbolt 3 USB Controller (C step) [Alpine Ridge 4C 2016] [8086:15d4] (rev 02)
        Subsystem: Akitio JHL6540 Thunderbolt 3 USB Controller (C step) [Alpine Ridge 4C 2016] [1cf0:030d]
        Kernel driver in use: vfio-pci
6c:00.0 USB controller [0c03]: Intel Corporation JHL6340 Thunderbolt 3 USB 3.1 Controller (C step) [Alpine Ridge 2C 2016] [8086:15db] (rev 02)
        Subsystem: Intel Corporation JHL6340 Thunderbolt 3 USB 3.1 Controller (C step) [Alpine Ridge 2C 2016] [8086:2074]
        Kernel driver in use: xhci_hcd
6d:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 [144d:a808]
        Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 [144d:a801]
        Kernel driver in use: nvme
6e:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader [10ec:522a] (rev 01)
        Subsystem: Intel Corporation RTS522A PCI Express Card Reader [8086:2074]
        Kernel driver in use: rtsx_pci
        Kernel modules: rtsx_pci

Returning to UNRAID :

 

16 hours ago, T0a said:

When you start the VM then, it will try to find the device (which is turned off) and throw an error. I think this is expected behaviour.

When i start the VM the TB3 enclosure is ON and connected to the NUC. Maybe i said something wrong, my steps to produce the error is:

  • boot with TB3 on and connected to the NUC
  • start Win10 VM
  • stop Win10 VM
  • disconnect TB3 cable from eGPU enclosure and NUC
  • reconnect the TB3 cable ( same port on the enclosure )
  • start Win10 VM: ERROR.

While with proxmox this is not happening, it starts fine again. But i can't address where the issue is located.

16 hours ago, T0a said:

Thunderbold support can still be considered as experimental in Unraid. Personally, I'm also interested in eGPU support

eGPU works fine, for what i tested, as soon as you don't disconnect the cable or turn off the power to the enclosure, then you are messed :)

The only way to start the VM again with eGPU is to reboot the host.

16 hours ago, T0a said:

A solution might be passing through the Thunderbold controller to the the VM.

This could be a neat solution, but how to achieve this?

Does it pass the GPU correctly then?

Is there a guide or something?

I didn't see that ability when passing the devices to the VM, but maybe i'm blind 😅

Another problem is: if i disconnect it, and the reconnect it...will UNRAID find it again or i will end up with same issue?

For what i saw the probably answer for the last question will be yes, but if i find how to pass the whole controller i can do a test without problems.

16 hours ago, T0a said:

You would be one of the first Unraid users experimenting with thunderbold passthrough.

That's interesting, and a bit scaring, i was thinking a NUC + eGPU was not a so strange setup.

Edited by Malaga
Link to comment
10 hours ago, Malaga said:

Returning to UNRAID :

 

When i start the VM the TB3 enclosure is ON and connected to the NUC. Maybe i said something wrong, my steps to produce the error is:

  • boot with TB3 on and connected to the NUC
  • start Win10 VM
  • stop Win10 VM
  • disconnect TB3 cable from eGPU enclosure and NUC
  • reconnect the TB3 cable ( same port on the enclosure )
  • start Win10 VM: ERROR.

While with proxmox this is not happening, it starts fine again. But i can't address where the issue is located.

eGPU works fine, for what i tested, as soon as you don't disconnect the cable or turn off the power to the enclosure, then you are messed :)

The only way to start the VM again with eGPU is to reboot the host.

 

This sounds like some sort of reset bug with the graphics card. What graphics card are you using (AMD)?

 

Quote

This could be a neat solution, but how to achieve this?

Does it pass the GPU correctly then?

Is there a guide or something?

I didn't see that ability when passing the devices to the VM, but maybe i'm blind 😅

Another problem is: if i disconnect it, and the reconnect it...will UNRAID find it again or i will end up with same issue?

For what i saw the probably answer for the last question will be yes, but if i find how to pass the whole controller i can do a test without problems.

That's interesting, and a bit scaring, i was thinking a NUC + eGPU was not a so strange setup.

This would be cool as I don't own the hardware to try it myself. Have a look at this video. You need to do some sort of transfer here as you want to passthrough your Thunderbold controller instead of the USB controller. To briefly summarize:

 

* Install VFIO PCIE Config plugin from the Apps store

* The plugin should list your IOMMU groups and also show the connected devices such as your eGPU

* Tick the Thunderbold controller that has the eGPU connected to its port. This will blacklist the whole controller from the host (make sure you don't tick the device with your Unraid flash drive. Otherwise you cannot boot anymore). You may need to reboot.

* Add the thunderbold controller to your VM template as shown in the video linked above

  ** For troubleshooting please add the VM template XML and the issues that occur while starting the VM (if so) in your next post

* When booted, make sure Windows uses the correct Thunderbold driver (Devices Manager)

 

Using this method, we will rule out Unraid as the culprit as we blacklist the whole controller from the host. I'm curious if that works for you! Feel free to reach back, if you have any problems.

 

 

Edited by T0a
Link to comment
19 hours ago, T0a said:

This sounds like some sort of reset bug with the graphics card. What graphics card are you using (AMD)?

 

Hello,

i have a GTX 1050ti, since it can fit Atiko Node Lite and doesn't require extra power other than PCIE.

19 hours ago, T0a said:

To briefly summarize:

Thank for the detailed infos, if i have time i'll do it this friday or week-end.

I read about locking out, mainly because UNRAID works on usb stick.

I have a question if you can answer: does vfio cover ALL ids you pass? Every id i pass to vfio, "he" must take care of or it's not that simple? All driver are vfio "compatible" ( pass me the term )?

Tell me if i'm right or wrong, but other than the id i want, i have to pass to vfio all other ids in that IOMMU group, right?

NUC in this case is quite good, many groups are just one device/id.

Thanks.

Link to comment
  • 4 weeks later...
On 5/19/2020 at 5:12 PM, Malaga said:

Thank for the detailed infos, if i have time i'll do it this friday or week-end.

I read about locking out, mainly because UNRAID works on usb stick.

How did it go? 

I am looking for parts to mini itx build that can manage two gpu. Either with tb3 or with m.2 nvme adapter to pcie x16 Riser Cable. 

I want to be able to run windows VM but with an option to unplug the egpu when I dont need the VM anymore to save power. I want to do this without to restart the server in order for it to work again. 

TB3 for what I know should be hot Plug but I am not sure if the m.2 nvme pcie adapter support hot swap or if the unraid server only can detect the pcie slots durung boot sequence. 

Pls to hear your progress. 

Link to comment
15 hours ago, TIE Fighter said:

How did it go? 

I am looking for parts to mini itx build that can manage two gpu. Either with tb3 or with m.2 nvme adapter to pcie x16 Riser Cable. 

I want to be able to run windows VM but with an option to unplug the egpu when I dont need the VM anymore to save power. I want to do this without to restart the server in order for it to work again. 

TB3 for what I know should be hot Plug but I am not sure if the m.2 nvme pcie adapter support hot swap or if the unraid server only can detect the pcie slots durung boot sequence. 

Pls to hear your progress. 

 

I have some experience with this. Using a m.2 to pcie riser to an external gpu setup, but I have the m.2 riser on the main gfx pcie x16 slot through a pcie -> m.2 adapter card. Mobo is mini itx and old 2nd/3rd gen i7 so it doesnt have real m.2 slots.

 

GPU has its own power supply. When I turn on the egpu while unraid server is on, it can cause my unraid to error and even makes drives in array drop. Those drives are on a sata hba through a mini pcie 1x slot.  I must have the egpu powered on before the unraid server boots up. The riser is powered from the egpu power supply. Maybe this plays a role? Maybe its the motherboard issue.

 

On the flip side, I can turn off the egpu with unraid on, and its fine. I need to have the VM turned off. I also have a manual script to remove lspci devices after powering off the gpu.

Edited by kakashisensei
Link to comment
On 6/13/2020 at 10:17 AM, TIE Fighter said:

How did it go? 

I am looking for parts to mini itx build that can manage two gpu. Either with tb3 or with m.2 nvme adapter to pcie x16 Riser Cable. 

I want to be able to run windows VM but with an option to unplug the egpu when I dont need the VM anymore to save power. I want to do this without to restart the server in order for it to work again. 

TB3 for what I know should be hot Plug but I am not sure if the m.2 nvme pcie adapter support hot swap or if the unraid server only can detect the pcie slots durung boot sequence. 

Pls to hear your progress. 

Hello, I haven’t had time to test it more on unraid, post Covid work was getting more intensive 😄

I can only say I found the right spot to handle my NUC with proxmox, I made a nodered routine to enable desk smart switch wait a bit and power up the VM, then shut it down and wait a bit to shut it down, working great.

 That’s the working way, give the time for the system to give the HW to the VM and give it again to let it come back to the system.

 On unraid this was not working fine, i did a fast try to assign the tb3 controller but my first issue remained: since I turn off the enclosure, it won’t be detected again. Maybe need to pass more, who knows.

 

I think that m2 enclosure is a bit different, like a real connected gpu, I don’t know if it’s pluggable...I suggest you to give a try to proxmox too, just to test it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.