Jump to content
lotte

Ryzen Threadripper Platform Issues

8 posts in this topic Last Reply

Recommended Posts

I was requested to put in my issues about Threadripper here for extra support. First thing is I have already done this with 6.3.3-6.3.5 stable. I will put in the issue about 6.4 RC7 as well.

 

My specifications

X399 Zenith

1950x

32Gb Trident 3200 if this matters

GTX 1070 FE 42:0:0

GTX 1070 EVGA FE I think this is 09:00:0

GTX 1070 Gigabyte FE I think this is 41:00:0

Zotac GT 710 x1 edition

 

(boot mainly into UNRAID GUI via first card because I wasn't sure how to trigger x1 as the client out for UNRAID GUI)

 

PCIe X16 slot one GTX 1070 Nvidia FE

PCIe x16 slot two GTX 1070 EVGA FE

PCIe x 4 slot three not populated

PCIe x16 slot four Blackmagic SDI card disabled via hardware PCIE switch (zenith manual)

PCIe x1 slot five Zotac GT 710 x1 edition

PCIe x16 slot 6 GTX 1070 Gigabyte FE

 

https://www.asus.com/us/Motherboards/ROG-ZENITH-EXTREME/

 

(manual related)

 

PCI e X16 slot one

 

issues that occur with the x1 edition in both 6.3.5 stable and 6.4 rc7

 

(below is related to x1 gpu card starts)

 

internal error: qemu unexpectedly closed the monitor: 2017-08-15T18:08:40.145365Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
2017-08-15T18:08:40.173453Z qemu-system-x86_64: -device vfio-pci,host=06:00.0,id=hostdev0,x-vga=on,bus=pci.4,addr=0x0: vfio error: 0000:06:00.0: group 11 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

 

(above is related to x1 gpu card ends)

 

(Main issue below starts)

 

The main issue is mainly what happens is that there is no video output (1070 09:00:0) after launching the VM in 6.3.5 stable (already known is some kernel related issue). I then moved to 6.4 RC7 trigger the vm with the1070 09:00:0 it still outputs no video (no tianocore post). Result for me is to pause and then force stop. Re-summoning the same 1070 09:00:0 will cause this error below

 

internal error: Unknown PCI header type '127' (6.4 RC7 1070 09:00:0 also happens on second 1070)

IOMMU group 0
	[1022:1452] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 1
	[1022:1453] 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 2
	[1022:1453] 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 3
	[1022:1452] 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 4
	[1022:1452] 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 5
	[1022:1452] 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 6
	[1022:1452] 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
	[1022:1454] 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
	[1022:145a] 0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
	[1022:1456] 0a:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1456
	[1022:145c] 0a:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] USB3 Host Controller
IOMMU group 7
	[1022:1452] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
	[1022:1454] 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
	[1022:1455] 0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455
	[1022:7901] 0b:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
	[1022:1457] 0b:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Device 1457
IOMMU group 8
	[1022:790b] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
	[1022:790e] 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
IOMMU group 9
	[1022:1460] 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1460
	[1022:1461] 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1461
	[1022:1462] 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1462
	[1022:1463] 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1463
	[1022:1464] 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1464
	[1022:1465] 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1465
	[1022:1466] 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1466
	[1022:1467] 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1467
IOMMU group 10
	[1022:1460] 00:19.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1460
	[1022:1461] 00:19.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1461
	[1022:1462] 00:19.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1462
	[1022:1463] 00:19.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1463
	[1022:1464] 00:19.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1464
	[1022:1465] 00:19.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1465
	[1022:1466] 00:19.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1466
	[1022:1467] 00:19.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1467
IOMMU group 11
	[1022:43ba] 01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43ba (rev 02)
	[1022:43b6] 01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43b6 (rev 02)
	[1022:43b1] 01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b1 (rev 02)
	[1022:43b4] 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
	[1022:43b4] 02:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
	[1022:43b4] 02:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
	[1022:43b4] 02:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
	[1022:43b4] 02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
	[1022:43b4] 02:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
	[168c:003e] 03:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
	[1ae9:0310] 04:00.0 Network controller: Wilocity Ltd. Wil6200 802.11ad Wireless Network Adapter (rev 02)
	[8086:1539] 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
	[10de:128b] 06:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1)
	[10de:0e0f] 06:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
	[1b21:2142] 08:00.0 USB controller: ASMedia Technology Inc. Device 2142
IOMMU group 12
	[10de:1b81] 09:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev ff)
	[10de:10f0] 09:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev ff)
IOMMU group 13
	[1022:1452] 40:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 14
	[1022:1453] 40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 15
	[1022:1452] 40:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 16
	[1022:1452] 40:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 17
	[1022:1453] 40:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 18
	[1022:1452] 40:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 19
	[1022:1452] 40:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
	[1022:1454] 40:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
	[1022:145a] 43:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
	[1022:1456] 43:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1456
	[1022:145c] 43:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] USB3 Host Controller
IOMMU group 20
	[1022:1452] 40:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
	[1022:1454] 40:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
	[1022:1455] 44:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455
	[1022:7901] 44:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
IOMMU group 21
	[10de:1b81] 41:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev ff)
	[10de:10f0] 41:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev ff)
IOMMU group 22
	[10de:1b81] 42:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
	[10de:10f0] 42:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)

CPU Thread Pairings

cpu 0 / cpu 16
cpu 1 / cpu 17
cpu 2 / cpu 18
cpu 3 / cpu 19
cpu 4 / cpu 20
cpu 5 / cpu 21
cpu 6 / cpu 22
cpu 7 / cpu 23
cpu 8 / cpu 24
cpu 9 / cpu 25
cpu 10 / cpu 26
cpu 11 / cpu 27
cpu 12 / cpu 28
cpu 13 / cpu 29
cpu 14 / cpu 30
cpu 15 / cpu 31

USB Devices

Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 0b05:1868 ASUSTek Computer, Inc. 
Bus 001 Device 003: ID 1b1c:1b27 Corsair 
Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 001 Device 005: ID 0b05:1867 ASUSTek Computer, Inc. 
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 004 Device 002: ID 1b1c:1a0f Corsair 
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 002: ID 054c:05c4 Sony Corp. DualShock 4
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 007 Device 002: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]
Bus 007 Device 003: ID 04d9:0167 Holtek Semiconductor, Inc. 
Bus 008 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

SCSI Devices

[0:0:0:0]    disk    Corsair  Voyager GO       000B  /dev/sda   31.0GB
[1:0:0:0]    disk    ATA      SanDisk SDSSDHII 00RL  /dev/sdb    480GB
[3:0:0:0]    disk    ATA      INTEL SSDSA2CW12 0302  /dev/sdc    120GB
[4:0:0:0]    disk    ATA      INTEL SSDSA2CW12 0302  /dev/sdd    120GB
[5:0:0:0]    disk    ATA      SanDisk SDSSDHII 00RL  /dev/sde    240GB
[6:0:0:0]    disk    ATA      SanDisk Ultra II 00RL  /dev/sdf    960GB

 

I have attach both the real time log and the tower diag related to 6.4 RC7

 

(main issue above ends)

cleardot.gif

tower-syslog-20170815-1147.zip

tower-diagnostics-20170815-1151.zip

Share this post


Link to post

you doing OVMF or SeaBIOS?  I've had more luck with SeaBios....but now I'm having the same PCI header type '127' error you're getting.  I can't even tel it to turn on.  and I'm too new at this to go digging in logs..but I'll get to it.  grrrr. :)

 

I wonder if a reboot would fix this.  These Ryzen processors seem awesome.....can't wait for the software to develop. :)

Share this post


Link to post

Will try it again just to make sure. At the moment I cannot bypass the '127' error unless I actually reboot the system then try again. I think once you shutdown the VM it also takes the PCIE device leaving each device '127'. Looks to be when it happens, reading through the syslog the kernel panic starts and stays that way until you reboot.

Share this post


Link to post

I have not seen any comments about loading the vbios for the nivida cards.

 

I forget now if that was limited to passing through the primary or required for secondary cards as well.

Share this post


Link to post
On 16/08/2017 at 5:51 AM, lotte said:

Will try it again just to make sure. At the moment I cannot bypass the '127' error unless I actually reboot the system then try again. I think once you shutdown the VM it also takes the PCIE device leaving each device '127'. Looks to be when it happens, reading through the syslog the kernel panic starts and stays that way until you reboot.

 

How are you getting on with the Zenith board and UNRAID? I just bought this board but concerned with the reports of GPU/PCIe passthrough issues.

Share this post


Link to post

There were passthrough issues with TR / Ryzen that were at least partially resolved with the RC10 release. Would suggest starting with that. 

 

Looking at OPs config, that is the most maxed out I have seen. If it were me, I'd back down to two video cards, the Zotac as primary and a 1070 as secondary. Remove the other two video cards. Read all the threads here about configuring c states, NPT, etc. properly before starting. Follow the SpaceInvader One daily driver videos to get VM1 working.in non-passthrough and then passthrough modes. Once you have it installed and stable for a few days, swap the video card with a different one and make sure each of them work with the passthrough.  They are all virtually the same card and should be interchangeable with maybe some tweaking of the passthrough configuration.

 

Then install the second video card, make sure VM still works.

 

Then install new VM with no passthrough with both video cards installed. Leave VM1 powered down.

 

Once it is up and working, turn on VM1 and get both working together.

 

Then power down VM1 and get passthrough enabled on.VM2. Once working, bring up VM1 and get them working together.

 

Then move on to VM3 taking a similar small step after small step approach.

 

Taking this in steps is the best way to create such a complex configuration, especially with no other person having a very similar config you can emulate and ask questions of. If it fails along the way, bask down to the last working configuration and consider ways to take even smaller steps. Once you are stuck, you have a lot of data to share about what changed between working and not working and would have a far better chance of getting useful suggestions. And in the meantime, have a working configuration to use while you plot your next move or wait for the next update.

 

Building success on success is far more effective than going all in and expecting it to work and then trying to troubleshoot from there. This rule #1 especially when operating at the leading or bleeding edge! 

Share this post


Link to post

I would strongly suggest following something similar to what SSD has said.

 

I've not looked into Theardripper yet or the boards, but can you really run all them slots at 16X? I would of expecting you to be down to 8x at least with that many cards. Also with my board the nvme slot shares bandwidth with one my pcie slots. (Based on reading the manual) so i've never populated it. 

Share this post


Link to post

There's a thread that may be of some value, here:

 

https://www.reddit.com/r/Amd/comments/6vbe6w/threadripper_broken_on_linux_for_pci_passthrough/

 

Dunno if it's exactly the same issue, or possibly a different issue.  The last post was a few hours ago, so near as I can tell, this issue is still open.

 

I haven't justified that I need a Threadripper myself, with the general lack of funds this is unlikely to change anytime in the near future.

 

Still like to monitor the progress of others, so hope this helps!!

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.