[Support] ich777 - AMD Vendor Reset, CoralTPU, hpsahba,...


Recommended Posts

6 minutes ago, jungle said:

Question. I have a Ryzen 5 3400G APU and I've installed the GPU Stats and Radeon TOP plugins - I can see some activity when in the CLI and running radeontop but there is no activity on the dashboard. Am I missing a step or is this due to the APU?

 

image.thumb.png.d7e4b9168942fef0803c1348a0fcb58c.png

Wrong place. Please report this in the GPU Statistics support thread, the radeontop plugin seems to work just fine.

 

Link to comment

just found the new AMD top plugin, awesome job! i'd like to test some containers using gpu transcoding (afaik plex doesn't support amd hardware acceleration, right?). what value would we add to the extra perimeters section of the containers to enable an amd gpu?

 

--device=/dev/dri ?

Edited by Cpt. Chaz
typo
Link to comment
7 hours ago, Cpt. Chaz said:

what value would we add to the extra perimeters section of the containers to enable an amd gpu?

Currently only my Jellyfin container supports transcoding with AMD hardware as far as I know.

In the description is how to do that but well yes '/dev/dri' is the device to add.

Please also note I recommend adding it via the button on the bottom of the template 'Add another Path, Port, Variable, Label or Device' and then select 'Device' from the drop down menu.

Link to comment

Just to double check is the RadeonTop module in the kernel helper the right one to use a Navi card with GPU Statistics?

 

I've installed it but it's showing 100% usage when nothing is running. Wondered if I'd done something wrong or should report this in the GPU Statistics thread?

Link to comment
1 hour ago, cobhc said:

Just to double check is the RadeonTop module in the kernel helper the right one to use a Navi card with GPU Statistics?

Exactly, but please be also sure to build it with the gnif/vendor-reset patch (please also note that if you want to use it in VM's I don't think the GPU Statistics will work if it's currently used by a VM).

If you only plan to use it for Docker containers then you can also use the stock unRAID builds and install the Plugin from the CA App.

 

1 hour ago, cobhc said:

I've installed it but it's showing 100% usage when nothing is running. Wondered if I'd done something wrong or should report this in the GPU Statistics thread?

Can you post a screenshot?

Link to comment
1 hour ago, ich777 said:

Exactly, but please be also sure to build it with the gnif/vendor-reset patch (please also note that if you want to use it in VM's I don't think the GPU Statistics will work if it's currently used by a VM).

If you only plan to use it for Docker containers then you can also use the stock unRAID builds and install the Plugin from the CA App.

 

Can you post a screenshot?

I did also build the kernel with the vendor reset patch. I only really wanted to use RadeonTop for the GPU Statistics as it's nice to have showing on the main screen in the UI.

 

Here's a screenshot of what I'm seeing when the server is idle (GPU definitely not in use as the fans aren't spinning). Strangely, when I spin up a VM, the figures all drop back down and then appear to work correctly when the VM is running/under load.

Screenshot_20210322-125427.jpg

Edited by cobhc
Link to comment
5 minutes ago, cobhc said:

Here's a screenshot of what I'm seeing when the server is idle (GPU definitely not in use as the fans aren't spinning). Strangely, when I spin up a VM, the figures all drop back down and then appear to work correctly when the VM is running/under load.

Can you give me the output of 'lsmod' without a VM loaded up? Is the VM set to autostart?

Link to comment
31 minutes ago, ich777 said:

Can you give me the output of 'lsmod' without a VM loaded up? Is the VM set to autostart?

 

  Here you go:-

Module                  Size  Used by
iptable_raw            16384  1
wireguard              86016  0
curve25519_x86_64      32768  1 wireguard
libcurve25519_generic    49152  2 curve25519_x86_64,wireguard
libchacha20poly1305    16384  1 wireguard
chacha_x86_64          28672  1 libchacha20poly1305
poly1305_x86_64        28672  1 libchacha20poly1305
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             20480  1 wireguard
libblake2s             16384  1 wireguard
blake2s_x86_64         20480  1 libblake2s
libblake2s_generic     20480  1 blake2s_x86_64
libchacha              16384  1 chacha_x86_64
xt_CHECKSUM            16384  1
ipt_REJECT             16384  2
ip6table_mangle        16384  1
ip6table_nat           16384  1
vhost_net              24576  0
tun                    49152  2 vhost_net
vhost                  32768  1 vhost_net
vhost_iotlb            16384  1 vhost
tap                    24576  1 vhost_net
xt_nat                 16384  38
veth                   24576  0
xt_MASQUERADE          16384  31
iptable_nat            16384  4
nf_nat                 36864  4 ip6table_nat,xt_nat,iptable_nat,xt_MASQUERADE
nfsd                  196608  11
lockd                  77824  1 nfsd
grace                  16384  1 lockd
sunrpc                446464  14 nfsd,lockd
md_mod                 45056  3
iptable_mangle         16384  2
nct6775                53248  0
hwmon_vid              16384  1 nct6775
ip6table_filter        16384  1
ip6_tables             28672  3 ip6table_filter,ip6table_nat,ip6table_mangle
iptable_filter         16384  2
ip_tables              28672  6 iptable_filter,iptable_raw,iptable_nat,iptable_mangle
amdgpu               4493312  0
gpu_sched              32768  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
drm_kms_helper        167936  1 amdgpu
ttm                    77824  1 amdgpu
drm                   385024  4 gpu_sched,drm_kms_helper,amdgpu,ttm
backlight              16384  2 amdgpu,drm
agpgart                36864  2 ttm,drm
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
vendor_reset           81920  0
wmi_bmof               16384  0
mxm_wmi                16384  0
edac_mce_amd           32768  0
kvm_amd                98304  0
kvm                   667648  1 kvm_amd
crct10dif_pclmul       16384  1
crc32_pclmul           16384  0
crc32c_intel           24576  6
ghash_clmulni_intel    16384  0
aesni_intel           364544  0
crypto_simd            16384  1 aesni_intel
cryptd                 20480  2 crypto_simd,ghash_clmulni_intel
glue_helper            16384  1 aesni_intel
rapl                   16384  0
btusb                  45056  0
btrtl                  16384  1 btusb
i2c_piix4              24576  0
btbcm                  16384  1 btusb
btintel                24576  1 btusb
k10temp                16384  0
ccp                    73728  1 kvm_amd
igc                    90112  0
i2c_core               65536  5 drm_kms_helper,i2c_algo_bit,amdgpu,i2c_piix4,drm
ahci                   40960  4
bluetooth             405504  5 btrtl,btintel,btbcm,btusb
libahci                32768  1 ahci
ecdh_generic           16384  1 bluetooth
ecc                    28672  1 ecdh_generic
nvme                   36864  1
nvme_core              81920  3 nvme
input_leds             16384  0
led_class              16384  1 input_leds
wmi                    24576  2 wmi_bmof,mxm_wmi
acpi_cpufreq           16384  0
button                 16384  0

 And I don't have any VM's set to autostart.

Edited by cobhc
Link to comment
10 minutes ago, ich777 said:

Looks good, can you tell me if the VM is set to autostart like asked before?

No there isn't any VM's set to autostart, I did put that in the previous post but it was below the code block so might have been hard to see! :)

 

Edit: Just rebooted and still the same and then I tried booting up the VM again and now it's also stuck at 100% usage after dropping back to 0% temporarily. 

Edited by cobhc
Link to comment
2 minutes ago, cobhc said:

No there isn't any VM's set to autostart, I did put that in the previous post but it was below the code block so might have been hard to see!

Eventually @b3rs3rk can help here since this seems like it has something to do with the front end but it is really weird since if you start up the VM it seems to work like you wrote above.

 

Can you please post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded file here in the textbox)?

Link to comment
4 minutes ago, ich777 said:

Eventually @b3rs3rk can help here since this seems like it has something to do with the front end but it is really weird since if you start up the VM it seems to work like you wrote above.

 

Can you please post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded file here in the textbox)?

Please see attached.

 

Apologies, I edited my previous post as the VM is now also showing 100% usage and also typing radeontop into the terminal also shows full usage. Maybe there's an issue with my particular card.

tower-diagnostics-20210322-1407.zip

Link to comment
3 minutes ago, cobhc said:

Please see attached.

 

Apologies, I edited my previous post as the VM is now also showing 100% usage and also typing radeontop into the terminal also shows full usage. Maybe there's an issue with my particular card.

From what I see in your Diagnostics you have bound the card to VFIO since you use it in your Windows 10 VM.

 

That's one reason why it won't work, for testing purposes you can try to unbind the card and see if 'radeontop' works without the card bound to VFIO (if you bind the card to VFIO it's only "accessible" by the VM's).

Link to comment
3 hours ago, ich777 said:

From what I see in your Diagnostics you have bound the card to VFIO since you use it in your Windows 10 VM.

 

That's one reason why it won't work, for testing purposes you can try to unbind the card and see if 'radeontop' works without the card bound to VFIO (if you bind the card to VFIO it's only "accessible" by the VM's).

That makes sense and yes it works without the VFIO bind however without that I cannot pass through my GPU to my VM's, so it looks like I'll have to give both GPU Statistics and any hardware passthrough in dockers a miss.

 

Thanks for your help anyway :)

  • Like 1
Link to comment
On 3/20/2021 at 12:02 AM, ich777 said:

Currently only my Jellyfin container supports transcoding with AMD hardware as far as I know.

In the description is how to do that but well yes '/dev/dri' is the device to add.

Please also note I recommend adding it via the button on the bottom of the template 'Add another Path, Port, Variable, Label or Device' and then select 'Device' from the drop down menu.

cool, thanks!

  • Like 1
Link to comment

Hey again @ich777!

 

I just saw that you've released Radeon Top and it should save a lot of time for users trying to install a container I just had added to CA. Just a few things I was wondering about if that's cool with you?

 

1. I've noticed in the code it runs modprobe amdgpu, does this mean that a reboot isn't necessary after installing it before the GPU becomes available to the host?

2. Does it mean that it's not necessary to run touch /boot/config/modprobe.d/amdgpu.conf per the user guide to get Unraid to load the drivers as long as this is installed?

3. Does this override everything that could prevent Unraid from loading the drivers? eg. vfio passthrough being enabled for the GPU, or the GPU being stubbed

 

Thanks mate!

Edited by lnxd
Link to comment

 

8 minutes ago, lnxd said:

1. I've noticed in the code it runs modprobe amdgpu, does this mean that a reboot isn't necessary after installing it?

Exactly, no reboot required after installing the plugin.

 

8 minutes ago, lnxd said:

2. Does it mean that it's not necessary to run touch /boot/config/modprobe.d/amdgpu.conf per the user guide to get Unraid to load the drivers as long as this is installed?

No, because this is basically the same, the driver is loaded a little earlier in the boot process if you do it like that way than with the plugin but that should not affect a container since Docker is loaded way later in the process...

 

8 minutes ago, lnxd said:

3. Does this override everything that could prevent Unraid from loading the drivers? eg. vfio passthrough being enabled for the GPU, or the GPU being stubbed

If VFIO for the card or device ID is enabled neither of the top two methods will work since Unraid can't "see" the card (this applies also if you bind it to VFIO in the syslinux.cfg).

 

8 minutes ago, lnxd said:

I just saw that you've released Radeon Top and it should save a lot of time for users trying to install a container I just had added to CA. Just a few things I was wondering about if that's cool with you?

I've already seen that container. ;)

Do you pass through the directory '/dev/dri' or how does this work?

 

EDIT: Are in your container the RadeonPro drivers or the OpenSource ones since the OpenSource ones are significantly faster from what I know.

  • Thanks 1
Link to comment
5 minutes ago, ich777 said:

Exactly, no reboot required after installing the plugin.

Perfect! Thank you, looks like my understanding was correct. I'll update the instructions to reflect as such.

 

6 minutes ago, ich777 said:

I've already seen that container. ;)

Do you pass through the directory '/dev/dri' or how does this work?

😂 You are everywhere, and yep nice and simple 😉

 

7 minutes ago, ich777 said:

Are in your container the RadeonPro drivers or the OpenSource ones since the OpenSource ones are significantly faster from what I know.

amdgpu-pro-20.20. This isn't the first time I've heard this, but unlike yourself I'm not a genius so I need to do a lot of research first before I know what works. Now that I know it meets the requirements for CA, it's worth spending some time to thoroughly investigate. I'm curious to learn if there is any impact on using eg. open source in the container, vs. proprietary on the host, vs. different versions of the drivers, etc.

 

I'll probably also need to add tags with different driver versions as different cards don't work with PhoenixMiner on specific amdgpu-pro versions. I also want to see if I get different results on different versions of the kernels available with PhoenixMiner, and I might benchmark against other mining software too.

Link to comment
28 minutes ago, lnxd said:

😂 You are everywhere, and yep nice and simple 😉

:D

 

29 minutes ago, lnxd said:

amdgpu-pro-20.20. This isn't the first time I've heard this, but unlike yourself I'm not a genius so I need to do a lot of research first before I know what works.

I'm also not a genius, look at the Jellyfin container, it containers the Mesa Open Source drivers (keep in mind that I switched over to Debian Bullseye - yes I love Debian... :D - to get the latest drivers), you can try it with a version of my baseimage: 'FROM ich777/debian-baseimage:bullseye' and then it should be enough to do a 'apt-get update && apt-get -y install mesa-va-drivers'.

 

32 minutes ago, lnxd said:

I might benchmark against other mining software too.

I'm really not into mining but it always sounded interesting to me but to get GPU's these days is really hard at least for a normal price (I even don't asking for a good price... :D ).

 

34 minutes ago, lnxd said:

Perfect! Thank you, looks like my understanding was correct. I'll update the instructions to reflect as such.

No problem, I'm here to help. ;)

Link to comment
On 3/24/2021 at 12:00 AM, ich777 said:

I'm really not into mining but it always sounded interesting to me but to get GPU's these days is really hard at least for a normal price (I even don't asking for a good price... :D ).

It's basically impossible here (in Australia) as well. AMD GPUs are completely sold out everywhere, the only thing you can get are entry level Nvidia cards. I managed to get the 5500 XT overpriced and it's now worth more used than what I paid for it. I got into it back in around 2012-2013, but ASICs killed that. I came back to it just in the past few months now that it is profitable again but it's too late to get GPUs now.

 

On 3/24/2021 at 12:00 AM, ich777 said:

I'm also not a genius, look at the Jellyfin container

You aren't proving your point 😉

 

EDIT: @ich777 I tried using your bullseye container as a base image last night, forcing an install of the latest mesa drivers and then pulling PhoenixMiner in. It looked promising but PhoenixMiner couldn't see any cards (that rely on amdgpu or radeon), even though I could see them from the container. It was expedited because @Lobsi has an R9 270x that they wanted to try mining with. I then built a container on Ubuntu 14.04 with the 15.12 Radeon drivers to no avail. 

 

I'm pretty sure PhoenixMiner can only see cards using certain drivers but I just wanted to say thanks for the tip! Would have been nice if it worked with the open source drivers, not only would they (theoretically) be faster but the image could be so much smaller and more efficient. I'm going to see if I have more luck with a different miner.

Edited by lnxd
  • Like 1
Link to comment
3 hours ago, Dazog said:

Yes, I know, the server already built it about 1 and a half hours ago.

 

EDIT: Please give my scripts a little bit, I update the versions every two hours and the build only for the Nvidia Driver takes about 30 minutes so the longest time you have to wait after a new driver is released is 2 and a half hours.

 

EDIT2: But the good news is that the automated build is working. :D

  • Thanks 1
  • Haha 1
Link to comment
  • ich777 changed the title to [Support] ich777 - AMD Vendor Reset, CoralTPU, hpsahba,...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.