[Support] ich777 - AMD Vendor Reset, CoralTPU, hpsahba,...


Recommended Posts

23 minutes ago, Paff said:

Does removing the plugin and restarting the system bring back my old power consumption? Or did it brick my card now?

Are you really sure that this was the cause of the issue?

 

Have you yet tried to issue:

radeontop

from a Unraid terminal and see what the output is?

 

24 minutes ago, Paff said:

Or do I need to also remove some code from a file?

No definitely not.

 

24 minutes ago, Paff said:

I can still use the card in a VM (settings not changed, same Bios) like I did before.

Do you have bound the card to VFIO?

 

25 minutes ago, Paff said:

But still also idleing doesnt lower the consumption it stays around the same. Want my old consumption back.

I would strongly recommend that you wait for the parity check to finish and then see what the real power consumption is.

 

25 minutes ago, Paff said:

Thanks in Advance. 

Always post your Diagnostics with such issues like that, I really can't tell much without them and the above things are all assumptions...

Link to comment
7 hours ago, ich777 said:

Are you really sure that this was the cause of the issue?

 

Have you yet tried to issue:

radeontop

from a Unraid terminal and see what the output is?

 

No definitely not.

 

Do you have bound the card to VFIO?

 

I would strongly recommend that you wait for the parity check to finish and then see what the real power consumption is.

 

Always post your Diagnostics with such issues like that, I really can't tell much without them and the above things are all assumptions...

Hey ich777,

 

first of all thank you for your support. Iam pretty sure, because it more or less crashed directly after I installed the plugin. I now wait for the check to be finished, then I will try your suggestions. A little scared to crash the system again.

Sorry, I should have attached directly the diagnostics. The card is bound to VFIO. Did this be the root of the cause?

Thanks again for the support.

BR

Paff 

tower-diagnostics-20230730-2036 2.zip

Link to comment
52 minutes ago, Paff said:

directly after I installed the plugin.

RadeonTOP or GPU Statistics?

 

RadeonTOP does by default nothing other than modprobing your amdgpu, GPU Statistics also does nothing by default if you haven't configured it.

 

53 minutes ago, Paff said:

The card is bound to VFIO. Did this be the root of the cause?

Pretty sure, if you are binding a card to VFIO the host basically can't "see/use" it, it reserved for use in VMs so there is no point anyways if you install any of the two plugins because it is bound to VFIO.

 

I really can't imagine that now you have a higher power draw because of this, because as said the two plugins even can't access your GPU.

Please try a restart after the parity check finished, maybe shut down the system completely, wait a bit and then turn it back on.

Link to comment

@ich777 I have an asustor AS6704T that Im trying to get fan control on. Sensors-detect says ITE IT8625E Super IO Sensors'     (address 0xa30, driver `to-be-written')

 

After installing your ITE driver, The System Temp plugin sees an IT8625 fan which reports its at 0%/800 rpm.   However, the fan control plugin does not list any pwm, nor does pwmconfig.

 

I found these instructions for getting pwm control on debian, which uses a modified ITE driver. I do not know if it is the same source your driver is built from. I was going to build it myself, but the devpack is not longer supported, and my linux knowledge is somewhat limited. https://gist.github.com/johndavisnz/b5aae0236141666a77aac094701d7839

 

Ive also found this, which refers to some kernel drivers for it87. https://github.com/mafredri/asustor-platform-driver

 

Any assistance you can render in getting me fan control would be deeply appreciated (and compensated?)  My cpu temps are regularly climging up to 95c, and I think its because the fan is only  spinning at its lowest speed. 

Link to comment
6 hours ago, Terebi said:

My cpu temps are regularly climging up to 95c, and I think its because the fan is only  spinning at its lowest speed. 

may as note, fan control is made for hdd fans (and only controlled by hdd temps) and not intended to use for the cpu fan ... this just as sidenote before you invest also of time ... also stated pretty often in the fan control thread.

 

6 hours ago, Terebi said:

My cpu temps are regularly climging up to 95c

then you really should check your bios fan curve as by default its controlled by your bios and not by a OS.

  • Like 1
Link to comment
13 hours ago, ich777 said:

RadeonTOP or GPU Statistics?

 

RadeonTOP does by default nothing other than modprobing your amdgpu, GPU Statistics also does nothing by default if you haven't configured it.

 

Pretty sure, if you are binding a card to VFIO the host basically can't "see/use" it, it reserved for use in VMs so there is no point anyways if you install any of the two plugins because it is bound to VFIO.

 

I really can't imagine that now you have a higher power draw because of this, because as said the two plugins even can't access your GPU.

Please try a restart after the parity check finished, maybe shut down the system completely, wait a bit and then turn it back on.

Interesting. I use a GTX 1050TI for transcoding purposes. So I had GPU statistics active a long time before. I was just interested if I could get statistics the same way. So I just downloaded the PlugIn (even if GPU was bind to VFIO). Afterwards I could suprisingly select the AMD GPU in the GPU Statistics App. Did that afterwards it broke. Hmmm probably it reset the BIOS or something on crash? I dont know. I will try to find a solution, when Parity is done. Its a really weird behavior. Thanks again! :)

BR
Paff

Link to comment
2 hours ago, Paff said:

I was just interested if I could get statistics the same way.

Not if you have bound the card to VIFO.

 

2 hours ago, Paff said:

So I just downloaded the PlugIn (even if GPU was bind to VFIO). Afterwards I could suprisingly select the AMD GPU in the GPU Statistics App.

It may be the case that radeontop works differently and tries to access it directly which in your case resulted in a crash.

 

2 hours ago, Paff said:

Hmmm probably it reset the BIOS or something on crash?

Your motherboard BIOS or your GPU BIOS <- the GPU BIOS is usually read only. My suspicion is that the card was not properly reset.

 

2 hours ago, Paff said:

I will try to find a solution, when Parity is done.

Shut down your server, cut the power to the wall for about a minute and then try to boot again.

Link to comment
14 hours ago, Terebi said:

I found these instructions for getting pwm control on debian, which uses a modified ITE driver. I do not know if it is the same source your driver is built from. I was going to build it myself, but the devpack

NerdTools is still available.

 

14 hours ago, Terebi said:

Any assistance you can render in getting me fan control would be deeply appreciated (and compensated?)  My cpu temps are regularly climging up to 95c, and I think its because the fan is only  spinning at its lowest speed. 

Sorry but I don't build any drivers for hardware that I don't have on hand anymore because I simply can't support/tell when something is broken or doesn't work.

Link to comment
6 minutes ago, orlando500 said:

hi i updated python3 plugin. But after i get this: 

WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

is there something i have to do after upgrade do make things work work again?

 

@dlandon something seems wrong with the package that was updated yesterday, can you take a look at that?

  • Like 1
Link to comment
25 minutes ago, ich777 said:

@dlandon something seems wrong with the package that was updated yesterday, can you take a look at that?

It looks like the latest version of python3 requires the updated ssl libraries that are not included in Unraid yet.  I'll have to roll back to the previous python3 version.  Give me a couple of hours and then remove the python3 plugin and re-install it from CA.

  • Like 2
Link to comment
1 hour ago, dlandon said:

It looks like the latest version of python3 requires the updated ssl libraries that are not included in Unraid yet.  I'll have to roll back to the previous python3 version.  Give me a couple of hours and then remove the python3 plugin and re-install it from CA.

I updated the python3 with a package that does not give the ssl warning.  Updating the plugin will straighten things out.

  • Like 1
Link to comment
6 hours ago, dlandon said:

I updated the python3 with a package that does not give the ssl warning.  Updating the plugin will straighten things out.

@dlandon i still get errors with ssl after delete and reinstall of plugin. 

Error "ImportError: Can't connect to HTTPS URL because the SSL module is not available."

 

version of plugin: 2023.08.01a 

Edited by orlando500
Link to comment
2 hours ago, orlando500 said:

@dlandon i still get errors with ssl after delete and reinstall of plugin. 

Error "ImportError: Can't connect to HTTPS URL because the SSL module is not available."

 

version of plugin: 2023.08.01a 

I don't see it:

root@BackupServer:~# python3
Python 3.9.16 (main, Dec  7 2022, 11:25:14) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

 

What version of Unraid are you using?

Link to comment
2 hours ago, dlandon said:

I don't see it:

root@BackupServer:~# python3
Python 3.9.16 (main, Dec  7 2022, 11:25:14) 
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

 

What version of Unraid are you using?

6.12.3

 

and my python3

Python 3.9.17 (main, Jun  8 2023, 14:52:17) 
[GCC 13.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

 

EDIT: i see that python3 is still there after uninstall of plugin. tried to delete the lib python dir. but when i install plugin: "| Skipping package python3-3.9.16-x86_64-1 (newer vesion already installed)"

 

I see mine is newer. just tried to remove and install python3 again. Samme error:

"root@Hal:/mnt/user/appdata/arrsync2# python3 traktarr.py -s
Traceback (most recent call last):
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 240, in <module>
    main()
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 233, in main
    trakt_api_helper.process_sonarr(sonarrURL, sonarrAPIKey, sonarr_list_privacy, sonarrtrakt_list)
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 201, in process_sonarr
    self.add_to_trakt_list(sonarrtrakt_list, sonarr_list_privacy, monitored_tvdb_ids, "shows", response_json)
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 87, in add_to_trakt_list
    response = traktSession.post(f"https://api.trakt.tv/users/{self.trakt_user}/lists/{trakt_list}/items", headers=trakt_add_hdr, data=json.dumps(trakt_add),timeout=5)
  File "/usr/lib64/python3.9/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/usr/lib64/python3.9/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib64/python3.9/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib64/python3.9/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/lib64/python3.9/site-packages/urllib3/connectionpool.py", line 769, in urlopen
    conn = self._get_conn(timeout=pool_timeout)
  File "/usr/lib64/python3.9/site-packages/urllib3/connectionpool.py", line 295, in _get_conn
    return conn or self._new_conn()
  File "/usr/lib64/python3.9/site-packages/urllib3/connectionpool.py", line 1056, in _new_conn
    raise ImportError(
ImportError: Can't connect to HTTPS URL because the SSL module is not available."

Edited by orlando500
Link to comment
1 hour ago, orlando500 said:

6.12.3

 

and my python3

Python 3.9.17 (main, Jun  8 2023, 14:52:17) 
[GCC 13.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

 

EDIT: i see that python3 is still there after uninstall of plugin. tried to delete the lib python dir. but when i install plugin: "| Skipping package python3-3.9.16-x86_64-1 (newer vesion already installed)"

 

I see mine is newer. just tried to remove and install python3 again. Samme error:

"root@Hal:/mnt/user/appdata/arrsync2# python3 traktarr.py -s
Traceback (most recent call last):
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 240, in <module>
    main()
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 233, in main
    trakt_api_helper.process_sonarr(sonarrURL, sonarrAPIKey, sonarr_list_privacy, sonarrtrakt_list)
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 201, in process_sonarr
    self.add_to_trakt_list(sonarrtrakt_list, sonarr_list_privacy, monitored_tvdb_ids, "shows", response_json)
  File "/mnt/user/appdata/arrsync2/traktarr.py", line 87, in add_to_trakt_list
    response = traktSession.post(f"https://api.trakt.tv/users/{self.trakt_user}/lists/{trakt_list}/items", headers=trakt_add_hdr, data=json.dumps(trakt_add),timeout=5)
  File "/usr/lib64/python3.9/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/usr/lib64/python3.9/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib64/python3.9/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib64/python3.9/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/lib64/python3.9/site-packages/urllib3/connectionpool.py", line 769, in urlopen
    conn = self._get_conn(timeout=pool_timeout)
  File "/usr/lib64/python3.9/site-packages/urllib3/connectionpool.py", line 295, in _get_conn
    return conn or self._new_conn()
  File "/usr/lib64/python3.9/site-packages/urllib3/connectionpool.py", line 1056, in _new_conn
    raise ImportError(
ImportError: Can't connect to HTTPS URL because the SSL module is not available."

Uninstall python3 plugin and reboot.  Then re-install from CA.  It's out of sync and the latest plugin won't update python3.

Link to comment

People refer to the "Radeon-Top Plugin" to get an AMD GPU working on Unraid. I can get some output from `radeontop`:

 

                 radeontop v1.4-4-gec97e6f, running on HAWAII bus 01, 120 samples/sec                                                                                       │
                                       Graphics pipe   0.00% │
─────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────                                        Event Engine   0.00% │
                                                             │
                         Vertex Grouper + Tesselator   0.00% │
                                                             │
                                   Texture Addresser   0.00% │
                                                             │
                                       Shader Export   0.00% │
                         Sequencer Instruction Cache   0.00% │
                                 Shader Interpolator   0.00% │
                                                             │
                                      Scan Converter   0.00% │
                                  Primitive Assembly   0.00% │
                                                             │
                                         Depth Block   0.00% │
                                         Color Block   0.00% │
                                                 UVD   0.00% │
                                                 VCE   0.00% │
                                                             │
                                    15M / 8192M VRAM   0.18% │
                                      5M / 2043M GTT   0.23% │
                          0.15G / 0.00G Memory Clock    inf% │
                          0.30G / 1.04G Shader Clock  28.85% │------------
                                                             │

 

But I can never get anything to actually utilise it. I have tried many attempts at passing it through to a docker container, with no luck.

 

What steps can I take to get this working? Thanks for any help~

Link to comment
1 hour ago, ohare93 said:

What steps can I take to get this working?

RadeonTOP is working but if you get no reading out from it, it seems that nothing is using you AMD GPU.

 

RadenTOP is just a diagnostics tool.

 

To what containers (which repository) where you passing the GPU to and how did you pass it through?

Do you want to use it for transcoding? If yes, because this is a Hawaii based GPU I don‘t think it supports many formats (IIRC only h264 and everything below).

 

You would need something more recent for h265

Link to comment
  • 3 weeks later...
On 8/6/2023 at 7:55 AM, ich777 said:

To what containers (which repository) where you passing the GPU to and how did you pass it through?

Do you want to use it for transcoding? If yes, because this is a Hawaii based GPU I don‘t think it supports many formats (IIRC only h264 and everything below).

 

You would need something more recent for h265

 

Sorry for the delayed response. Kids, aye?

 

Looking more into my issues I found that my logs are filled with kernel errors

 

Quote

Aug 20 04:40:01 TheBox kernel: radeon 0000:01:00.0: ring 0 stalled for more than 60683110msec
Aug 20 04:40:01 TheBox kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000001 last fence id 0x000000000000000d on ring 0)
Aug 20 04:40:01 TheBox kernel: radeon 0000:01:00.0: ring 3 stalled for more than 60683113msec
Aug 20 04:40:01 TheBox kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000941 last fence id 0x0000000000000ed3 on ring 3)

 

I see here 

 someone suggested blacklisting the radeon driver. Would that simply make it use the amdgpu driver, or just make it stop working? 🤔I am woefully unknowledgeable about this stuff.

 

 

But in answer to your original question, I have been trying to run this container with gpu support https://github.com/ahmetoner/whisper-asr-webservice/ I am using the right image ":latest-gpu" and have tried passing through the driver in various ways in the extra parameters: `--gpus all` gives an error whereas  `--device /dev/dri` and `--gpus device=0` "work" but do nothing. Though I imagine that may be due to my kernel issues 🙈

 

Lastly, I am getting slightly more of a response from radeontop 🤔


     

Quote

 

                    radeontop v1.4-4-gec97e6f, running on HAWAII bus 01, 120 samples/sec                                                                                       │
                                       Graphics pipe 100.00% │-------------------------------------
─────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────                                        Event Engine   0.00% │
                                                             │
                         Vertex Grouper + Tesselator   0.00% │
                                                             │
                                   Texture Addresser   0.00% │
                                                             │
                                       Shader Export   0.00% │
                         Sequencer Instruction Cache   0.00% │
                                 Shader Interpolator   0.00% │
                                                             │
                                      Scan Converter   0.00% │
                                  Primitive Assembly   0.00% │
                                                             │
                                         Depth Block   0.00% │
                                         Color Block   0.00% │
                                                 UVD   0.00% │
                                                 VCE   0.00% │
                                                             │
                                    73M / 8192M VRAM   0.90% │-
                                      8M / 2043M GTT   0.37% │
                          0.15G / 0.00G Memory Clock    inf% │
                          1.04G / 1.04G Shader Clock 100.00% │-------------------------------------

 


 

It probably means nothing, but I just thought it was weird that the graphics pipe went up to 100% and I had some more RAM usage 😅

Edited by ohare93
Link to comment
1 hour ago, ohare93 said:

Would that simply make it use the amdgpu driver, or just make it stop working?

It will completely stop working AFAIK.

 

1 hour ago, ohare93 said:

--gpus all

This is for Nvidia GPUs.

 

1 hour ago, ohare93 said:

--device /dev/dri

This is correct for AMD GPUs.

 

1 hour ago, ohare93 said:

It probably means nothing, but I just thought it was weird that the graphics pipe went up to 100% and I had some more RAM usage 😅

I don't know if that container will work with your GPU since it's HAWAII based but it can at least I think. I would recommend that you ask in the GitHub from the container if your GPU can/will work.

 

 

Sorry for being not more helpful with that but I don't actually own a AMD GPU.

Link to comment

Hi ich777,

 

I'm having trouble installing the RadeonTOP plugin. I have an AMD APU as follows:

 

# lspci | grep VGA
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)

 

When I try to enable the amdgpu kernel module (which the plugin fails to do), I get this error message:

 

modprobe -v amdgpu
insmod /lib/modules/6.1.38-Unraid/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz 
modprobe: ERROR: could not insert 'amdgpu': Unknown symbol in module, or unknown parameter (see dmesg)

 

dmesg isn't helpful:

 

amdgpu: Unknown symbol dev_coredumpm (err -2)

 

I can enable the 'radeon' kernel module, but I don't think this is the right driver for my device. Enabling this module doesn't set /dev/dri.

 

Any idea what I can do? Thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.