Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

18 minutes ago, PeeLoW said:

I did swap the slot (and back again). So there is a slight chans i fucked up...

This seems to me that the card don't work in all slots:

Oct 10 21:53:02 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0xffff:1211)
Oct 10 21:53:02 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 10 21:53:03 PeeLoW kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000dffff window]
Oct 10 21:53:03 PeeLoW kernel: caller _nv000720rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 21:53:03 PeeLoW kernel: NVRM: GPU at PCI:0000:01:00: GPU-8e555b0d-adbf-c913-e46f-820a8f9ad65e
Oct 10 21:53:03 PeeLoW kernel: NVRM: Xid (PCI:0000:01:00): 62, pid=14325, 0a7b(14c8) 00000000 00000000
Oct 10 21:53:15 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1241)
Oct 10 21:53:15 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 10 21:53:15 PeeLoW kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000dffff window]
Oct 10 21:53:15 PeeLoW kernel: caller _nv000720rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1241)
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 10 21:53:23 PeeLoW kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000dffff window]
Oct 10 21:53:23 PeeLoW kernel: caller _nv000720rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0xffff:1211)
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

 

These are the eroors, seems like a BIOS issue.

Please also make sure that you have Above 4G Decoding enabled in your BIOS.

Link to comment
29 minutes ago, ich777 said:

This seems to me that the card don't work in all slots:

Oct 10 21:53:02 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0xffff:1211)
Oct 10 21:53:02 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 10 21:53:03 PeeLoW kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000dffff window]
Oct 10 21:53:03 PeeLoW kernel: caller _nv000720rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 21:53:03 PeeLoW kernel: NVRM: GPU at PCI:0000:01:00: GPU-8e555b0d-adbf-c913-e46f-820a8f9ad65e
Oct 10 21:53:03 PeeLoW kernel: NVRM: Xid (PCI:0000:01:00): 62, pid=14325, 0a7b(14c8) 00000000 00000000
Oct 10 21:53:15 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1241)
Oct 10 21:53:15 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 10 21:53:15 PeeLoW kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000dffff window]
Oct 10 21:53:15 PeeLoW kernel: caller _nv000720rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1241)
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 10 21:53:23 PeeLoW kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000a0000-0x000dffff window]
Oct 10 21:53:23 PeeLoW kernel: caller _nv000720rm+0x1ad/0x200 [nvidia] mapping multiple BARs
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0xffff:1211)
Oct 10 21:53:23 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

 

These are the eroors, seems like a BIOS issue.

Please also make sure that you have Above 4G Decoding enabled in your BIOS.

I have changed back to the old slot which has been working before and the bios is at the same version it worked before.

Link to comment
1 minute ago, PeeLoW said:

I have changed back to the old slot which has been working before and the bios is at the same version it worked before.

So I assume it is not working?

Can you please post update Diagnostics with the card installed in the old slot?

 

Did you add any other cards or make any other changes?

Please make sure that you've enabled Above 4G Decoding and Resizable BAR support in the BIOS.

Link to comment
35 minutes ago, ich777 said:

So I assume it is not working?

Can you please post update Diagnostics with the card installed in the old slot?

 

Did you add any other cards or make any other changes?

Please make sure that you've enabled Above 4G Decoding and Resizable BAR support in the BIOS.

Thanks for the help so far! I can't check the bios right now but will later today. Here is the diagnostics.

peelow-diagnostics-20231011-0856.zip

Link to comment
45 minutes ago, PeeLoW said:

Thanks for the help so far! I can't check the bios right now but will later today. Here is the diagnostics.

Please look at this table: Click

 

The syslog lists Xid error 62 (second line):

Oct 11 08:13:56 PeeLoW kernel: NVRM: GPU at PCI:0000:01:00: GPU-8e555b0d-adbf-c913-e46f-820a8f9ad65e
Oct 11 08:13:56 PeeLoW kernel: NVRM: Xid (PCI:0000:01:00): 62, pid=18862, 0a7b(14c8) 00000000 00000000
Oct 11 08:14:08 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1241)
Oct 11 08:14:08 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

 

Which means:

Internal micro-controller halt, which seems to be caused by a HW Error, Driver Error or Thermal Issue.

 

But keep in mind that this can be also caused by your BIOS and conflicting resources which too me seems more likely to be the case.

Link to comment
24 minutes ago, ich777 said:

Please look at this table: Click

 

The syslog lists Xid error 62 (second line):

Oct 11 08:13:56 PeeLoW kernel: NVRM: GPU at PCI:0000:01:00: GPU-8e555b0d-adbf-c913-e46f-820a8f9ad65e
Oct 11 08:13:56 PeeLoW kernel: NVRM: Xid (PCI:0000:01:00): 62, pid=18862, 0a7b(14c8) 00000000 00000000
Oct 11 08:14:08 PeeLoW kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1241)
Oct 11 08:14:08 PeeLoW kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

 

Which means:

Internal micro-controller halt, which seems to be caused by a HW Error, Driver Error or Thermal Issue.

 

But keep in mind that this can be also caused by your BIOS and conflicting resources which too me seems more likely to be the case.

Thanks for the insights i will look in to this! Really appriciate your help!

Link to comment

Hi all,

 

this might be a stupid question since I am new to Unraid. My setup is HPE Microserver Gen10+ v2 Xeon processor and Nvidia GPU T400 via PCIe.

 

I run Home Assistant as a VM and Plex as Docker.

In Home Assistant I have Frigate as Plugin, which needs the GPU for decoding/encoding video stream of my cameras.

Plex needs the GPU for that as well.

 

Do I need the driver now or since the card should work on VM and Docker, not recommended?

 

Thank you and warm regards

Sebastian

Link to comment
5 hours ago, tischer.s said:

Do I need the driver now or since the card should work on VM and Docker, not recommended?

 

you would need the driver to run it in docker/s, for VM's you only passthrough the card ... (see page 1, post 1)

 

but you CANT run this simultan .... either the card is passed to a VM (while running in a ONE VM the card is vfio bound and not useable for the host, dockers, other VM's ...) OR you run it on the host for host usage, docker/s, ... also, page 1 ...

 

so you have to decide if you want to use it in your HA VM ... OR .... for docker/s like Plex.

 

may as note, you can use the card for multiply Dockers simultan ... that would work, sample Plex, Emby, Jelly, ... could simultan make use of your ONE GPU ... but VM's ... ONLY 1 VM simultan ... no 2nd VM, no docker/s, no ... at least while its running.

 

in the end, either look if HA Docker could make use of frigate with NVENC (i assume not) or get a 2nd card for your usecase ...

 

or may read into Coral TPU ... may that would help too in your usecase with Frigate ...

https://docs.frigate.video/frigate/hardware/

  • Like 1
Link to comment
6 hours ago, alturismo said:

you would need the driver to run it in docker/s, for VM's you only passthrough the card ... (see page 1, post 1)

 

Thanks a lot for the extensive reply and your explanation :) I will do it like you said...

 

- VM with HASS and Frigate Proxy Addon

- Docker for Frigate (with Coral & GPU usage), Plex etc.

 

I will need another Windows VM but since the HP Microserver doesn't support Bifurcation of the INTEL processor gpu this need to run just on the processor. I only need that for KNX ETS, so that should be no problem I hope :)

Link to comment
2 hours ago, tischer.s said:

 

- VM with HASS and Frigate Proxy Addon

Why not use HomeAssistant in a Docker container or in a LXC container? This would solve your problems because you then can use the GPU for both containers or even more containers.

 

A VM is most certainly for HomeAssistant a waste of resources.

Link to comment
On 10/14/2023 at 2:41 PM, ich777 said:

Why not use HomeAssistant in a Docker container or in a LXC container? This would solve your problems because you then can use the GPU for both containers or even more containers.

 

A VM is most certainly for HomeAssistant a waste of resources.

 

The problem with HASS as docker is, that it doesn't support AddOns which are partly necessary :(

 

I will check it when I have all my hardware available :D 

Link to comment
24 minutes ago, tischer.s said:

VSCodeServer (tightly integrated)

Why not on Unraid?

 

25 minutes ago, tischer.s said:

SkyConnect Zigbee

Why not pass that over to a Docker?

 

25 minutes ago, tischer.s said:

Cloudflared

Why not on Unraid?

 

25 minutes ago, tischer.s said:

MQTT (this can be easily excluded :D)

Can be also done on Unraid (but let's exclude it here :D )?

 

26 minutes ago, tischer.s said:

HASS Google Drive Backup

Why not with Unraid (rclone)?

 

 

I don't understand why someone is doing it like that, Unraid has apps for all of your things and you are basically wasting a lot of resources with a VM...

 

grafik.png.233b2c767fddb941ffcc0b872c1cb698.png

grafik.png.5bfec68f42dce14d0492036ae220bcdb.png

grafik.png.5c0f66a36a78f892feed416486aba9ee.png

grafik.png.25e26ce81f7ad70fb27d807ff6442c0a.png

 

 

Anyways, this should not be part of this support thread.

  • Like 1
Link to comment
7 hours ago, ich777 said:

I don't understand why someone is doing it like that, Unraid has apps for all of your things and you are basically wasting a lot of resources with a VM...

 

I am really thankful for your help and opinion - I am totally new to Unraid to be honest, so you are totally right :) I will definitely test the docker version, no doubt.

The only thing I am not sure about, is the GoogleDrive Backup addon, since it is not "just" a file copy but a specific HomeAssistant restorable backup. But I can find a solution for that ;)

Link to comment
1 hour ago, tischer.s said:

I am totally new to Unraid to be honest, so you are totally right :) I will definitely test the docker version, no doubt.

If you are using the HA VM and use their addons these are basically are also all Dockers, I even think that the MQTT one is completely the same as in the CA App.

 

1 hour ago, tischer.s said:

The only thing I am not sure about, is the GoogleDrive Backup addon, since it is not "just" a file copy but a specific HomeAssistant restorable backup.

I think you can also install that through HACS which is also compatible with the Docker version from HA, however it is usually a simple file copy thing to driver, maybe it is zipped or something like that but the principle is basically the same.

Link to comment

Nvidia Driver:
image.png.f6f5aa8d4c65d63ecf329ff3f1df4924.png

 

root@Vesper:~# nvidia-smi
Wed Oct 18 11:46:10 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01             Driver Version: 535.113.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GT 1030         Off | 00000000:01:00.0 Off |                  N/A |
| 44%   31C    P0              N/A /  30W |      0MiB /  2048MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
 

Homebridge Docker

image.png.0bc8e72f8a9d383008989ad47a6ff724.png

image.png.81c65a7c495d6a69803cbd75e4474131.png

 

Homebridge config (cameras are all available in Homekit, works great):

 {

            "controllers": [

                {

                    "address": "192.168.69.2",

                    "password": "***",

                    "username": "***"

                }

            ],

            "_bridge": {

                "username": "***",

                "port": 48233

            },

            "platform": "UniFi Protect",

            "options": [

                "Enable.Video.Transcode.Hardware.7483C271B443",

                "Enable.Video.Transcode.7483C271B443",

                "Enable.Video.Transcode.Hardware",

                "Enable.Video.Transcode"

            ]

        }

 

Error from Homebridge on Docker start:
[10/18/2023, 8:00:48 AM] [homebridge-unifi-protect] Hardware-accelerated decoding and encoding using qsv will be unavailable: unable to successfully validate capabilities.

 

Did I skip a step?

Link to comment
2 hours ago, bjsmith911 said:

Did I skip a step?

As the error implies:

2 hours ago, bjsmith911 said:

Hardware-accelerated decoding and encoding using qsv will be unavailable: unable to successfully validate capabilities.

It seems like the container want to use QSV (Quick Sync = Intel), maybe a configuration error but I can't tell because I don't know the container and this question would be better suited for the support thread from the container.

 

Please keep in mind that the GT1030 can only transcode h264 not h265 because it's based on the now very old Kepler architecture.

Link to comment
1 minute ago, ich777 said:

As the error implies:

It seems like the container want to use QSV (Quick Sync = Intel), maybe a configuration error but I can't tell because I don't know the container and this question would be better suited for the support thread from the container.

Oh. That's what QSV is. I gotcha; looking for the Intel iGPU (which I wasn't using as that's been assigned to Plex). Thank you!

Link to comment
5 hours ago, ich777 said:

Please keep in mind that the GT1030 can only transcode h264 not h265 because it's based on the now very old Kepler architecture.

 

@bjsmith911 the GT1030 has no NVENC at all ... so no transcode at all, only decode h264 and hevc h265 but no encode ... so prolly no card for your "workload" ... rather look to get your igpu also in there (iGPU can run simultan plex and other dockers)

  • Like 1
Link to comment

Thank you for this! Made it super easy to get going. I have a p400 for jellyfin transcoding.. is it preferred to have above 4g decoding on/resizable bar on for this use case? 

 

EDIT: searched the thread where you mentioned to other people to enable it. I'll do the same.

Edited by faptaincrunch
  • Like 1
Link to comment
7 minutes ago, faptaincrunch said:

is it preferred to have above 4g decoding on/resizable bar on for this use case? 

sometimes its necessarry (hw related) ... if its working now like this, for encoding processes its no benefit ... VM usage ... there it would be a benefit but thats a totally diff usecase ;)

  • Like 1
  • Thanks 1
Link to comment

Hey guys, I seem to have trouble getting hardware transcoding working with plex. Right now, all files that need to be converted become transcoded via the CPU. So, my diagnostics and configuration are attached.

The error log I get through plex:
Oct 20, 2023 16:41:45.123 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Cannot load libcuda.so.1
Oct 20, 2023 16:41:45.123 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Could not dynamically load CUDA
Oct 20, 2023 16:41:45.140 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Cannot load libcuda.so.1
Oct 20, 2023 16:41:45.140 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Could not dynamically load CUDA
Oct 20, 2023 16:41:45.146 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Cannot load libcuda.so.1
Oct 20, 2023 16:41:45.146 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Could not dynamically load CUDA
Oct 20, 2023 16:41:45.156 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Cannot load libcuda.so.1
Oct 20, 2023 16:41:45.156 [22821048994616] ERROR - [Req#44b3/Transcode] [FFMPEG] - Could not dynamically load CUDA

Things I have tried over the last two weeks so far:

  1. Restarting the server.
  2. Restarting the docker container plex. 3
  3. Double checked that yes, I do have a plex pass and that I have turned on hardware transcoding in the settings.
  4. Deleting the codecs folder and letter the docker container recreate them with a restart.
  5. Updating drivers in the nvidia plugin.
  6. Switching to older and different drivers in the nvidia plugin.
  7. Uninstalling the plugin, turning off docker containers, reinstalling the plugin, restarting, updating the drivers, and enabling docker containers again.
  8. Remapping the transcoding to happen on RAM via /tpm.
  9. Pulling my Nvidia 1080 card from a vm passthrough and adding it to the pile that can transcode (wondering if my M2000 was the problem) and setting Plex to transcode with all GPUs.
  10. Switching which docker container I used from Plex-Media-Server to just linuxserver's Plex.
  11. Maybe a few other things I forgot that I've done in the last two weeks.

Can someone please help me find a solution to this mission impossible? I feel like with two different dockers, reinstalling the plugins, and with two different working drives, I would have found the problem by now.

 

As a side note, I hope this isn't related but I'll put it here. I started seeing this issue come up when one of my cache drives failed and started corrupting my entire appdata share. I ended up deleting the whole thing since not even appdata backup could save it--which was devastating in terms of server progress. However, ever since I rebuilt the docker container from scratch, I have never been able to get the docker to transcode using the hardware again.



rocky-diagnostics-20231020-1633.zip

Screenshot 2023-10-20 163634.png

Screenshot 2023-10-20 164007.png

image.png

Link to comment
17 minutes ago, alturismo said:

may look at page 1 from this thread, there is a tutorial incl. pics howto add the NV GPU ... 


I followed it... multiple times throughout this process. My setup should be correct. I should add, I once had it working but then one day I noticed that my cpu was transcoding a file and then I realized that something stopped working along the way.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...