[Plugin] Nvidia-Driver


ich777

Recommended Posts

2 hours ago, Jacon said:

This started after moving from 6.11.2 to 6.11.4 and I haven't had any trouble before what appears to be the last upgrade.

may upgrade to 6.11.5 which is the latest, actually shouldnt be the issue but who knows ;)

 

and yes, looks like some memory write error, as afaik nothing changed kernel wise from 6.11.2 > 6.11.4(5) ...  last change was to 6.11.2 but as its been working until there ... hard to say ...

  • Like 1
Link to comment

Hello,

I've changed my graphic card from a GTX970 to an NVidia Tesla P4 on my Unraid server.

It's been detected from the NVidia Driver Plugin :

 

immagine.thumb.png.168594b56ab89b1f363534a4b2723cb2.png

 

I've change the UUID of the Graphic cards inside my PLEX Container, but it doesn't transode anymore through GPU.

immagine.png.fbd52f6abca14959779e602053f1038a.png

 

Can someone please help me ? Thanks 🙂

 

Link to comment
5 minutes ago, ich777 said:

These cards are not supported by this plugin (because the driver simply doesn’t supports the card Click).

Only consumer and workstation cards are supported.

 

A Tesla is a Datacenter card…

Hello,

 

Thanks for the answer.

I bought it cause it energy efficient and can transcode alot 🙂

 

So there is no way to let it work with UnRaid then?

Link to comment
Just now, TDA said:

So there is no way to let it work with UnRaid then?

Not currently AFAIK.

The main issue with these kind of cards that I don't own one and I can't test if something isn't working or goes wrong.

I even don't know if they will work with the default container runtime that is needed (it however should be supported).

 

Just now, TDA said:

I bought it cause it energy efficient and can transcode alot 🙂

Wouldn't it be better to use a Quadro instead?

Link to comment
7 minutes ago, ich777 said:

Not currently AFAIK.

The main issue with these kind of cards that I don't own one and I can't test if something isn't working or goes wrong.

I even don't know if they will work with the default container runtime that is needed (it however should be supported).

 

Wouldn't it be better to use a Quadro instead?

P4 costs around 200$ a quadro with the same specific over 2k.

Max W of the P4 is 75W

Link to comment
9 minutes ago, TDA said:

P4 costs around 200$ a quadro with the same specific over 2k.

Yes, but I hope you understand my point of view, buying a card for that I don't even have a specific use case doesn't make any sense.

 

I really don't like creating things where I don't have hardware on hand to test it myself (eg. AMD Vendor Reset <- really hard to troubleshoot for me because I don't own such a card and even if people around which say that they would test and help me if something pops up are most of the times not available or have sold their hardware too and can't/won't help me when an issue for someone else comes up).

 

Sure it should be possible to create such a driver package for example I already have a routine for the Open Source Kernel Module ready but the Kernel Module isn't really ready because it doesn't work as of time of writing -> see this issue Click

This drivers I can actually test but as said above not the Tesla cards, the same also applies for Grid cards.

Link to comment
29 minutes ago, ich777 said:

Yes, but I hope you understand my point of view, buying a card for that I don't even have a specific use case doesn't make any sense.

 

I really don't like creating things where I don't have hardware on hand to test it myself (eg. AMD Vendor Reset <- really hard to troubleshoot for me because I don't own such a card and even if people around which say that they would test and help me if something pops up are most of the times not available or have sold their hardware too and can't/won't help me when an issue for someone else comes up).

 

Sure it should be possible to create such a driver package for example I already have a routine for the Open Source Kernel Module ready but the Kernel Module isn't really ready because it doesn't work as of time of writing -> see this issue Click

This drivers I can actually test but as said above not the Tesla cards, the same also applies for Grid cards.

Yes I fully understand.

is there any chance I could be helpful? 
cause i bought the card and is now useless xD

i could do test for you , send report or whatever

Link to comment
10 minutes ago, TDA said:

cause i bought the card and is now useless xD

i could do test for you , send report or whatever

I will have to look into this, but please keep in mind I really don't know if I have enough time this week.

 

This can take some time since I have think about how to implement this in the existing plugin because I don't think that it would be beneficial if a new plugin is created.

 

Please write me a short PM so that I don't forget about that... :)

  • Like 1
Link to comment
1 hour ago, Jacon said:

Here you go!  I ran several Plex sessions on different clients to get the problem to occur more than once.

Is this only happening with a specific movie you try to transcode? What Plex container are you using? Is the container up to date, it seems like that Plex itself or at least his encoder is causing the issue.

 

Can you maybe try to install Jellyfin (just for testing purposes) and see if the same happens there? <- I would recommend that you use the official container:

grafik.png.412eb681230dc4e03ec4e51ad92035eb.png

Link to comment
4 minutes ago, ich777 said:

Is this only happening with a specific movie you try to transcode? What Plex container are you using? Is the container up to date, it seems like that Plex itself or at least his encoder is causing the issue.

 

Can you maybe try to install Jellyfin (just for testing purposes) and see if the same happens there? <- I would recommend that you use the official container:

grafik.png.412eb681230dc4e03ec4e51ad92035eb.png

Happens on all movies and I'm on the latest Official Plex Container.  I will try downgrading to an earlier 1.29.2.x version and report back.

Edited by Jacon
Link to comment
3 minutes ago, Jacon said:

Happens on all movies and I'm on the latest Official Plex Container.  I will try downgrading to an earlier 1.29.2.x version and report back.

I have to say, between Unraid 6.11.2 and the newer versions in terms of the Nvidia driver and Kernel nothing is different because all versions are using Kernel 5.19.17

Link to comment
35 minutes ago, ich777 said:

I have to say, between Unraid 6.11.2 and the newer versions in terms of the Nvidia driver and Kernel nothing is different because all versions are using Kernel 5.19.17

Im also noticing that Plex is reverting to software transcoding yet the GPU Statistics plugin is registering the process and that Plex has an active stream. Im officially puzzled.

Link to comment
1 hour ago, Jacon said:

Im also noticing that Plex is reverting to software transcoding yet the GPU Statistics plugin is registering the process and that Plex has an active stream. Im officially puzzled.

Are you really sure that it is SW transcoding and not HW transcoding? Is it possible that the file that you are trying to transcode has baked in sub titles and this is using your CPU heavily.

This is a known issue with Plex, at least to my knowledge, I'm not really a Plex guy, I mainly use Emby.

Link to comment

Just installed a P100 from a old Maxwell card that was used for transcoding. Getting the driver error:

"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

 

Have already checked connections and been through the VFIO unbind troubleshooting and rebooted. Still same message.

 

Attached diagnostics

media-diagnostics-20221124-0714.zip

Link to comment
29 minutes ago, Joker169 said:

Have already checked connections and been through the VFIO unbind troubleshooting and rebooted. Still same message.

Your card is still bound to VFIO:

42:00.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
    Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:118f]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidia_drm, nvidia

 

You can find the file which binds the device to VFIO at /boot/config/vfio-pci.cfg these are the contents of the file:

BIND=0000:42:00.0|10de:15f8

 

Link to comment
28 minutes ago, ich777 said:

Your card is still bound to VFIO:

42:00.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)
    Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:118f]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidia_drm, nvidia

 

You can find the file which binds the device to VFIO at /boot/config/vfio-pci.cfg these are the contents of the file:

BIND=0000:42:00.0|10de:15f8

 

:facepalm: I must have clicked the box, should be unbound now right? Looked in the system devices and didnt see anything bound. New to this, but looks unbound to me. Still getting the error, after unchecking and reboot.

media-diagnostics-20221124-0816.zip

Link to comment
16 minutes ago, Joker169 said:

:facepalm: I must have clicked the box, should be unbound now right? Looked in the system devices and didnt see anything bound. New to this, but looks unbound to me. Still getting the error, after unchecking and reboot.

Something seems wrong with your BIOS configuration:

Nov 24 08:15:23 Media kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov 24 08:15:23 Media kernel: NVRM: BAR1 is 0M @ 0x0 (PCI:0000:42:00.0)
Nov 24 08:15:23 Media kernel: nvidia: probe of 0000:42:00.0 failed with error -1

 

Please make sure that you've enabled:

  • Above 4G Decoding
  • Resizable BAR

in your BIOS, since this is a Dell Motherboard I really don't know if you even have this options or if they are named differently.

  • Thanks 1
Link to comment
39 minutes ago, ich777 said:

Something seems wrong with your BIOS configuration:

Nov 24 08:15:23 Media kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov 24 08:15:23 Media kernel: NVRM: BAR1 is 0M @ 0x0 (PCI:0000:42:00.0)
Nov 24 08:15:23 Media kernel: nvidia: probe of 0000:42:00.0 failed with error -1

 

Please make sure that you've enabled:

  • Above 4G Decoding
  • Resizable BAR

in your BIOS, since this is a Dell Motherboard I really don't know if you even have this options or if they are named differently.

Hey,

 

Dell call those thing something else weird( because dell... 🙄 ), but somewhat similar. changed and now no error and everything shows up.

Thanks for all the help!

  • Like 1
Link to comment
On 11/23/2022 at 12:17 PM, ich777 said:

Are you really sure that it is SW transcoding and not HW transcoding? Is it possible that the file that you are trying to transcode has baked in sub titles and this is using your CPU heavily.

This is a known issue with Plex, at least to my knowledge, I'm not really a Plex guy, I mainly use Emby.

I'm positive.  It attempts to access the GPU and when the process fails, it reverts to the CPU.  No subtitles being forced.

 

I uninstalled the docker, cleared the docker, restarted and reinstalled.  Still getting the same errors with the same description in syslog.

 

Link to comment
On 11/15/2020 at 2:22 PM, ich777 said:
  1. Add '--runtime=nvidia' in your Docker template in 'Extra Parameters' (you have to enable 'Advanced view' in the template to see this option)
  2. Add a variable to your Docker template with the Key: 'NVIDIA_VISIBLE_DEVICES' and as Value: 'YOURGPUUUID' (like 'GPU-9cfdd18c-2b41-b158-f67b-720279bc77fd')
  3. Add a variable to your Docker template with the Key: 'NVIDIA_DRIVER_CAPABILITIES' and as Value: 'all'
  4. Make sure to enable hardware transcoding in the application/container itself

 

 

So I couldn't find if this has been asked before when I searched but I am trying to add multiple GPU to single docker. I used above and video and got one to successfully work but can't seem to add the second. Is there a way for a docker container to directly communicate with NVIDIA SMI? (Docker I am using is Tdarr and I have 3x 3080 GPUs). Thanks!

Link to comment

Hello,

Can I get plugin file from your github repository and install at plugin-->install plugin page?because I cant use CA it shows SSL verfication failure.😔

After use this method,it shows

915855750_ZB@0818XJR1@V6LGJL5L4T.png.4a693255241bc1e25c83b3094fcb4e6b.png

clicked done button,then reboot server,switched docker off and on. this showed up

2073493075_1JU)7VNPOADFRNCG1UJ)9.png.1105cefe77a225372ab06d6cfb8f6523.png

as I reboot my kvm output a line

1642698619_PBP5W0TV(OTNUN5ECPG.png.d27e6f19426f3b429a3f1b4fbb031876.png

Is that my method is wrong? here is my diagnostics.

btw my card is tesla P4🙂

stannas-diagnostics-20221125-0948.zip

Link to comment

Thank you so much for the plugin, I have a question

My graphics card is Nvidia T600
I installed your plugin correctly and can use it to transode in Jellyfin
But my syslog keeps reporting the error

26693652024601.png.9594439ff0f5db17efb23c2294db175c.png

Nov 25 16:15:03 LeoreyHome kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)

Nov 25 16:15:03 LeoreyHome kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Nov 25 16:15:03 LeoreyHome kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)

_16693651271962.thumb.png.67718db9107721a72ae94253fea1af5c.png

 

My Unriad version is 6.11.5

My motherboard model is
ASUSTeK COMPUTER INC. PRIME Z690M-PLUS D4
The version of BIOS is the latest version (2014)

My Cpu is 13600K

 

336693652752756.png.db3d8a87d04dc1d0bb8bd7f1a5ad74ef.png

Errors are happening every second. They have seriously affected my normal use of Unraid. They will fill up the log memory and cause the system to go down.

 

I've tried the following, but all to no avail:
Replace each version of the graphics card driver, v525.53, v515.86.01, v520.56.06

I even observed a strange phenomenon,When the graphics card is in transode , the error message stops briefly

 

I still have a GT1030, but it cannot be encoded, it is normal under the v525.53 driver and no error is reported.

 

Your help is requested, many thanks

Edited by crushleorey
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.