[Plugin] Nvidia-Driver


ich777

Recommended Posts

All the previous files that weren't working before are working now on 470.  IDK, I'm also confused.

 

The only thing different this time is that before downgrading to 470, I uninstalled the plugin, rebooted, installed the plugin, selected 470, rebooted, then refreshed the docker image after reboot.

 

Just checked this morning and NVIDIA deprecated 515.86.01 from Production.  525.60.11 is now Production: https://www.nvidia.com/download/driverResults.aspx/196723/en-us/

Edited by Jacon
Link to comment
12 hours ago, Jacon said:

All the previous files that weren't working before are working now on 470.  IDK, I'm also confused.

 

The only thing different this time is that before downgrading to 470, I uninstalled the plugin, rebooted, installed the plugin, selected 470, rebooted, then refreshed the docker image after reboot.

 

Just checked this morning and NVIDIA deprecated 515.86.01 from Production.  525.60.11 is now Production: https://www.nvidia.com/download/driverResults.aspx/196723/en-us/

Upgraded to 525.60.11 tonight and was able to successfully transcode all videos.  So, I'm not sure if r515 is a bad set of drivers for my card or if there was another underlying issue that prevented the previous driver installations from running.

 

Either way, with successful r525 drivers, I'm not attempting to install r515 and risk it again.  Considering this closed.

 

Appreciate your patience @ich777

  • Like 1
Link to comment
4 hours ago, Jacon said:

Either way, with successful r525 drivers, I'm not attempting to install r515 and risk it again.

Why? Please try it I'm really curious, never heard of such an issue like you are experiencing.

 

16 hours ago, Jacon said:

Just checked this morning and NVIDIA deprecated 515.86.01 from Production.

This is not the right terminology, the older driver is still from the production branch and that a new driver is released doesn't mean automatically that 515.86.01 is deprecated...

 

16 hours ago, Jacon said:

525.60.11 is now Production

This is the new driver yes, I compiled it and uploaded the package to GitHub so that is available and in the plugin for users to download and install on Unraid.

  • Like 1
Link to comment
5 hours ago, ich777 said:

Why? Please try it I'm really curious, never heard of such an issue like you are experiencing.

I will give it a shot at a later date when my wife isn't so fed up with me troubleshooting my server for 5 days straight 🙃.  She's just happy that she can now watch her shows needing to be transcoded on the back TV and the kids can watch their movies on the main TV.

 

Quote

This is not the right terminology, the older driver is still from the production branch and that a new driver is released doesn't mean automatically that 515.86.01 is deprecated...

Understood, but I guess we have a different understanding in the definition of 'deprecated'.  They removed it from THE most recent driver tool, so, yes, it's still available but it's not the most recent and recommended production package. If I use this tool, r515 isn't listed.  They've moved onto r525 https://www.nvidia.com/download/index.aspx

 

Quote

This is the new driver yes, I compiled it and uploaded the package to GitHub so that is available and in the plugin for users to download and install on Unraid.

Separate question - Does your plugin automatically fetch the most recent driver package from Nvidia's site, or do you have to manually fetch them individually and compile them on your github?  Seems like a very manual process.

Link to comment

Hello there,

is there any chance to get access to the nvidia-settings options? I try to reduce power consumption but without access to the nvidia-settings options that will not work.

Or does anyone know another option to tweak settings without nvidia-settings?
 

cheers,

skies

Link to comment
36 minutes ago, skies said:

is there any chance to get access to the nvidia-settings options? I try to reduce power consumption but without access to the nvidia-settings options that will not work.

Not, that won't work because you need a GUI for that and even if you boot in GUI mode Unraid doesn't ship all the necessary libraries for it.

 

36 minutes ago, skies said:

Or does anyone know another option to tweak settings without nvidia-settings?

Simply run this command from a Unraid Terminal:

nvidia-persistenced

wait a minute or two and the card should go into P8 which is actually the lowest power state.

 

You can also put this command in the go file so that it persists across reboots so you don't have to enter it every time manually.

 

Please don't use this command if you are planning to use the card in a VM because this can crash your server as soon as you start the VM.

Link to comment

Thanks @ich777.

 

That is actually what I am doing at the moment (I am using "nvidia-smi --persistence-mode=1"). In this state, the card is down to 28 watts of power consumption.

However, that does not seem to be the lowest consumption. As soon as I start and stop a Windows VM with installed Nvidia-drivers, consumption goes down to 17 watts. So the Windows VM seems to make additional changes that persist even after shutting it down and loading the Unraid-nvidia-drivers again.

So I am trying to find put what the Windows VM does to lower the consumption even more.

And still, there should even be lower consumption possible (<10watts).

So for further changing clocks, timings etc, the features of Nvidia-settings seem to be required.

I am very open to any suggestions regarding that topic.

 

cheers,

skies

Link to comment
10 hours ago, ich777 said:

Please don't use this command if you are planning to use the card in a VM because this can crash your server as soon as you start the VM.

 

Using "nvidia-smi --persistence-mode=1" instead of Nvidia-persistenced (never tries that one actually) brings the card to P8 and lets you pass it through to a VM without crashes or any other additional modifications. At least for me.

 

Link to comment
1 hour ago, skies said:

Using "nvidia-smi --persistence-mode=1" instead of Nvidia-persistenced (never tries that one actually) brings the card to P8 and lets you pass it through to a VM without crashes or any other additional modifications. At least for me.

Don't use this anymore:

--persistenced-mode=1

because it is deprecated soon by Nvidia and even if you are using it, please disable it before starting a VM, you can definitely crash your entire server because persistence mode is not meant for such a use case.

 

1 hour ago, skies said:

That is actually what I am doing at the moment (I am using "nvidia-smi --persistence-mode=1"). In this state, the card is down to 28 watts of power consumption.

Where are you seeing that, in nvidia-smi or on a power meter? Is that even true if nvidia-smi displays that?

 

1 hour ago, skies said:

As soon as I start and stop a Windows VM with installed Nvidia-drivers, consumption goes down to 17 watts. So the Windows VM seems to make additional changes that persist even after shutting it down and loading the Unraid-nvidia-drivers again.

There is a little bit of misunderstanding here... The Windows driver does nothing to the Unraid Nvidia Driver, it does something to the card that changes the power consumption, if that's even true what nvidia-smi displays.

This behavior is also not consistent between all cards and it seems to me that it has something to do with how it's implemented in the BIOS from the GPU itself, but I really can't tell for sure.

 

You have to understand that the card isn't fully initialized by nvidia-persistenced, it tricks the card only into a state where it thinks it sits idle on the desktop and that's why it's pulled down to P8 but as said above nvidia-persistenced doesn't initalize the card fully.

 

Anyways if you are really sure that the power consumption is higher (measured on from the wall) then do something like that in a user script at startup:

virsh start VMNAME
sleep 120
virish shutdown VMNAME

This will basically start the VM once, wait 120 seconds so that it is fully started (in case Windows Update kicks in) and shutdown the VM again after said 120 seconds.

Link to comment
10 hours ago, ich777 said:
12 hours ago, skies said:

That is actually what I am doing at the moment (I am using "nvidia-smi --persistence-mode=1"). In this state, the card is down to 28 watts of power consumption.

Where are you seeing that, in nvidia-smi or on a power meter? Is that even true if nvidia-smi displays that?

 

Both. Nvidia-smi says 28 before / 17-18 after, my measuring power plug shows about 10W less.

And after all the fiddling, 18w is the lowest I see with the 3090 (Gigabyte)

 

Link to comment

Just installed the Nvidia-Driver plugin and I can no longer access the Web GUI - it locked up and now loads indefinitely when trying to access from a new browser tab.

 

Any advice? I can try connecting an old keyboard/mouse to the server and connect to the TV next to my NAS but it's not booted in GUI mode. I'd rather avoid a hard shutdown.

 

Edit: no output from HDMI. I can connect to my shares via SMB still. I tried telnet on my old MacBook and the connection was refused. Have not tried SSH.

 

Edit2: Oddly I was able to connect via web GUI on another device. Not sure why it stopped working on my other computer. I uninstalled the plugin and initiated a reboot that for some reason is taking ages to complete. It just says "System is going down..." and has been counting for like five minutes.

Edited by Kyle W
Link to comment
6 hours ago, Kyle W said:

Just installed the Nvidia-Driver plugin and I can no longer access the Web GUI - it locked up and now loads indefinitely when trying to access from a new browser tab.

6 hours ago, Kyle W said:

Edit2: Oddly I was able to connect via web GUI on another device. Not sure why it stopped working on my other computer.

Can you please share a bit more information about your system? Can you maybe post your Diagnostics, even without the Nvidia Plugin installed?

 

It seems a bit of a coincidence that the WebGUI stopped working while installing the plugin, where you able to do everything from the other computer? The plugin should not harm the WebGUI whatsoever, maybe if the module crashes but even then you shouldn't be able to connect to Unraid at all.

Next time please try to ssh into your server and pull the Diagnostics from the CLI by simply typing in:

diagnostics

they will be saved to your USB Boot device.

 

6 hours ago, Kyle W said:

Edit: no output from HDMI. I can connect to my shares via SMB still. I tried telnet on my old MacBook and the connection was refused. Have not tried SSH.

Do you have telnet enabled? AFAIK it is now disabled by default.

 

Can you try to install the plugin again and pull the Diagnostics again so that I can see whats going on there?

Did you install the plugin because you want to enable transcoding?

Link to comment

Last night after attempting to trigger a system shutdown and waiting for about 5 minutes, I was still able to access the GUI from my MacBook so I stopped the array and tried again. I waited for another few minutes but ended up just doing a hard shutdown. After that, everything seems to be working fine including my Macinabox VM.

 

Tbh I'm not sure what purpose I have for this plugin at the moment, I threw an old Quadro P4000 into my NAS just for fun. I've only had this system running for about two weeks and I'm pretty new to Unraid in general, though I'm considering getting Plex running and/or hosting some game servers. I don't need to pass through the GPU to my Mac VM since it's just for a BlueBubbles client. It's a Ryzen 5700G, 32GB system with 3 8TB drives in the array and a 1TB NVMe cache along with the aforementioned Quadro P4000.

 

I've attached diagnostics, but to be honest I'm not comfortable installing this plugin again right now. I was not sure how to SSH into the server last night (I will be installing PuTTY on my Windows machines today) but I did check just now and both SSH and Telnet are disabled.

server-diagnostics-20221201-0935.zip

Link to comment
2 hours ago, Kyle W said:

I've attached diagnostics, but to be honest I'm not comfortable installing this plugin again right now.

That's why it would be great to have Diagnostigs to see what happened, maybe the driver crashed but if it crashes there is most of the time something wrong with the hardware itself, are you sure that the P4000 is working correctly?

I can only tell you from the download counts for the driver packages for Unraid 6.11.5 about 6.000 to 10.000 people are using the plugin and I assume that they have no issue whatsoever...

 

2 hours ago, Kyle W said:

I've attached diagnostics, but to be honest I'm not comfortable installing this plugin again right now. I was not sure how to SSH into the server last night (I will be installing PuTTY on my Windows machines today) but I did check just now and both SSH and Telnet are disabled.

From a MAC this should work:

ssh [email protected]

 

You can even try the same on a Terminal from a Windows machine (WSL is installed on most Windows instances nowadays by default).

Link to comment

I've noticed latley my nvidia rtx a2000 seems to having more like it's been passsedthrough to a VM when used in dockers.

 

Steamheadless is running and making use of the card. When that happens the card no longer outputs a video signal to the TV casuing this code error to appear in the unraid log:

Dec  3 09:55:32 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:55:32 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:55:35 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:55:35 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:55:35 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:55:35 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:55:36 Moulin-Rouge kernel: traps: light-locker[24133] trap int3 ip:14d76b777ca7 sp:7ffce24d21e0 error:0 in libglib-2.0.so.0.6600.8[14d76b73b000+88000]
Dec  3 09:55:51 Moulin-Rouge  emhttpd: read SMART /dev/sdf
Dec  3 09:55:58 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:55:58 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:56:00 Moulin-Rouge kernel: traps: xdg-desktop-por[25388] trap int3 ip:153139c26ca7 sp:7ffdf0571510 error:0 in libglib-2.0.so.0.6600.8[153139bea000+88000]
Dec  3 09:56:00 Moulin-Rouge kernel: traps: xdg-desktop-por[25349] trap int3 ip:1522ceb10ca7 sp:7fffb2f9a690 error:0 in libglib-2.0.so.0.6600.8[1522cead4000+88000]
Dec  3 09:56:26 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:56:26 Moulin-Rouge kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0
Dec  3 09:56:36 Moulin-Rouge kernel: fossilize_repla[27672]: segfault at dc0 ip 0000561cb3c07ead sp 000015249261c940 error 4 in fossilize_replay[561cb3bbb000+23e000]
Dec  3 09:56:36 Moulin-Rouge kernel: Code: 00 85 c0 74 13 48 8b 76 08 e9 0f ed ff ff 0f 1f 80 00 00 00 00 45 31 c0 44 89 c0 c3 90 8b 46 10 85 c0 75 61 55 53 48 83 ec 08 <48> 8b 2f 0f b6 85 c1 0e 00 00 84 c0 74 0d 48 83 c4 08 5b 5d c3 66
Dec  3 09:56:36 Moulin-Rouge kernel: fossilize_repla[27670]: segfault at dc0 ip 0000561cb3c07ead sp 00001524ac676940 error 4 in fossilize_replay[561cb3bbb000+23e000]
Dec  3 09:56:36 Moulin-Rouge kernel: Code: 00 85 c0 74 13 48 8b 76 08 e9 0f ed ff ff 0f 1f 80 00 00 00 00 45 31 c0 44 89 c0 c3 90 8b 46 10 85 c0 75 61 55 53 48 83 ec 08 <48> 8b 2f 0f b6 85 c1 0e 00 00 84 c0 74 0d 48 83 c4 08 5b 5d c3 66
Dec  3 09:56:37 Moulin-Rouge kernel: fossilize_repla[27669]: segfault at dc0 ip 0000561cb3c07ead sp 000015249261c940 error 4 in fossilize_replay[561cb3bbb000+23e000]
Dec  3 09:56:37 Moulin-Rouge kernel: Code: 00 85 c0 74 13 48 8b 76 08 e9 0f ed ff ff 0f 1f 80 00 00 00 00 45 31 c0 44 89 c0 c3 90 8b 46 10 85 c0 75 61 55 53 48 83 ec 08 <48> 8b 2f 0f b6 85 c1 0e 00 00 84 c0 74 0d 48 83 c4 08 5b 5d c3 66

 

It didn't happen before so trying to work out if its a bug in Unraid 6.11.5 or steamheadless bug or driver bug?

Edited by dopeytree
Link to comment
7 minutes ago, dopeytree said:

I've noticed latley my nvidia rtx a2000 seems to having more like it's been passsedthrough to a VM when used in dockers.

I don‘t understand, what do you mean with that?

If you‘ve passed it through to a VM it wouldn‘t behave like that.

 

7 minutes ago, dopeytree said:

Steamheadless is running and making use of the card. When that happens the card no longer outputs a video signal to the TV casuing this code error to appear in the unraid log:

Is your card used as the primary display output for the console?

 

If yes, this seems pretty normal to me since I assume that the container is taking over one or more outputs of the card but there is nothing I can do about that because this has nothing to do with this plugin. Your question would be better suited on the Support Thread from Steam Headless.

 

EDIT: Please don‘t do double posts this is just a waste of time too, because you have to wait until you get a answer, I saw you‘ve already posted on the Steam Headless container Support Thread.

Link to comment

Hi thanks it's not passed through to a VM only to steamheadless which is a docker.

Its also not the primary GPU as there is a built in intel gpu in the i9.

The intel gpu is on a separate HDMI cable.

The nvida card uses a mini DP -> HDMI port which is always plugged in.

Will see where we get with steamheadless support. Its not a big deal just odd it stopped working.

 

Edited by dopeytree
Link to comment

Hello,
As mentionned by other user in the last post pages, I got a Tesla P4 too for $100 and I would love to be able to use it with Plex in the future on Unraid. Can do as much as a 10xx for less power draw and cheaper. Also only use 1 slot since it's a small card.
I read your post that you don't own it and only support commercial gpu but since this one is going to be sold to a lot of people on ebay maybe you can have a look at it.

 

Thank you.

Link to comment
5 hours ago, Aspiro said:

As mentionned by other user in the last post pages, I got a Tesla P4 too for $100 and I would love to be able to use it with Plex in the future on Unraid. Can do as much as a 10xx for less power draw and cheaper. Also only use 1 slot since it's a small card.

This was misinformation, the latest driver supports Tesla cards and this is the exact same driver that you can download from Nvidia.

 

Another user with a P100 confirmed that it is working, see this post:

 

...and the user that reported the issue with it's P4 also solved the issue (he forgot to put in --runtime=nvidia in the template).

  • Like 1
Link to comment

Hello!

I am reaching out to the community since I can't seem to find a solution to my error being:

 NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I understand that this is a common issue, but I believe I have ran through the common troubleshooting fixes related to this error. I don't have a bunch of time to look through every comment on this thread relating to my error, so I am creating this post hoping someone has a potential fix for me. I will attach my diagnostic logs so you all can get a clearer picture of my scenario.

Thanks!

messiah-diagnostics-20221205-0041.zip

Link to comment
28 minutes ago, SaltShakerOW said:

I am reaching out to the community since I can't seem to find a solution to my error being:

You have bound your card to VFIO:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP104 [GeForce GTX 1070] [1043:8598]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP104 High Definition Audio Controller [1043:8598]
	Kernel driver in use: vfio-pci

 

Please unbind it in Tools -> System Devices and reboot.

Link to comment
19 hours ago, ich777 said:

You have bound your card to VFIO:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP104 [GeForce GTX 1070] [1043:8598]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP104 High Definition Audio Controller [1043:8598]
	Kernel driver in use: vfio-pci

 

Please unbind it in Tools -> System Devices and reboot.

 

I thought I had unbound my card to VFIO? The box is unchecked and I rebooted unless I am looking at the wrong thing...

image.png

Edited by SaltShakerOW
Link to comment
2 hours ago, SaltShakerOW said:

I thought I had unbound my card to VFIO? The box is unchecked and I rebooted unless I am looking at the wrong thing...

If you'll bind it through the Kernel command line via syslinux.conf, like you did it after reviewing the Diagnostics again, then you would still bind it to VFIO:

BOOT_IMAGE=/bzimage initrd=/bzroot pcie_acs_override=downstream vfio-pci.ids=10de:1b81,10de:10f0

 

Please remove the VFIO binding from the syslinux.conf and reboot.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.