Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[Plugin] Nvidia-Driver

Featured Replies

On 2/28/2023 at 1:57 AM, ich777 said:

First of all I would strongly recommend that you remove the nvidia.conf file in the modprobe.d directory, you are not using the OpenSource Driver module...

 

Have you yet tried to boot with CSM (Legacy) instead of booting with UEFI mode? Please also make sure that you are on the latest BIOS version, that you've enabled Above 4G Decoding and Resizable BAR Support in the BIOS.

 

I would also try to switch from MACVLAN to IPVLAN in the Docker settings first.

 

Is this only happening with Tdarr (if yes, IIRC this is nothing new that Tdarr can crash your server but TBH I really don't know if that was fixed already).

Have you yet tried to disable Tdarr and see if this is happening too with Emby/Jellyfin/Plex?

 

Do you can test the card in another system (install the drivers and put some 3D load for about 10 minutes on it, something like FurMark should do the Job just fine).

I have removed the nvidia.conf file, I was planning to use the open source drivers in the future. I didn't think it would hurt leaving it in there.

 

I just changed it to CSM boot and it's working good, I am on the latest BIOS version, I have already enabled 4G decoding and Resizeable bar.

 

When the crashes happened, TDARR was not transcoding any videos and was not using the GPU at all. I just had another crash now, the TDARR docker was launched but not doing anything. The GPU utilization was 0. Should I still try to stop the TDARR docker and test?

 

It would be difficult, but I could try in a while from now. I was hoping I can passthrough the GPU into a Windows 10 VM and stress test it that way. The load on the GPU has been relatively low at 20% utilization max I've ever seen, and maybe maximum 80W I've ever seen too. Dumb question, but even if this was caused by the GPU and the GPU went haywire, wouldn't at most the dockers using the GPU crash, like TDARR? Why would the kernel crash as the kernel does/shouldn't rely on the GPU as it's running in headless mode?

 

During the crash that happened again, I stopped all docker containers and attempted to do a soft shutdown using the `powerdown` command and the `poweroff` command but nothing happened after waiting 15 minutes. I'm not good at Linux so I had to do a hard reboot.

I have a new diagnostics log for you, taken right after the crash.

dragon-diagnostics-20230302-0307.zip

  • Replies 5.9k
  • Views 1m
  • Created
  • Last Reply

Top Posters In This Topic

Most Popular Posts

  • To utilize your Nvidia graphics card in your Docker container(s) the basic steps are:   Add '--runtime=nvidia' in your Docker template in 'Extra Parameters' (you have to enable 'Advanced

  • Recompiled the drivers and they are now just working fine (to get it working scroll down):   Please do the following (this is only necessary if you upgraded before I recompiled the dri

  • I'm currently spinning up my build VM and compiling the drivers again, currently drivers for 6.11.0 stable are not available...

Posted Images

  • Author
25 minutes ago, LimesKey said:

When the crashes happened, TDARR was not transcoding any videos and was not using the GPU at all. I just had another crash now, the TDARR docker was launched but not doing anything. The GPU utilization was 0. Should I still try to stop the TDARR docker and test?

Then the crash is most certainly not related to the Nvidia Driver.

 

Have you yet changed MACVLAN to IPVLAN in your Docker settings? MACVLAN is notorious to crash servers with similar Kernel panics.

 

27 minutes ago, LimesKey said:

I have removed the nvidia.conf file, I was planning to use the open source drivers in the future. I didn't think it would hurt leaving it in there.

May I ask why? The open source module has no real benefit...

I'm currently having trouble trying to update the NVIDIA driver. I am running the newest plugin, 2023.03.02.

When I select "latest" or manually select "v530.30.02" it still acts like it is trying to download v470.141.03 and spits out "Can't download Nvidia Driver Package v470.141.03". Is this a plugin error or something with my server?

Screenshot 2023-03-02 101652.png

Screenshot 2023-03-02 101612.png

  • Author
1 hour ago, tusculumgolfer said:

When I select "latest" or manually select "v530.30.02" it still acts like it is trying to download v470.141.03 and spits out "Can't download Nvidia Driver Package v470.141.03". Is this a plugin error or something with my server?

I can't reproduce this on my end, I've even tried to downgrade to the driver version 470.141.03 and then set it again to latest and it properly downloaded the latest (v530.30.02) driver version.

@ich777 Thanks a lot for your answer before. I was able to compile and load the vGPU guest drivers, and they're working fine. However, I have some questions:

1) Can I post the scripts and steps to make the packages in this thread? (the user must provide the driver package from nvidia, the scripts do not download any drivers)

2) I'm using your plugin to load and manage the GUID of the GPU for docker, however I'm having an issue with the GUID: it resets every time the vm does a cold boot, which is an inconvenience, and I'm not sure about the cause of this.

Edited by midi

  • Author
25 minutes ago, midi said:

1) Can I post the scripts and steps to make the packages in this thread? (the user must provide the driver package from nvidia, the scripts do not download any drivers)

I don't think that if you post the scripts to build the non public drivers anyone can come after you as long as you post how to legally obtain a license and the vgpu binaries which are needed to do so.

 

26 minutes ago, midi said:

2) I'm using your plugin to load and manage the GUID of the GPU for docker, however I'm having an issue with the GUID: it resets every time the vm does a cold boot, which is an inconvenience, and I'm not sure about the cause of this.

Sorry, I really can't help here because I simply don't have a card that has the capabilities to make use of a vGPU and I even don't know how this is all working.

1 minute ago, ich777 said:

I don't think that if you post the scripts to build the non public drivers anyone can come after you as long as you post how to legally obtain a license and the vgpu binaries which are needed to do so.

Yeah, technically those who are into this already know where to get the drivers legally (the nvidia enterprise portal), the goal of the scripts is just patch and repack the drivers not how to get them, but I will point out how to get those.

 

3 minutes ago, ich777 said:

Sorry, I really can't help here because I simply don't have a card that has the capabilities to make use of a vGPU and I even don't know how this is all working.

There is a community project to unlock the drivers to enable the vGPU capabilities to consumer graphics if you want to test, but they also support natively supported cards (like Tesla P4/P40..., they just lift some nvidia limitations). Yes it is against Nvidia's EULA, this is why I'm not sharing any files here, but anyone can get the files legally from Nvidia's Enterprise Portal.

8 hours ago, ich777 said:

I can't reproduce this on my end, I've even tried to downgrade to the driver version 470.141.03 and then set it again to latest and it properly downloaded the latest (v530.30.02) driver version.

I've tried restarting server and still doing same thing. Have any suggestions for resolution?

  • Author
3 hours ago, tusculumgolfer said:

I've tried restarting server and still doing same thing. Have any suggestions for resolution?

Sounds a bit complicated but can you do the following:

  1. Uninstall the Plugin
  2. Reboot
  3. Install the Plugin again
  4. Reboot once more or restart the Docker service

I‘ve now tried it again on another machine and 6.11.5 and it is working there too as it should.

I will contact another user if he can test that for me too.

  • Author
5 hours ago, midi said:

There is a community project to unlock the drivers to enable the vGPU capabilities to consumer graphics if you want to test

No thank you…

 

5 hours ago, midi said:

technically those who are into this already know where to get the drivers legally (the nvidia enterprise portal), the goal of the scripts is just patch and repack the drivers not how to get them

Sure thing, I also know that but the redistribution from the drivers is not allowed AFAIK and that‘s why I have not created a plugin for that (and I really don‘t want that they take down my GitHub).

So sharing the scripts here on the forums or even in a GitRepo of yours should be fine.

13 hours ago, tusculumgolfer said:

When I select "latest" or manually select "v530.30.02" it still acts like it is trying to download v470.141.03 and spits out "Can't download Nvidia Driver Package v470.141.03". Is this a plugin error or something with my server?

 

i guess more something wrong with your flash ... working also fine here

 

just as simple testrun, downngrade to 470, up to 525, 530 now, all fine

 

image.thumb.png.3755300dc0963ded2e6272ee386893e4.png

 

image.thumb.png.7e4518fa244cb57e68cc5b5835af82aa.png

 

image.thumb.png.78cfba543744f0260eb53d526a531ea7.png

10 hours ago, midi said:

@ich777 Thanks a lot for your answer before. I was able to compile and load the vGPU guest drivers, and they're working fine. However, I have some questions:

1) Can I post the scripts and steps to make the packages in this thread? (the user must provide the driver package from nvidia, the scripts do not download any drivers)

2) I'm using your plugin to load and manage the GUID of the GPU for docker, however I'm having an issue with the GUID: it resets every time the vm does a cold boot, which is an inconvenience, and I'm not sure about the cause of this.

I have the same issues with GUID resets with cold boot as well.

How to add CUDA Toolkit and cuDNN?

  • Author
30 minutes ago, realies said:

How to add CUDA Toolkit and cuDNN?

You have to install that in the container.

Can someone confirm this will allow the use of one Nvidia card for docker, and one Nvidia card for VMs.  Also does this allow the non vm card to run the unraid gui in addition to docker. I am running a ryzen 5900x so no igpu.

 

Thanks, 

  • Author
5 hours ago, galloglypg said:

Can someone confirm this will allow the use of one Nvidia card for docker, and one Nvidia card for VMs.

Yes, this is of course possible.

 

5 hours ago, galloglypg said:

Also does this allow the non vm card to run the unraid gui in addition to docker. I am running a ryzen 5900x so no igpu.

Yes, but you have to maybe switch slots because you are forced to use the card which outputs the BIOS screen/console output from Unraid on boot.

You also have to maybe change a line in the config (if you only got a blinking cursor in GUI mode) but this is something for later.

Hi,

 

Probably VERY stupid question! (Sorry).

 

But what is the benefit of keeping the driver up to date if v470.141.03 is working as expected ?

 

I use a nvida GTX 1650 for plex transcoding in docker.

 

What are the benefits to moving to say v525.89.02 ?

 

Thanks

 

D.

 

 

  • Author
8 hours ago, BigDanT said:

But what is the benefit of keeping the driver up to date if v470.141.03 is working as expected ?

No benefits at all as long as you are using it for transcoding.

 

8 hours ago, BigDanT said:

What are the benefits to moving to say v525.89.02 ?

No benefits at all as long as you are using it for transcoding.

 

The answer changes a bit if you use it for other things than transcoding because you get performance improvements for Cuda accelerated workloads or if you are using the card with the Steam-Headless container.

On 3/2/2023 at 10:55 PM, ich777 said:

Sounds a bit complicated but can you do the following:

  1. Uninstall the Plugin
  2. Reboot
  3. Install the Plugin again
  4. Reboot once more or restart the Docker service

I‘ve now tried it again on another machine and 6.11.5 and it is working there too as it should.

I will contact another user if he can test that for me too.

This worked. Thank you!

9 hours ago, ich777 said:

No benefits at all as long as you are using it for transcoding.

 

No benefits at all as long as you are using it for transcoding.

 

The answer changes a bit if you use it for other things than transcoding because you get performance improvements for Cuda accelerated workloads or if you are using the card with the Steam-Headless container.

Thanks @ich777 keep up the good work !

 

Its good to know I'm not missing out on anything, my instinct is to always keep up with the latest updates, be that unraid or your drivers. it comforting to know that in this instance, if it aint broke don't update :)

Hey @ich777 I was pointed in your direction for this question: 
I'm currently still running 6.8.2 with the old linuxserver nvidia build. I'm looking to finally upgrade to meet the latest stable unraid, however, in the past, I've always reverted back to the vanilla build before upgrading. The old plugin no longer works/supported so I am curious if you would suggest trying to get my hands on a vanilla build of 6.8.2 before attempting to upgrade to 6.11 and/or any other gotchas you might think of regarding my current situation?

Thanks

  • Author
7 minutes ago, manofcolombia said:

attempting to upgrade to 6.11 and/or any other gotchas you might think of regarding my current situation?

Just upgrade to 6.11.5, no need for a old 6.8.2 build.

3 minutes ago, ich777 said:

Just upgrade to 6.11.5, no need for a old 6.8.2 build.

Thanks for the confirmation to set my mind at ease!

I have been having some issues recently with the plugin. From time to time I come home and find that my server GUI is in a 500 internal server error. The SSH into the server works fine, and all dockers and VMs are all running, but the web GUI just fails to load. If I SSH in and to "/etc/rc.d/rc.php-fpm restart" the GUI will come back, sometimes for a short time and sometimes for a while, but eventually I find myself back in the crash 500 internal server error. I have this plugin as well as GPU Statistics by b3rs3rk installed. The other weird thing is that when I do get the GUI back, the Statistics on the main page for my GPU are all blank.

I have a ASUS Strix 1050ti installed with the latest official driver in the plugin installed as well. I only use the GPU for Tdarr transcoding, and have had no issues while the GPU is doing that task. The output files also look fine so I dont think that MY GPU is dying. Any thoughts? 

eos-diagnostics-20230101-1425.zip

Edited by Tithonius
added diagnostics

  • Author
6 hours ago, Tithonius said:

The other weird thing is that when I do get the GUI back, the Statistics on the main page for my GPU are all blank.

I don't see a Nvidia GPU connected to your system in the attached Diagnostics, I even don't see the Nvidia Driver plugin installed...

 

The only thing that I see is a Intel iGPU, may I ask why you aren't using the iGPU for HW transcoding?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.