[Plugin] Nvidia-Driver


ich777

Recommended Posts

On 9/24/2022 at 12:02 AM, DavidNguyen said:

I think either the card is faulty or there is a hardware incompatibility at this point unless there's anything else I can try.

From what I see in your Diagnostics everything seems fine too me, nvidia-smi reports your card correctly.

I can also see that your first card is recognized as a amdgpu and your second card is your Nvidia.

Have you yet tried to remove nvidia-persistenced from your go file and/or disabling your iGPU if that helps?

Do you use your iGPU for something on your system?

Link to comment
1 hour ago, ich777 said:

From what I see in your Diagnostics everything seems fine too me, nvidia-smi reports your card correctly.

I can also see that your first card is recognized as a amdgpu and your second card is your Nvidia.

Have you yet tried to remove nvidia-persistenced from your go file and/or disabling your iGPU if that helps?

Do you use your iGPU for something on your system?

Tried removing persistenced setting without luck.

I thought of the iGPU, I had tried disabling it before.

It just happened that was my previous daily driver and was being used to run PiKVM

Link to comment
16 hours ago, ich777 said:

Please post your Diagnostics.

Transcoding works fine - seems to be an energy saving issue that appears after a while, steam-headless may be causing a driver bug as its not working at the moment so have to manual shutdown and thats seems to be the point as which is stops showing any info in the GPU statistics on the dashboard?

Anyway over to the pro's. Thanks

 

moulin-rouge-diagnostics-20220925-2155.zip

Edited by dopeytree
Link to comment
10 hours ago, dopeytree said:

Transcoding works fine - seems to be an energy saving issue that appears after a while, steam-headless may be causing a driver bug as its not working at the moment so have to manual shutdown and thats seems to be the point as which is stops showing any info in the GPU statistics on the dashboard?

Please try to boot with Legacy and not UEFI.

 

Do you have a display connected to the card, seems like something is wrong here:

kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0

Or is this some kind of fake dongle device? If yes make sure that it is working properly.

 

BTW I've also noticed this in your go file:

# ------------------------------------------------
# Disables FTP & Telnet 
# ------------------------------------------------

sed -i -e 's/^telnet/#telnet/;s/^ftp/#ftp/' /etc/inetd.conf
/etc/rc.d/rc.inetd restart

Is this really needed anymore because both services should be disabled by default.

Link to comment
On 9/25/2022 at 6:05 PM, ich777 said:

Even of you disable the iGPU with modprobe.d (so that the driver will not load) it will output the console just fine so that you can use PiKVM just fine.

So, I tried disabling iGPU and it wouldn't work.

Came back after a while and tried manually nvidia-persistenced and suddenly both Plex & JF containers started successfully.

Then I tried rebooting, running persistenced again, and it wouldn't start.

So, that's another symptom, sometimes it starts, sometimes not, and when it does, it doesn't survive a reboot.

Could be a hardware problem, maybe should do a teardown to see if there's a busted cap, or flash vbios.

Link to comment
16 hours ago, ich777 said:

Please try to boot with Legacy and not UEFI.

 

Do you have a display connected to the card, seems like something is wrong here:

kernel: nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device DP-0

Or is this some kind of fake dongle device? If yes make sure that it is working properly.

 

BTW I've also noticed this in your go file:

# ------------------------------------------------
# Disables FTP & Telnet 
# ------------------------------------------------

sed -i -e 's/^telnet/#telnet/;s/^ftp/#ftp/' /etc/inetd.conf
/etc/rc.d/rc.inetd restart

Is this really needed anymore because both services should be disabled by default.

 Thanks yeah I removed that last night but have kept the nvidia eco persist as otherwise it defaults to 24watts rather than 12watts on idle. (it was idling on 7 watts before the latest driver update) although this may be the issue. once gpu has been used to transcode or game it then doesn't default to its eco state and some bug is happening.

 

There is a display connected via normal display port cable. 

 

Will try legacy instead of UEFI tomorrow.

Edited by dopeytree
Link to comment
5 hours ago, wgstarks said:

They were collected prior to reboot.

Did you try to upgrade from 6.10.3 to 6.11.0?

 

When was the last time you rebooted? Changes are high that the Plugin Update Helper is not up to date but that is nothing to worry about if you have a actove internet connection on boot since the driver will be downloaded there.

 

If you don‘t have a active internet connection on boot I would recommend that you remove the plugin, reboot and then install it again from the CA App.

Link to comment
4 hours ago, DavidNguyen said:

Could be a hardware problem, maybe should do a teardown to see if there's a busted cap, or flash vbios.

I‘m not too sure anymore. I know that nvidia-persistenced can cause issues on some systems because persistenced can sometimes introduce latencys that are higher than usual and cause such errors on the container runtime.

Link to comment
18 minutes ago, googen75 said:

I even tried downgrading, but still wont get past Starting Samba:  hangs

Try to remove the nvidia-driver.plg file.

The exact location is: /boot/config/plugins/nvidia-driver.plg

(I would also recommend that you remove the nvidia-driver folder that is lacated in there).

 

After that you should be able to boot and you can grab a fresh copy from the Nvidia Driver plugin from the CA App.

 

May I ask where are you located in the world?

Link to comment
5 hours ago, ich777 said:

I‘m not too sure anymore. I know that nvidia-persistenced can cause issues on some systems because persistenced can sometimes introduce latencys that are higher than usual and cause such errors on the container runtime.

It's started working again. I'm completely confused because I literally just went to sleep woke up and it worked.

Anyway, I included the diagnostics file again in case you see something compared to my old file when it wasn't working.

tower-diagnostics-20220927-1912.zip

Link to comment
1 minute ago, DavidNguyen said:

It's started working again. I'm completely confused because I literally just went to sleep woke up and it worked.

Nothing changed, as said above the main reason because it is working sometimes and sometimes not is most likely nvidia-persistenced which you trigger through the go file.

Link to comment
8 hours ago, ich777 said:

Did you try to upgrade from 6.10.3 to 6.11.0?

 

When was the last time you rebooted? Changes are high that the Plugin Update Helper is not up to date but that is nothing to worry about if you have a actove internet connection on boot since the driver will be downloaded there.

 

If you don‘t have a active internet connection on boot I would recommend that you remove the plugin, reboot and then install it again from the CA App.

I was in the process of updating when I got the error. I really think it was just a transient problem since I didn’t have any issues downloading the driver manually. Wouldn’t even have posted probably if it hadn’t been for the popup window requesting it.

  • Like 1
Link to comment

This plugin hasn't been working normally for a month on my server.

I noticed that it never notified me of a driver update. It never downloaded and installed the update even after rebooting my server. I had to update the driver manually and reboot my server. 

I've always kept the plugin up to date.

 

So I deleted the plugin, rebooted, and re-installed the plugin. Now I can't change the Notification setting from Enabled to Disabled. I don't see any Save button on the plugin page. I also can't change Select Driver Preferred Version setting. I usually select Production Branch, but I can't change it from Latest Branch. 

 

Fortunately this plugin is still working for local gui access and my plex docker. 

 

Does anyone have any idea what is causing these issues?

 

Thanks for the help. 

1404573494_ScreenShot2022-09-27at1_22_55PM.thumb.png.2372d3865071f842496eccdd5f209964.png

179969730_ScreenShot2022-09-27at1_22_02PM.thumb.png.909c1c60e7e913b96388ab1cd6cc1958.png

threadripper19-diagnostics-20220927-1329.zip

Link to comment

🤓 On unraid - 6.11 - please can we get the option to roll back to the driver - 515.65.01 - August 2, 2022

 

The monitor is a samsung 75" tv and has been connected in the same way for months with the previous driver. It isn't always turned on but it also isn't expected to output an image (apart from bios and early unraid start) so shouldn't be an issue. The cable is a mini displayport to HDMI cable (because the rtx a2000 only has mini display ports.. 4 of them 😱).

 

Many Thanks

 

 

Edited by dopeytree
Link to comment
46 minutes ago, FQs19 said:

Now I can't change the Notification setting from Enabled to Disabled.

Exact same issue over here, I will look into that and report back ASAP.

 

47 minutes ago, FQs19 said:

I also can't change Select Driver Preferred Version setting. I usually select Production Branch, but I can't change it from Latest Branch.

Click Update & Download and it will update the selected version, on my machine this works just fine.

Link to comment
5 minutes ago, ich777 said:

Exact same issue over here, I will look into that and report back ASAP.

 

Click Update & Download and it will update the selected version, on my machine this works just fine.

Glad to see its not just me. 

 

About the Select Preferred Driver Version, I don't want it to keep updating to the latest, which is why I'm trying to select the Production Branch and not Latest Production Branch. Unfortunately, I'm unable to keep the Production Branch selected. 

Link to comment
19 minutes ago, dopeytree said:

🤓 On unraid - 6.11 - please can we get the option to roll back to the driver - 515.65.01 - August 2, 2022

I don't understand, what should that change exactly?

 

Did you reinstall the driver at some point and or do you have a iGPU built in? Do you use GUI mode?

 

You can however try to execute this command from the command line and reboot afterwards:

sed -i "/disable_xconfig=/c\disable_xconfig=true" /boot/config/plugins/nvidia-driver/settings.cfg

(but this would be the last thing that I would try)

Link to comment
1 minute ago, ich777 said:

Select the Production Branch and click on Update & Download.

AHHH...

Sorry, I just expected there to be a Save button. 

The Update & Download button seemed like it was going to download and update even I was already up to date. 

I clicked Update & Download and it saw that I was already on the Production branch and just finished without updating. The Production Branch is also now selected. 

 

Thanks for the help.

Link to comment
6 minutes ago, FQs19 said:

Sorry, I just expected there to be a Save button. 

No, I changed that a while back since it will combine both features because if someone wants to downgrade or change the version to something else one had to click Save and afterwards Download. To make it a bit more convenient you can now click Update & Download

 

As said above I will look into the Update notification ASAP.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.