Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

16 minutes ago, erak said:

Why does it change driver version to "latest" after a system update, even if I had previously chosen "Production Branch"?

A bit more insight would be better.

Can you post your Diagnostics after this happened? What have you done exactly, are you on the legacy 470.xx series driver?

Did you wait for the Plugin Update Helper to finish before rebooting to the new version?

 

I really can't tell anything without more information.

  • Like 1
Link to comment
7 hours ago, robertpaolella said:

I just upgraded to 6.12.3 from 6.12.2 and as soon as I did

Please uninstall the Nvidia Driver plugin, reboot, install the Nvidia Driver Plugin again and reboot.

 

The driver package or better speaking the plugin is not installed correctly because I think you also didn't wait until the Plugin Update Helper told you that it's now safe to reboot (these are Notifications which are showing up after you trigger the update).

-rw------- 1 root root 132200960 Aug 11 16:32 nvidia-535.98-6.1.38-Unraid-1.txz

The above shows that the package was partially downloaded but not completely.

Link to comment
3 hours ago, ich777 said:

A bit more insight would be better.

Can you post your Diagnostics after this happened? What have you done exactly, are you on the legacy 470.xx series driver?

Did you wait for the Plugin Update Helper to finish before rebooting to the new version?

 

I really can't tell anything without more information.

I've been running unraid for about a year and I think this has happened almost every time, therefore I thought it was by design for some reason. Not using legacy, T600. Updated today from 6.12.2 to 6.12.3. I think I waited for the ok, but now I can't say for certain. But I usually do, and the version type change has happened anyway.

server-diagnostics-20230812-1909.zip

Link to comment
43 minutes ago, erak said:

Updated today from 6.12.2 to 6.12.3. I think I waited for the ok, but now I can't say for certain.

I think you where waiting for the message which Unraid itself gives you to reboot but that changed in the last versions a bit.

After the download and everything from Unraid itself finished you should get messages that the Plugin Update Helper kicked in and it will notify when it's safe to reboot <- the Plugin Update Helper is my take on how to update third party plugins but since the messages now fading out with the new Unraid versions it's a little bit hard to catch them for the user.

 

The Plugin Update Helper also does account for the selected branch btw. ;)

 

The driver was not downloaded for the new Unraid version properly by the Plugin Update Helper that's why it falls back to the latest branch:

Aug 12 14:26:56 Micro-8 root: -----------------Downloading Nvidia Driver Package v535.98------------------
Aug 12 14:26:56 Micro-8 root: ----------This could take some time, please don't close this window!------------
Aug 12 14:27:25 Micro-8 root:
Aug 12 14:27:25 Micro-8 root: ----Successfully downloaded Nvidia Driver Package v535.98, please wait!----
Aug 12 14:27:27 Micro-8 root:
Aug 12 14:27:27 Micro-8 root: -----------------Installing Nvidia Driver Package v535.98-------------------

 

Sorry about that but maybe now that you are aware of the Plugin Update Helper you'll catch it next time. :)

 

The Nvidia Driver download will usually take about 30 seconds to the USB Flash device after the Plugin Update Helper kicked in (about 200MB).

 

Hope that explains it a bit better.

  • Like 1
Link to comment
4 hours ago, Necrotic said:

but I think the lower power state would be a good thing to be built into the plugin when possible

Thanks for the recommendation but I won‘t do that because this can be also be quite dangerous since if one is using a card for Docker and a VM this would crash the server and it would be really hard for me to detect if one is doing that.

 

The simplest way of enabling persistemced mode is putting this line in the go file:

nvidia-persistenced


If you want to end the process do something like that:

kill $(pidof nvidia-persistenced)

 

Link to comment
31 minutes ago, ich777 said:

Thanks for the recommendation but I won‘t do that because this can be also be quite dangerous since if one is using a card for Docker and a VM this would crash the server and it would be really hard for me to detect if one is doing that.

 

The simplest way of enabling persistemced mode is putting this line in the go file:

nvidia-persistenced


If you want to end the process do something like that:

kill $(pidof nvidia-persistenced)

 

I get it, maybe not a default setting but an option with a giant warning all over it would be enough.

In any case, great job with this and the other stuff, I still use the gameservers :P

Link to comment
5 hours ago, Necrotic said:

maybe not a default setting but an option with a giant warning all over it would be enough.

Sorry but no, I don't think that I want to integrate that because there will be too many variables. For example when a user does enable this and forget about this and then he tries to boot a VM with the GPU passed through the server will ultimately crash and then nobody will know why that happens, even if he uploads Diagnostics no one will be able to see it what was the cause of this issue.

 

I think one line in the go file is even for a Linux newbie doable. :)

Link to comment

This is for an RTX 4060.  I am having an issue where my GPU is stating I have persistenced enabled but it appears to not be working.  I followed spaceinvaderone's video and as you can see the script is working.  The GPU works when transcoding in plex. However before and after running persistenced the GPU shows a wattage of 50w/115w.  It does not appear to be doing anything for the GPU even thought the GPU shows a state of P8.  It should be much lower on the wattage.  I'm currently using the latest driver. Does this sound like a bad driver issue? Is there a driver we know that works better or could it be because of how new the rtx 4060 is?

Screenshot 2023-08-12 at 12.07.56 PM.png

Screenshot 2023-08-12 at 12.08.58 PM.png

Screenshot 2023-08-13 at 7.46.51 AM.png

Link to comment
1 hour ago, FlyingTexan said:

This is for an RTX 4060.  I am having an issue where my GPU is stating I have persistenced enabled but it appears to not be working.  I followed spaceinvaderone's video and as you can see the script is working.  The GPU works when transcoding in plex. However before and after running persistenced the GPU shows a wattage of 50w/115w.  It does not appear to be doing anything for the GPU even thought the GPU shows a state of P8.  It should be much lower on the wattage.  I'm currently using the latest driver. Does this sound like a bad driver issue? Is there a driver we know that works better or could it be because of how new the rtx 4060 is?

Screenshot 2023-08-12 at 12.07.56 PM.png

Screenshot 2023-08-12 at 12.08.58 PM.png

Screenshot 2023-08-13 at 7.46.51 AM.png

Is there a reason why you are using the open source driver not latest?

 

image.png

 

Also have you followed

 

image.png

Link to comment
9 minutes ago, SimonF said:

Is there a reason why you are using the open source driver not latest?

 

image.png

 

Also have you followed

 

image.png

 

It doesn't show that as open source for me. It shows that as the latest.   I checked under plugins and it says the plugin is up-to-date as of 07-02-2023. Why would it show differently then what you are seeing?1609501965_Screenshot2023-08-13at9_23_31AM.thumb.png.7e4edaeb150ada1b6bd3af3aa67a2411.png

 

Edited by FlyingTexan
Link to comment
1 hour ago, FlyingTexan said:

RTX 4060.  I am having an issue where my GPU

Have you yet measured how much it draws from the wall?

RTX 40 series are known to report the wrong power draw in nvidia-smi.

 

Please, please don't trust software readings they might be off by a lot in some cases.

 

Also please use nvidia-persistenced instead of the script since if you put nvidia-persistenced in the go file it is enough and doesn't need to be ran in intervals, the reason why I recommend nvidia-persistenced is because the documentation from Nvidia states that persistence mode from nvidia-smi is soon to be removed <- who knows how long the leave it in...

Link to comment
2 minutes ago, ich777 said:

Have you yet measured how much it draws from the wall?

RTX 40 series are known to report the wrong power draw in nvidia-smi.

 

Please, please don't trust software readings they might be off by a lot in some cases.

 

Also please use nvidia-persistenced instead of the script since if you put nvidia-persistenced in the go file it is enough and doesn't need to be ran in intervals, the reason why I recommend nvidia-persistenced is because the documentation from Nvidia states that persistence mode from nvidia-smi is soon to be removed <- who knows how long the leave it in...

My system idle seems to support the measurement. With the dockers off and array down I still draw around 120w.  I don't understand your next statement about the user script.  If nvidia-persistenced is in the script what does it matter? It doesn't hurt to run the command again does it? I'm confused on what it is that should be done differenlty.

Link to comment
12 minutes ago, FlyingTexan said:

 

It doesn't show that as open source for me. It shows that as the latest.   I checked under plugins and it says the plugin is up-to-date as of 07-02-2023. Why would it show differently then what you are seeing?1609501965_Screenshot2023-08-13at9_23_31AM.thumb.png.7e4edaeb150ada1b6bd3af3aa67a2411.png

 

My plugin vers is 2023.07.06 Also maybe OS level related, currently I am on 6.12.3

Link to comment
1 minute ago, FlyingTexan said:

My system idle seems to support the measurement. With the dockers off and array down I still draw around 120w.

Again, I don't know anything about your system without the Diagnostics. :)

 

1 minute ago, FlyingTexan said:

I'm confused on what it is that should be done differenlty.

Remove the script and just put nvidia-persistenced in your go file. ;)

  • Like 1
Link to comment
2 minutes ago, ich777 said:

@SimonF or I can tell more without your Diagnostics, but as already mentioned in the previous post don't trust software readouts:

 

Telling me to simply not trust the numbers I'm seeing isn't a solution.  My UPS numbers support the GPU using more wattage.  I can't provide system diagnostics I am not home.  When I try to do it remotely it stalls and never completes.

Link to comment
1 minute ago, FlyingTexan said:

Telling me to simply not trust the numbers I'm seeing isn't a solution.

No it's not but again, I don't know anything about your system...

 

1 minute ago, FlyingTexan said:

I can't provide system diagnostics I am not home.

Then please pull the Diagnostics from the system when you are at home and post them here, easy fix. :)

 

1 minute ago, FlyingTexan said:

Wouldn't that only work until the GPU is used?

No, persistenced runs in the background and simulates that it is in use so that it can go to a lower power state, to be precise P8.

 

You wouldn't even need persistenced if you for example ran a transcode on your GPU and then end it, this will also pull your card in a lower power state <- this is only for explanation purposes, it's clear to me that nobody does this and is not practicable.

  • Like 1
Link to comment
6 minutes ago, ich777 said:

No it's not but again, I don't know anything about your system...

 

Then please pull the Diagnostics from the system when you are at home and post them here, easy fix. :)

 

No, persistenced runs in the background and simulates that it is in use so that it can go to a lower power state, to be precise P8.

 

You wouldn't even need persistenced if you for example ran a transcode on your GPU and then end it, this will also pull your card in a lower power state <- this is only for explanation purposes, it's clear to me that nobody does this and is not practicable.

diagnostics were posted above

  • Thanks 1
Link to comment

@FlyingTexan are you really sure that the card is the cause of the issue, I mean really sure...?

 

You have 23 disks in your system, are these disks in standby mode or are these disks running?

I have 8 spinning disks (not to mention the SSDs) in my system an when they are spun down my entire system consumes about 50W when I spin them up I'm hovering about 90W, this is also a thing that you should consider.

 

Did you install the card just recently?

For what are you using the card? Only for transcoding? For example a Nvidia T400, T600, T1000 is a way better choice for transcoding because these cards draw only a few watts in idle and the T400 has a max (and locked) TDP from 35W.

If you are using it for transcoding only you can also use the iGPU which is more than capable of transcoding a few 4K streams, consumes nearly 0W in idle and a maximum while transcoding from about 15W.

I would also recommend that you upgrade Unraid to something more recent like 6.12.3

 

It also looks like that one of your containers is constantly restarting or restarting very often for whatever reason, this could also cause higher power usage.

 

I see also that your load average for the last 15 minutes is at around 4.73 which can also be the cause of all of this since the 11600K is a really power hungry CPU when it has to Turbo. If the value 4.73 doesn't tell you much my load average for the last 15 minutes is about 0.7 <- but I'm not doing anything special other than running my update scripts for the plugins and pulling some other values in and uploading on GitHub, I think that also a TV stream is runngin from the DVB-C cards from my system.

 

But what I also see is that Radarr, unrar and so on are all working and especially downloading can cause spikes:

grafik.png.f3d89918f5ee22a9b4a23b970e9100aa.png

 

Also I don't think that the driver is bad otherwise it wont work...

There is nothing I can do about that if it won't draw less power (if you still think it's the dGPU) since this is a driver thing that Nvidia has to fix and as you can see from the Reddit posts users where able to narrow down the issue to not displaying it correctly. Even if you report that on the Nvidia forums I don't think you will get help there because of a thing that I mention in my last paragraph from this post.

 

1 minute ago, FlyingTexan said:

diagnostics were posted above

Yes, I see that but please be a bit more patient, I usually go through the full diagnostics to identify maybe other things too.

 

 

Please also note that I won't give any further support because you are running another script that is violating the Nvidia EULA and can maybe also cause this, if you need more hints search for this line in your syslog and see how it continues:

Aug 12 22:14:00 Tower  emhttpd: /usr/local/emhttp/plugins/user.scripts/backgroundScript.sh

 

  • Like 3
Link to comment
3 hours ago, ich777 said:

@FlyingTexan are you really sure that the card is the cause of the issue, I mean really sure...?

 

You have 23 disks in your system, are these disks in standby mode or are these disks running?

I have 8 spinning disks (not to mention the SSDs) in my system an when they are spun down my entire system consumes about 50W when I spin them up I'm hovering about 90W, this is also a thing that you should consider.

 

Did you install the card just recently?

For what are you using the card? Only for transcoding? For example a Nvidia T400, T600, T1000 is a way better choice for transcoding because these cards draw only a few watts in idle and the T400 has a max (and locked) TDP from 35W.

If you are using it for transcoding only you can also use the iGPU which is more than capable of transcoding a few 4K streams, consumes nearly 0W in idle and a maximum while transcoding from about 15W.

I would also recommend that you upgrade Unraid to something more recent like 6.12.3

 

It also looks like that one of your containers is constantly restarting or restarting very often for whatever reason, this could also cause higher power usage.

 

I see also that your load average for the last 15 minutes is at around 4.73 which can also be the cause of all of this since the 11600K is a really power hungry CPU when it has to Turbo. If the value 4.73 doesn't tell you much my load average for the last 15 minutes is about 0.7 <- but I'm not doing anything special other than running my update scripts for the plugins and pulling some other values in and uploading on GitHub, I think that also a TV stream is runngin from the DVB-C cards from my system.

 

But what I also see is that Radarr, unrar and so on are all working and especially downloading can cause spikes:

grafik.png.f3d89918f5ee22a9b4a23b970e9100aa.png

 

Also I don't think that the driver is bad otherwise it wont work...

There is nothing I can do about that if it won't draw less power (if you still think it's the dGPU) since this is a driver thing that Nvidia has to fix and as you can see from the Reddit posts users where able to narrow down the issue to not displaying it correctly. Even if you report that on the Nvidia forums I don't think you will get help there because of a thing that I mention in my last paragraph from this post.

 

Yes, I see that but please be a bit more patient, I usually go through the full diagnostics to identify maybe other things too.

 

 

Please also note that I won't give any further support because you are running another script that is violating the Nvidia EULA and can maybe also cause this, if you need more hints search for this line in your syslog and see how it continues:

Aug 12 22:14:00 Tower  emhttpd: /usr/local/emhttp/plugins/user.scripts/backgroundScript.sh

 


Obviously this is on deaf ears. Ask about idle of a card and the response is always “why do you have a card”. The following response is basically teaching me computers run on electricity. Yes, I’m double triple sure because as I’ve already stated I measured it. Don’t want to believe that, fine. But the screen shots are accurate.

  • Thanks 1
Link to comment
27 minutes ago, FlyingTexan said:

Obviously this is on deaf ears.

I could say the same... Have you even read my post? Especially:

4 hours ago, ich777 said:

There is nothing I can do about that if it won't draw less power (if you still think it's the dGPU) since this is a driver thing that Nvidia has to fix and as you can see from the Reddit posts users where able to narrow down the issue to not displaying it correctly. Even if you report that on the Nvidia forums I don't think you will get help there because of a thing that I mention in my last paragraph from this post.

4 hours ago, ich777 said:

I would also recommend that you upgrade Unraid to something more recent like 6.12.3

 

30 minutes ago, FlyingTexan said:

Don’t want to believe that, fine. But the screen shots are accurate.

I have never said that the screenshots are not accurate.

 

I just try to help but I really loose more and more interest because of such posts. :/

Not a single question answered... :D

 

What should I say, anyways, I think you also didn't read that part:

4 hours ago, ich777 said:

Please also note that I won't give any further support because you are running another script that is violating the Nvidia EULA and can maybe also cause this

 

Sorry but I'm really upset by your post, answering not a single question and I tried simply to help. Just sad... :P

  • Like 1
Link to comment

 

“Did you install the card just recently?

For what are you using the card? Only for transcoding? For example a Nvidia T400, T600, T1000 is a way better choice for transcoding because these cards draw only a few watts in idle and the T400 has a max (and locked) TDP from 35W.

If you are using it for transcoding only you can also use the iGPU which is more than capable of transcoding a few 4K streams, consumes nearly 0W in idle and a maximum while transcoding from about 15W.”

 

 

What does a single questions of yours have to do with what I asked? Nothing. I asked about the driver and GPU idle wattage. Using quicksync has what to do with that? Nothing. What am I using the card for? Who cares, what does that have to do with idle card draw? Nothing.  When did I install it? Who cares, what does that have to do with idle card draw? Nothing.

 

you wanted diagnostics, ok you got them. I posted screenshots of everything as well. Your response was to say not to believe them. I told you they were accurate and your response was to question me again. So I thank you for ceasing your help.

Edited by FlyingTexan
  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...