[Plugin] Nvidia-Driver

ich777 · December 30, 2023

2 hours ago, timbo72 said:

Tesla P4

First of all I would recommend that you upgrade to 6.12.6

2 hours ago, timbo72 said:

but my X79-UD3 doesn't have either the 4g

This is named differently sometimes and may be something like "Support Large Address Space" or similar and hides in on of the PCI sub-menus.

Please make also sure that you have the latest BIOS version installed.

Have you yet tried to put the card in another PCIe slot (if that's possible on this board)?

You can also try to boot with UEFI instead of Legacy.

I have also seen that you are using S3 Sleep, are you actually active using it? Please keep in mind that a Tesla is a Datacenter card and was never meant to be put into Sleep mode <- but I actually don't recommend putting the server into sleep mode anyways because on some systems this causes issues.

BTW, you don't have to create a nvidia-bug-report.log since all the necessary information are in the Diagnostics.

timbo72 · December 30, 2023

Thanks, i'll poke at it a bit more

Edit: It was cooling...

Edited December 30, 2023 by timbo72

SaltShakerOW · December 30, 2023

Hey guys.

I'm having trouble installing the nvidia drivers again after I uninstalled them awhile back due to not needing them anymore. I slotted my old card back into my system and am looking to utilize it in docker again but I'm getting this error that doesn't really point me in any direction. If anyone has seen this and can tip me off in the right direction please do. The screenshot of the error and my Unraid diagnostics are attached to this reply.

Thanks!

messiah-diagnostics-20231230-0253.zip

ich777 · December 30, 2023

13 minutes ago, SaltShakerOW said:

If anyone has seen this and can tip me off in the right direction please do. The screenshot of the error and my Unraid diagnostics are attached to this reply.

Are you sure that your server is able to reach the GitHub API?

Do you have any AdBlocking somewhere on your network?

Do you have any Unifi gear on your network?

Make sure that the server can communicate with the GitHub API and from what I can see something is preventing it from communicating with the API.

However, I would also strongly recommend that you upgrade to the latest Unraid version 6.12.6

ich777 · December 30, 2023

1 hour ago, timbo72 said:

Edit: It was cooling...

Glad that you've solved it!

Oh, I think that those cards don't have a cooler on them correct?

timbo72 · December 30, 2023

32 minutes ago, ich777 said:

Glad that you've solved it!

Oh, I think that those cards don't have a cooler on them correct?

yeah, as you mentioned they are ex datacentre so positive pressure cold and negative hot.

In my case it would be passivly drawing hot air off the HDD stack. I've knocked up a shroud for a spare 40mm which has done the trick.

SaltShakerOW · December 30, 2023

10 hours ago, ich777 said:

Are you sure that your server is able to reach the GitHub API?

Do you have any AdBlocking somewhere on your network?

Do you have any Unifi gear on your network?

Make sure that the server can communicate with the GitHub API and from what I can see something is preventing it from communicating with the API.

However, I would also strongly recommend that you upgrade to the latest Unraid version 6.12.6

I'm honestly not sure why the github api was being blocked, but an update to 6.12.6 fixed it. Thanks for the pointers

kokx1220 · January 10

The install worked perfectly for my Quadro P620 on the first try. Thank you all for publishing this and providing support.

moose1be · January 14

A few days after I reboot my unraid server my 1070ti goes non responsive to any of the docker applications that normally use it. The log starts spamming this:
Jan 13 20:56:15 Tower kernel: NVRM: GPU 0000:29:00.0: RmInitAdapter failed! (0x22:0x56:762)
Jan 13 20:56:15 Tower kernel: NVRM: GPU 0000:29:00.0: rm_init_adapter failed, device minor number 0

I have attached my dianostics zip. Just so that im not rebooting twice a week I have gone to cpu encoding but its sort of odd that this seems to happen. Anyone have any advice?

tower-diagnostics-20240113-2055.zip

alturismo · January 14

2 hours ago, moose1be said:

Anyone have any advice?

when you look into your logs

Jan 13 04:39:37 Tower kernel: NVRM: Xid (PCI:0000:29:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Jan 13 04:39:37 Tower kernel: NVRM: GPU 0000:29:00.0: GPU has fallen off the bus.

1/ may check if the GPU is still working overall in another system ?

2/ power supply checked ?

3/ tested in another pcie slot ?

4/ checked if its really pushed into the pcie slot ?

5/ Bios check, above 4g decode, rbar activated, may try vise vers

overall, hard to say, the logs say the card went off ... as i assume you didnt pull it out while running most common power, pcie slot, defect card, ...

moose1be · January 14

huh, weird. It was my primary computer up until a few months ago and other then adding a sas card and some more hdds the main change was the swap to unraid. I guess i will poke around.

Thanks

ich777 · January 14

17 minutes ago, moose1be said:

I guess i will poke around.

A XID error code 79 is a pretty generic error, it means basically everything (thermal issue, gpu firmware issue, computer firmware issue, bus error,…).

Most likely it’s an issue with your BIOS, check if there is a BIOS update and that you‘ve enabled Above 4G Decoding and Resizable BAR support as @alturismo suggested.

dezai · January 20

Hi guy´s, i think i fucked up.

I´ve got a P40 for real cheap and i thought it would be a great card for Unraid - Docker - Stable Difuion / Comfy-UI.
But i think it is not supported in the driver v545.29.06

Is there a way to get this to work in docker?

My System

Dell R730xd (no reBar Support - only above 4G SR-IOV etc.)

2x E5-2680 V4

128GB DDR ECC RAM

A lot of SSD Storage

With best regards

Marc

ich777 · January 20

17 minutes ago, dezai said:

Is there a way to get this to work in docker?

Sure, where are your Diagnostics for further troubleshooting?

dezai · January 20

1 hour ago, ich777 said:

Sure, where are your Diagnostics for further troubleshooting?

🥰Ok let´s try it

Attatched the Diagnostics / Screenshot from the P40 in System Devices / Screenshot from your great driver with the working P2000

r730xd-diagnostics-20240120-2122.zip

ich777 · January 20

40 minutes ago, dezai said:

Attatched the Diagnostics / Screenshot from the P40 in System Devices / Screenshot from your great driver with the working P2000

Did you read the first and second post in this thread how to get it working in a Docker container?

The card is recognized just fine.

Can you spot why it's not working:

Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: GPU does not have the necessary power cables connected.
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: RmInitAdapter failed! (0x24:0x1c:1436)
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: rm_init_adapter failed, device minor number 1

I also noticed this:

Jan 20 12:18:36 R730XD kernel: NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.

Did you install the user script from SpaceinvaderOne? If yes please remove it and simply append this line to your go file:

nvidia-persistenced

dezai · January 21

3 hours ago, ich777 said:
Did you read the first and second post in this thread how to get it working in a Docker container?

The card is recognized just fine.

Can you spot why it's not working:
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: GPU does not have the necessary power cables connected.
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: RmInitAdapter failed! (0x24:0x1c:1436)
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: rm_init_adapter failed, device minor number 1
I also noticed this:
Jan 20 12:18:36 R730XD kernel: NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.
Did you install the user script from SpaceinvaderOne? If yes please remove it and simply append this line to your go file:
nvidia-persistenced

Ok i did some research and it was the last thing i thought about.

I´m using the R730xd with 2x750W PSU ,CPUs are 2x120W ,P40 250W, so i thought i´m fine with the 2x750W (i´ve also tried non redundant mode).

After some research i found out, that Dell is limiting the GPU TDP to 150W (even if you use the 225W Riser) if you are using the 750W PSU.

So my GTX 970 is working fine / P2000 is also working just fine.

But the P40 doesnt work.....

I´m trying to get 2x 1100W PSU´s now.....

Yep i did use the Spaceinvader Script, i´ve deleted it now and put the nvidia-persistenced command in to the "go file"

And yep i´m using your Driver for a long time

So a little conlusion.....if my Dell server would give the p40 250W - it should work out of the box with the driver? 😮 😮

Edited January 21 by dezai

ich777 · January 21

6 hours ago, dezai said:

But the P40 doesnt work.....

I think you missread what I've meant, it says this:

GPU does not have the necessary power cables connected.

Are you sure that you've plugged in the auxiliary power to the card since it needs auxiliary power connected (I think it's a 8pin PCIe power cable from the PSU)?

6 hours ago, dezai said:

it should work out of the box with the driver? 😮 😮

Yes, as long as you've connected the auxiliary power:

grafik.png.e1f68f61e84f8844efa0f131e25d40d0.png

dezai · January 21

6 hours ago, ich777 said:
I think you missread what I've meant, it says this:
GPU does not have the necessary power cables connected.
Are you sure that you've plugged in the auxiliary power to the card since it needs auxiliary power connected (I think it's a 8pin PCIe power cable from the PSU)?

Yes, as long as you've connected the auxiliary power:

No no all good it is a 8Pin EPS 12V from the Riser and a 8 Pin EPS 12V in the GPU - and i´ve got a cable for exact that setup / R730 - Tesla P40.

But the R730xd can only Support 250W GPU´s with the 1100W PSU´s.
In my server are the 750W PSU´s - if you are using those the PCIE Riser only gets 125-150W to the 8 PIN 12V EPS Connector.

Thats the reason why every GPU in my house is working correctly but the P40 did not.

If you are using the 1100W PSU you will get 225W to the 8 PIN 12V EPS Connector / 75 from the PCIE Slot.

I´ve got 2x 1100W PSU´s from a reseller for 17,50€ each, i can test it hopfully next week.

Part from a reddit thread with the same Problem:
 GPUs require a redundant 1100W power supply and GPU enablement kit

I will get back in touch here when the PSU´s are in the Server.

Edited January 21 by dezai

theMs · January 25

I'm running into an issue when trying to use my NVIDIA RTX 3070 in Handbrake, it errors out with this:

[hevc_nvenc @ 0x147319ec4640] Driver does not support the required nvenc API version. Required: 12.1 Found: 11.1

My current driver version: v515.43.04

Any thoughts?

alturismo · January 25

35 minutes ago, theMs said:

Any thoughts?

did you try a newer driver version ?

36 minutes ago, theMs said:

My current driver version: v515.43.04

ich777 · January 25

1 hour ago, theMs said:

Any thoughts?

Diagnostics?

theMs · January 25

Well all I needed to do was: Upgrade my Unraid version, I was a little behind. (this meant I didn't have access to the latest drivers, the "latest" was 515.x.x)

Now I have access to the latest drivers, which have what I need!

Edited January 25 by theMs
clarification

dezai · January 25

On 1/21/2024 at 2:30 PM, dezai said:

No no all good it is a 8Pin EPS 12V from the Riser and a 8 Pin EPS 12V in the GPU - and i´ve got a cable for exact that setup / R730 - Tesla P40.

But the R730xd can only Support 250W GPU´s with the 1100W PSU´s.
In my server are the 750W PSU´s - if you are using those the PCIE Riser only gets 125-150W to the 8 PIN 12V EPS Connector.

Thats the reason why every GPU in my house is working correctly but the P40 did not.

If you are using the 1100W PSU you will get 225W to the 8 PIN 12V EPS Connector / 75 from the PCIE Slot.

I´ve got 2x 1100W PSU´s from a reseller for 17,50€ each, i can test it hopfully next week.

Part from a reddit thread with the same Problem:
 GPUs require a redundant 1100W power supply and GPU enablement kit

I will get back in touch here when the PSU´s are in the Server.

Hey Guy´s, i can confirm that the problems from the "wrong cable errors" came from the 2x750W PowerSupplies.

I´m now using the 2x1100w and it is working.

But i have another problem, Tesla P40 doesnt go to the Idle State - it stay´s at P0 and uses 50W, is there a fix for that?

ich777 · January 25

31 minutes ago, dezai said:

But i have another problem, Tesla P40 doesnt go to the Idle State - it stay´s at P0 and uses 50W, is there a fix for that?

Append this to your go file:

nvidia-persistenced

For testing purposes you can issue the command also from the terminal and wait a minute or two and it should automatically go to P8 (depending if something is using the card or not).

[Plugin] Nvidia-Driver

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

ich777

ich777

ich777

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation