Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

2 hours ago, timbo72 said:

Tesla P4

First of all I would recommend that you upgrade to 6.12.6

 

2 hours ago, timbo72 said:

but my X79-UD3 doesn't have either the 4g

This is named differently sometimes and may be something like "Support Large Address Space" or similar and hides in on of the PCI sub-menus.

 

Please make also sure that you have the latest BIOS version installed.

Have you yet tried to put the card in another PCIe slot (if that's possible on this board)?

You can also try to boot with UEFI instead of Legacy.

 

I have also seen that you are using S3 Sleep, are you actually active using it? Please keep in mind that a Tesla is a Datacenter card and was never meant to be put into Sleep mode <- but I actually don't recommend putting the server into sleep mode anyways because on some systems this causes issues.

 

BTW, you don't have to create a nvidia-bug-report.log since all the necessary information are in the Diagnostics.

Link to comment

Hey guys.

 

I'm having trouble installing the nvidia drivers again after I uninstalled them awhile back due to not needing them anymore. I slotted my old card back into my system and am looking to utilize it in docker again but I'm getting this error that doesn't really point me in any direction. If anyone has seen this and can tip me off in the right direction please do. The screenshot of the error and my Unraid diagnostics are attached to this reply.


Thanks!

 

 

Screen Shot 2023-12-30 at 2.48.00 AM.png

messiah-diagnostics-20231230-0253.zip

Link to comment
13 minutes ago, SaltShakerOW said:

If anyone has seen this and can tip me off in the right direction please do. The screenshot of the error and my Unraid diagnostics are attached to this reply.

Are you sure that your server is able to reach the GitHub API?

Do you have any AdBlocking somewhere on your network?

Do you have any Unifi gear on your network?

 

Make sure that the server can communicate with the GitHub API and from what I can see something is preventing it from communicating with the API.

 

However, I would also strongly recommend that you upgrade to the latest Unraid version 6.12.6

Link to comment
32 minutes ago, ich777 said:

Glad that you've solved it!

Oh, I think that those cards don't have a cooler on them correct?

yeah, as you mentioned they are ex datacentre so positive pressure cold and negative hot.

In my case it would be passivly drawing hot air off the HDD stack. I've knocked up a shroud for a spare 40mm which has done the trick.

  • Like 1
Link to comment
10 hours ago, ich777 said:

Are you sure that your server is able to reach the GitHub API?

Do you have any AdBlocking somewhere on your network?

Do you have any Unifi gear on your network?

 

Make sure that the server can communicate with the GitHub API and from what I can see something is preventing it from communicating with the API.

 

However, I would also strongly recommend that you upgrade to the latest Unraid version 6.12.6

I'm honestly not sure why the github api was being blocked, but an update to 6.12.6 fixed it. Thanks for the pointers

  • Like 1
Link to comment
  • 2 weeks later...

A few days after I reboot my unraid server my 1070ti goes non responsive to any of the docker applications that normally use it. The log starts spamming this:
Jan 13 20:56:15 Tower kernel: NVRM: GPU 0000:29:00.0: RmInitAdapter failed! (0x22:0x56:762)
Jan 13 20:56:15 Tower kernel: NVRM: GPU 0000:29:00.0: rm_init_adapter failed, device minor number 0
 

I have attached my dianostics zip. Just so that im not rebooting twice a week I have gone to cpu encoding but its sort of odd that this seems to happen. Anyone have any advice?

tower-diagnostics-20240113-2055.zip

Link to comment
2 hours ago, moose1be said:

Anyone have any advice?

when you look into your logs

 

Jan 13 04:39:37 Tower kernel: NVRM: Xid (PCI:0000:29:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Jan 13 04:39:37 Tower kernel: NVRM: GPU 0000:29:00.0: GPU has fallen off the bus.

 

1/ may check if the GPU is still working overall in another system ?

2/ power supply checked ?

3/ tested in another pcie slot ?

4/ checked if its really pushed into the pcie slot ?

5/ Bios check, above 4g decode, rbar activated, may try vise vers 

 

overall, hard to say, the logs say the card went off ... as i assume you didnt pull it out while running ;) most common power, pcie slot, defect card, ...

  • Like 1
Link to comment
17 minutes ago, moose1be said:

I guess i will poke around.

A XID error code 79 is a pretty generic error, it means basically everything (thermal issue, gpu firmware issue, computer firmware issue, bus error,…).

 

Most likely it’s an issue with your BIOS, check if there is a BIOS update and that you‘ve enabled Above 4G Decoding and Resizable BAR support as @alturismo suggested.

Link to comment

Hi guy´s, i think i fucked up.

I´ve got a P40 for real cheap and i thought it would be a great card for Unraid - Docker - Stable Difuion / Comfy-UI.
But i think it is not supported in the driver v545.29.06 :D

 

Is there a way to get this to work in docker?

 

My System

Dell R730xd (no reBar Support - only above 4G SR-IOV etc.)

2x E5-2680 V4

128GB DDR ECC RAM

A lot of SSD Storage

 

With best regards

Marc

Link to comment
40 minutes ago, dezai said:

Attatched the Diagnostics / Screenshot from the P40 in System Devices / Screenshot from your great driver with the working P2000

Did you read the first and second post in this thread how to get it working in a Docker container?

 

The card is recognized just fine.

 

Can you spot why it's not working:

Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: GPU does not have the necessary power cables connected.
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: RmInitAdapter failed! (0x24:0x1c:1436)
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: rm_init_adapter failed, device minor number 1

 

I also noticed this:

Jan 20 12:18:36 R730XD kernel: NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.

 

Did you install the user script from SpaceinvaderOne? If yes please remove it and simply append this line to your go file:

nvidia-persistenced

 

  • Like 1
Link to comment
3 hours ago, ich777 said:

Did you read the first and second post in this thread how to get it working in a Docker container?

 

The card is recognized just fine.

 

Can you spot why it's not working:

Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: GPU does not have the necessary power cables connected.
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: RmInitAdapter failed! (0x24:0x1c:1436)
Jan 20 12:18:36 R730XD kernel: NVRM: GPU 0000:82:00.0: rm_init_adapter failed, device minor number 1

 

I also noticed this:

Jan 20 12:18:36 R730XD kernel: NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.

 

Did you install the user script from SpaceinvaderOne? If yes please remove it and simply append this line to your go file:

nvidia-persistenced

 

 

Ok i did some research and it was the last thing i thought about.

I´m using the R730xd with 2x750W PSU ,CPUs are 2x120W ,P40 250W, so i thought i´m fine with the 2x750W (i´ve also tried non redundant mode).

After some research i found out, that Dell is limiting the GPU TDP to 150W (even if you use the 225W Riser) if you are using the 750W PSU.

 

So my GTX 970 is working fine / P2000 is also working just fine.

But the P40 doesnt work.....

I´m trying to get 2x 1100W PSU´s now.....

 

Yep i did use the Spaceinvader Script, i´ve deleted it now and put the nvidia-persistenced command in to the "go file"

 

And yep i´m using your Driver for a long time :)

 

 

So a little conlusion.....if my Dell server would give the p40 250W - it should work out of the box with the driver? 😮 😮

 

 

 

 

Edited by dezai
Link to comment
6 hours ago, dezai said:

But the P40 doesnt work.....

I think you missread what I've meant, it says this:

GPU does not have the necessary power cables connected.

 

Are you sure that you've plugged in the auxiliary power to the card since it needs auxiliary power connected (I think it's a 8pin PCIe power cable from the PSU)?

 

6 hours ago, dezai said:

it should work out of the box with the driver? 😮 😮

Yes, as long as you've connected the auxiliary power:

grafik.png.e1f68f61e84f8844efa0f131e25d40d0.png

  • Like 2
Link to comment
6 hours ago, ich777 said:

I think you missread what I've meant, it says this:

GPU does not have the necessary power cables connected.

 

Are you sure that you've plugged in the auxiliary power to the card since it needs auxiliary power connected (I think it's a 8pin PCIe power cable from the PSU)?

 

Yes, as long as you've connected the auxiliary power:

grafik.png.e1f68f61e84f8844efa0f131e25d40d0.png

 

No no all good :) it is a 8Pin EPS 12V from the Riser and a 8 Pin EPS 12V in the GPU - and i´ve got a cable for exact that setup / R730 - Tesla P40.

 

But the R730xd can only Support 250W GPU´s with the 1100W PSU´s.
In my server are the 750W PSU´s - if you are using those the PCIE Riser only gets 125-150W to the 8 PIN 12V EPS Connector.

Thats the reason why every GPU in my house is working correctly but the P40 did not.

If you are using the 1100W PSU you will get 225W to the 8 PIN 12V EPS Connector / 75 from the PCIE Slot.

 

I´ve got 2x 1100W PSU´s from a reseller for 17,50€ each, i can test it hopfully next week.

 

Part from a reddit thread with the same Problem:
 GPUs require a redundant 1100W power supply and GPU enablement kit

 

I will get back in touch here when the PSU´s are in the Server.


 

Edited by dezai
  • Like 1
Link to comment

I'm running into an issue when trying to use my NVIDIA RTX 3070 in Handbrake, it errors out with this:

[hevc_nvenc @ 0x147319ec4640] Driver does not support the required nvenc API version. Required: 12.1 Found: 11.1

 

My current driver version: v515.43.04

 

Any thoughts?

Link to comment

Well all I needed to do was: Upgrade my Unraid version, I was a little behind. (this meant I didn't have access to the latest drivers, the "latest" was 515.x.x)

 

Now I have access to the latest drivers, which have what I need! 

Edited by theMs
clarification
  • Like 2
Link to comment
On 1/21/2024 at 2:30 PM, dezai said:

 

No no all good :) it is a 8Pin EPS 12V from the Riser and a 8 Pin EPS 12V in the GPU - and i´ve got a cable for exact that setup / R730 - Tesla P40.

 

But the R730xd can only Support 250W GPU´s with the 1100W PSU´s.
In my server are the 750W PSU´s - if you are using those the PCIE Riser only gets 125-150W to the 8 PIN 12V EPS Connector.

Thats the reason why every GPU in my house is working correctly but the P40 did not.

If you are using the 1100W PSU you will get 225W to the 8 PIN 12V EPS Connector / 75 from the PCIE Slot.

 

I´ve got 2x 1100W PSU´s from a reseller for 17,50€ each, i can test it hopfully next week.

 

Part from a reddit thread with the same Problem:
 GPUs require a redundant 1100W power supply and GPU enablement kit

 

I will get back in touch here when the PSU´s are in the Server.


 

Hey Guy´s, i can confirm that the problems from the "wrong cable errors" came from the 2x750W PowerSupplies.

I´m now using the 2x1100w and it is working.

 

But i have another problem, Tesla P40 doesnt go to the Idle State - it stay´s at P0 and uses 50W, is there a fix for that?

  • Like 1
Link to comment
31 minutes ago, dezai said:

But i have another problem, Tesla P40 doesnt go to the Idle State - it stay´s at P0 and uses 50W, is there a fix for that?

Append this to your go file:

nvidia-persistenced

 

For testing purposes you can issue the command also from the terminal and wait a minute or two and it should automatically go to P8 (depending if something is using the card or not).

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...