Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

19 minutes ago, Bahamut___ said:

Here, i forgot to attach that

I think something with your network is wrong since Unraid even can't grab the latest versions from your plugins:

Could not download current plugin versions

community.applications.plg - 2024.06.08a
dynamix.unraid.net.plg - 2024.05.15.1138
fix.common.problems.plg - 2024.05.04
gpustat.plg - 2024.06.21
NerdTools.plg - 2024.02.17
nut-dw.plg - 2024.06.23
nvidia-driver.plg - 2024.01.19
rclone.plg - 2024.05.27
unbalanced.plg - 2024.03.26
unRAIDServer.plg - 6.12.10
user.scripts.plg - 2024.03.29

 

I see a lot of these messages in your syslog:

...
Jul  2 12:10:56 Tower kernel: eth0: renamed from vethe3bbea7
Jul  2 12:11:04 Tower kernel: vethe3bbea7: renamed from eth0
Jul  2 12:11:04 Tower kernel: eth0: renamed from veth5e7f227
Jul  2 12:11:06 Tower kernel: veth5e7f227: renamed from eth0
Jul  2 12:11:07 Tower kernel: eth0: renamed from vethb315cdc
Jul  2 12:11:08 Tower kernel: vethb315cdc: renamed from eth0
Jul  2 12:11:09 Tower kernel: eth0: renamed from veth6e05170
...

 

 

Have you yet tried to reboot?

I don't think that the Nvidia Driver plugin is the issue here.

Link to comment

Um I'm sure I'm not the frost but I did a thing then read this post after is there anyway to reset the hard lock I had gpu running dockers and then with a VM off decided to add it to the vm never started the vm and been going crazy as to why it's not working in my dockers anymore and seems like I must have hard locked it

Link to comment
3 minutes ago, EvoX008 said:

Um I'm sure I'm not the frost but I did a thing then read this post after is there anyway to reset the hard lock I had gpu running dockers and then with a VM off decided to add it to the vm never started the vm and been going crazy as to why it's not working in my dockers anymore and seems like I must have hard locked it

I'm really not sure what the issue is, are you really sure that the GPU hard locked or did the server hard lock?

Did you maybe bound the card to VIFO?

 

Can you please post your Diagnostics?

Link to comment
9 minutes ago, EvoX008 said:

but it does not show up in the Nvidia Drivers plugin

This is caused because the GPU doesn't initialize correctly (from your syslog) :

Jul  3 01:04:56 Cloud kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x65:2477)
Jul  3 01:04:56 Cloud kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

 

I would recommend the following steps:

  • Update your BIOS (your BIOS is from 2015 and the newest one is from 2018)
  • Enable Above 4G Decoding in your BIOS (some manufacturers also use something like: "Support for large address space" in the PCI options)
  • Enable (if your BIOS has the option) Resizabel BAR
  • Try to boot in Legacy Mode instead of UEFI

 

Sadly enough some older Motherboards are not fully compatible with new GPUs and have some Firmware (BIOS) issues that will prevent the card from properly working, however I would recommend to first try the steps from above and see if you can get it to work.

Link to comment
Just now, ich777 said:

This is caused because the GPU doesn't initialize correctly:

Jul  3 01:04:56 Cloud kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x65:2477)
Jul  3 01:04:56 Cloud kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

 

I would recommend the following steps:

  • Update your BIOS (your BIOS is from 2015 and the newest one is from 2018)
  • Enable Above 4G Decoding in your BIOS
  • Enable (if your BIOS has the option) Resizabel BAR
  • Try to boot in Legacy Mode instead of UEFI

 

lol i just updated the bios i was running a 2014 but wasn't sure about the 2018 because it says beta but ill go for it and check out those other things 

  • Like 1
Link to comment
10 hours ago, ich777 said:

I think something with your network is wrong since Unraid even can't grab the latest versions from your plugins:

Could not download current plugin versions

community.applications.plg - 2024.06.08a
dynamix.unraid.net.plg - 2024.05.15.1138
fix.common.problems.plg - 2024.05.04
gpustat.plg - 2024.06.21
NerdTools.plg - 2024.02.17
nut-dw.plg - 2024.06.23
nvidia-driver.plg - 2024.01.19
rclone.plg - 2024.05.27
unbalanced.plg - 2024.03.26
unRAIDServer.plg - 6.12.10
user.scripts.plg - 2024.03.29

 

I see a lot of these messages in your syslog:

...
Jul  2 12:10:56 Tower kernel: eth0: renamed from vethe3bbea7
Jul  2 12:11:04 Tower kernel: vethe3bbea7: renamed from eth0
Jul  2 12:11:04 Tower kernel: eth0: renamed from veth5e7f227
Jul  2 12:11:06 Tower kernel: veth5e7f227: renamed from eth0
Jul  2 12:11:07 Tower kernel: eth0: renamed from vethb315cdc
Jul  2 12:11:08 Tower kernel: vethb315cdc: renamed from eth0
Jul  2 12:11:09 Tower kernel: eth0: renamed from veth6e05170
...

 

 

Have you yet tried to reboot?

I don't think that the Nvidia Driver plugin is the issue here.


Seems like a "double reboot" fixed the issue, don't know if the network stuff is related to some operations i was doing on a container but everything else was working fine that i could see. Thanks a lot for the help anyway 🙏

  • Like 1
Link to comment
7 hours ago, EvoX008 said:

 

lol i just updated the bios i was running a 2014 but wasn't sure about the 2018 because it says beta but ill go for it and check out those other things 

Got it working did the BIOS update and 4G Decoding 

Didn't see a option for resize rebar but got it back working thanks for the quick response I really appreciate it I would have been beating my head against the wall all night it was already like 2pm

  • Like 1
Link to comment
On 6/5/2024 at 10:39 PM, ich777 said:

I see that but I really don't know why it fails on your machine.

The way the check works is that it checks for updates between 8 and 10 am by getting the latest version for the branch that is selected and if a newer version is found what you are using it is pulled down and message is sent to the user that the new driver is download.

 

The message that you see is generated when the download from the driver fails.

 

Please note that the driver needs about 250MB of free space on your USB Boot device.

BTW, you have a lot of FSCK files on your boot device, this indicates an issue with your USB Boot device however this is out of the scope from this thread.

Updates are still failing to automatically download. Just happened again with v550.100. Works perfectly fine if I download manually. Should I try reinstalling the plugin?

Link to comment

Hi,

 

Hope you can help.

 

I have an Nvidia 1050 TI, but none of the drivers work, even though it is listed as supported.

 

Here is an extract from my logs:

Jul  9 21:23:58 Lofty kernel: nvidia: loading out-of-tree module taints kernel.
Jul  9 21:23:58 Lofty kernel: nvidia: module license 'NVIDIA' taints kernel.
Jul  9 21:23:58 Lofty kernel: Disabling lock debugging due to kernel taint
Jul  9 21:23:58 Lofty kernel: AES CTR mode by8 optimization enabled
Jul  9 21:23:58 Lofty kernel: [drm] Initialized mgag200 1.0.0 20110418 for 0000:01:00.1 on minor 0
Jul  9 21:23:58 Lofty kernel: fbcon: mgag200drmfb (fb0) is primary device
Jul  9 21:23:58 Lofty kernel: BTRFS: device fsid ab17ef47-9a14-4f2a-975b-53afb10fcaf5 devid 1 transid 15717 /dev/sdd1 scanned by udevd (888)
Jul  9 21:23:58 Lofty kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 244
Jul  9 21:23:58 Lofty kernel: 
Jul  9 21:23:58 Lofty kernel: nvidia 0000:04:00.0: enabling device (0040 -> 0043)
Jul  9 21:23:58 Lofty kernel: nvidia 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Jul  9 21:23:58 Lofty kernel: NVRM: The NVIDIA GPU 0000:04:00.0 (PCI ID: 10de:1c82)
Jul  9 21:23:58 Lofty kernel: NVRM: installed in this system is not supported by the
Jul  9 21:23:58 Lofty kernel: NVRM: NVIDIA 470.239.06 driver release.
Jul  9 21:23:58 Lofty kernel: NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
Jul  9 21:23:58 Lofty kernel: NVRM: in this release's README, available on the operating system
Jul  9 21:23:58 Lofty kernel: NVRM: specific graphics driver download page at www.nvidia.com.
Jul  9 21:23:58 Lofty kernel: nvidia: probe of 0000:04:00.0 failed with error -1
Jul  9 21:23:58 Lofty kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Jul  9 21:23:58 Lofty kernel: NVRM: None of the NVIDIA devices were initialized.
Jul  9 21:23:58 Lofty kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 244
Jul  9 21:23:58 Lofty kernel: Console: switching to colour frame buffer device 128x48
Jul  9 21:23:58 Lofty kernel: mgag200 0000:01:00.1: [drm] fb0: mgag200drmfb frame buffer device
Jul  9 21:23:58 Lofty kernel: sr 5:0:0:0: [sr0] scsi3-mmc drive: 40x/12x writer dvd-ram cd/rw xa/form2 cdda tray
Jul  9 21:23:58 Lofty kernel: cdrom: Uniform CD-ROM driver Revision: 3.20
Jul  9 21:23:58 Lofty kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 244
Jul  9 21:23:58 Lofty kernel: 
Jul  9 21:23:58 Lofty kernel: nvidia 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
Jul  9 21:23:58 Lofty kernel: NVRM: The NVIDIA GPU 0000:04:00.0 (PCI ID: 10de:1c82)
Jul  9 21:23:58 Lofty kernel: NVRM: installed in this system is not supported by the
Jul  9 21:23:58 Lofty kernel: NVRM: NVIDIA 470.239.06 driver release.
Jul  9 21:23:58 Lofty kernel: NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
Jul  9 21:23:58 Lofty kernel: NVRM: in this release's README, available on the operating system
Jul  9 21:23:58 Lofty kernel: NVRM: specific graphics driver download page at www.nvidia.com.
Jul  9 21:23:58 Lofty kernel: sr 5:0:0:0: Attached scsi CD-ROM sr0
Jul  9 21:23:58 Lofty kernel: nvidia: probe of 0000:04:00.0 failed with error -1
Jul  9 21:23:58 Lofty kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Jul  9 21:23:58 Lofty kernel: NVRM: None of the NVIDIA devices were initialized.

 

image.thumb.png.1179c9d4fdcb502d9d2b7cf59b508d1e.png

04:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
        Kernel modules: nvidia_drm, nvidia
04:00.1 Audio device: NVIDIA Corporation GF116 High Definition Audio Controller (rev a1)

The logs indicated that the GPU is not supported, but from what I can see the 1050 TI is supported still by these drivers?

Ive tried the latest Production and earliest drivers available in the plugin.

 

Any suggestions?

 

Thanks in advance

Link to comment
1 hour ago, shiftylilbastrd said:

Should I try reinstalling the plugin?

No.

 

1 hour ago, shiftylilbastrd said:

Updates are still failing to automatically download. Just happened again with v550.100. Works perfectly fine if I download manually.

Are you sure that your sever time is correct? I will have to wait until tomorrow morning and see if my test server pulls the updated automatically (last time it did).

 

May I ask for what do you use the card in your server?

Link to comment
48 minutes ago, Tom Sealey said:

I have an Nvidia 1050 TI, but none of the drivers work, even though it is listed as supported.

Please always include your Diagnostics.

 

However it seems I got bad news for you... Your card is based on:

48 minutes ago, Tom Sealey said:
GF116

And GF116 is a Fermi 2.0 based card (more information on Fermi here), so to speak GTS 4xx and 5xx series and not Pascal based like a GTX1050Ti should be, this means you got a counterfeit card.

 

How much did you pay for the card? I hope you have an option to return it.

 

I would recommend that you want to use the card for transcoding only you look into a Nvidia T400, it you should be able to get it for under $ 100,- , it is really power efficient (only a max. of 35W) and is enough for 3 to 4 simultaneous 4K transcodes (depending on bitrate and so on).

  • Like 1
Link to comment
6 hours ago, shiftylilbastrd said:

It's used for Plex transcoding. Just double checked server time and it's correct.

TBH, if you got trouble with the update check disable it because if you just use the card fro transcoding the driver updates won't do much for you anyways.

 

You have to wait a bit longer until my Test Server checks for updates so that I can see if my Server pulls the update.

 

EDIT: I found and fixed a bug in the update check for Production and New Feature Branch, please update the plugin, it should work next time if Latest Production or New Feature Branch is selected.

Link to comment

Thanks @ich777.

 

I bought it on Ebay as new but unused...

 

8 hours ago, ich777 said:

Your card is based on: GF116

And GF116 is a Fermi 2.0 based card (more information on Fermi here), so to speak GTS 4xx and 5xx series and not Pascal based like a GTX1050Ti should be, this means you got a counterfeit card.

 

 

I did wonder if it was fake, but the pci id seemed right, and the line above says GP107, so I thought it was legit. Missed checking the audio driver line.

 

I'll complain to ebay, I might get my money back...

Thanks for the help.

  • Like 1
Link to comment
10 hours ago, ich777 said:

Nvidia T400, it you should be able to get it for under $ 100,- , it is really power efficient (only a max. of 35W) and is enough for 3 to 4 simultaneous 4K transcodes (depending on bitrate and so on).

Thanks for the suggestion. It's for transcoding cctv streams for Frigate.

I'll look for a T400... a genuine one. 

  • Like 1
Link to comment
5 minutes ago, Tom Sealey said:

I'll look for a T400... a genuine one. 

You can even look for a higher tier like T600 or T1000 depending on how many simultaneous streams you have.

If you have many streams something like a Quadro should be considered.

  • Like 1
Link to comment
6 hours ago, ich777 said:

You can even look for a higher tier like T600 or T1000 depending on how many simultaneous streams you have.

If you have many streams something like a Quadro should be considered.

Thanks.

T400 should be good, it's 2 4k CCTV cameras. 
Should be perfect and power efficient.

 

Link to comment

Hello! New unraid user here.

 

I have constructed a NAS including a 3050 6gb low profile card. I am running in to the issue with 'NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.' I am also having some issues where display cuts out on the card after a short time. I believe that my card is at least getting sufficient power from the pcie slot on my aorus b550i pro ax board, mostly due to the following:

 

1. running `lspci` reveals:

07:00.0 VGA compatible controller: NVIDIA Corporation GA107 [GeForce RTX 3050 6GB] (rev a1)

 

2. and running `modinfo nvidia` reveals:

lename:       /lib/modules/6.8.12-Unraid/kernel/drivers/video/nvidia.ko
alias:          char-major-195-*
version:        555.58.02
supported:      external
license:        NVIDIA
firmware:       nvidia/555.58.02/gsp_tu10x.bin
firmware:       nvidia/555.58.02/gsp_ga10x.bin
srcversion:     7CC59AD55E0DD69F0C28592

 

I am attaching my page and my diagnostics as well. Any help would be appreciated.

plugin.png

scheherazade-diagnostics-20240711-2254.zip

Edited by ahaseros
Link to comment
1 hour ago, ahaseros said:

I have constructed a NAS including a 3050 6gb low profile card.

Here is your issue:

Jul 11 22:48:55 scheherazade kernel: nvidia 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 11 22:48:55 scheherazade kernel: nvidia 0000:07:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=none
Jul 11 22:48:55 scheherazade kernel: NVRM: The NVIDIA GPU 0000:07:00.0
Jul 11 22:48:55 scheherazade kernel: NVRM: (PCI ID: 10de:2584) installed in this system has
Jul 11 22:48:55 scheherazade kernel: NVRM: fallen off the bus and is not responding to commands.

 

From what I see you are using a outdated BIOS too, make sure to upgrade your BIOS first after that continue with the steps here in this post.

 

It seems that you are using a Ryzen CPU, please make sure to disable C-States in the BIOS entirely, check if Support for large Address Space in the BIOS or Above 4G Decoding and Resizable BAR Support is enabled in the BIOS.

 

When it is still not working after you did all of the above try to boot into legacy mode instead of UEFI and see if that helps.

It could also be the case that this is caused by a Firmware bug (BIOS issue) however I would first recommend to try the steps from above and see if you can get it working, in it's current state it seems the card is not answering and the driver can't initialize the card.

Are you sure that your power supply is also up to the task and is capable of delivering enough power?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...