[Plugin] Nvidia-Driver


ich777

Recommended Posts

7 minutes ago, alturismo said:

no performance issues, but, if you have a active docker running using the driver and turning on the VM ... this can result in a crash overall as its a either or usage only ... so either your VM using the GPU in passthrough OR the host using it, as sample for a GUI usage, docker usage, ... just be aware ;)

 

 

Ok, I see the potentiel problem, it's happen to me the first days of transition to the VM, so i will follow this recommendation ahah ! 😄

 

Thank you for this prompt answer ! Have a great day !

 

  • Like 1
Link to comment
20 minutes ago, Obbyed said:

When I install nvidia drivers for my quadro p400, it crashes the web UI and I have to remove from the config folder for the system to boot again

Do you have by any chance a syslog from a crash? Can you still connect to Unraid via SSH after installing the drivers? Was the card working before?

 

Nothing obvious in the Diagnostics from what I see.

Link to comment
1 hour ago, ich777 said:

Do you have by any chance a syslog from a crash? Can you still connect to Unraid via SSH after installing the drivers? Was the card working before?

 

Nothing obvious in the Diagnostics from what I see.

nothing appears to crash, the web UI is unnavailable once the drivers are installed, the card was working before, sorry not familiar with SSH'ing into the server, I can ping it however. I previously had a 1050ti installed could the driver version be my issue?

 

This is the error i get when going to install drivers

 

Warning: file_put_contents(): Only -1 of 163 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/notify on line 218

Warning: file_put_contents(): Only -1 of 174 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/notify on line 219

 

Edited by Obbyed
Link to comment
2 minutes ago, Obbyed said:

At your earlier suggestion i have tried it in another system and it is fully functional

Did you install a driver and put a 3D load too?

 

1 hour ago, Obbyed said:

Warning: file_put_contents(): Only -1 of 163 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/notify on line 218

Uh wait, how much RAM do you have in your system?

How much space do you have on your USB Flash device?

Link to comment
1 minute ago, ich777 said:

Did you install a driver and put a 3D load too?

 

Uh wait, how much RAM do you have in your system?

How much space do you have on your USB Flash device?

Yes I did

 

Currently have 4gb as it was bought as a barebones server, i have 16gb of space on the USB

Link to comment
9 minutes ago, Obbyed said:

Currently have 4gb as it was bought as a barebones server

4GB of are too less for the base system and the driver nowadays.

 

You need at least 8GB of RAM for the driver to work since they are getting bigger and bigger with each revision.

Link to comment
Just now, ich777 said:

4GB of are too less for the base system and the driver nowadays.

 

You need at least 8GB of RAM for the driver to work since they are getting bigger and bigger with each revision.

Thankfully i have an 8GB kit coming on monday just to keep me online, thank you very much for your help

  • Like 1
Link to comment
On 1/27/2023 at 7:08 PM, ich777 said:

I really don't know what causes that issue because I barely can't reproduce it and I'm pretty much clueless what it could be since Unraid runs from RAM and the package is also installed on boot, so this basically means everything is installed fresh on each reboot.

 

I can only imagine that something in Docker prevents it from running properly because that's the only place in the chain where something is stored across reboots.

 

Anyways, glad that everything is now working for you again!

Hey I found the root cause!

Krusader grabs something locking out  /proc/sys/kernel/overflowuid

I haven't dug into it too much. but basically if I run krusader I cannot restart anything that uses the GPU without a system reboot.

 

This is only the (looks it up) the one from your repository the binhex one is ok even when it has root privileges. (I did have your one on root privileges with the advised settings)

Edited by BomB191
Link to comment
3 hours ago, TDA said:

After a couple of hours (or days it depends) - it stops working and it's no more recognized from the plugin:

Seems like the Kernel module crashed, that happens usually because of some kind of hardware failure or hardware incompatibility.

Jan 26 02:59:57 Sekiro kernel: NVRM: GPU at PCI:0000:05:00: GPU-5d7f9c28-8f84-bb4a-7ae8-2456be369183
Jan 26 02:59:57 Sekiro kernel: NVRM: GPU Board Serial Number: 0322717107488
Jan 26 02:59:57 Sekiro kernel: NVRM: Xid (PCI:0000:05:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Jan 26 02:59:57 Sekiro kernel: NVRM: GPU 0000:05:00.0: GPU has fallen off the bus.
Jan 26 02:59:57 Sekiro kernel: NVRM: GPU 0000:05:00.0: GPU serial number is 0322717107488.
Jan 26 02:59:57 Sekiro kernel: NVRM: A GPU crash dump has been created. If possible, please run
Jan 26 02:59:57 Sekiro kernel: NVRM: nvidia-bug-report.sh as root to collect this data before
Jan 26 02:59:57 Sekiro kernel: NVRM: the NVIDIA kernel module is unloaded.

 

Is your card cooled enough so that it don't overheat?

Link to comment
3 hours ago, BomB191 said:

but basically if I run krusader I cannot restart anything that uses the GPU without a system reboot.

I will try that hopefully soon (my test server is still broken), thank you for the hint, but by default Krusader shouldn't be able to edit anything on the host...

Link to comment
7 hours ago, 56025192 said:

How can I solve it? None of the methods on the network are good

Please read the first post again, did you restart the Docker service after installing the plugin?

 

EDIT: Now I see the issue, you are updating the daemon.json on boot in the go file:

# Update mirrors
tee /etc/docker/daemon.json <<-'EOF'
{
 "registry-mirrors" : [
  "http://REMOVED.com",
  "https://REMOVED.com",
  "http://REMOVED.cn"]
}
EOF

 

Please delete this entry or at least comment it, reboot and see how it looks like when there no modification is made.

I would recommend that you add the required entries for Nvidia in your go file routine.

 

Hope that helps. ;)

 

EDIT2: Take also a look at this post where I explain how you can inject your mirrors too:

 

Link to comment
3 hours ago, ich777 said:

Please read the first post again, did you restart the Docker service after installing the plugin?

 

EDIT: Now I see the issue, you are updating the daemon.json on boot in the go file:

# Update mirrors
tee /etc/docker/daemon.json <<-'EOF'
{
 "registry-mirrors" : [
  "http://REMOVED.com",
  "https://REMOVED.com",
  "http://REMOVED.cn"]
}
EOF

 

Please delete this entry or at least comment it, reboot and see how it looks like when there no modification is made.

I would recommend that you add the required entries for Nvidia in your go file routine.

 

Hope that helps. ;)

 

EDIT2: Take also a look at this post where I explain how you can inject your mirrors too:

 

Thank you. The problem has been solved

  • Like 1
Link to comment

I'm having trouble getting my A4000 working. I enabled UEFI and above 4G decoding, but I'm still getting this error:

Jan 30 08:30:32 montero kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1457)
Jan 30 08:30:32 montero kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

nvidia-smi and driver plugin both say no devices found, but the card shows up in system devices, so I'm not sure what's the issue. Card works fine on my gaming pc running windows.

 

montero-diagnostics-20230130-1136.zip

Link to comment
2 hours ago, TealNerd said:

nvidia-smi and driver plugin both say no devices found, but the card shows up in system devices, so I'm not sure what's the issue. Card works fine on my gaming pc running windows.

First of all you have an error:

Jan 30 08:30:25 montero root: libkmod: kmod_config_parse: /etc/modprobe.d/nvidia.conf line 1: ignoring bad line starting with 'nvidia'

in your nvidia.conf file, it should be:

options nvidia NVreg_OpenRmEnableUnsupportedGpus=1

 

Second, you even don't use the Open Source driver package, please remove the file completely, reboot afterwards and post the Diagnostics again if the card isn't working.

 

Did you also enable Resizable BAR support in your BIOS?

 

2 hours ago, TealNerd said:

I enabled UEFI

If possible boot in legacy mode, I see in the initial post that you've wrote that it was working but stopped working after a reboot, did you boot in legacy mode when it was working?

Link to comment
59 minutes ago, TealNerd said:

I saw the advice in other places to boot in legacy mode, but my motherboard doesn't seem to support enabling above 4g decoding and resizable bar in legacy mode.

Then try booting without these options with legacy mode and see if it helps.

In the other thread you've said that it was working before correct?

 

Can you please check if a BIOS update is available for you motherboard? It seems to me that this is some kind of hardware incompatibility issue or at least some kind of wrong implementation in the BIOS.

Link to comment
1 minute ago, ich777 said:

Then try booting without these options with legacy mode and see if it helps.

In the other thread you've said that it was working before correct?

 

Can you please check if a BIOS update is available for you motherboard? It seems to me that this is some kind of hardware incompatibility issue or at least some kind of wrong implementation in the BIOS.

I'll check for a bios update, as for legacy with the options disabled, it doesn't work then either. The diag from the other thread was with legacy and no pcie options

Edited by TealNerd
Link to comment

Will these drivers allow UNRAID to access the 8 GPU cores on an older A10 series CPU?

 

hmmm... a test of the installation indicates that no, it won't find the "built in" GPU:

image.thumb.png.b92e67b43030d9552e01ce168a01c1bc.png

 

Anyone have any tips on configuration items that might enable this?

 

Had I been paying enough attention when I purchased the CPU, I wouldn't have purchased this one for a server. Sigh...

Edited by FreeMan
Link to comment
32 minutes ago, ich777 said:

Then try booting without these options with legacy mode and see if it helps.

In the other thread you've said that it was working before correct?

 

Can you please check if a BIOS update is available for you motherboard? It seems to me that this is some kind of hardware incompatibility issue or at least some kind of wrong implementation in the BIOS.

Updated bios to latest version, booted in legacy mode and no dice, getting the same error message. Is there anything specific I can look for in logs to get a clearer picture of what's going on?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.