[Plugin] Nvidia-Driver


ich777

Recommended Posts

27 minutes ago, marnon said:

Is there anyway to make my gpu work ? 

No, since if you want to use it in containers for HW transcoding or something else you need at least a driver with version 400+
 

I would recommend to look out for something newer on the used marked or even new, for examle you should be able to pick up a NVIDIA T400 for about ~130,- brand new (low profile, can only draw a maximum of about 30Watt).

Link to comment

I am having issues with my unraid and NVIDIA drivers.  everything was working fine until last 2 days.  unsure of what triggered the issue, but when I have the NVIDIA drivers installed, it causes kernel error.   This freezes my server.  I cannot even reboot ( even through SSH) and requires hard reboot.  I did not load my docker containers as the docker vdisk shows corrupted since this issue started and I rebuilt it once, and then this happened again.  I think its related to the NVIDIA driver.  my hardware is:

 

 Intel® Xeon® CPU E5-2680 v2 @ 2.80GHz

112 GiB DDR3 Multi-bit ECC

Intel Motherboard

NVIDIA Quadro P2000 as the video card

I have attached the diagnostics .

tower-diagnostics-20220328-1938.zip

Link to comment
6 hours ago, littlebudha said:

unsure of what triggered the issue, but when I have the NVIDIA drivers installed, it causes kernel error.

Yes, see this from your log, after about 5 minutes after the driver got installed it runs into a Kernel Panic.

 

  • Was the card working before?
  • Have you changed anything to your setup (BIOS updated, BIOS changes, HW installation)?
  • For what exactly do you use the card on your server?
  • Do you have a monitor connected to the server?
  • (Would it be possible for you to put the card in another computer to stress test it? <- this is only a thing I would recommend as last step after we troubleshooted everything else)

 

6 hours ago, littlebudha said:

I cannot even reboot ( even through SSH) and requires hard reboot. 

Remove the 'nvidia-driver.plg' file that is located on your USB Boot device in .../config if you got any issue if it crashes instantly on boot, this basically prevents the plugin from installing on boot.

 

Please double check if you have above 4G Decoding turned on in your BIOS.

 

I would recommend doing this procedure:

  1. Stop the Array
  2. Disable Autostart from the Array
  3. Install the Nvidia Driver again
  4. See if it crashes -> if not reboot (without starting the Array)
  5. After reboot see if it crashed -> if not try to start the Array and see if it crashes again
Link to comment

Have following issue after upgrading Unraid to 6.9.2 (reviously everything was working properly):

"Failed to initialize NVML: Unknown Error"

 

But GPU is defined in sysdevices propelry.

Tried to reinstall plugin with no success

 

Any ideas what to do in that case? 

IOMMU.PNG

Plugin.PNG

Link to comment

Sorry, mislead you in previous messages. Upgrade is not the reason cause plugin was working on the same 6.9.2 version before. But before the issue my BIOS settings was reset (due to USB boot failure) so after this I have this error. Seems I should check BIOS settings but don`t sure how they can affect GPU and it`s functionality.

Link to comment
1 minute ago, ionedji said:

Sorry, found out that VM with GPU path through was turned on automatically and this was the reason of issue.

Yes, I now got time and saw this:

03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP104 [GeForce GTX 1080] [1043:8592]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidia_drm, nvidia

As you can see the GTX1080 is bound to VFIO.

 

Please also note that your GT630 isn't supported by the newer drivers anymore.

 

Do you want to use the GTX1080 or the GT630 in a VM or for containers?

Link to comment
32 minutes ago, ich777 said:

Yes, I now got time and saw this:

03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP104 [GeForce GTX 1080] [1043:8592]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidia_drm, nvidia

As you can see the GTX1080 is bound to VFIO.

 

Please also note that your GT630 isn't supported by the newer drivers anymore.

 

Do you want to use the GTX1080 or the GT630 in a VM or for containers?

 

I`m using 1080 for dockers, GT630 is used as main GPU for Unraid so nothing to worry about. Thank you for your support, sorry for disturbance :)

  • Like 1
Link to comment

The most recent update (v510.60.02) broke my GPU functionality (GeForce GTX 1070). unRAID could not see the GPU anymore after the update. I had to rollback to v510.54. If there is anything anyone wants me to attach for diagnostics, please let me know. 

 

For now I am going to keep off the latest channel until the next update.

 

Thanks!

Edited by BourbonCoffee
Link to comment
1 hour ago, BourbonCoffee said:

If there is anything anyone wants me to attach for diagnostics, please let me know. 

Yes, please attach the Diagnostigs otherwise I can‘t diagnose anything because I don‘t know on what Unraid version are you, what the driver itself does and so on…

 

But keep in mind I need the Diagnostics from the driver that don‘t work.

 

Please install select the non working driver, click the Download button on the Nvidia Driver plugin page, wait for it to finish, then reboot and then pull the Diagnostics.

Link to comment

I just had to do a flash drive replacement and everything is back up and running, but I noticed an old reported issue is apparently still occurring and is now being duplicated.

 

Quote

On 10/7/2021 at 2:12 PM, N385 said:

Warning: commands will be executed using /bin/sh and it sits on that for about 60 seconds

 

@ich777's reply:

 

Quote

This can be ignored and will be fixed in an upcoming version from the plugin, the ~60 seconds are actually the time it takes to install the plugin.

 

The attached picture of my normal boot now shows 2 of those messages:

 

EndofBootErrorsAnimNAS1.thumb.jpg.1539b092bd1911f41ebcfc99b5d184f0.jpg

 

I fixed the /var/temp/go issue (ZFS plugin related) but I'm also still trying to find what causes the logger: send message failed: Bad file descriptor message. Any thoughts on the job 3/4 messages?

 

Edited by AgentXXL
Link to comment
On 4/6/2022 at 9:56 PM, ich777 said:

Please attach your Diagnostics.

 

I will grab a clean set after my next reboot, but I prefer to send them direct vs public posting. I do use the anonymize function but having looked through them, there's still info captured that I prefer not be public. That said, it's more about the job 3/job 4 messages which seem to be very similar to the message reported by N385 last year. Has the issue that caused those messages been fixed?

 

I find no evidence of those messages in syslog, and haven't found any logs specific to the Nvidia plugin that I can look through. I understand wanting a full set of diagnostics but it would be nice to get tips from more advanced users like yourself so that we can do our own troubleshooting.

Edited by AgentXXL
Link to comment
1 hour ago, AgentXXL said:

I prefer to send them direct vs public posting.

You can always send them directly to me...

 

1 hour ago, AgentXXL said:

That said, it's more about the job 3/job 4 messages which seem to be very similar to the message reported by N385 last year. Has the issue that caused those messages been fixed?

This is not really an issue at all and I don't think that thay are caused by the Nvidia plugin, rather I think they are caused by another plugin, but I could be wrong about that, maybe I've also introduced such a new message with the update from the plugin where I introduced the automatic update check to it.

 

1 hour ago, AgentXXL said:

but it would be nice to get tips from more advanced users like yourself so that we can do our own troubleshooting.

As said above these are not really issues at all, I don't think you have any trouble running a plugin or something causes instability or am I wrong?

This is mostly because I have to run a background process that checks my Github repo every day if there is a newer Nvidia driver available.

 

One background process is also running for the Plugin Update Helper (don't know if you have seen it already in action) which is always running regardless of which Driver Plugin (Nvidia, DVB, ZFS, USBIP,...) you have installed.

 

That are maybe the two things that causes this messages.

 

But as said above these messages will not harm your system in any way, have to look into it if I can hide those messages, but I'm really not entirely sure...

  • Thanks 1
Link to comment
14 hours ago, ich777 said:

You can always send them directly to me...

 

This is not really an issue at all and I don't think that thay are caused by the Nvidia plugin, rather I think they are caused by another plugin, but I could be wrong about that, maybe I've also introduced such a new message with the update from the plugin where I introduced the automatic update check to it.

 

As said above these are not really issues at all, I don't think you have any trouble running a plugin or something causes instability or am I wrong?

This is mostly because I have to run a background process that checks my Github repo every day if there is a newer Nvidia driver available.

 

One background process is also running for the Plugin Update Helper (don't know if you have seen it already in action) which is always running regardless of which Driver Plugin (Nvidia, DVB, ZFS, USBIP,...) you have installed.

 

That are maybe the two things that causes this messages.

 

But as said above these messages will not harm your system in any way, have to look into it if I can hide those messages, but I'm really not entirely sure...

 

Thank you... I'm not seeing any issues that I can trace to the job 3/job 4 messages, but my OCD prefers to see a clean boot. I'm in the process of doing some minor hardware maintenance today so I'll capture the fresh set of diagnostics and send them to you. It's odd that I can't see anything in syslog that mirrors those messages, and same for the 'logger' message that I'm also trying to track down.

 

I haven't seen anything that would point at the Plugin Update Helper, but I'm using both your Nvidia and ZFS plugins. If you need me to check anything on my system, let me know.

 

Its obviously not a rush, but hopefully we can track down the source of these messages. Even if there's not a simple way to prevent them, knowing what actually causes them will help satisfy my OCD.

 

 

Link to comment
20 hours ago, AgentXXL said:

I haven't seen anything that would point at the Plugin Update Helper, but I'm using both your Nvidia and ZFS plugins. If you need me to check anything on my system, let me know.

 

@ich777 I just had a small realization on why there's two such messages - because I'm now using 2 of your plugins - Nvidia and ZFS. That makes it likely that it's the same code in both plugins, and as such it could be related to the Plugin Update Helper if both use it. Note that the messages go away if I boot without those plugins installed, so it's certainly looking like they're the culprits.

 

I'll be sending you my fresh diagnostics from yesterday shortly. I may have tracked down the logger message as something related to using a remote syslog server. The rsyslog functions are initialized before the network is fully up, so it fails with a few errors during the bootup. That's a native unRAID issue so I'll follow up with a bug report for them.

 

Regardless, thank you again for your plugins! I'm very pleased with the functionality from both of the ones I'm using.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.