[Plugin] Nvidia-Driver


ich777

Recommended Posts

5 hours ago, ich777 said:

Are you sure that you've followed the directions from the Nvidia Driver plugin page for the Open Source Kernel Module?

I don't see the folder /boot/config/modprobe.d and the nvidia.config file is also missing.

 

If you really want to use the open source driver you have to follow like mentioned on the Nvidia Driver plugin page for the Open Source Kernel Module.

 

The next thing is that you've bound your P4 to VFIO and that wont work either, you have to unbind it from VFIO so that it can show up on the host and therefore in the plugin:

0b:00.0 3D controller [0302]: NVIDIA Corporation GP104GL [Tesla P4] [10de:1bb3] (rev a1)
    Subsystem: NVIDIA Corporation GP104GL [Tesla P4] [10de:11d8]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidia_drm, nvidia

 

And last the NVS 310 isn't supported by the Open Source Kernel Module, or any other driver package because it is simply to old and needs the older legacy driver series 390 which isn't available through the plugin because you won't have much benefit because you simply can't use it in Docker containers.

As you can even see here the driver did also tell you that you card is not supported by the Open Source Kernel Module:

Jan 18 12:40:06 Executor kernel: NVRM: The NVIDIA GPU 0000:0a:00.0 (PCI ID: 10de:107d)
Jan 18 12:40:06 Executor kernel: NVRM: installed in this system is not supported by open
Jan 18 12:40:06 Executor kernel: NVRM: nvidia.ko because it does not include the required GPU
Jan 18 12:40:06 Executor kernel: NVRM: System Processor (GSP).

This is the card the driver is referring to:

0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 310] [10de:107d] (rev a1)
	Subsystem: Hewlett-Packard Company GF119 [NVS 310] [103c:1154]
	Kernel modules: nvidia_drm, nvidia

 

 

Please don't forget that if you want to use the Open Source Kernel Module it will only work if you create the file with the content described on the Nvidia Driver plugin page and reboot afterwards, but I would recommend that you stick to the latest available closed source driver.

Thanks!  I did as you asked.  I had left the .txt on the nvidia file and fixed it.  I also removed the vfio mapping. Still the same.  As for the NVS I have it as a basic card for the machine, a left over from my T320.  

 

I just want to use the P4 with a VM.  Is that going to be possible or am I wasting my time? That is the only reason I am trying the open source driver.  If I am going about this wrong please let me know.  I am using a Dell Precision T6710 so it doesnt have an onboard video.  I was told that If I pass through a GPU I could use it for the unraid output to a monitor. That is why I am using cards that are single slot only.  I have no spare slots.  Please correct me on this.  I just a vm to do video encoding for videos I work on.  

 

I really appreciate your assistance.  I uploaded the new diags. 

 

 

 

 

executor-diagnostics-20230118-2254.zip

Link to comment
13 minutes ago, Trackpads said:

I just want to use the P4 with a VM.  Is that going to be possible or am I wasting my time?

may  as note from Page 1

 

image.thumb.png.55969b84b3defb93dcf84f87ae896606.png

 

so as you are using 2 cards, this plugin could make sense for card 1 in docker(s) when i see this correctly (NVS310), but its useless for your VM Project if thats the question.

 

While the NVS310 is very old ...

  • Like 1
Link to comment
3 hours ago, wkipling said:

Is there anything I can do to troubleshoot/reset?

You don't have to reset anything.

 

The only indication that I see is this in your syslog:

Jan 19 10:41:59 Rack720XD kernel: NVRM: GPU 0000:43:00.0: RmInitAdapter failed! (0x31:0xffff:2476)
Jan 19 10:41:59 Rack720XD kernel: NVRM: GPU 0000:43:00.0: rm_init_adapter failed, device minor number 0

 

What Plex container are you using? Did you already try to fully shutdown the server and start it again?

Link to comment
1 hour ago, Trackpads said:

That is the only reason I am trying the open source driver.

But why the open source one?

The Tesla P4 isn't supported by the Open Source Kernel Module...

 

Why don't you use the latest driver? Anyways, as @alturismo said, you don't need this plugin if you want to use it in a VM like mentioned on the first page.

Link to comment
9 hours ago, ich777 said:

You don't have to reset anything.

 

The only indication that I see is this in your syslog:

Jan 19 10:41:59 Rack720XD kernel: NVRM: GPU 0000:43:00.0: RmInitAdapter failed! (0x31:0xffff:2476)
Jan 19 10:41:59 Rack720XD kernel: NVRM: GPU 0000:43:00.0: rm_init_adapter failed, device minor number 0

 

What Plex container are you using? Did you already try to fully shutdown the server and start it again?

Thanks for your quick reply.

 

I have done multiple reboots, I tried reseating the card in the same slot.

I am using binhex plex pass. It has been working fine until a couple of days ago. One morning I saw plex pinning some cores, it didn't appear to be doing any scheduled tasks so I tried to restart the docker which failed. Upon eventual reboot i got no devices found.
Interestingly when this issue first occurred, a reboot fixed it but not the second time.

 

I'll look into that syslog message. Thanks again for your time and hard work.

Link to comment
9 hours ago, ich777 said:

But why the open source one?

The Tesla P4 isn't supported by the Open Source Kernel Module...

 

Why don't you use the latest driver? Anyways, as @alturismo said, you don't need this plugin if you want to use it in a VM like mentioned on the first page.

Well that solved it then.  I removed the P4 since I cant get it to work in the vm, no way to export the vbios.  I ordered a Titan and will use it instead.  Thanks again for the patience and help!

  • Like 1
Link to comment
1 hour ago, wkipling said:

Interestingly when this issue first occurred, a reboot fixed it but not the second time.

In your BIOS check if Above 4G Decoding and Resizable BAR Support is enabled in your BIOS.

I would also recommend that you boot in legacy mode since this can solve a lot of issues too in terms of the Nvidia drivers.

 

1 hour ago, wkipling said:

I am using binhex plex pass.

Maybe try the official one from the CA App too, but if you didn't see the card on the plugin page there seems to be another cause of this issue.

 

Can you test the card in a desktop computer, install the drivers and put a 3D load (Furmark or something like that) on it for about 30 Minutes and see if everything is working?

 

I know it sounds maybe stupid because the card was working before but can you also check if the power supply isn't the cause of this issue? The driver seems to load fine but after you put a load on it (transcode in your case) the message form my previous post from your syslog appeared.

Link to comment

Hi all

 

I had the plugin and drivers working a while back, but upgraded to 6.11.5 a while back and didn't bother to install/reinstall the drivers again.

In a state of temporary mental derangement, I clicked "Install latest driver" from the plugin page today and waited until I got the reboot message. 

Unfortunately, my system now won't boot anymore.... it only goes past "Loading /bzroot....", but then I only get a blinking cursor on an otherwise empty screen.

What did I do wrong? How can I revert back to the state before, where at least my unraid was running smoothely?

 

Thank you and kind regards,

SnakeZZ

Link to comment
5 hours ago, SnakeZZ said:

How can I revert back to the state before, where at least my unraid was running smoothely?

may 1st start in safe mode (no plugins)

 

uninstall the plugin, then remove from the stick the driver

 

rm -R /boot/config/plugins/nvidia-driver

 

reboot, now you are running without the driver ... this will completely remove all of it.

 

then you can start from scratch and install the plugin, install your previously installed driver (stable ?) ...

Link to comment
8 hours ago, SnakeZZ said:

In a state of temporary mental derangement

I also have these moments... :D

 

...but in all seriousness, the plugin or better speaking the driver should not prevent your server from booting.

I have suspicion that maybe we are dealing with a bad boot drive here or another issue related to the boot device.

 

As @alturismo said, please remove the plugin manually, if you do this from another machine because obviousely your server isn't booting, delete this file too:

/config/plugins/nvidia-driver.plg

 

8 hours ago, SnakeZZ said:

it only goes past "Loading /bzroot....", but then I only get a blinking cursor on an otherwise empty screen.

Does it go immediately to a blinking cursor and display nothing?

On what machine are you running Unraid?

Link to comment
45 minutes ago, ich777 said:

Does it go immediately to a blinking cursor and display nothing?

On what machine are you running Unraid?

 

Deleting the plugin folder and .cfg didn't help.

It directly goes in to a blank screen/blinking cursor after "Loading /bzroot".

 

So... I guess something else is wrong with the stick... how do I recover from that?

I've been running Unraid on a Supermicro CSE-848 for nearly 3 years now.

Boot stick is a Kingston data traveler 16 GB - if that makes any difference.

 

Link to comment
1 minute ago, SnakeZZ said:

So... I guess something else is wrong with the stick... how do I recover from that?

  • Did you Enable Flash Backup in the My Server Plugin? If yes you can restore your Flash backup like this: Click
     
  • Do you have a backup from your USB Boot device? If yes you can simply create a new USB Boot device with the USB Creator Tool for Unraid and replace the whole "config" from your backup with the on the new USB Boot device.
     
  • You can also try to download the latest release (as time of writing 6.11.5) from here and replace only the bz* files in the root of the USB Boot device with the ones from your archive, maybe this will fix the issue but I would recommend that you buy a new USB Boot device since it is likely that something corrupted the files.

 

Did you by any chance change anything in the BIOS or in the config from your USB Boot device?

 

I would also recommend that you go through this thread for a new USB Boot device (I personally recommend Transcend JetFlash 600 USB2.0 32GB as Flash drive).

Link to comment
3 hours ago, ich777 said:
  • Do you have a backup from your USB Boot device? If yes you can simply create a new USB Boot device with the USB Creator Tool for Unraid and replace the whole "config" from your backup with the on the new USB Boot device.
     
  • You can also try to download the latest release (as time of writing 6.11.5) from here and replace only the bz* files in the root of the USB Boot device with the ones from your archive, maybe this will fix the issue but I would recommend that you buy a new USB Boot device since it is likely that something corrupted the files.

 

Did you by any chance change anything in the BIOS or in the config from your USB Boot device?

 

I had a second USB Stick lying around (bought 2 back in the day), so I used the USB Flash Creator and manually copied back the config folder.

Unfortunately, I now get the same issue from that stick. Stuck on the blinking cursor again.

 

Now I am a bit stuck... can my HW be faulty? It was running fine before.

I have ECC RAM, so I suppose RAM shouldn't be an issue.

 

Man.... this s*cks.

Link to comment
23 minutes ago, SnakeZZ said:

Now I am a bit stuck... can my HW be faulty? It was running fine before.

I have ECC RAM, so I suppose RAM shouldn't be an issue.

I would recommend that you create a post on the General Support sub-forum since I'm not too familiar with Supermicro boards.

 

I really can't think why the Nvidia Driver plugin should cause such an issue.

 

No changes to the hardware or software where made? Does the Motherboard has a onboard video card where it maybe outputs the console?

Link to comment
28 minutes ago, ich777 said:

I really can't think why the Nvidia Driver plugin should cause such an issue.

 

Me neither... But thank Flying-Spaghetti-Monster and Invisible-Pink-Unicorn, I got just it running again on the old stick.

I had a lot of devices in the boot sequence in BIOS and threw out everything but the USB stick.

That solved it. Very strange.

 

Thank you very very much for your help... I posted a similar post before posting here in the General Forum, but got no replies at all 😕

 

Link to comment

Hello, my friend.  I've been troubleshooting an issue now for over a week, and I could use some of your expertise in an area where, for my expertise, I'm well over my head.  I posted a thread last week in the General sub-forum, but haven't had much response. 

 

I recently upgraded my server with a new 5700G and B550 motherboard.  The server retains the Quadro P400.  The goal is to use the AMD iGPU as an Unraid console (to use KVM) and the P400 for transcoding in Unraid dockers. 

 

The issue is I cannot get the iGPU to display the Unraid login screen when booting into "Unraid OS GUI Mode" with the P400 and nVidia plugin installed.  With a monitor attached to the iGPU, from a power on boot I can watch boot sequence on the display, the initial Unraid boot text, GRUB menu, then more Unraid text.  I displays "Staring Samba" for just over a minute, then the display is blanked.

 

In my current configuration, if I boot into GIU Safe Mode, the display on the iGPU boots into the Unraid log in, and works as I would expect and desire.

 

I have tried so many different combinations of things:  EUFI vs Legacy, uninstall/reinstall both nVidia and Radeon_Top, remove P400, plugged HDMI dummy plug into P400 (was needed in earlier non-iGPU setup to boot), and more.  Each time I (usually) start from a power off boot.  I have so many hours now of troubleshooting, I have lost track.  What is common is if I remove the nVidia from the situation (physically or Safe Mode) the iGPU displays fine.  It seems that Unraid switches the console from the iGPU to the P400 at some point.

 

Any input you can offer is greatly appreciated.  The diags attached is the current/latest configuration in the standard GUI bott up.

malta-tower-diagnostics-20230124-0845.zip

Link to comment
6 minutes ago, ConnerVT said:

The issue is I cannot get the iGPU to display the Unraid login screen when booting into "Unraid OS GUI Mode" with the P400 and nVidia plugin installed.

Please run this command from a Unraid Terminal:

sed -i "/disable_xconfig=/c\disable_xconfig=true" /boot/config/plugins/nvidia-driver/settings.cfg

and reboot after that.

  • Thanks 1
Link to comment

WOO HOO!  You are my hero!  First time I've seen this 100% correct since I started on it a week ago Sunday.

 

I had at one point added "disable_xconfig=true" to the config file.  It ended up displaying out put that was squished down and extending too wide for the monitor (see picture).  I had found this in a bug report thread for 6.9.  I then had a bunch of issues undoing it (set to 'false', removed entirely) that I ended up rebuilding my flash from an earlier backup.  (That's a whole other frustrating story).  So I decided not to go down that path again.

 

So thank you, once again, for your help.  Not just for this, but all you do for the Unraid community!

squish.jpg

Edited by ConnerVT
wrong word
  • Like 1
Link to comment

A quick follow up.  When I tried "disable_xconfig=true" earlier, I may have been booting with UEFI.  One notable difference I saw on my machine is the text size (and GRUB menu) had smaller text.  In hindsight, this may be why I had the funny looking resolution when I tried this config entry before.  Something to think about.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.