[Plugin] Nvidia-Driver


ich777

Recommended Posts

44 minutes ago, alturismo said:

does it wotk without the installed driver ?

 

i just wonder how these things should be related ... in terms you mean the Dell MM iDRAC

Yes, it does. This is what I get with the driver installed though:

 

image.thumb.png.85511de3ba4d4bd16c437dbdbed17fa2.png

 

Edit: console mode works fine of course, but sometimes you want gui mode.

Edited by WenzelComputing
Link to comment
On 2/15/2023 at 12:27 PM, ich777 said:

You are talking about the driver correct, not firmware...?

 

After you click Update & Download the server freezes or after you click reboot?

Are you really sure that the issue is actually the Nvidia Driver plugin, do you maybe have a SSH session open somewhere <- this will also prevent the reboot most of the times.

 

Have you yet tried to not install the driver and reboot the server just to double check if the server reboots?

From a technical standpoint nothing is different after clicking Update & Download since it only downloads the driver, nothing more, the driver is actually installed when the server boots.

Hey ich - Following up with you.  I've removed the card and it (unraid) been stable.  The nVidia M60 is powered by an 8-pin CPU cable and I don't think the stock 8-pin CPU cable will suffice since the official adapter takes 2 PCI-E cables and 'must carry 200+W....'.  

TL;DR - I don't think it's a plugin issue.  I'll follow up again once I get the correct adapter.  I can only surmise the card was working previously with the stock 8-pin CPU cable because my previous build was 7 drives and my latest build has double that.  As always, thank you for your continued support and prompt responses!

 

Cheers!

  • Like 1
Link to comment

Hi,

I am using Unraid inside a vm with proxmox as a host for it. I have a vGPU setup with an nvidia card, and I need to install the grid guest drivers on unraid to get support for the vgpu hooked to it. I already read that this might be out of the scope of this project here, but I would like to know how you setup your system environment to get the kernel modules to compile. I googled for a while and I still couldn't get it working (the nvidia kernel modules would not compile properly, and my setup environment is not properly made anyways). 

Any help or pointers would be appreciated.

Edited by midi
Link to comment
21 minutes ago, midi said:

I already read that this might be out of the scope of this project here, but I would like to know how you setup your system environment to get the kernel modules to compile.

I compile everything in a custom made Docker container which is based on Slackware but you can even compile it on Unraid itself if you install all the necessary packages.

 

You even can cross compile the Kernel on another Distribution but make sure that you've actually use the .config file from Unraid and also that you compile the same Kernel version that your Unraid VM is running on.

 

I really don't know how the GRID driver for vGPUs and everything is working around it but at least from what I read it is really complicated once a guy tried to do the same thing as you want to do and he ultimately gave up and switched over to Ubuntu because he didn't need the special Array type filesystem that Unraid has to offer.

 

Also keep in mind that Unraid is running from RAM and you have to install the driver and binaries again each time you reboot the system, the next thing to consider is when a new Unraid version drops you have to recompile everything again.

 

There is no plugin available for Unraid because you are not allowed to redistribute the driver for the GRID vGPUs because this would ultimately violate the EULA.

 

This post in the docs is a bit outdated but should be a good starting point: Click

Here is also a good post on how to set everything up and compile the Kernel for Unraid: Click

(but please keep in mind the build process will eventually change slightly to advancements and improvements to Unraid - it's sometimes even hard for me to keep up... :D )

  • Like 2
Link to comment
15 minutes ago, stranford said:

It shows up under System Devices, not bound to vfio. Any suggestions? 

Please make sure that you are on the latest BIOS version for you Motherboard and you've enabled Resizabe BAR support/Above 4G Decoding in your BIOS.

Feb 24 18:20:06 Tower kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Feb 24 18:20:06 Tower kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.89.02  Wed Feb  1 23:23:25 UTC 2023
Feb 24 18:23:49 Tower kernel: resource sanity check: requesting [mem 0x000e0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d4000-0x000e7fff window]
Feb 24 18:23:49 Tower kernel: caller _nv036385rm+0x2a/0x60 [nvidia] mapping multiple BARs

 

From what I see you are using a Intel Board and some people have really bad experiences with it and Nvidia cards on Linux.

 

If the above doesn't work try to boot with UEFI and see if this makes a change.

Link to comment
1 hour ago, ich777 said:

Please make sure that you are on the latest BIOS version for you Motherboard and you've enabled Resizabe BAR support/Above 4G Decoding in your BIOS.

Feb 24 18:20:06 Tower kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Feb 24 18:20:06 Tower kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  525.89.02  Wed Feb  1 23:23:25 UTC 2023
Feb 24 18:23:49 Tower kernel: resource sanity check: requesting [mem 0x000e0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d4000-0x000e7fff window]
Feb 24 18:23:49 Tower kernel: caller _nv036385rm+0x2a/0x60 [nvidia] mapping multiple BARs

 

From what I see you are using a Intel Board and some people have really bad experiences with it and Nvidia cards on Linux.

 

If the above doesn't work try to boot with UEFI and see if this makes a change.

 

Thanks, yeah it's an Intel s1200bt. I updated the bios and still don't have a resizable BAR option. It might just be time for an upgrade.

Link to comment
22 hours ago, stranford said:

Thanks, yeah it's an Intel s1200bt. I updated the bios and still don't have a resizable BAR option. It might just be time for an upgrade.

Try to boot with UEFI mode but I'm not entirely sure if it will work after you do so...

I remember a few users with Intel Board who had issues so far.

Link to comment
5 minutes ago, i_max said:

I was trying to search through the thread, but maybe I missed it, is there anyway to enable hardware transcoding for photoprism?

Please see the second post in this thread, if Photoprism supports it it does work by adding the variables and the runtime in the Extra Parameters as if you would add it to Emby, Jellyfin or Plex.

Link to comment
10 minutes ago, ich777 said:

Please see the second post in this thread, if Photoprism supports it it does work by adding the variables and the runtime in the Extra Parameters as if you would add it to Emby, Jellyfin or Plex.

 

So I did try adding the nividia run time parameter with a semicolon after the --restart=unless-stopped parameter. Which crashed the container. I did add all the other parameters from here -

 

https://docs.photoprism.app/getting-started/advanced/transcoding/

 

Nvidia hardware transcoding is supported by photoprism. Still getting nvec transcoding failed in logs.

Edited by i_max
Link to comment
Just now, i_max said:

So I did try adding the nividia run time parameter with a semicolon after the --restart=unless-stopped parameter. Which crashed the container. I did add all the other parameters from here -

This is really not much information that you are giving me here... What does crash?

Have you also added the other variables too like mentioned in the second post?

 

I really can't help with that less information...

 

If Photoprism does support it then it will work with the plugin for sure.

Link to comment
2 minutes ago, ich777 said:

This is really not much information that you are giving me here... What does crash?

Have you also added the other variables too like mentioned in the second post?

 

I really can't help with that less information...

 

If Photoprism does support it then it will work with the plugin for sure.

 

Sorry didn't mean to. Actually it just started working. I was seeing in the Photoprism logs, that the encoding was failing. I just removed the restart parameter and used only the run time parameter which seems to have done the trick.

 

How do I add a 2nd extra parameter to the container? to re-add the --restart=unless-stopped parameter. When I added the nvidia run time parameter with the semicolon, that crashed the container while trying to spin it up.  Here are the other variables I had added.

 

  -e 'PHOTOPRISM_FFMPEG_ENCODER'='nvidia'
  -e 'PHOTOPRISM_INIT'='tensorflow'
  -e 'NVIDIA_VISIBLE_DEVICES'='GPU-UUID'
  -e 'NVIDIA_DRIVER_CAPABILITIES'='all'

 

Incase you want to add this in the beginning if someone else is trying this.

 

Link to comment
14 minutes ago, i_max said:

I just removed the restart parameter and used only the run time parameter which seems to have done the trick.

Then you just have missed the space or something like that, the --restart=unless-stopped does not harm anything...

 

You have to use a space and not a semicolon.

  • Like 1
Link to comment

Hello all
I have already searched in the forum and also on google. Unfortunately still without success.
Perhaps someone can kindly help me or already has a solution for this problem.

 

I have the classic error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
 

I have an Nvidia Quadro P4000

The driver is the v525.89.02

Unraid Version: 6.11.5 

 

And yes, the Quadro 4000 is supported by the driver and is listed.

Here is the list of system devices. The graphics card is listed and not bound.

Could someone please help me with this?

 

Thanks a lot!

MP

Spoiler

PCI Devices and IOMMU Groups

IOMMU group 0:[8086:3405] 00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI Port (rev 22)

IOMMU group 1:[8086:3408] 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)

IOMMU group 2:[8086:340a] 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)

IOMMU group 3:[8086:340e] 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)

IOMMU group 4:[8086:342e] 00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers (rev 22)

IOMMU group 5:[8086:3422] 00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)

IOMMU group 6:[8086:3423] 00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)

IOMMU group 7:[8086:3a37] 00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4

Bus 003 Device 001 Port 3-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a38] 00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5

Bus 004 Device 001 Port 4-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a39] 00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6

Bus 005 Device 001 Port 5-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a3c] 00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2

Bus 001 Device 001 Port 1-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 001 Device 002 Port 1-5 ID 090c:1000 Silicon Motion, Inc. - Taiwan (formerly Feiya Technology Corp.) Flash Drive

IOMMU group 8:[8086:3a3e] 00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller

IOMMU group 9:[8086:3a40] 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1

[8086:3a4a] 00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6

[14e4:1681] 08:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5761 Gigabit Ethernet PCIe (rev 10)

IOMMU group 10:[8086:3a34] 00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1

Bus 006 Device 001 Port 6-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a35] 00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2

Bus 007 Device 001 Port 7-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a36] 00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3

Bus 008 Device 001 Port 8-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a3a] 00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1

Bus 002 Device 001 Port 2-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

IOMMU group 11:[8086:244e] 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)

IOMMU group 12:[8086:3a16] 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller

[8086:3a22] 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller

[2:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdf 3.00TB

[3:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdg 3.00TB

[5:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdh 3.00TB

[6:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdi 3.00TB

[8086:3a30] 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller

IOMMU group 13:[12d8:2304] 01:00.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)

IOMMU group 14:[12d8:2304] 02:01.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)

IOMMU group 15:[12d8:2304] 02:02.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)

IOMMU group 16:[1b21:0612] 03:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)

[8:0:0:0] disk ATA SAMSUNG MZ7LN256 300Q /dev/sdj 256GB

[9:0:0:0] disk ATA SAMSUNG MZ7LN256 300Q /dev/sdk 256GB

IOMMU group 17:[1b6f:7023] 04:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01)

Bus 010 Device 001 Port 10-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

Bus 010 Device 002 Port 10-1 ID 152d:0567 JMicron Technology Corp. / JMicron USA Technology Corp. JMS567 SATA 6Gb/s bridge

Bus 009 Device 001 Port 9-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

IOMMU group 18:[10de:06dd] 05:00.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro 4000] (rev a3)

[10de:0be5] 05:00.1 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1)

IOMMU group 19:[8086:2c41] 3f:00.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath Architecture Generic Non-Core Registers (rev 05)

[8086:2c01] 3f:00.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath Architecture System Address Decoder (rev 05)

IOMMU group 20:[8086:2c10] 3f:02.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 0 (rev 05)

[8086:2c11] 3f:02.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 0 (rev 05)

IOMMU group 21:[8086:2c18] 3f:03.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller (rev 05)

[8086:2c19] 3f:03.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Target Address Decoder (rev 05)

[8086:2c1c] 3f:03.4 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Test Registers (rev 05)

IOMMU group 22:[8086:2c20] 3f:04.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Control Registers (rev 05)

[8086:2c21] 3f:04.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Address Registers (rev 05)

[8086:2c22] 3f:04.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Rank Registers (rev 05)

[8086:2c23] 3f:04.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Thermal Control Registers (rev 05)

IOMMU group 23:[8086:2c28] 3f:05.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Control Registers (rev 05)

[8086:2c29] 3f:05.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Address Registers (rev 05)

[8086:2c2a] 3f:05.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Rank Registers (rev 05)

[8086:2c2b] 3f:05.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Thermal Control Registers (rev 05)

IOMMU group 24:[8086:2c30] 3f:06.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Control Registers (rev 05)

[8086:2c31] 3f:06.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Address Registers (rev 05)

[8086:2c32] 3f:06.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Rank Registers (rev 05)

[8086:2c33] 3f:06.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Thermal Control Registers (rev 05)

 

Edited by MasterPepe
Used the "spoiler" Feature
Link to comment
1 hour ago, MasterPepe said:

Hello all
I have already searched in the forum and also on google. Unfortunately still without success.
Perhaps someone can kindly help me or already has a solution for this problem.

 

I have the classic error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
 

I have an Nvidia Quadro P4000

The driver is the v525.89.02

Unraid Version: 6.11.5 

 

And yes, the Quadro 4000 is supported by the driver and is listed.

Here is the list of system devices. The graphics card is listed and not bound.

Could someone please help me with this?

 

Thanks a lot!

MP

  Hide contents

PCI Devices and IOMMU Groups

IOMMU group 0:[8086:3405] 00:00.0 Host bridge: Intel Corporation 5520/5500/X58 I/O Hub to ESI Port (rev 22)

IOMMU group 1:[8086:3408] 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)

IOMMU group 2:[8086:340a] 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)

IOMMU group 3:[8086:340e] 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)

IOMMU group 4:[8086:342e] 00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers (rev 22)

IOMMU group 5:[8086:3422] 00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)

IOMMU group 6:[8086:3423] 00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)

IOMMU group 7:[8086:3a37] 00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4

Bus 003 Device 001 Port 3-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a38] 00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5

Bus 004 Device 001 Port 4-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a39] 00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6

Bus 005 Device 001 Port 5-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a3c] 00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2

Bus 001 Device 001 Port 1-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 001 Device 002 Port 1-5 ID 090c:1000 Silicon Motion, Inc. - Taiwan (formerly Feiya Technology Corp.) Flash Drive

IOMMU group 8:[8086:3a3e] 00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller

IOMMU group 9:[8086:3a40] 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1

[8086:3a4a] 00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6

[14e4:1681] 08:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5761 Gigabit Ethernet PCIe (rev 10)

IOMMU group 10:[8086:3a34] 00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1

Bus 006 Device 001 Port 6-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a35] 00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2

Bus 007 Device 001 Port 7-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a36] 00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3

Bus 008 Device 001 Port 8-0 ID 1d6b:0001 Linux Foundation 1.1 root hub

[8086:3a3a] 00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1

Bus 002 Device 001 Port 2-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

IOMMU group 11:[8086:244e] 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)

IOMMU group 12:[8086:3a16] 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller

[8086:3a22] 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller

[2:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdf 3.00TB

[3:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdg 3.00TB

[5:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdh 3.00TB

[6:0:0:0] disk ATA WDC WD30EFRX-68E 0A82 /dev/sdi 3.00TB

[8086:3a30] 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller

IOMMU group 13:[12d8:2304] 01:00.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)

IOMMU group 14:[12d8:2304] 02:01.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)

IOMMU group 15:[12d8:2304] 02:02.0 PCI bridge: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch (rev 05)

IOMMU group 16:[1b21:0612] 03:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)

[8:0:0:0] disk ATA SAMSUNG MZ7LN256 300Q /dev/sdj 256GB

[9:0:0:0] disk ATA SAMSUNG MZ7LN256 300Q /dev/sdk 256GB

IOMMU group 17:[1b6f:7023] 04:00.0 USB controller: Etron Technology, Inc. EJ168 USB 3.0 Host Controller (rev 01)

Bus 010 Device 001 Port 10-0 ID 1d6b:0003 Linux Foundation 3.0 root hub

Bus 010 Device 002 Port 10-1 ID 152d:0567 JMicron Technology Corp. / JMicron USA Technology Corp. JMS567 SATA 6Gb/s bridge

Bus 009 Device 001 Port 9-0 ID 1d6b:0002 Linux Foundation 2.0 root hub

IOMMU group 18:[10de:06dd] 05:00.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro 4000] (rev a3)

[10de:0be5] 05:00.1 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1)

IOMMU group 19:[8086:2c41] 3f:00.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath Architecture Generic Non-Core Registers (rev 05)

[8086:2c01] 3f:00.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QuickPath Architecture System Address Decoder (rev 05)

IOMMU group 20:[8086:2c10] 3f:02.0 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Link 0 (rev 05)

[8086:2c11] 3f:02.1 Host bridge: Intel Corporation Xeon 5500/Core i7 QPI Physical 0 (rev 05)

IOMMU group 21:[8086:2c18] 3f:03.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller (rev 05)

[8086:2c19] 3f:03.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Target Address Decoder (rev 05)

[8086:2c1c] 3f:03.4 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Test Registers (rev 05)

IOMMU group 22:[8086:2c20] 3f:04.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Control Registers (rev 05)

[8086:2c21] 3f:04.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Address Registers (rev 05)

[8086:2c22] 3f:04.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Rank Registers (rev 05)

[8086:2c23] 3f:04.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 0 Thermal Control Registers (rev 05)

IOMMU group 23:[8086:2c28] 3f:05.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Control Registers (rev 05)

[8086:2c29] 3f:05.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Address Registers (rev 05)

[8086:2c2a] 3f:05.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Rank Registers (rev 05)

[8086:2c2b] 3f:05.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 1 Thermal Control Registers (rev 05)

IOMMU group 24:[8086:2c30] 3f:06.0 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Control Registers (rev 05)

[8086:2c31] 3f:06.1 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Address Registers (rev 05)

[8086:2c32] 3f:06.2 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Rank Registers (rev 05)

[8086:2c33] 3f:06.3 Host bridge: Intel Corporation Xeon 5500/Core i7 Integrated Memory Controller Channel 2 Thermal Control Registers (rev 05)

 

Are you passing gpu to vm or have bound it to vfio?

Link to comment

I'm receiving some crashes from presumably the NVIDIA driver plugin in Unraid. I've run a short memtest already for about 24 hours, I can run one for longer if required. Pastebin for the  entire crash is here. I did some research already and wasn't able to find much about this. My RAM is not ECC which I know is problematic but I'm hoping it's not a RAM issue.

Feb 19 21:53:17 Dragon kernel: BUG: unable to handle page fault for address: ffffc90011110200
Feb 19 21:53:17 Dragon kernel: #PF: supervisor write access in kernel mode
Feb 19 21:53:17 Dragon kernel: #PF: error_code(0x0002) - not-present page
Feb 19 21:53:17 Dragon kernel: PGD 100000067 P4D 100000067 PUD 100180067 PMD 0 
Feb 19 21:53:17 Dragon kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Feb 19 21:53:17 Dragon kernel: CPU: 5 PID: 9577 Comm: nvidia-smi Tainted: P           O      5.19.17-Unraid #2
Feb 19 21:53:17 Dragon kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING II, BIOS 5003 02/03/2023

I am on the latest version of the plugin, and have NVIDIA driver 525.89.02 installed on my 2070 Super GPU. I rarely use the GPU and only use it for TDARR. The diagnostic below are from right now and the crash happened a week ago so I'm not sure how reliable they will be.

dragon-diagnostics-20230227-2250.zip

Link to comment

Hi all, newbie here
  
I need some help with the Nvidia GRID Licensing applying in UNRAID since I can't find any information regarding this.
  
My setup is Proxmox as a host and virtualizing UNRAID with vGPU pass thru. I have successfully got everything set up. UNRAID and Nvidia Plugin was able to recognize it and able to use it for transcoding.
  
Although, due to the licensing. The GPU performance will be limited to 15 frames per second, with CUDA degraded after 20 minutes. I follow the SUSE licensing vGPU in the VM Guest the ClientConfigTokenPath session but with no luck. Is it different in UNRAID for applying the license?

 

Any pointer would be appreciate it!

 

root@UNRAID-VM:~# nvidia-smi -q | grep "License"
    vGPU Software Licensed Product
        License Status                    : Unlicensed

 

root@UNRAID-VM:~# nvidia-smi
Mon Feb 27 22:43:44 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID P100-1Q        Off  | 00000000:06:10.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |      0MiB /  2048MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Edited by Roland Vessalius
Added link and fix some text
Link to comment
2 hours ago, LimesKey said:

The diagnostic below are from right now and the crash happened a week ago so I'm not sure how reliable they will be.

First of all I would strongly recommend that you remove the nvidia.conf file in the modprobe.d directory, you are not using the OpenSource Driver module...

 

Have you yet tried to boot with CSM (Legacy) instead of booting with UEFI mode? Please also make sure that you are on the latest BIOS version, that you've enabled Above 4G Decoding and Resizable BAR Support in the BIOS.

 

I would also try to switch from MACVLAN to IPVLAN in the Docker settings first.

 

Is this only happening with Tdarr (if yes, IIRC this is nothing new that Tdarr can crash your server but TBH I really don't know if that was fixed already).

Have you yet tried to disable Tdarr and see if this is happening too with Emby/Jellyfin/Plex?

 

Do you can test the card in another system (install the drivers and put some 3D load for about 10 minutes on it, something like FurMark should do the Job just fine).

Link to comment
16 minutes ago, Roland Vessalius said:

I need some help with the Nvidia GRID Licensing applying in UNRAID since I can't find any information regarding this.

This is not (easily) possible because of the unique design from Unraid (because it runs out of ram) and you need the GRID guest driver which is not available to the public which I can't create a plugin/package for that because distributing the GRID driver would ultimately violate the EULA from Nvidia, I've explained it a bit more in depth here:

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.