[Plugin] Nvidia-Driver


ich777

Recommended Posts

3 hours ago, BrockTrainedAsh said:


I followed it... multiple times throughout this process. My setup should be correct. I should add, I once had it working but then one day I noticed that my cpu was transcoding a file and then I realized that something stopped working along the way.

I'm in the same boat. I have a GTX 1060 that was working great for transcoding. I recently noticed that my GPU Statistics said "N/A" for everything and the GPU doesn't work anymore. I'm getting a ton of those RmInitAdapter failed messages now, but I didn't change anything with my BIOS or HW. Not sure why this would just start all of a sudden and it seems like it's happening to multiple people.

Link to comment
5 hours ago, BrockTrainedAsh said:

My setup should be correct

But it is not.

 

Look at that:

grafik.png.14266878a0f8746de5739c75802df2d2.png

By doing it like you do it you end up passing two times -e which is ultimately wrong.

 

It should say: NVIDIA_VISIBLE_DEVICES=<GPUUUID>

I also don't recommend using all because this causes often times issues, decide which GPU you want to use and replace <GPUUUID> with the UUID.

 

The capabilities are also missing, are you sure you've followed the second post in this thread? There is a step by step tutorial for Plex there:

Link to comment
On 10/21/2023 at 12:47 AM, ich777 said:

I don't think so because you are facing another issue that the initialization failed but I really can't tell because without the Diagnostics I can't say anything.

Ah, I may have quoted the wrong post as I was reading through this. Diagnostics are attached. It's just odd that it was working fine and then randomly stopped.

 

 

tower-diagnostics-20231023-1206.zip

Screenshot 2023-10-23 121618.jpg

Screenshot 2023-10-23 121558.jpg

Edited by atlasalex
Adding screenshots
Link to comment
24 minutes ago, atlasalex said:

It's just odd that it was working fine and then randomly stopped.

Are you sure that the motherboard/power supply is up to the task?

 

It seems that your card dropped from the system an now fails to initialize:

Oct 20 15:10:26 Tower kernel: caller _nv038252rm+0x35/0x70 [nvidia] mapping multiple BARs
Oct 20 15:10:26 Tower kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x24:0x72:1428)
Oct 20 15:10:26 Tower kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 20 15:10:26 Tower kernel: nvidia-uvm: Loaded the UVM driver, major device number 243.

 

Did you update the BIOS or change anything in terms of BIOS or Hardware?

 

Make sure that Above 4G Decoding is enabled in the BIOS and also check if there is a BUIOS Update for your motherboard, I know that this is a petty old/ancient system but I think there should be an update for it.

 

You should also upgrade to Unraid 6.12.4 since 6.11.5 is now almost a year old...

Link to comment

Hi! I recently upgraded hardware and have run into a frustrating issue although I do have a workaround at the moment of using Intel quick sync (I will need more streams sooner than later so I do want to figure this out). When I have the nVidia plugin installed to use my Quadro P2200, the memory log usage grows quickly and hits 100% after about 24 hours. When that happens, the UI is no longer accessible and I have to telnet in to restart the system and clear the log. Diagnostics are attached. From what I can tell, the syslog is just filling up with tens of thousands of repeated entries for the below. But I can't figure out why it's happening.

 

I saw on page 101 there was a workaround to Suppress ACPI Error messages. I might give that a shot if there are no other recommendations or solutions? The weird thing is, despite the message log spam, the system works great otherwise. I can transcode on the nVidia perfectly fine with no issues and it would be perfect other than the system becoming unstable when the log maxxes out.

 

Oct 11 17:22:40 Unraid kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Oct 11 17:22:40 Unraid kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Oct 11 17:22:40 Unraid kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)

 

unraid-diagnostics-20231011-1722.zip

Link to comment
3 hours ago, nooneralpha123 said:

The weird thing is, despite the message log spam, the system works great otherwise.

This is caused because, as the message implies, your motherboard firmware has some kind of bug in it.

 

Have you yet tried to disable the IPMI and see if that changes anything, also try to disable the Aspeed iGPU?

Do you have Above 4G Decoding, Lage Address Space for PCI devices and Resizable BAR support enabled in your BIOS?

In your case I would also try to boot with UEFI instead of CSM (Legacy) and see if that makes a difference.

Link to comment
15 hours ago, ich777 said:

Have you yet tried to disable the IPMI and see if that changes anything, also try to disable the Aspeed iGPU?

Do you have Above 4G Decoding, Lage Address Space for PCI devices and Resizable BAR support enabled in your BIOS?

In your case I would also try to boot with UEFI instead of CSM (Legacy) and see if that makes a difference.

 

If I disable CSM, the iGPU gets disabled (I don't understand why they are linked) and I still have the recurring log entries. Above 4G Decoding was already enabled but I did have Resizable BAR support set to Disabled. I enabled that and the system was stable for about 90 minutes, then the log spam started again.  I tried the workaround of editing my Go file so it looks like below and then rebooting but that also did not work.

 

#!/bin/bash
# Start the Management Utility
/usr/local/sbin/emhttp &
modprobe i915
# Suppress ACPI Error messages
echo ":msg,contains,\"ACPI Error: AE_ALREADY_EXISTS\" stop" >> /etc/rsyslog.d/01-blocklist.conf
echo ":msg,contains,\"ACPI Error: Aborting method\" stop" >> /etc/rsyslog.d/01-blocklist.conf
echo ":msg,contains,\"ACPI BIOS Error (bug)\" stop" >> /etc/rsyslog.d/01-blocklist.conf
/etc/rc.d/rc.rsyslogd restart

 

Link to comment
9 hours ago, nooneralpha123 said:

iGPU

Are we talking about your Intel iGPU or the ASPEED one?

 

If you are talking about the iGPU this is most certainly caused because usually they get disabled if you put a dGPU in your system, you can prevent this by using the iGPU as your Primary Graphics output and enable Multi Monitor Support <- this is actually what enables the iGPU if a dGPU is installed.

 

9 hours ago, nooneralpha123 said:

I did have Resizable BAR support set to Disabled

I would recommend that you also enable this feature with newer hardware.

 

9 hours ago, nooneralpha123 said:

I enabled that and the system was stable for about 90 minutes, then the log spam started again.

Maybe someone started transcoding?

 

First of all remove this file: /boot/config/modprobe.d/i915.conf <- because you are blacklisting your iGPU there and then enabling it the go file with this line:

10 hours ago, nooneralpha123 said:
modprobe i915

 

Even if you are blacklisting it, the line in the go file is not necessary because you have the Intel-GPU-TOP plugin installed which will enable your iGPU anyways. So to speak, please also remove the above mentioned line from the go file.

 

10 hours ago, nooneralpha123 said:

also did not work.

I would recommend that you change your added lines to the go file like that:

# Suppress ACPI Error messages
echo ':msg,contains,"ACPI Error: AE_ALREADY_EXISTS" stop
:msg,contains,"ACPI Error: Aborting method" stop
:msg,contains,"ACPI BIOS Error (bug)" stop' > /etc/rsyslog.d/95-blockacpi.conf
/etc/rc.d/rc.rsyslogd restart

(this will work)

Link to comment

Hello, i'm sorry to disturb you all, but i have a problem, the plug-in works well, but i have an old NVIDIA Quatro 4000 and the driver with the plug-in are too new, and i need to install older driver like "NVIDIA-FreeBSD-x86_64-390.141.tar" but i don't know where or how to put them for the plug-in to add them. ???? anyone can help me with this, Please

Link to comment
4 hours ago, AndreB said:

but i don't know where or how to put them for the plug-in to add them.

this is not supported anymore, also may consider, what do you want todo with this card ... no nvenc encoder functionality ... rather switch to something newer ... may just take a look at the recommended posts upper yours (the 1st one).

  • Like 1
Link to comment

I have a new install on a Dell R730XD server and I am looking to install a Quadro P2000 on it.  I have Unraid installed and working with the Nvidia Driver installed, however, it is not seeing my card.  I know there are people that are using the same card on a r730xd and wondering what was done to allow it to work?  I am using dual 750w power supplies, do I need to upgrade to the 1100watt ones?  Let me know what info you may need.

Link to comment
On 10/20/2023 at 10:44 PM, ich777 said:

But it is not.

 

Look at that:

grafik.png.14266878a0f8746de5739c75802df2d2.png

By doing it like you do it you end up passing two times -e which is ultimately wrong.

 

It should say: NVIDIA_VISIBLE_DEVICES=<GPUUUID>

I also don't recommend using all because this causes often times issues, decide which GPU you want to use and replace <GPUUUID> with the UUID.

 

The capabilities are also missing, are you sure you've followed the second post in this thread? There is a step by step tutorial for Plex there:

image.thumb.png.4fd019f9e2b651ad664e70e8166f0d8d.pngHey Ich777, thanks for replying. I did have my old configuration as you said when I was using the plex-media-server docker. However, the linuxserver docker instructions are a little different in setting up nvidia passthrough.

However, I did try your recommendations anyway and they didn't work unfortunately.

Link to comment
11 minutes ago, BrockTrainedAsh said:

However, the linuxserver docker instructions are a little different in setting up nvidia passthrough.

actually they are the same, just different approach.

 

this instruction is from cli docker run ... -e NVID.... =all ... while in Unraid you used the GUI, which then are 3 Parts to build this line

 

by adding this via Unraid GUI

 

image.png.afafebb74ca980d6269bda2045c43153.png#

 

this is then "building" the command you see in the lsio instruction, thats the only difference ...

 

you would see this also when you check the docker run command (after building) that there are in your setup "double -e -e .." and so on ...

 

in the end, if its not working while using the instructions (either way) may look at your logs and ask in the lsio plex thread what could be the reason.

  • Like 1
Link to comment
15 minutes ago, alturismo said:

actually they are the same, just different approach.

 

this instruction is from cli docker run ... -e NVID.... =all ... while in Unraid you used the GUI, which then are 3 Parts to build this line

 

by adding this via Unraid GUI

 

image.png.afafebb74ca980d6269bda2045c43153.png#

 

this is then "building" the command you see in the lsio instruction, thats the only difference ...

 

you would see this also when you check the docker run command (after building) that there are in your setup "double -e -e .." and so on ...

 

in the end, if its not working while using the instructions (either way) may look at your logs and ask in the lsio plex thread what could be the reason.

That was it! That worked! I was just too stupid not to realize the GUI was changing the command. THANK YOU!

  • Like 1
Link to comment
On 10/27/2023 at 5:16 PM, ich777 said:

The Diagnostics please.

here is a Diagnostics dump.

pdlvcunraid01-diagnostics-20231029-0610.zip

 

I only have 2 750w power supplies in my 730xd so not sure if that could be my prob or not.  I read in the manual with the GPUs you needed 1100w.  Just wondering if others who have a r730xd with a quadro card working are using the higher wattage power supply.

Edited by vcadm
Link to comment
7 hours ago, vcadm said:

I only have 2 750w power supplies in my 730xd so not sure if that could be my prob or not.  I read in the manual with the GPUs you needed 1100w.  Just wondering if others who have a r730xd with a quadro card working are using the higher wattage power supply.

I don't even see a Nvidia GPU listed in your Diagnostics, are you sure that the Motherboard can provide enough power?

 

So to speak the card isn't even recognized because that is also strange because if you have too less power the card would be recognized but would complain that it dropped from the bus or similar because of too less power.

 

Can you maybe double check if the card is working in a second system?

Link to comment
8 hours ago, ich777 said:

I don't even see a Nvidia GPU listed in your Diagnostics, are you sure that the Motherboard can provide enough power?

 

So to speak the card isn't even recognized because that is also strange because if you have too less power the card would be recognized but would complain that it dropped from the bus or similar because of too less power.

 

Can you maybe double check if the card is working in a second system?

Ya I realize the card isn't being recognized.  The fan comes on and all it's just like it's not there.  That is why I was curious if other users with a dell r730xd had the 750w or the 1100w power supply.  Reading the Dell Manual it states that for GPUs you need the 1100w power supply.  So, not sure if they just don't allow it to be enabled if you don't use the higher end PSU.  Not sure how they would regulate that and all.  I will see if I can put it into another system and then see what it does.  I did end up ordering the 1100w power supplies, should be here Tuesday.

Link to comment
6 hours ago, vcadm said:

Not sure how they would regulate that and all.

Dell and HP Server hardware is notorious for hardware locks and usually they allow vendor specific hardware in their servers.

 

6 hours ago, vcadm said:

I will see if I can put it into another system and then see what it does.

I would strongly recommend doing that first, this would not be the first P2xxx series card which I've seen not working at all and is basically dead (I don't wish for you that it's dead)...

These cards are pretty "old" now.

Link to comment
2 hours ago, ich777 said:

Dell and HP Server hardware is notorious for hardware locks and usually they allow vendor specific hardware in their servers.

 

I would strongly recommend doing that first, this would not be the first P2xxx series card which I've seen not working at all and is basically dead (I don't wish for you that it's dead)...

These cards are pretty "old" now.

What would be something equivalent that I could go with in order to be able to use it for Plex Transcoding?  This server is setup in our church and I got rid of all DVD players like 2-3 years ago.  Been running a vmWare stack for the last couple years and looking to kill that expense and go down to something a little less power hungry and more of an all in one.  

Link to comment
5 minutes ago, vcadm said:

What would be something equivalent that I could go with in order to be able to use it for Plex Transcoding?

What requirements do you have?

How many streams?

How many drives are connected?

 

I would say if you only do transcoding from a maximum of 2 simultaneous 4K streams or maybe 4 simultaneous 1080p streams a Intel iGPU is also enough (I would recommend that you don't go lower than 10th gen).

 

If you only have 1x 4K stream you can even go as low as something like a Asrock N100 motherboard, this is basically a 4 core Intel Mobile GPU.

 

Maybe create a post in the Hardware subfroums and post what you need/want and do with the server there, this is a bit out of scope for this thread.

 

You can also take a look at my server over here (it's in German but you can use Google translat if you want to read), I don't have the addon cards anymore (only 2x 6 port ASM1166 SATA cards) installed, 7 spinning disks, 4 SATA SSDs and 2 NVME drives -> In this configuration (with a lot of services/containers running in the background) it draws around 50W of power with the HDDs in spindown.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.