[Plugin] Nvidia-Driver


ich777

Recommended Posts

15 minutes ago, Rayce185 said:

What happened is that the local GUI now doesn't work anymore, but the GPU recognition is the same.

What? That should not happen.

Please keep in mind that I only wrote the plugin for easy installation and the driver is compiled and downloaded by/from limetech itself.

 

Can you give me the syslog output?

Can it be that your USB drive is going to fail soon?

 

EDIT: Please post eventually your diagnostigs here Tools->Diagnostics->Download and upload the zip file here.

Link to comment
1 hour ago, ich777 said:

What? That should not happen.

Please keep in mind that I only wrote the plugin for easy installation and the driver is compiled and downloaded by/from limetech itself.

 

Can you give me the syslog output?

Can it be that your USB drive is going to fail soon?

 

EDIT: Please post eventually your diagnostigs here Tools->Diagnostics->Download and upload the zip file here.

syslog is by PM, anonymized diagnostics are here: LINK REMOVED

  • Thanks 1
Link to comment

Hi, getting lately:

Feb 14 00:54:56 Magatzem2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

and all unraid system get blocked.

Seems to happens once a hour or similar.


Specs:

M/B: ASRock X570 Creator Version - s/n: M86-xxxxxx

BIOS: American Megatrends Inc. Version P3.13. Dated: 11/05/2020

CPU: AMD Ryzen 7 3700X 8-Core @ 3600 MHz

HVM: Enabled

IOMMU: Enabled

Cache: 512 KiB, 4 MB, 32 MB

Memory: 64 GiB DDR4 (max. installable capacity 128 GiB)

Network: bond0: fault-tolerance (active-backup), mtu 1500
 eth0: 10000 Mbps, full duplex, mtu 1500
 eth1: 1000 Mbps, full duplex, mtu 1500

Kernel: Linux 5.10.1-Unraid x86_64

OpenSSL: 1.1.1h

 

nVidia Info:

Nvidia Driver Version:455.45.01

Installed GPU(s):0:
GeForce GTX 970
32:00.0
GPU-b542cc0e-xxxxxxxxxxxxxxxxx

 

After some minutes the system recorvers... after this kernel errors and CPU at 100%.
 

Happens when playing from browser with hardware accelerated Plex docker with GPU bypass with nvidia plugin.

P.D: I'm starting to think this is a consequence not the root cause of my server problems. I will investigate further.
The problems seems more poiting to another docker which cause the system lock. Also I'm using a usb 3.0 stick with the 6.9rc2, probably not the best of the combinations.

Edited by Kanashii
Link to comment
5 hours ago, Kanashii said:

Feb 14 00:54:56 Magatzem2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

This message should be harmless...

 

5 hours ago, Kanashii said:

I'm starting to think this is a consequence not the root cause of my server problems. I will investigate further.

Which container are you thinking is causing the issue?

 

Please also check if you are on the latest BIOS version or if there is a newer BIOS version available.

 

5 hours ago, Kanashii said:

Also I'm using a usb 3.0 stick with the 6.9rc2, probably not the best of the combinations.

I would recommend using a USB2.0 USB Boot device since they run normaly cooler and you only need them normaly at the system boot.

SLC USB Keys are my recommendation but the are not cheap. I paid for my Transcend JetFlash 170 1GB about 30,- Euros I think.

  • Like 1
Link to comment
Feb 20 14:14:08 Magatzem2 kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs
Feb 20 14:14:09 Magatzem2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Feb 20 14:14:09 Magatzem2 kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs
Feb 20 14:14:10 Magatzem2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Feb 20 14:14:10 Magatzem2 kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs
Feb 20 14:14:11 Magatzem2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Feb 20 14:14:11 Magatzem2 kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs
Feb 20 14:14:12 Magatzem2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Feb 20 14:14:12 Magatzem2 kernel: caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs
Feb 20 14:14:13 Magatzem2 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

 

I still getting lots of these messages.


I have jdownloader docker which uses a lot of resources when it uses "chrome" to capture captchas.  it sometimes can get stuck and raise the CPU to 100%. 

 

Which bios are you talking about? I'm on unraid 6rc3 and nvidia pluguin shows that is the last one avaliable.

Anyway, thanks for this plugin, it's really handy.

Edited by Kanashii
Link to comment
4 hours ago, Kanashii said:

I still getting lots of these messages.

These are just normal messagesn...

 

4 hours ago, Kanashii said:

I have jdownloader docker which uses a lot of resources when it uses "chrome" to capture captchas.  it sometimes can get stuck and raise the CPU to 100%.

So you think that this container is the issue?

 

4 hours ago, Kanashii said:

Which bios are you talking about?

The motherboard BIOS itself, since I got also problems with my motherboard shortly (upgraded to new hardware) and noticed that there was a newer BIOS for my motherboard available, after installing it everything works now smothly. ;)

 

Are you booting Unraid with UEFI? If so try to switch to Legacy Boot or CSM and see if the errors go away (this was discussed in this thread earlier and this was the solution to get rid of the warnings.

 

EDIT: Btw, you can also build your own images with the Nvidia drivers built in, here is the thread to my Unraid-Kernel-Helper:

 

Link to comment

Hello, 

 

SInce a couple of days my GPU will stop showing in the plugin and with nvidia-smi and i am seeing these errors in the logs. It will come back, and then dissapears again.

NVRM: GPU 0000:08:00.0: Failed to copy vbios to system memory.
NVRM: GPU 0000:08:00.0: RmInitAdapter failed! (0x30:0xffff:802)
NVRM: GPU 0000:08:00.0: rm_init_adapter failed, device minor number 0

 

It has worked fine since then for about 8 months.  I am not able to find to much information about these messages. But sounds like it could be a driver issue. 

Any hints to where to start troubleshooting?

 

Thanks in advanced


edit: When it is available and i stress test it, it looks to work fine. Do not think that it is a hardware failure. But have not tested yet.

tower-diagnostics-20210221-1423.zip

nvidia-bug-report.log.gz

Edited by Glasti
Link to comment
5 hours ago, Glasti said:

It has worked fine since then for about 8 months.  I am not able to find to much information about these messages. But sounds like it could be a driver issue. 

Have you changed anything lately to the config of your system or do you installed any new hardware?

Have you installed a VM or a new container or bound a device to VFIO?

 

What kind of server do you have, Custom one or a prebuilt?

Link to comment
19 hours ago, ich777 said:

Have you changed anything lately to the config of your system or do you installed any new hardware?

Have you installed a VM or a new container or bound a device to VFIO?

 

What kind of server do you have, Custom one or a prebuilt?

Thank you for your reply. I should have given a bit more details

It is a custom build machine. 
- ROG Strix B450-F Gaming board
- Ryzen 3700x

I have not changed anything hardware wise in the last few months, beside replacing some HDD's in december. 

There are no VM's running or any devices bound to VFIO. Also, plex is the only container using the GPU.

What i did realize, and forogot to mention here. I recently didnt unplug the HDMI cable from the GPU, but unplugged it from the monitor. I have since removed it. 
The GPU dissapeared once after, but it has been available now for the last like 20 hours.

Pretty sure the problems started after i left the cable plugged in, feels like that was causing the issue..

 

  • Like 1
Link to comment

I'm feeling a little silly because I think I just realized what my issue is with this patch and my hardware setup.

I believe the issue I have is not with the patch but with my hardware setup. Please review and comment if anyone can?

 

I have a DELL R720 with a gtx 1650 turbo and two 750w power supplies.

My card is recognized and it starts to transcode but quickly fails and kills the processes. I didn't even clue in that I only had 750w power supplies or that it wouldn't be enough until I was physically changing my setup for something else. 

 

Does it seem plausible that my ps is too weak and when the GPU starts to draw anything beyond idle it fails? I've ordered 2 1100w power supplies (wont arrive until mid march!) but I like to know if anyone can comment... even if its "you idiot of course you need a larger ps when you add a gpu!"

 

Link to comment
34 minutes ago, bellyup said:

Does it seem plausible that my ps is too weak and when the GPU starts to draw anything beyond idle it fails?

This could be the case, but likely only if the power supply because of age has lost its ability to deliver sufficient power.  According to Nvidia a GTX 1650 requires about 100 watts and the recommended minimum power supply is only 350 Watts.

 

I read on a benchmarking site that a GTX 1650 under a constant Furmark load may draw up to 200 Watts.  On the surface it would appear that a 750 W power supply should be more than sufficient.

 

You may want to enter your server specs (CPU, motherboard, drives, GPU, etc.) into a PSU calculator to see what it recommends.

 

I have had a situation where failing power supplies did cause my PC to lockup and turn off when the GPU was under any kind of load but that was not because the PSU was underpowered (form a specs standpoint); it was just failing.

 

I don't think your PSU is underpowered, but, perhaps it is failing.  I assume the two 750 PSUs are for redundancy.  Try hooking up the redundant PSU removing the now main PSU to see if the problem is repeatable.

  • Thanks 1
Link to comment
1 hour ago, Hoopster said:

This could be the case, but likely only if the power supply because of age has lost its ability to deliver sufficient power.  According to Nvidia a GTX 1650 requires about 100 watts and the recommended minimum power supply is only 350 Watts.

 

I read on a benchmarking site that a GTX 1650 under a constant Furmark load may draw up to 200 Watts.  On the surface it would appear that a 750 W power supply should be more than sufficient.

 

You may want to enter your server specs (CPU, motherboard, drives, GPU, etc.) into a PSU calculator to see what it recommends.

 

I have had a situation where failing power supplies did cause my PC to lockup and turn off when the GPU was under any kind of load but that was not because the PSU was underpowered (form a specs standpoint); it was just failing.

 

I don't think your PSU is underpowered, but, perhaps it is failing.  I assume the two 750 PSUs are for redundancy.  Try hooking up the redundant PSU removing the now main PSU to see if the problem is repeatable.

 

That PSU calculator is pretty cool! Thank you so much for your response and help.

With my basic items in the calculator it does recommend I use ~600w so 750w should be fine. I've already ordered the 1100w so I will install them anyway and see... in march. Shipping to canada is never great.

Quote

Load Wattage: 543 W
Recommended UPS rating: 1000 VA
Recommended PSU Wattage: 593 W

 

Link to comment
7 hours ago, bellyup said:

My card is recognized and it starts to transcode but quickly fails and kills the processes.

You are using Plex or am I wrong?

If you are on Plex please try to load up a native client such as Plex for iOS or Plex for Android and try to transcode a file.

 

I have now a few reports that if you try to transcode through the webclient that it wouldn't work to switch the quality to something else than it was when the playback was started.

 

If you can also reproduce this behaviour then it would be nice if you can make a post on the Plex forums, a user here in this thread already opened up a Ticket:

 

  • Like 1
Link to comment

Thats exactly what I'm doing. You're brilliant!

I was using the web client (multiple times on one pc) to run multiple streams for testing transcoding and I was forcing the quality change.

I wanted to test the not only that it was transcoding but that it was running more than 2 concurrent sessions. I just got to work so I can't test until much later tonight but thank you so much.

 

EDIT: I would like to confirm that was my problem and my transcoding was been fine the whole time. The way I was testing it was wrong. LOL? More sad than funny but thank you!

Edited by bellyup
Solution
  • Like 1
Link to comment

Firstly, great plugin/work!

 

Unfortunately I'm having some issues getting plex transcoding to work on my GT710. The very first time I installed the plugin and set everything up it worked fine until my next system restart. Since then every combo of troubleshooting steps I've tried failed to get it to ever work again.

 

I'm running binhex-plexpass with a lifetime plex pass. Rest of the info is in the screenshots, I've tried to include any/everything I've seen in the other posts in this thread that seems to be of help. Sorry if I've overdone it :D

 

It should be noted I have no errors in the systemlog or the container log for plex. None whatsoever.

 

EDIT: For clarity I'm doing the PCI override because I have a dual intel nic for a dedicated pfSense VM, so it's not really optional.

 

2021-02-25 10_33_12-Window.png

2021-02-25 10_31_34-Window.png

2021-02-25 10_31_08-Window.png

2021-02-25 10_29_08-Window.png

2021-02-25 10_29_01-Window.png

2021-02-25 10_28_30-Window.png

2021-02-25 10_27_39-Window.png

Edited by DaveDoesStuff
Link to comment
44 minutes ago, DaveDoesStuff said:

Unfortunately I'm having some issues getting plex transcoding to work on my GT710

First of all the GT710 isn't capable of transcoding h265 so I think that's the main problem here and from what I've seen this is Direct Play anyways, you have to force a lower quality to initiate the transcoding.

 

For a full overview of what you card is capable/can transcode look here: Click

Link to comment
19 minutes ago, ich777 said:

First of all the GT710 isn't capable of transcoding h265 so I think that's the main problem here and from what I've seen this is Direct Play anyways, you have to force a lower quality to initiate the transcoding.

 

For a full overview of what you card is/can be capable look here: Click

That's actually a really cool website, bookmarked.

 

It never occured to me that the media, coupled with lack of 265 support was the problem...but it does make total sense thanks for the steer.

 

I'll try a different format and lower the quality and report back!

EDIT: So the HW transcoding kicks in when playing this second show and transcoding to h.264...but it seems to be using CPU not GPU...or am I misreading this?

Hmm, does the source also have to be h.264 or could a GT710 transcode HVEC Main 10 to h.264 at all? I clearly need to improve my knowledge in this area :P 

2021-02-25 12_27_13-Window.png

2021-02-25 12_25_42-Window.png

Edited by DaveDoesStuff
Link to comment
17 minutes ago, DaveDoesStuff said:

EDIT: So the HW transcoding kicks in when playing this second show and transcoding to h.264...but it seems to be using CPU not GPU...or am I misreading this?

No it uses both, the GPU does the encoding of the file to h264 and the CPU is used (because it can't decode the h265) for the decoding the source file.

 

Try a file that is for example 1080p h264 and transcode it to 720p h264.

 

If you want something that is decent for transcoding try to get a used GTX1050 or GTX1050Ti

Link to comment

So I was wondering if I could get some help. I am not extremely versed in all this yet. But I Previously had UnRaid 6.8.3 Running with the previous Nvidia driver support. I am running an Nvidia Quadro P2000. As I stated everything worked awesome with plex. It would transcode everything I threw at it. 

 

Last week I decided it was time for an upgrade and I built a new system installed my P2000 into that new system and installed a fresh version of UnRaid 6.9.0-rc2. I went ahead and installed the new version of the Nvidia Driver as well. I got all my dockers installed and configured. I was initially able to see the card and get transcoding working. I was a happy camper. Then the card started disappearing randomly not even under load. I have to reboot the server every time in order to get the card back. So I am not sure what to do or what to check here. If you all can provide some guidance I can provide whatever necessary logs you may need. I am hoping I can get this fix and not have to revert back. 


I noticed that the Nvidia Driver was no longer available unless I upgraded to 6.9

 

image.png.60bba3de1b0f7e20cf2b1f98e897e840.png

 

image.thumb.png.95521cb2b3f19002f4fee7c7cde82351.png

Edited by SiRMarlon
Link to comment
6 minutes ago, SiRMarlon said:

Last week I decided it was time for an upgrade and I built a new system installed my P2000 into that new system and installed a fresh version of UnRaid 6.9.0-rc2.

Can you give me the diagnostics after it starts to disappear?

 

Maybe something is wrong with the card but that is just a guess...

Have you bound the Card to VFIO or are you using it in a VM too?

Are you running Unraid on something like ESXi oder VMWare?

Have you installed any power tweaks to the system or something in the syslinux.cfg?

Have you changed anything to the system lately?

Are you booting UEFI or Legacy, if you are booting UEFI try to do a Legacy boot.

 

With what kind of hardware are you using the card (have you built the server on your own or is it a prebuilt one?).

The first thing I would recommend is to reseat the card in the slot or at least try another slot if one is availlable.

 

The last thing I can recommend is that you build your own images with the Nvidia driver builtin but I would do that after we troubleshooted this.

Link to comment
4 minutes ago, ich777 said:

Can you give me the diagnostics after it starts to disappear? Where can I pull that from? 

 

Maybe something is wrong with the card but that is just a guess... It worked flawless in my previous system

 

Have you bound the Card to VFIO or are you using it in a VM too? Don't use VMs

 

Are you running Unraid on something like ESXi oder VMWare? Don't use any Virtualization on this server

 

Have you installed any power tweaks to the system or something in the syslinux.cfg? Nope

 

Have you changed anything to the system lately? Brand new build, just installed docker apps. 

 

Are you booting UEFI or Legacy, if you are booting UEFI try to do a Legacy boot. UEFI

 

With what kind of hardware are you using the card (have you built the server on your own or is it a prebuilt one?). Brand new system I just built. It's a ASRock B450 Mobo, with an AMD Ryzen 9 3700x and 32GB of DDR4 3600mhz Memroy. 

 

The first thing I would recommend is to reseat the card in the slot or at least try another slot if one is availlable. I honestly tried that. It was one of the first things I did. The thing is my monitor doesn't wig out. I can see the login screen. 

 

The last thing I can recommend is that you build your own images with the Nvidia driver builtin but I would do that after we troubleshooted this. You lost me bud that is way beyond my scope of knowledge! lol We can talk networking and PC Building all day. But coding and building images is over my head. You guys are rockstars for all the work you do for this community! 

 

  • Thanks 1
Link to comment
5 minutes ago, SiRMarlon said:

Where can I pull that from? 

Tools -> Diagnostics -> Download

 

5 minutes ago, SiRMarlon said:

It worked flawless in my previous system

As I said only a guess, but I think you know sometimes such things can happen even if you are upgraded or changed something, but as said before only a guess and that don't have to be the case here.

 

7 minutes ago, SiRMarlon said:

UEFI

Please try to switch to Legacy since UEFI causes often trouble with Nvidia cards...

 

7 minutes ago, SiRMarlon said:

The thing is my monitor doesn't wig out. I can see the login screen.

Please try to remove the monitor cable from the card since another user reported that this caused troubles on his build.

 

8 minutes ago, SiRMarlon said:

You lost me bud that is way beyond my scope of knowledge!

This is really easy, I've created a Container for it that does everything for you but that would be the last thing that I would recommend. :)

I understand that this sounds a little bit too much but it's really easy, trust me, but as said this is the last thing I recommend to try.

 

As said above please try to boot with Legacy or CSM (however it is called in your BIOS ;) ) and remove the HDMI cable to your screen (I think it was THIS post).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.