Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

Thank you so much for your response!

 

I have a 750 watt power supply, which I believe should be able to handle a 5600x and a 3050. I have enabled both above 4G decoding and resizable BAR support in the BIOS.

 

I will attempt to update my BIOS and try booting into legacy mode.

Link to comment
5 hours ago, ahaseros said:

the error persists

if you checked the seat, power plugs, ... i would look for if the card is may broken.

 

did you try to look for the card if it works on another OS ?

 

sample, some live linux stick, boot and see what happens there ? if there comes a screen etc ... and running properly.

Link to comment
  • 2 weeks later...

Hey. I just installed a gtx 1070 into my PowerEdge T310 (a little bit dated lol) and Nvidia Driver says "No device found" on Latest and: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running" on Open Source Driver. The graphics card itself works and shows in the server devices, i even have a monitor plugged into it, but I can't get it to pass through to Plex. I tried adding "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" to the specified file but no bueno. Any help would be appreciated, I am ready to hurl my computer out a window.

 

lcl-server-syslog-20240727-0353.zip

Link to comment
2 hours ago, LIONGENZ9629 said:

Please post your Diagnostics, the syslog doesn't help much.

 

2 hours ago, LIONGENZ9629 said:

on Latest and: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.

Diagnostics would help a lot, pulled after a restart after installing the latest driver version (not the open source one <- the Open Source driver only works for Turing cards, RTX2xxx, and up)

 

BTW, don't forget to remove that:

2 hours ago, LIONGENZ9629 said:

I tried adding "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" to the specified file but no bueno.

  • Like 1
Link to comment
3 hours ago, LIONGENZ9629 said:

Attached is diag file.

as you see here, its a BAR issue 

 

Jul 27 15:59:55 LCL-SERVER kernel: nvidia: loading out-of-tree module taints kernel.
Jul 27 15:59:55 LCL-SERVER kernel: nvidia: module license 'NVIDIA' taints kernel.
Jul 27 15:59:55 LCL-SERVER kernel: Disabling lock debugging due to kernel taint
Jul 27 15:59:55 LCL-SERVER kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Jul 27 15:59:55 LCL-SERVER kernel: 
Jul 27 15:59:55 LCL-SERVER kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
Jul 27 15:59:55 LCL-SERVER kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Jul 27 15:59:55 LCL-SERVER kernel: NVRM: BAR3 is 0M @ 0x0 (PCI:0000:04:00.0)
Jul 27 15:59:55 LCL-SERVER kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Jul 27 15:59:55 LCL-SERVER kernel: NVRM: BAR4 is 0M @ 0x0 (PCI:0000:04:00.0)
Jul 27 15:59:55 LCL-SERVER kernel: nvidia 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Jul 27 15:59:55 LCL-SERVER kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  560.28.03  Thu Jul 18 19:32:18 UTC 2024
Jul 27 15:59:55 LCL-SERVER kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  560.28.03  Thu Jul 18 20:27:27 UTC 2024
Jul 27 15:59:55 LCL-SERVER kernel: [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
Jul 27 15:59:55 LCL-SERVER kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:04:00.0 on minor 0

 

sometimes also happens when power supply is not sufficient ...

 

may some trys

 

1/ check if the card is properly inserted

2/ check psu wiring (and if its enough)

3/ check if the GPU is actually working, may boot with some Live Linux if you get some display output

4/ may try booting Unraid in EFI / Legacy (vice vers, depneding on boot setup now)

5/ check BIOS for primary GPU / multi monitor settings

...

but in the end, if above 4g / rbar is not working ... you may are ... its a "aged" platform ;)

 

last try would be, add this to your syslinux

 

pci=realloc

 

to force reallocation of PCI ressources ... but always could end in a crashing, instable sys, if so, remove this again.

Link to comment

I'm actually running 2 power supplies, a 600w (with the on switch shorted to on) for the card, and one 300w stock PSU for the rest of the system and the drives. Could there be some mismatch between the two causing this? The PSU powering the card is always on and doesn't turn on or off with the system.

 

It launches unraid just fine with the monitor being plugged into the gpu, but I will try with another OS.

 

UEFI causes a hang and a failed launch, a known issue with my system and unraid with seemingly no fix. Legacy is therefore the only option.

 

I have onboard graphics as well, with the same issue happening with onboard video disabled and while enabled.

 

The BIOS really only gives options for Date, Time, SATA Controller, and some other little various options but none including 4g or rbar, it is not the most modern system. It sure doesn't quit though!

 

I already tried the "pci=realloc" command and no luck, but will try again and maybe it will work!

 

I'll check to see if I got a dead GPU and haven't forgotten to plug something in properly, but I may just have to give up the whole "transcoding with a system that's now 12 years old" idea.

I will update with results.

  • Like 1
Link to comment
12 hours ago, LIONGENZ9629 said:

It launches unraid just fine with the monitor being plugged into the gpu, but I will try with another OS.

?

 

12 hours ago, LIONGENZ9629 said:

The BIOS really only gives options for Date, Time, SATA Controller, and some other little various options but none including 4g or rbar, it is not the most modern system. It sure doesn't quit though!

Please double check that if you are on the latest BIOS version.

 

Since this is a Dell Motherboard everything is named differently, search for something Support Large Address Space in your PCI section form your BIOS and in general search the PCI submenu (if it is even called like that).

 

12 hours ago, LIONGENZ9629 said:

I'll check to see if I got a dead GPU and haven't forgotten to plug something in properly, but I may just have to give up the whole "transcoding with a system that's now 12 years old" idea.

In general I think this is a firmware (BIOS) issue so you might have a option to enable that would solve this issue, otherwise you are sadly out of luck.

Link to comment

this might be a probjem of age but where can I check what my GPU is capable of transcoding? I have a 2Gb geforce 710 installed but i cant get it to transcode anything on the machine when streaming. I also have a 1080i to the side i could install too (but haven't yet). before i do though i wanted to check what are they capable of transcoding i.e. are they usable for 1080/2160 type quality or what are they actually capable of to make the (hw) appear on plex for me to know its working. at the moment (hw) does not appear for me despite plex being able to see the GPU in the transcoder settings

Link to comment
1 hour ago, alcxander said:

this might be a probjem of age but where can I check what my GPU is capable of transcoding?

Here (but you won't find your GPU there).

 

1 hour ago, alcxander said:

I have a 2Gb geforce 710 installed but i cant get it to transcode anything on the machine when streaming.

Because this GPU is nobody should buy, it is based on Kepler which was released 2010 and is really outdated.

The GT710 is barely able to transcode h264...

 

Something like a Nvidia T400 (or even T600, T800 or T1000) will do the job just fine.

The T400 is really low power (maximum 30 Watts) and is able to transcode 4 x 4K streams simultaneous (depending on the bitrate and so on).

 

You could of course also install your GTX1080Ti but that is a bit overkill for transcoding only in my opinion.

  • Like 1
Link to comment
1 hour ago, ich777 said:

Here (but you won't find your GPU there).

 

Because this GPU is nobody should buy, it is based on Kepler which was released 2010 and is really outdated.

The GT710 is barely able to transcode h264...

 

Something like a Nvidia T400 (or even T600, T800 or T1000) will do the job just fine.

The T400 is really low power (maximum 30 Watts) and is able to transcode 4 x 4K streams simultaneous (depending on the bitrate and so on).

 

You could of course also install your GTX1080Ti but that is a bit overkill for transcoding only in my opinion.

thanks! thats very helpful. I only got the 710 because it was a hand off. I could get a T series they all look very approachable. I have a 1080Ti because its a spare so i may go with it anyway. but thank you for the above recommendations. 

Link to comment

Hi all,

 

Gfx card: Nvidia GTX 1650

Driver: v560.28.03

Unraid: 7.0.0 beta 2

 

Hoping to get some help with a driver issues and im at my wits end with it. After a random amount of time (could be a few hours or days) I get warnings in my syslog about GPU errors starring with:

 

NVRM: Xid (PCI:0000:02:00): 119, pid=1185, name=nv_open_q, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 103 (GSP_RM_ALLOC) (0x80 0x38).

 

and

 

NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:02:00 (printing 1 of every 30).  The GPU likely needs to be reset.

 

This eventually results in a total system crash requiring a system reboot. I have had this card for a year or so and everything has been fine up until now.

 

Appreciate any help anyone is able to provide.

 

Link to comment
4 hours ago, Thundermonk said:

Sorry, hopefully thats everything.

It seems that you BIOS has issues assigning address space for the GPU:

Jul 29 09:32:49 NAS kernel: resource: resource sanity check: requesting [mem 0x00000000000e0000-0x00000000000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000e7fff window]
Jul 29 09:32:49 NAS kernel: caller _nv043843rm+0x35/0x70 [nvidia] mapping multiple BARs

 

However since this is a GTX 1650 I would strongly recommend that you switch from Legacy boot to UEFI boot, please also make sure that you've enabled above 4G Decoding and if available Resizable BAR support, but I don't think that your BIOS has the last option in it.

 

You can also take a look here what the XID error means (in your case 119).

 

Did this happen with older drivers or did you just install the card into your server?

Please also check if your system memory is okay and maybe do a memtest.

  • Like 1
Link to comment
4 hours ago, ich777 said:

It seems that you BIOS has issues assigning address space for the GPU:

Jul 29 09:32:49 NAS kernel: resource: resource sanity check: requesting [mem 0x00000000000e0000-0x00000000000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000e7fff window]
Jul 29 09:32:49 NAS kernel: caller _nv043843rm+0x35/0x70 [nvidia] mapping multiple BARs

 

However since this is a GTX 1650 I would strongly recommend that you switch from Legacy boot to UEFI boot, please also make sure that you've enabled above 4G Decoding and if available Resizable BAR support, but I don't think that your BIOS has the last option in it.

 

You can also take a look here what the XID error means (in your case 119).

 

Did this happen with older drivers or did you just install the card into your server?

Please also check if your system memory is okay and maybe do a memtest.

 

 

Thanks again for looking in to this for me.

 

From what I can tell it was using UEFI boot. 4G decoding was enabled but no resizeable BAR support (as you hinted).

 

Memtest was fine, passed with no errors.

 

With regards to the Drivers, I am unsure what version of the drivers I was using when everything was working fine. It was not something I even looked at until I started having problems. It had been working for fine for a year or so until now with no changes to the system or hardware. I could try rolling back through the drivers to see if that helps.

 

Although it sounds like it might be time to retire the motherboard (after I confirm the gfx card works in another machine ok), she has served me well over the years :)

 

Thanks again for your help, very much appreciated.

Link to comment
16 minutes ago, Thundermonk said:

From what I can tell it was using UEFI boot.

You are using legacy:
grafik.png.26220e4f73e4597fdf0b5ccf7e3e9069.png

otherwise this folder would be named EFI (without - at the end).

 

Maybe try to switch to UEFI and see if that helps (click on the blue text Flash on the Main page, at the bottom click Permit UEFI boot and click Apply).

After that you should be able to boot with UEFI.

Link to comment
1 hour ago, ich777 said:

You are using legacy:
grafik.png.26220e4f73e4597fdf0b5ccf7e3e9069.png

otherwise this folder would be named EFI (without - at the end).

 

Maybe try to switch to UEFI and see if that helps (click on the blue text Flash on the Main page, at the bottom click Permit UEFI boot and click Apply).

After that you should be able to boot with UEFI.

Not sure if you are looking at a different motherboard bios?

 

The only refrence I could find to UEFI was in the images attached, changing it to "UEFI only" causes it to boot in to the bios constantly. Could that folder be a handover from a previous system I used to run? The seccond image to me suggests its booting in UEFI?

bios.JPG

BIOS2.JPG

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...