[Plugin] Nvidia-Driver


ich777

Recommended Posts

14 hours ago, ich777 said:

You have a lot of:

Oct 20 18:47:00 Tower kernel: NVRM: GPU 0000:00:10.0: RmInitAdapter failed! (0x25:0x65:1411)
Oct 20 18:47:00 Tower kernel: NVRM: GPU 0000:00:10.0: rm_init_adapter failed, device minor number 0

 

in your Syslog, is this the first time you are trying to install the Nvidia Driver?

 

Please make sure that you've disabled C-States in your BIOS and that you enable Above 4G-Decoding and Resizable BAR Support if you have these options in your BIOS.

Thanks, those BIOS settings worked! My GPU is now being read by the Nvidia Driver:

image.thumb.png.b11262dac5645351a9464a87534c91a4.png

However when having transcoding enabled on my Plex, video fails to play with an Error code: s1003. The docker container cannot be stopped via GUI (pictured below) or killed via terminal. I had to force stop my Unraid VM.

image.thumb.png.3dd435166401eda8e0b4cbfdda787da8.png

With all the part 2 configuration but transcoding disabled within the Plex app the videos play fine, it's only when I enable the acceleration settings:image.thumb.png.47cb7e5641657650a537fe014fa23542.png

Here is my part 2 configuration:

image.thumb.png.4a18da9d5ee2f9759458cf2f001fdc94.png

 

Attached is my diagnostics. Any idea?

tower-diagnostics-20221021-1456.zip

 

 

Edited by AndrewClaus
Link to comment
6 minutes ago, AndrewClaus said:

The docker container cannot be stopped via GUI or killed via terminal. I had to force stop my Unraid VM.

If you are virtualizing Unraid I really can‘t help because I use it on bare metal and one wrong setting on the Host can prevent the card from working when virtualizing it.

 

Maybe ask on the Proxnox forums or over in the Virtualisazion Sub Forums like @trurl pointed already out.

Link to comment
21 minutes ago, ich777 said:

Do you use one card for a VM and transcoding? If yes, don‘t do that!

 

Or are you virtualizing Unraid?

If you are virtualizing Unraid I really can‘t help because I use it on bare metal and one wrong setting on the Host can prevent the card from working.

I'm running Unraid as a VM within Proxmox and passing the GPU through to Unraid.

 

I'll revisit the passthrough instructions again and ask elsewhere. Worst case I run Unraid bare metal too, it supports VMs too but I need to compare it's limitations. Thanks

Edited by AndrewClaus
Link to comment

Hello,

 

Since I upgraded to 6.11 I have been getting an issue where one of my GPU's disappear from the Nvidia DIag plugin.  I have found in the logs this:
Oct 24 10:04:27 Tower kernel: NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x23:0xffff:1365)
Oct 24 10:04:27 Tower kernel: NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
Oct 24 10:04:27 Tower kernel: NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x23:0xffff:1365)
Oct 24 10:04:27 Tower kernel: NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
Oct 24 10:04:28 Tower kernel: NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x23:0xffff:1365)
Oct 24 10:04:28 Tower kernel: NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0
Oct 24 10:04:28 Tower kernel: NVRM: GPU 0000:04:00.0: RmInitAdapter failed! (0x23:0xffff:1365)
Oct 24 10:04:28 Tower kernel: NVRM: GPU 0000:04:00.0: rm_init_adapter failed, device minor number 0

 

Any help would be appreciated.

 

Edited by JC2020
Link to comment
56 minutes ago, JC2020 said:

Seems like you have an issue with your mining container:

Oct 24 02:01:32 Tower kernel: miner1[3117]: segfault at 21 ip 0000000000467ce4 sp 0000147641dfd740 error 4 in t-rex[400000+19fa000]
Oct 24 02:01:32 Tower kernel: Code: df e8 30 7b 4f 00 48 83 7b 58 00 0f 84 8d 00 00 00 48 8b 43 48 0f b6 50 21 31 c0 84 d2 74 54 48 83 c4 18 5b 5d c3 48 8b 43 48 <0f> b6 40 21 84 c0 0f 94 c0 48 83 c4 18 5b 5d c3 0f 1f 40 00 48 8b

 

Are you sure your card is working properly, was this card working before?

Link to comment
5 hours ago, JC2020 said:

Is there anyway other than a reboot to try and restart the GPU?

Yes and no, it is rather complicated when the card falls from the bus and how hard it crashed but is not always guaranteed that it is working.

 

Please also make sure that you enable, disable the options that I've mentioned already in the BIOS.

Link to comment
3 hours ago, PneuMatix said:

Hello, I'm new to UNRAID and am currently setting up JellyFin with 1650 super. I downloaded the Nvidia Driver, but it appear "NVIDIA-SMI failed due to a communication failure with the NVIDIA driver. Check that the most recent NVIDIA driver is installed and running." Tried going through step by step a few times and searching but through the issue but maybe I'm missing something.

I've confirmed that the card works in another machine and that it appears in my server's hardware list.

I tried different versions of the graphic card driver, but the message remained the same.

Any help would be appreciated.

tower-diagnostics.zip 59.03 kB · 3 downloads

 

https://unraid.net/buy-genuine-license

  • Like 1
Link to comment

Can't start any dockers with --runtime=nvidia

they all say "Bad Parameter"

 

any advice on what to look at?

docker run
  -d
  --name='frigate'
  --net='custom'
  --privileged=true
  -e TZ="America/New_York"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="##"
  -e HOST_CONTAINERNAME="frigate"
  -e 'FRIGATE_RTSP_PASSWORD'='##'
  -e 'NVIDIA_VISIBLE_DEVICES'='GPU-20e89059-##'
  -e 'NVIDIA_DRIVER_CAPABILITIES'='all'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:5000]'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/yayitazale/unraid-templates/main/frigate.png'
  -p '5000:5000/tcp'
  -p '1935:1935/tcp'
  -v '/mnt/cachessd/appdata/frigate':'/config':'rw'
  -v '/mnt/user/Documents/Media/frigate':'/media/frigate':'rw'
  -v '/tmp/frigate':'/tmp/cache':'rw'
  -v '/etc/localtime':'/etc/localtime':'rw'
  --device='/dev/usb'
  --shm-size=5G
  --restart unless-stopped
  --runtime=nvidia 'blakeblackshear/frigate:stable-amd64'

d1ad2297da28b5f58cb0c0c08aca28f0f1cec9ffb0395a0f122eb1d2551bfb58
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown.

The command failed.

 

diagnostics-20221030-0832.zip

Edited by Findthelorax
Link to comment
7 hours ago, UBS said:

I have the same issue, and it started all of a sudden.

Do you run the script from SpaceInvaderOne?

Please remove that script, reboot and see if it's the same.

 

If that doesn't help, please go in the container template from a affected container change something and change it back so that you can press the Apply button and see if anything changes after pressing Apply.

  • Thanks 1
Link to comment
58 minutes ago, gamertaboo said:

Thank you for any help I really appreciate it usually I can figure this stuff out but this has me stumped.

Please make sure to enable Resizable BAR Support and Above 4G Decoding in your BIOS.

Also please try to boot with Legacy (CSM) Mode instead of UEFI.

 

Do you maybe did a BIOS update in between too? It seems like that something is set wrong in your BIOS:

Nov  2 18:30:57 Plex2 kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov  2 18:30:57 Plex2 kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:2d:00.0)
Nov  2 18:30:57 Plex2 kernel: nvidia: probe of 0000:2d:00.0 failed with error -1

 

Since you are using a AMD CPU please also try to disable C-States.

Link to comment
8 hours ago, ich777 said:

Please make sure to enable Resizable BAR Support and Above 4G Decoding in your BIOS.

Also please try to boot with Legacy (CSM) Mode instead of UEFI.

 

Do you maybe did a BIOS update in between too? It seems like that something is set wrong in your BIOS:

Nov  2 18:30:57 Plex2 kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov  2 18:30:57 Plex2 kernel: NVRM: BAR0 is 0M @ 0x0 (PCI:0000:2d:00.0)
Nov  2 18:30:57 Plex2 kernel: nvidia: probe of 0000:2d:00.0 failed with error -1

 

Since you are using a AMD CPU please also try to disable C-States.

Hello,

I cleared CMOS and set every setting you asked me to still no luck. Thanks

tower-diagnostics-20221103-0903.zip

Link to comment
1 hour ago, gamertaboo said:

I cleared CMOS and set every setting you asked me to still no luck. Thanks

Are you really sure that you have enabled Above 4G Decoding and Resizable BAR Support?

Please double check if you have any other options for BAR or something like 64bit support in the BIOS.

 

Please also try to disable to onbaord Aspeed AST graphics if possible to see if this makes any difference.

 

I'm pretty certain that a setting in the BIOS is wrong, please also update your BIOS since you are on version P1.20 where P1.50 is available.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.