[Plugin] Linuxserver.io - Unraid Nvidia


Recommended Posts

15 minutes ago, tmchow said:

 
I'm running unraid 6.7.2
 
watch nvidia-smi does show the transcoder working and video plays fine. Just the error is bothering me because i'm trying to watch for the malformed image error and when someone that doesn't know how to use direct play on my server comes on i get spammed with this error.


I had this error when I was mistakenly including a modprobe command in my go file which I think was exposing my quick sync cpu. I can’t remember which modprobe it was but as soon as I removed it and rebooted my plex logs no longer had the error.

..: or it could’ve been a coincidence and my reboot is the thing that solved it

well i have no rebooted in a while... may have to give that a go.

Link to comment

I restarted my Plex server yesterday and now several services such as Docker, VMS, APPS won't start, the system will not shutdown cleanly, etc.   See this thread where I have put more details here: 

 

This seems related to Unraid Nvidia 6.7.2 as if I back the system back to regular Unraid 6.7.2 everything seems fine but when I convert back to Unraid Nvidia 6.7.2 I get the odd behavior with no Docker, VMS, etc. and the system will not cleanly shutdown.  Can someone please provide some guidance on how to troubleshoot this issue?  I have been running Nvidia 6.7.2 since it was released without issues and multiple reboots.  I use the Nvidia version to passthrough a P2000 to my Plex docker.  My system is built on a Threadripper 2920x.  Any help you can offer would be very much appreciated.

 

My diagnostics file is attached in the thread linked above

 

Thank you

 

EDIT - As a test I also tried Unraid Nvidia 6.7.1 with the same experience.  I have a monitor on my console now and when the system is shutdown it keeps waiting for Docker to die.

 

Edited by Grimjack
Link to comment

More troubleshooting, I pulled my Nvidia P2000 card out (the one passed through to my Plex Docker) and my Nvidia 1050Ti passed to my Win10 VM and the system  booted normally and services started (Unraid Nvidia 6.7.1).  I shutdown (normally for a change) and reinstalled my P2000 and brought the system back up and docker and VM started normally.  I upgraded back to Nvidia 6.7.2 and restarted (normally again) and everything is back up and running.  I am going to leave the 1050Ti out of the system and test it in another PC I am building to see if there are any issues with that card.

 

Has anyone experienced anything similar to this with Nvidia Unraid? 

Link to comment
29 minutes ago, Grimjack said:

More troubleshooting, I pulled my Nvidia P2000 card out (the one passed through to my Plex Docker) and my Nvidia 1050Ti passed to my Win10 VM and the system  booted normally and services started (Unraid Nvidia 6.7.1).  I shutdown (normally for a change) and reinstalled my P2000 and brought the system back up and docker and VM started normally.  I upgraded back to Nvidia 6.7.2 and restarted (normally again) and everything is back up and running.  I am going to leave the 1050Ti out of the system and test it in another PC I am building to see if there are any issues with that card.

 

Has anyone experienced anything similar to this with Nvidia Unraid? 

Yes, very similar issues. I have an RTX 2080 passed to a Windows VM and a P2000 for transcodes. Even with hardware acceleration disabled and the NVIDIA variables removed from the Plex docker, I'm getting random crashes that require a hard reboot. Haven't had much time to troubleshoot, but I was planning to start by reverting to vanilla unRAID to see if that was the issue. I'll continue following your progress and will report updates as well. Thanks!

Link to comment
1 hour ago, JasonM said:

Yes, very similar issues. I have an RTX 2080 passed to a Windows VM and a P2000 for transcodes. Even with hardware acceleration disabled and the NVIDIA variables removed from the Plex docker, I'm getting random crashes that require a hard reboot. Haven't had much time to troubleshoot, but I was planning to start by reverting to vanilla unRAID to see if that was the issue. I'll continue following your progress and will report updates as well. Thanks!

 

Are you guys stubbing the card you pass through?

If you are not, do try to stub the card passed through.

The crashes might be caused by passing the card through when the nvidia driver controls the gpu.

Link to comment
2 minutes ago, saarg said:

 

Are you guys stubbing the card you pass through?

If you are not, do try to stub the card passed through.

The crashes might be caused by passing the card through when the nvidia driver controls the gpu.

I am not currently, but I'll try that tonight. If I wake up in the morning to a functioning unRAID UI, that may be the fix.

Link to comment
On 8/21/2019 at 5:16 PM, saarg said:

 

Are you guys stubbing the card you pass through?

If you are not, do try to stub the card passed through.

The crashes might be caused by passing the card through when the nvidia driver controls the gpu.

Regrettably, this did not correct the problem. At least not for me. I'm going to leave it stubbed since that makes sense anyway, but I'm still chasing random crashes. It seems to happen over night, so I suspect it is a scheduled task that's making things freak out. I'll post diags to the main forum and see what we get.

Link to comment

i have noticed recently that Plex is stating that it is using HW transcoding but when i look at my cpu usage and watch nvidia-smi there is nothing using the GPU

 

i have tried both Typing in the GPU and copying and pasting the UUID and nothing changes. image.png.95ad3d2ba4aba27e57b9d8ea74e74176.png

 

image.thumb.png.1939110897dce84fabb5481907a1a912.png

 

has anyone else ran into this? 

Link to comment

I have an old nvidia card which requires the 340.xx driver line for support, and the card *does* support nvenc/nvdec.

 

I'm able to compile and load the appropriate nvidia driver myself, but I'd also like to take advantage of the other modifications that have been done as part of the work for this plugin for docker compatibility, beyond simply loading the driver.

 

Where is the source for the additional changes made to the underlying unraid/docker system with build instructions to create these distributed packages?  A general outline is fine, I can figure it out from there.

Link to comment
21 hours ago, nick5429 said:

I have an old nvidia card which requires the 340.xx driver line for support, and the card *does* support nvenc/nvdec.

 

I'm able to compile and load the appropriate nvidia driver myself, but I'd also like to take advantage of the other modifications that have been done as part of the work for this plugin for docker compatibility, beyond simply loading the driver.

 

Where is the source for the additional changes made to the underlying unraid/docker system with build instructions to create these distributed packages?  A general outline is fine, I can figure it out from there.

I wonder if that's why my GTX 760 is detected by Emby and Plex but it never uses it anymore as of more recent builds.

 

I think there's something in this build https://www.nvidia.com/Download/driverResults.aspx/149785/en-us that may fix it but I'm still not 100% sure of what the main issue is.

 

I've been kinda just thinking I need to just get over it and buy a p2000 vs. bringing it up that I've been having a problem.

 

I don't know tho.

Edited by AnnabellaRenee87
Link to comment

It is stated that it’s recommended to not share the GPU to other VMs while transcoding but I wonder if another docker is to be considered as a “virtual machine”?

The reason Im wondering is that I would like to share my GPU on 2 dockers at the same time.

Link to comment

Hi, thanks for creating this awesome plugin.

 

This plugin is not working if I have 1 of the two GPUs installed in my system (one GTX 750 Ti and one RTX 2080) excluded for my VM with VFIO. 

 

Situation: 

GTX 750 ti (ideally to use for docker container, GPU transcoding etc.)

RTX 2080 (passthrough to VM, excluded via VFIO flag)

 

Error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.    

 

and from `dmesg`:

```

[   23.725090] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[   23.726094] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   23.726897] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   23.729240] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   23.732388] NVRM: No NVIDIA devices probed.
[   23.733304] nvidia-nvlink: Unregistered the Nvlink Core, major device number 246
[   23.781765] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[   23.782691] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   23.783451] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   23.785747] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   23.788687] NVRM: No NVIDIA devices probed.

``` 

 

Thanks in advance,

Tomas

Link to comment
14 hours ago, Sic79 said:

It is stated that it’s recommended to not share the GPU to other VMs while transcoding but I wonder if another docker is to be considered as a “virtual machine”?

The reason Im wondering is that I would like to share my GPU on 2 dockers at the same time.

Dockers are not virtual machines and the nvidia container runtime should support multiple dockers using the same card's resources. Know that if both of these cards require access to the transcoding pipeline you may run into problems, especially if you have a card that isn't licensed for more than 2 simultaneous transcode processes.

  • Like 1
Link to comment
11 hours ago, teumaauss said:

Hi, thanks for creating this awesome plugin.

 

This plugin is not working if I have 1 of the two GPUs installed in my system (one GTX 750 Ti and one RTX 2080) excluded for my VM with VFIO. 

 

Situation: 

GTX 750 ti (ideally to use for docker container, GPU transcoding etc.)

RTX 2080 (passthrough to VM, excluded via VFIO flag)

 

Error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.    

 

and from `dmesg`:

```

[   23.725090] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[   23.726094] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   23.726897] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   23.729240] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   23.732388] NVRM: No NVIDIA devices probed.
[   23.733304] nvidia-nvlink: Unregistered the Nvlink Core, major device number 246
[   23.781765] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[   23.782691] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[   23.783451] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[   23.785747] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[   23.788687] NVRM: No NVIDIA devices probed.

``` 

 

Thanks in advance,

Tomas

You'll want to post the diagnostics zip. Most likely you are stubbing a PCI-E bus  address that is in the same IOMMU group as the other card. This will stub both cards, even though you only intended to stub the one. This can be worked around, but I'll leave that advice to people more experienced than I with these things. 

Link to comment
1 minute ago, Xaero said:

Dockers are not virtual machines and the nvidia container runtime should support multiple dockers using the same card's resources. Know that if both of these cards require access to the transcoding pipeline you may run into problems, especially if you have a card that isn't licensed for more than 2 simultaneous transcode processes.

Are you saying that we can use the Nvidia card on 2 different Docker at the same time? (i.e.: PLEX & Emby) (assuming more than 2 transcode card)

Link to comment
34 minutes ago, Pducharme said:

Are you saying that we can use the Nvidia card on 2 different Docker at the same time? (i.e.: PLEX & Emby) (assuming more than 2 transcode card)

In theory, yes.

So containerization is different than virtualization in several ways. On Linux this has some big benefits.

For example, on a Linux OS devices are populated into a filesystem we call "sysfs"
Sysfs nodes exist for every single sensor, switch, and register for every single device that is found and initialized. As such, your GPU also becomes a SYSFS node, and all of it's features become exposed through SYSFS as well. With a virtual machine, we "remove" the card from the host OS and "install" that card into the guest OS.

In a containerized environment, specifically in docker, sysfs nodes can exist on both "machines" simultaneously. The application, driver, et al aren't any wiser about the existence of the other OS, and the card exists in both at the same time. As far as the card is concerned (and nvidia-smi for that matter), two processes on the same system are using the same card at the same time. Which is perfectly acceptable. In theory you could use it on both Plex and Emby at the same time with a 2 transcode limited card - but any more than 1 transcode at a time on either would result in broken transcoding on the other. Not a desirable situation. 

I have successfully had the card in use with 3 "operating systems" at the same time:
- Unraid (I spawned a fake xorg server on a 'virtual display' running at 640x480 so I could run nvidia-settings over ssh)
- The LS.IO Plex docker
- netdata
 

Edited by Xaero
  • Like 1
Link to comment
3 minutes ago, Xaero said:

In theory, yes.

So containerization is different than virtualization in several ways. On Linux this has some big benefits.

For example, on a Linux OS devices are populated into a filesystem we call "sysfs"
Sysfs nodes exist for every single sensor, switch, and register for every single device that is found and initialized. As such, your GPU also becomes a SYSFS node, and all of it's features become exposed through SYSFS as well. With a virtual machine, we "remove" the card from the host OS and "install" that card into the guest OS.

In a containerized environment, specifically in docker, sysfs nodes can exist on both "machines" simultaneously. The application, driver, et al aren't any wiser about the existence of the other OS, and the card exists in both at the same time.

I have successfully had the card in use with 3 "operating systems" at the same time:
- Unraid (I spawned a fake xorg server on a 'virtual display' running at 640x480 so I could run nvidia-settings over ssh)
- The LS.IO Plex docker
- netdata
 

Interesting! Might start playing with Emby to see if it got any better since last time I tried it.  I think I have a Lifetime on Emby too.

Link to comment
Dockers are not virtual machines and the nvidia container runtime should support multiple dockers using the same card's resources. Know that if both of these cards require access to the transcoding pipeline you may run into problems, especially if you have a card that isn't licensed for more than 2 simultaneous transcode processes.

@Xaero Thanks, exactly the answer I wanted :).
Link to comment
On 8/26/2019 at 1:30 PM, ramblinreck47 said:

Time has come! It’s finally here! At least in an early beta... https://forums.plex.tv/t/plex-media-server-1-16-7-1573-new-transcoder-preview/451135

Sorry for the noob question.  Is this an indication of successful 4K transcoding, NVENC support, both or something else?  Appreciate it's not ready for prime time, and some hackage might be needed to get it running in an Unraid docker environment, but I'm not sure whether I should be looking forward to ditching my ENC user script, integrating my 4K files into my main library (and probably upgrading my 1050i card), or what...

Link to comment
52 minutes ago, Cessquill said:

Sorry for the noob question.  Is this an indication of successful 4K transcoding, NVENC support, both or something else?  Appreciate it's not ready for prime time, and some hackage might be needed to get it running in an Unraid docker environment, but I'm not sure whether I should be looking forward to ditching my ENC user script, integrating my 4K files into my main library (and probably upgrading my 1050i card), or what...

I guess I should have been a little more descriptive in my post but I assumed most everyone was waiting for it like I was.

 

Essentially, this alpha version of the Plex transcoder allows for official NVDEC (hardware decoding) support in Linux. If you are using a decoder script, you’d want to disable it when using this new Plex version. Mind you, this is an alpha version and hasn’t been added to Linuxserver.io Docker yet. It could be but for now it is not that I know of. Beta version will probably be out relatively soon though.

 

The “zero-copy” stuff means that the transcoder should be more efficient as well going forward. It also wasn’t listed in their preview post but the new version finally has support for 9th Gen Intel QuickSync. Now, we’re just waiting on UnRAID to update to a newer kernel (>4.20) and then it should be good to go.

Link to comment
  • trurl locked this topic
Guest
This topic is now closed to further replies.