[Plugin] Nvidia-Driver

ich777 · August 13, 2023

14 minutes ago, FlyingTexan said:

I asked about the driver and GPU idle wattage.

The wattage that is reported by nvidia-smi is not always accurate but I really don't know what's going on with your card or better speaking the driver, you should report that on the Nvidia forums.

15 minutes ago, FlyingTexan said:

What am I using the card for? Who cares

I care because I tried to narrow down for what you are using the card and if you are maybe using it in a VM and using it alongside with nvidia-persistenced would ultimately crash your system.

16 minutes ago, FlyingTexan said:

When did I install it? Who cares

I care because I tried to narrow down if you just installed it recently in your system and your power draw got higher.

17 minutes ago, FlyingTexan said:

you wanted diagnostics, ok you got them.

Thank you, I even pointed out other things to look for and it seems that everything is working correctly.

18 minutes ago, FlyingTexan said:

I posted screenshots of everything as well.

Thank you.

18 minutes ago, FlyingTexan said:

Your response was to say not to believe them.

I can only tell you what I read online about other people reporting about that card because I simply can't buy every card and I do the things that I do here for free. I read that post on Reddit about a week ago I think and thought maybe it is the right place to point you to but now that I read every comment again I see that you are maybe the user "ElBigBad" but I could be of course wrong:

grafik.png.a174add7454ec4b66534483b540a044f.png

21 minutes ago, FlyingTexan said:

So I thank you for ceasing your help.

This is mainly the main reason (first post of this thread):

grafik.png.13b3021a903670dd875b9af34e1d6197.png

Please remove the script and upgrade to 6.12.3 then we can diagnose this issue further.

SpencerJ · August 13, 2023

21 minutes ago, FlyingTexan said:

“Did you install the card just recently?

For what are you using the card? Only for transcoding? For example a Nvidia T400, T600, T1000 is a way better choice for transcoding because these cards draw only a few watts in idle and the T400 has a max (and locked) TDP from 35W.

If you are using it for transcoding only you can also use the iGPU which is more than capable of transcoding a few 4K streams, consumes nearly 0W in idle and a maximum while transcoding from about 15W.”

What does a single questions of yours have to do with what I asked? Nothing. I asked about the driver and GPU idle wattage. Using quicksync has what to do with that? Nothing. What am I using the card for? Who cares, what does that have to do with idle card draw? Nothing. When did I install it? Who cares, what does that have to do with idle card draw? Nothing.

you wanted diagnostics, ok you got them. I posted screenshots of everything as well. Your response was to say not to believe them. I told you they were accurate and your response was to question me again. So I thank you for ceasing your help.

In case you didn’t know, Plugin developers are largely volunteering their time and effort here.

Why people feel the need to get snarky in light of this is beyond me.

ConnerVT · August 13, 2023

Thank you Spencer. I also found for the first time where to ignore a forum member.

MaCribHome · August 15, 2023

Hi all, I installed the gtx 770 with the driver 470.199.02 because otherwise it was to old as a driver. No he is reading it but when I command nvidia-smi everywhere there is N/A. I already setup my jellyfin, but no draw no intel when i try to transcode. The strange thing is that he reads my temperature.

Here is my console output:

Tue Aug 15 13:13:25 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02   Driver Version: 470.199.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 N/A |                  N/A |
|  0%   32C    P0    N/A /  N/A |      0MiB /  1998MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Thanks in advance!

MaCribHome · August 15, 2023

ich777 · August 15, 2023

9 minutes ago, MaCribHome said:

No he is reading it but when I command nvidia-smi everywhere there is N/A.

This is a common thing with older and even with newer cards.

Nvidia breaks the readings for older cards in favour to support all readings for new cards and they usually don‘t care to keep compatibility with old cards.

11 minutes ago, MaCribHome said:

I already setup my jellyfin, but no draw no intel when i try to transcode.

What type of media do you try to transcode? The GTX770 is only capable of transcoding h264 (AVC) and wont transcode h265 (HEVC).

How have you set up your card in Jellyfin, like it is described in the second post of this thread?

Zeze21 · August 16, 2023

On 8/10/2023 at 2:02 PM, ich777 said:

The driver sees 3 GPUs:

Attached GPUs                             : 3
...
GPU 00000000:08:00.0
    Product Name                          : NVIDIA GeForce RTX 2070 SUPER
...
GPU 00000000:09:00.0
    Product Name                          : NVIDIA GeForce GT 1030
...
GPU 00000000:44:00.0
    Product Name                          : NVIDIA GeForce GT 1030

Is one GPU passed through to the VM or are more GPUs passed through to VMs?

Please don't forget that if you pass a GPU through to the VM and the VM is running that it doesn't show on the plugin page...

However I see this in your lspci:

08:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] [10de:1e84] (rev a1)
	Subsystem: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] [10de:1e84]
	Kernel driver in use: nvidia
	Kernel modules: nvidia_drm, nvidia
08:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
	Subsystem: NVIDIA Corporation TU104 HD Audio Controller [10de:1e84]
08:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1)
	Subsystem: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1e84]
	Kernel driver in use: xhci_hcd
08:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev a1)
	Subsystem: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1e84]
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP108 [GeForce GT 1030] [10de:1d01] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP108 [GeForce GT 1030] [1043:85f4]
	Kernel driver in use: nvidia
	Kernel modules: nvidia_drm, nvidia
09:00.1 Audio device [0403]: NVIDIA Corporation GP108 High Definition Audio Controller [10de:0fb8] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP108 High Definition Audio Controller [1043:85f4]
...
43:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] [10de:2504] (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd GA106 [GeForce RTX 3060 Lite Hash Rate] [1458:4096]
	Kernel driver in use: vfio-pci
	Kernel modules: nvidia_drm, nvidia
43:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e] (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:4096]
	Kernel driver in use: vfio-pci
44:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP108 [GeForce GT 1030] [10de:1d01] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] GP108 [GeForce GT 1030] [1462:8c98]
	Kernel driver in use: nvidia
	Kernel modules: nvidia_drm, nvidia
44:00.1 Audio device [0403]: NVIDIA Corporation GP108 High Definition Audio Controller [10de:0fb8] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] GP108 High Definition Audio Controller [1462:8c98]

Nvidia RTX2070 Super: nvidia driver loaded

Nvidia GT1030: nvidia driver loaded

Nvidia RTX3060: bound to VFIO

Nvidia GT1030: nvidia driver loaded

So to speak, everything seems completely fine and the Nvidia Driver plugin lists the cards correctly.

In this Diagnostics I see that you have bound the RTX3060 to VFIO so to speak the Nvidia Driver plugin can't see it.

I don't know how you've bound it to VFIO because I don't see any indication why it should be bound to VFIO in your Diagnostics but the syslog shows that it was bound to VFIO on boot:

Aug 10 12:19:48 Server kernel: vfio-pci 0000:43:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none

Again, please note that you're RTX 2070 Super won't show up if you've use it in a running VM, that is the default behaviour.

Thank you for your detailed response.

I did not know that passed through GPUs don't show. (That is as long as the vm is running?)

I found out that my psu was not powerful enough to supply all my GPUs so I put another PSU in the server. Now everything should have enough power.

However the question still remains: I don't recall binding the 3060 to VFIO. Is there any way to "unbind" it? Thank you

However i did not bind the 3060 to VFIO. Can I somehow unbind it?

ich777 · August 16, 2023

47 minutes ago, Zeze21 said:

I did not know that passed through GPUs don't show. (That is as long as the vm is running?)

Exactly, this is caused because it is used by the VM and technically not visible to the host anymore.

47 minutes ago, Zeze21 said:

However i did not bind the 3060 to VFIO. Can I somehow unbind it?

Please shutdown all VMs, make sure that all GPUs show up in the Nvidia Driver plugin and post the Diagnostics again please.

Zeze21 · August 16, 2023

5 minutes ago, ich777 said:

Exactly, this is caused because it is used by the VM and technically not visible to the host anymore.

Please shutdown all VMs, make sure that all GPUs show up in the Nvidia Driver plugin and post the Diagnostics again please.

They are all shutdown and all 4 cards/GPUs show up in the Nvidia Driver plugin

attached the Diganostics.

Thank you in advance.

server-diagnostics-20230816-1733.zip

ich777 · August 16, 2023

27 minutes ago, Zeze21 said:

They are all shutdown and all 4 cards/GPUs show up in the Nvidia Driver plugin

Then it should work as expected. I think maybe if a VM is started and using a card currently it is showing up as it is bound to VFIO, at least that is what I think has happened in the last Diagnostics.

Is now everything working on your system as expected? Please keep in mind that this is completely normal that the card does not show up in the plugin when it is used in a VM.

SimonF · August 16, 2023

52 minutes ago, Zeze21 said:

They are all shutdown and all 4 cards/GPUs show up in the Nvidia Driver plugin

attached the Diganostics.

Thank you in advance.

server-diagnostics-20230816-1733.zip 430.37 kB · 1 download

Not that they are bound to vfio but if they where Look in tools/system devices, remove tick to remove from vfio and reboot.

Zeze21 · August 16, 2023

1 hour ago, ich777 said:

Then it should work as expected. I think maybe if a VM is started and using a card currently it is showing up as it is bound to VFIO, at least that is what I think has happened in the last Diagnostics.

Is now everything working on your system as expected? Please keep in mind that this is completely normal that the card does not show up in the plugin when it is used in a VM.

Well, the weird thing is, that the VM I stopped has no GPU assigned (it is home Assistant) and this VM does not work as it should.

It does not get the assigned IP address as it uses somehow 2 other mac adresses than stated in the VM Configuration. But that is something for a different topic.

When i started the VM again, the situation in the Nvidia Driver Plugin remained unchanged (all 4 GPUs showed)

Mrtj18 · August 17, 2023

I just like to add, thanks for this plugin it works as expected. I have 2 nvidia GPUs one rtx 3080 and one gtx 1080. I run this plugin for my steam headless docker and it picks up my 3080 just fine. And I have the 1080 binded to the vifo so I can run vm's. And it works perfectly.

I did run into an issue that if I didn't bind my 2nd GPU. Both would show up in the nvidia plugin, and when I would attempt to start a VM with my 1080 it would hard lock my server. So just FYI fellas make sure the nvidia plugin only see's the GPU you plan to use for docker. Or it could hardlock like mine was. From my experience at least

ich777 · August 17, 2023

43 minutes ago, Mrtj18 said:

if I didn't bind my 2nd GPU. Both would show up in the nvidia plugin

Yes, that's expected.

43 minutes ago, Mrtj18 said:

when I would attempt to start a VM with my 1080 it would hard lock my server

Do you run the script from @SpaceInvaderOne which uses nvidia-smi -pm 1 or do your run nvidia-persistenced on boot? If the answer is yes, this is also expected behaviour, you would first have to disable/kill persistenced mode and after that you can start your VM without crashing your server.

If you are using nvidia-persistenced you could also set up a script like that which runs via a User Script on boot and kills it after about 30 seconds so that it pulls your cards down to a lower power state:

#!/bin/bash
nvidia-persistenced
sleep 30
kill $(pidof nvidia-persistenced)

It is however possible after you shutdown the VM that the card is again in a higher power state again and that you have to call the script again, you could also do that with a libvirt hook, I think @alturismo wrote somewhere a tutorial on how to do that.

Jus · August 18, 2023

Hi all,

Not exactly sure if this is the correct thread to report this but...
After upgrade to 6.12.3 i see this on the plugin page:

image.png.9934b6e1bb9150a16ede1dff0e5726a8.png

Actual plugin works and all is fine (as far as I can see), but it would be cool to take care of this too as it clearly not suppose to be there.

Any pointers what I should look for?

ich777 · August 18, 2023

46 minutes ago, Jus said:

Any pointers what I should look for?

I don't see that on my system. On what plugin version are you?

Do you maybe have this enabled in the Tools -> PHP Settings menu:

grafik.png.4acad158488e4fc4151aab3ac59ccb08.png

If yes, please disable it by clicking RESET.

Jus · August 18, 2023

1 hour ago, ich777 said:

I don't see that on my system. On what plugin version are you?

Do you maybe have this enabled in the Tools -> PHP Settings menu:

If yes, please disable it by clicking RESET.

I am on latest 2023.07.06 and didnt had this before I updated Unraid to 6.12.3
PHP error reporting is not enabled.

in PHP log i have bunch of this:

#0 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code(64): log('0\x02', 1000)
#1 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code(173): presetSpace('0\x02')
#2 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705): eval()
#3 /usr/local/emhttp/plugins/dynamix/template.php(82): require_once('/usr/local/emht...')
#4 {main}
  thrown in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code on line 64
[18-Aug-2023 09:27:24 Europe/Paris] PHP Fatal error:  Uncaught TypeError: log(): Argument #1 ($num) must be of type float, string given in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code:64
Stack trace:
#0 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code(64): log('0\x02', 1000)
#1 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code(173): presetSpace('0\x02')
#2 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705): eval()
#3 /usr/local/emhttp/plugins/dynamix/template.php(82): require_once('/usr/local/emht...')
#4 {main}
  thrown in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code on line 64
[18-Aug-2023 09:34:54 Europe/Paris] PHP Deprecated:  trim(): Passing null to parameter #1 ($string) of type string is deprecated in /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(705) : eval()'d code on line 174

ich777 · August 18, 2023

33 minutes ago, Jus said:

in PHP log i have bunch of this

This is a good question, do you have any custom themes installed or something like that?

Maybe @bonienl knows what causes this.

As you see in the screenshot above I don't see these messages.

Jus · August 18, 2023

2 hours ago, ich777 said:

This is a good question, do you have any custom themes installed or something like that?

Maybe @bonienl knows what causes this.

As you see in the screenshot above I don't see these messages.

Nothing like that i run it as minimal as possible

I will keep trying things out and see if I could find the root cause of it.

alturismo · August 18, 2023

3 minutes ago, Jus said:

I will keep trying things out and see if I could find the root cause of it.

may start with another browser ... just to make sure its not browser related ... and needs some cleanup like cache, cookies, ...

Mrtj18 · August 19, 2023

On 8/17/2023 at 8:37 AM, ich777 said:

Do you run the script from @SpaceInvaderOne which uses nvidia-smi -pm 1 or do your run nvidia-persistenced on boot?

@ich777 Should I be running a script like this? Because currently I do not run a script or nvidia-persistenced on boot. I just binded my 2nd gpu (the 1080) to the vifo, so the nvidia plugin does not see it, and I have no issues gaming, or starting, or stopping VM's.

Edited August 19, 2023 by Mrtj18

ich777 · August 19, 2023

2 hours ago, Mrtj18 said:

Because currently I do not run a script or nvidia-persistenced on boot.

This is perfectly fine and you don't need to do more, such a script is not necessary.

UnJustice · August 21, 2023

Hello.

I'd like to be able to use my Nvidia 1070 in do some encoding for me via unmanic or tdarr. I also plan on using it for Plex, although I don't expect much usage there.

I have used this plugin in the past and it seemed to work fine. I had it uninstalled for a while and recently reinstalled it. It was working on 6.11.1 and the other day I upgraded to 6.12.3. I'm not 100% sure when the issues began, but this was the most major thing I've changed recently.

This is the error message when I click apply and unraid redeploys the docker:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown.

Over the last week or so I've been reading through the comments here, especially the recent ones, and attempting various suggestions. I have tried completely turning off iGPU virtualization (Intel 8700K), restarted the server probably 50 times, deleted the docker.image twice and even started using the directory option. Same results. I was getting this error in Plex over and over and over and then suddenly, and I really do not know what I changed this time, the 1070 became an option in my Plex server and seems to be working! but unmanic and tdarr still do not work. I have posted the unmanic related screenshots, but just be aware tdarr is doing the same thing. Plex seems to be working (?) for now anyway.

Please let me know if you need other info or screenshots.

Thanks for your time helping people and for all your work as well.

tower-diagnostics-20230821-1941.zip

ich777 · August 22, 2023

6 hours ago, UnJustice said:

nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown.

I read this from time to time but really can't reproduce that on my end.

Did you maybe downgrade to an older Unraid version at some point?

6 hours ago, UnJustice said:

I have tried completely turning off iGPU virtualization (Intel 8700K), restarted the server probably 50 times

This is something that you don't have to do certainly.

6 hours ago, UnJustice said:

Please let me know if you need other info or screenshots.

Can you post screenshots from your docker run command from Plex too?

Have you yet tried to downgrade to the legacy driver v470.xx?

I can only assume that something is different in the docker run command, do you use the exact same GPU UUID?

BTW you don't have to hide the UUID, nobody can do anything with it alone.

Have you yet tried to boot with UEFI instead of legacy mode?

The main issue with that is that I see this only from time to time and it happens randomly (at least it seems to me) for users and in very rare circumstances.

Most of the times it is fixed after a reboot and or redoing the container.

UnJustice · August 23, 2023

On 8/22/2023 at 1:58 AM, ich777 said:

1) Did you maybe downgrade to an older Unraid version at some point?

2) Can you post screenshots from your docker run command from Plex too?

3) Have you yet tried to downgrade to the legacy driver v470.xx?

4) I can only assume that something is different in the docker run command, do you use the exact same GPU UUID?

BTW you don't have to hide the UUID, nobody can do anything with it alone.

5) Have you yet tried to boot with UEFI instead of legacy mode?

Thank you for the response. I read it earlier today, but I was saving up a post for once I did everything I could. So, here's some updates and answers to questions here (I numbered your statements for ease in figuring out what I'm referring to:

1) Yes, but after the issues began. So yes at this point and for yesterday, but no for the very first time I had issues. I tried rolling back from 6.12.3 to 6.11.1 and the docker.image disappeared and some other stuff was being weird. I probably could've fixed it, but also... I just didn't wanna mess around with that stuff. So I went back to the current 6.12.3

2) I attached the Plex stuff without the GPU redactions.

3) I tried going to the 470.xx driver today after reading this. It gave the same exact output, the same error, so I went ahead and went back to the latest driver.

4) I figured the UUID didn't matter, but, you know. I've uploaded the Plex docker stuff along with the Nvidia driver plugin info and nvidia-smi info. I'm fairly confident I copied the numbers over correctly and formatted it correctly as it has (I posted proof yesterday) worked at certain times. Also, as the screenshots show, Plex working was some sort of fluke. I'm honestly lost for even trying to explain what's going on to myself at this point. The GPU hardware is absolutely fine. I've used it, and still can (I did it yesterday and day before just to see), as a VM passthrough GPU. It works perfectly fine in a Win 11 VM for performing encoding tasks. So, it's hard for me to lean too much in the direction of bad hardware (not that you implied such, this is just my own mind here running out of ideas). Also, I used a totally different Plex docker image. Previously I was using the one from linuxserver and now I'm using the binhex plex-pass one (I do have plex-pass lifetime license, just to answer any question there). I made it completely fresh using a different directory and re-added my media and stuff. Everything is functional and fine- but it still won't load using the Nvidia runtime parameter. (This also happened to the linuxserver version too, just for clarity. It was working, but then it stopped for some reason. I think I noticed the container had stopped running after I tried to run unmanic (which didn't run).)

5) I did today after reading this. My unraid is now up running right now, running perfectly fine (minus this issue of course), with a fresh UEFI (copied contents folder) install. I changed out the USB drive because the old drive was... very old. No joke, I found it in a library about 20 years ago! So, it was time to retire it anyway, and I wanted to absolutely rule out any hardware issues on the part of the USB. I had purchased a drive almost a year ago and never felt like dealing with swapping it out...this was an excuse.

I've also figured out/fixed some other issues that kept causing unclean shutdowns and then parity checks: one was an rclone mounting script that was failing to unmount. I fixed that so it unmounts now. The other, still a bit of a mystery, is a hassOS VM I run. If I don't force it to shutdown manually, it won't shutdown during the normal shutdown procedures. Anyway, I've been getting clean reboots/shutdowns since identifying and working around these issues.

I mean, I don't honestly know what else to mess with here. I've tried reseating and even put the GPU in a different PCIe slot. Same result. I've tried deleting the docker image as I mentioned before and same result even when I have literally just Plex installed/running in the docker. No VMs, just the array running, the plugins, and Plex. It still gives the error. It's got to be something somewhere that I changed or a minor setting in my UEFI settings, I don't know. The only other OS/software solution I'm finding myself considering is a clean USB and see what happens. I should've tried that today, but I wasn't thinking (dinner time, you understand). Perhaps that's my next thing. Just try a completely clean USB boot and see what happens? My understanding is unraid reinstalls every run into RAM though, so, I don't know that I could've messed up something so badly in the flash drive, but it's possible, maybe. I'd be curious your opinion on even bothering to try that.

I have other hardware I've considered testing out basically just for fun/science at this point. I have an AMD card (6500XT I think?) that I know for a fact works. I don't use it because it can't handle encoding though, which sucks. I also have an old 700-something series Nvidia card that was ripped out of a server or something. It also works but also doesn't support encoding. Neither can serve my overall intended purpose, but I might try them just to see what happens at this point. Especially the Nvidia card. I would use my desktop's 3080 but the setup to rip that thing out and put it back in later makes me very sad to consider trying. I also, generally, try not to play with my toys that are working.

Well, I'll look for your reply and again I appreciate your time and help.

tower-diagnostics-20230822-2136.zip

Edit:

Ok, I'm 99% sure I found the issue. It's actually completely unrelated to the nvidia driver plugin and Plex, et al. apps that would use the plugin.

Short version: The Krusader docker is causing it for some reason. I understand this isn't the place to ask for support on that, so I'm not. I'm just simply reporting that it was the thing causing the runc error.

Longer version: After several hours of deleting and rebuilding all my docker containers, I figured out a couple things. Perhaps this will be helpful for someone else in the future.

First, once the OCI runc issue was flagging, I absolutely HAVE to restart unraid. Turning docker off, waiting, then on against does not work. Stopping the array also doesn't work. It has to be a full reboot to make the error stop popping up.

Once I figured that out I quickly realized I could resolve this by process of elimination. Start with a core of containers I sort of know have no issues due to using them for over a year. I installed just the basics to get my plex server online. Traefik, the accompanying database, redis, plex itself. And that worked! Like yesterday (or rather two days ago at this point considering the current hour where I am. oh boy). Ok, so I knew I was onto something. I reinstalled unmanic. It also worked! I sat there and stopped and started the GPU-using containers repeatedly. I stopped docker service and restarted it a couple times. Everything was working fine with these few containers only. I rebooted the server and everything was still working. Great!

So I kept installing from there expanding my core group of containers to more and more based on how long I had been using them and vibes basically. I did five at a time and then I'd test for failure by starting/stopping over and over on the containers and docker service itself. I'd reboot the server just to check that too trying to force something to fail and give that error. Eventually, I worked my way down to about 10 containers. Krusader was added and immediately when I stopped Plex and tried to restart it blam-o! I got the error again. I uninstalled Krusader, did a clean reboot, and all was fine. Reinstalled Krusader and blam-o! again. I'm honestly not sure where to begin on that container and making it work. I rarely use it anyway, so may just keep it uninstalled for my sanity.

Edited August 23, 2023 by UnJustice
solved

[Plugin] Nvidia-Driver

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

ich777

ich777

ich777

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation