[Plugin] Nvidia-Driver


ich777

Recommended Posts

14 hours ago, ich777 said:

Do you run the script from SpaceInvaderOne?

Please remove that script, reboot and see if it's the same.

 

If that doesn't help, please go in the container template from a affected container change something and change it back so that you can press the Apply button and see if anything changes after pressing Apply.

first of all thank you for quick response !

as for the issue, i have removed SpaceInvador P8 power state script, restarted, and re-OK'd a docker template to make sure its updated...
 

but still all the Nvidia runtime are not starting and showing the same:

 

docker run
  -d
  --name='HandBrake'
  --net='dokernet'

. .

. .

. .

. -removed by me

. .

. .

--runtime=nvidia 'zocker160/handbrake-nvenc:latest' 

e2eeca1ec513824709bbf7e3be1241e0fb39845ad1ffe943f40d675804be4746


docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied: unknown.

The command failed.

 

appriciate your help

Uri

----------------EDIT---------------------

OK now I see there is a problem with my one of my virtual machines - so I think it is wider spread than just the Nvidia containers. I will have to dig into it. I will update when I find a solution.

 

1595983706_Screenshot2022-11-03at21_42_39.thumb.png.691e517e63159c672a745aec7f96376f.png

 

Edited by UBS
updated the issue
Link to comment
1 hour ago, UBS said:

The command failed.

Have you yet tried to boot with Legacy (CSM) mode?

 

The issue with the VM should be unrelated to that, but please maybe try to reboot.

Also make sure that you've disable C-States in the BIOS and also make sure that you enable Above 4G Decoding and Resizable BAR support in your BIOS.

 

This issues occure most of the times on AMD systems, I'm not 100% sure what's causing that.

Did you recently update the BIOS or anything similar?

Link to comment
4 minutes ago, ich777 said:

Have you yet tried to boot with Legacy (CSM) mode?

 

The issue with the VM should be unrelated to that, but please maybe try to reboot.

Also make sure that you've disable C-States in the BIOS and also make sure that you enable Above 4G Decoding and Resizable BAR support in your BIOS.

 

This issues occure most of the times on AMD systems, I'm not 100% sure what's causing that.

Did you recently update the BIOS or anything similar?

 

i will update during weeked -as i need to take the server out of the closet for these steps

  • Like 1
Link to comment
1 hour ago, UBS said:

as i need to take the server out of the closet for these steps

Have you yet tried to reboot?

 

Oh, can you try to force an update from a container which is affected. Also please make sure that you uninstall any packages installed through nerd pack IIRC one user has had an issue where he had installed a extra package which causes the same issue.

Link to comment
5 minutes ago, ich777 said:

Have you yet tried to reboot?

 

Oh, can you try to force an update from a container which is affected. Also please make sure that you uninstall any packages installed through nerd pack IIRC one user has had an issue where he had installed a extra package which causes the same issue.

rebooted many times, tried force update as well, no new bios or HW changes on my side.
i do have many plugins and i will check if nerdtools may be the reson, i had it only for pearl (temp)

 

you think maybe Portainer-EE maybe reason ? I changed from CE as they gave free license

Edited by UBS
Link to comment
15 minutes ago, UBS said:

you think maybe Portainer-EE maybe reason ? I changed from CE as they gave free license

I really don't know because I don't use it.

Maybe it messes with Docker or changes some kind of runtime. What is the output of:

cat /etc/docker/daemon.json

 

Link to comment
10 minutes ago, ich777 said:

I really don't know because I don't use it.

Maybe it messes with Docker or changes some kind of runtime. What is the output of:

cat /etc/docker/daemon.json

 

root@UNRAID:~# cat /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

i will continue tomorrow , appriciate tyor kind assistance !! 

  • Like 1
Link to comment

Posted my diagnostics. I'm on intel and haven't changed any hardware recently. Same issue though. It worked at one point then randomly worked again before crashing and now i'm stuck. I've tried re-installing the container and downgrading the nvidia driver and updating again with no luck.

diagnostics-20221030-0832.zip

 

EDIT: Well I just tried again for fun after messing with my idrac setting on my r720xd and now I can start the containers. Not sure if it will last but i'll post my diagnostics after it working incase that helps somehow.

diagnostics-20221103-2023.zip

Edited by Findthelorax
Link to comment
3 hours ago, Findthelorax said:

Well I just tried again for fun after messing with my idrac setting on my r720xd and now I can start the containers. Not sure if it will last but i'll post my diagnostics after it working incase that helps somehow.

Do you remember with what you messed in IDRAC?

 

Did you install anything in the meantime? As said above it could be also a packge that was installed manually.

Link to comment

I have installed the Nvidia drivers and have two Nvidia cards installed, a GeForce GT 730 and a Quadro M4000. Both show up under System Devices.

image.thumb.png.07aa0d1195830a693ccfc3874d71ef0a.png

 

But, the Nvidia Driver Package says no devices found.

image.thumb.png.a49b36b291f171f0be929139d808b73b.png

 

SSH command:

root@Media:~# nvidia-smi
No devices were found

 

I had used the latest driver when installing and changed to Production to see if that would help. I rebooted every time.

 

The GT 730 is used only for a Windows VM. I had it bound, but un-bound it when trying to get the drivers to show. Diags are attached just in case.

 

Thanks in advance,

Eddie

 

EDIT:

I just installed the M4000 today. I just found a note about no power adapter in the logs and just ordered a cable. But, the GT 730 does not install drivers even with the M4000 removed.

Nov  4 15:11:42 Media kernel: NVRM: GPU 0000:03:00.0: GPU does not have the necessary power cables connected.
Nov  4 15:11:42 Media kernel: NVRM: GPU 0000:03:00.0: RmInitAdapter failed! (0x24:0x1c:1377)
Nov  4 15:11:42 Media kernel: NVRM: GPU 0000:03:00.0: rm_init_adapter failed, device minor number 0

media-diagnostics-20221104-1511.zip

Edited by Eddie Seelke
power adapter
Link to comment
34 minutes ago, Eddie Seelke said:

the GT 730 does not install drivers even with the M4000 removed.

The GT730 needs the legacy drivers, these are the 470 series drivers as you can see from your screenshot on the bottom.

 

The GT730 is a pretty bad card for transcoding because it doesn't support h.265 (HEVC).

Link to comment
23 hours ago, ich777 said:

I really don't know because I don't use it.

Maybe it messes with Docker or changes some kind of runtime. What is the output of:

cat /etc/docker/daemon.json

 

hello again,

now containers starting with nvidia runtime...
but like I'm not sure how it started - I'm also not very sure how it's now fixed

  1. so I removed the GPU statistics add-on
  2. went to nerd tools and these packages were installed (not by me, not sure why) and updated them:

93040470_Screenshot2022-11-04at23_38_30.thumb.png.568a72ace716061eadac28437c597187.png

 

3. reboot

4. fixed... by maybe one of these actions...
but I have a feeling that I'm not finished with this problem as I don't know how to not make it occur again

i attach a new diagnostic from after i got it working in case this may help

 

thanks again for your help !

 

unraid-diagnostics-20221104-2336.zip

Link to comment
2 minutes ago, UBS said:

but I have a feeling that I'm not finished with this problem as I don't know how to not make it occur again

I have a few users that reported such an issue, most of them solved it by uninstalling some unnecessary pacakges installed by the Nerd Pack. I really don't know what the issue here.

 

Simply search this thread for your error (not the whole but the part with OCI... and you will find a few posts.

 

The first issue like yours IIRC came up on Unraid 6.9.2 but what it caused I really don't know, I also was able to once reproduce the issue but only once.

  • Thanks 1
Link to comment
1 minute ago, ich777 said:

I have a few users that reported such an issue, most of them solved it by uninstalling some unnecessary pacakges installed by the Nerd Pack. I really don't know what the issue here.

 

Simply search this thread for your error (not the whole but the part with OCI... and you will find a few posts.

 

The first issue like yours IIRC came up on Unraid 6.9.2 but what it caused I really don't know, I also was able to once reproduce the issue but only once.

I will update if I will find any new information

  • Like 1
Link to comment
15 minutes ago, neunghaha28 said:

After update 6.11.1 to 6.11.2 please help  me.

The driver download is incomplete, have you waited before upgrading to 6.11.2 where it said to wait until it displays a message that everything is done?

 

Anyways, please select the driver that you want to install and press the "Update & Download" button on the Nvidia Driver page, this should redownload the driver.

  • Thanks 1
Link to comment
1 hour ago, ich777 said:

The driver download was incomplete, have you waited after upgrading to 6.11.2 where it said to wait until it displays a message that everything is done?

 

Anyways, please press the "Update & Download" button on the Nvidia Driver page, this should redownload the driver.

 

Thanks i will try it I feel that the download speed is very slow.

 

Update: Now it works fine Thanks for the help again.

 

Edited by neunghaha28
  • Like 1
Link to comment
1 hour ago, mr2web said:

Please let me know if anyone need any more information. 🙂

Seems like something went wrong with the driver download, please reboot, maybe the reboot will take a bit longer since the plugin tries to download the driver on start too (as long as you have a active internet connection on boot) if it failed like in your case.

 

Please report back if it is working afterwards, if not I would recommend to uninstall the plugin and pull a fresh copy from the CA App.

Link to comment
On 10/20/2022 at 1:11 PM, carnivorebrah said:

 

Hmm, alright.

 

It is a Supermicro X9DRL-iF, which does have IPMI and a VGA port for displaying video, but I have the BIOS set to the PCI slot the GPU is in for video output because the VGA port wouldn't work before.

I can try switching the BIOS to the VGA port and seeing if it works now instead of the GPU...

 

So, as a test, I switched out the P2200 for a spare GT 710 I had lying around.

 

I'm able to get it load all the way to the log in screen with this card, but oddly, it doesn't show up in the Nvidia plug-in:

image.thumb.png.549f3f1824df0e25cd2b0d059e9a9851.png

 

It does show in my IOMMU groups though:

image.png.b5c743720b23ebeb7d8451b0b9bd0e99.png

 

Why wouldn't it show in the Nvidia plugin?

 

Diagnostics attached.

 

It's really odd that everything was working with the P2200, except the login screen, but with the GT 710, it's the complete opposite.

diagnostics-20221106-1324.zip

Link to comment
10 minutes ago, carnivorebrah said:

I'm able to get it load all the way to the log in screen with this card, but oddly, it doesn't show up in the Nvidia plug-in:

12 minutes ago, carnivorebrah said:

Why wouldn't it show in the Nvidia plugin?

This is because you need to select the 470 series driver:

Nov  6 13:12:46 AKWSERVER kernel: NVRM: The NVIDIA GeForce GT 710 GPU installed in this system is
Nov  6 13:12:46 AKWSERVER kernel: NVRM:  supported through the NVIDIA 470.xx Legacy drivers. Please
Nov  6 13:12:46 AKWSERVER kernel: NVRM:  visit http://www.nvidia.com/object/unix.html for more
Nov  6 13:12:46 AKWSERVER kernel: NVRM:  information.  The 520.56.06 NVIDIA driver will ignore

The GT710 is a really horrible card IMHO... :D

 

Please report back what happens if you switch over to the 470 series driver after a reboot.

Link to comment
1 minute ago, ich777 said:

This is because you need to select the 470 series driver:

Nov  6 13:12:46 AKWSERVER kernel: NVRM: The NVIDIA GeForce GT 710 GPU installed in this system is
Nov  6 13:12:46 AKWSERVER kernel: NVRM:  supported through the NVIDIA 470.xx Legacy drivers. Please
Nov  6 13:12:46 AKWSERVER kernel: NVRM:  visit http://www.nvidia.com/object/unix.html for more
Nov  6 13:12:46 AKWSERVER kernel: NVRM:  information.  The 520.56.06 NVIDIA driver will ignore

The GT710 is a really horrible card IMHO... :D

 

Please report back what happens if you switch over to the 470 series driver after a reboot.

 

What is the recommended card?
 

My logic of buying the top of the line card was obviously wrong because the P2200 didn't last 2 years.

 

I would expect to get 4+ years out of a $400 card at the very least.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.