[Plugin] Nvidia-Driver


ich777

Recommended Posts

17 minutes ago, leeknight1981 said:

GPU as I’m Not the only person with the issue 😕

Sorry I really can't help with that since @HellraiserOSU hasen't reported back yet and since you don't tried HW transcoding with Unraid on the Gaming machine.

As said the drivers are still the same and nothing wasn't changed there at all, so if you roll back to that driver version that worked before I see no reasen why it shouldn't work now.

Link to comment
On 6/28/2021 at 3:06 PM, HellraiserOSU said:

I'm having an issue where my card appears for a little bit and then disappears. It's an EVGA GeForce RTX 3060 and on the plugins it'll show the driver version and the Installed GPU fine after I reboot. After I check it and refresh the page, it's gone and says No devices found.

I don't have a VM

I do see in the logs

NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x23:0xffff:1199)

NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

image.thumb.png.b26884559e6ea173765d0e7ad20f6c10.png

Im having an identical Problem and am being advised its my card that works in other machine's and shows up in My UnRaid then disappears. If you find a fix please let me know :)) 

Link to comment
1 hour ago, ich777 said:

Sorry I really can't help with that since @HellraiserOSU hasen't reported back yet and since you don't tried HW transcoding with Unraid on the Gaming machine.

As said the drivers are still the same and nothing wasn't changed there at all, so if you roll back to that driver version that worked before I see no reasen why it shouldn't work now.

No worries i put it to the one prior to the update, rebooted all was ok then as soon as i Enable docker No - Enable docker Yes i get the error so its not my card or server  So i guess its wait and see 

Screen Shot 2021-07-02 at 11.20.12.png

Link to comment
Posted (edited)
2 hours ago, ich777 said:

Eventually for computational work? CUDA acceleretion or something similar?

Could be, but the OP mentioned assigning it to a Plex container, I was curious as to why - I have a few of those 1030s laying around so if there's a good reason to use them, I'm all ears

Edited by Michael_P
  • Like 1
Link to comment
Posted (edited)

I just picked up a GTX 1050ti to use for Plex transcoding and I'm getting the error message: 

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

 

It shows up in lspci -v and I've made sure it's not checked under Tools>System Devices (VMs are disabled.)

 

I see this in the syslog:

kernel: nvidia: probe of 0000:01:00.0 failed with error -1

kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).

 

Google doesn't give me any results except that it may be a bug in the nvidia drivers.

Is there anything I can do?

 

1.png

2.pngimage.thumb.png.11b9c6a6e21b07abac8d3e2a318a0ca9.png

Edited by Dr.Power
Link to comment
3 hours ago, Dr.Power said:

Google doesn't give me any results except that it may be a bug in the nvidia drivers.

Is there anything I can do?

Please post your Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded zip file here in the text box).

 

Can you also post a picture from the card itself?

Link to comment
Posted (edited)

Here is the diagnostics .zip.

What do you mean "a picture from the card itself?"

Like boot unraid to UI mode and a take a picture? 

nas-diagnostics-20210707-1630.zip

 

EDIT: Booting to GUI mode gives the same error I see in the syslog. I just get left on a black screen. If I change console sessions (Ctrl+Alt+F1) I get the usual console login prompt. startx does nothing.

It seems like the system recognizes the card, but the driver doesn't. The device ID(?): '0000:01:00.0' doesn't seem right to me.

Edited by Dr.Power
Link to comment
Posted (edited)
On 7/2/2021 at 4:34 AM, ich777 said:

Sorry I really can't help with that since @HellraiserOSU hasen't reported back yet and since you don't tried HW transcoding with Unraid on the Gaming machine.

As said the drivers are still the same and nothing wasn't changed there at all, so if you roll back to that driver version that worked before I see no reasen why it shouldn't work now.

Hey Sorry! I was away.

So I turned on the Above 4G Decoding.. still doesn't work.. Here's my diagonstic log.
 

 

Edited by HellraiserOSU
Link to comment
8 hours ago, Dr.Power said:

Like boot unraid to UI mode and a take a picture? 

I've now gone through the Diagnostics and from this two lines it seems to me that this is a counterfeit card... :/

 

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
    Kernel modules: nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GF116 High Definition Audio Controller [10de:0bee] (rev a1)

 

The first line tells the system that it's a 1050Ti but actually the second line tells you that this is a Fermi based card (Geforce 400 and 500 series), Pascal based card should have something like a GP106 Audio Controller on it and not a GF116 (Fermi based).

 

6 hours ago, HellraiserOSU said:

So I turned on the Above 4G Decoding.. still doesn't work.. Here's my diagonstic log.

From the Diagnostics I see nothing suspicious (please remove the line that is executed on boot for the card, I think you know what I mean), do you boot in UEFI or Legacy mode? If you are booting into Legacy please try to boot with UEFI and send me the Diagnostics again.

Link to comment
Posted (edited)
5 hours ago, ich777 said:

I've now gone through the Diagnostics and from this two lines it seems to me that this is a counterfeit card... :/

 








01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
    Kernel modules: nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation GF116 High Definition Audio Controller [10de:0bee] (rev a1)

 

The first line tells the system that it's a 1050Ti but actually the second line tells you that this is a Fermi based card (Geforce 400 and 500 series), Pascal based card should have something like a GP106 Audio Controller on it and not a GF116 (Fermi based).

 

From the Diagnostics I see nothing suspicious (please remove the line that is executed on boot for the card, I think you know what I mean), do you boot in UEFI or Legacy mode? If you are booting into Legacy please try to boot with UEFI and send me the Diagnostics again.

 

So I've replaced the card in the system a few times. And I would hope it's not a counterfeit because I got it straight from eVGA :)

 

Do you think the issue is because I've replaced it with a different card? Removed any script that runs on array start and still seeing no device found.

 

It is in UEFI mode.

 

So i rebooted and in lspci.txt I see

 

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060] [10de:2503] (rev a1)
    Subsystem: eVga.com. Corp. Device [3842:3655]
    Kernel driver in use: nvidia
    Kernel modules: nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228e] (rev a1)
    Subsystem: eVga.com. Corp. Device [3842:3655]

 

Edited by HellraiserOSU
Link to comment
30 minutes ago, HellraiserOSU said:

So I've replaced the card in the system a few times. And I would hope it's not a counterfeit because I got it straight from eVGA :)

This wasn't in reply to your message. ;)

 

31 minutes ago, HellraiserOSU said:

It is in UEFI mode.

Please try to boot in legacy mode an see if it works.

 

32 minutes ago, HellraiserOSU said:

Do you think the issue is because I've replaced it with a different card? Removed any script that runs on array start and still seeing no device found.

No, I don't think so but this can be a hardware compatibility issue I think you are too having a Sandy Bridge based system or am I wrong?

Link to comment
Posted (edited)
7 minutes ago, ich777 said:

This wasn't in reply to your message. ;)

 

Please try to boot in legacy mode an see if it works.

 

No, I don't think so but this can be a hardware compatibility issue I think you are too having a Sandy Bridge based system or am I wrong?

 

Oh haha my bad :D

Oh it's Skylake.. Running an i7-6700k. I had a GeForce RTX 1660 in there before working fine and removed it and changed it to a GeForce RTX 3060.

 

Legacy mode is a no go as well.

Edited by HellraiserOSU
not sandy bridge
Link to comment
2 minutes ago, HellraiserOSU said:

Yes it's a Sandy Bridge system. Running an i7-6700k.

Wait what, that's not possible Sandy Bridge is 2nd gen Intel...

 

2 minutes ago, HellraiserOSU said:

GeForce RTX 3060

Can you try to reset the BIOS and enable above 4G support and boot in legacy mode?

 

The strange thing is that the card should work and should be recognized by nvidia-smi. At least I can see nothing suspicious in your syslog...

 

From my perspective that is a hardware combination issue or misconfigured BIOS setting.

Link to comment
6 minutes ago, ich777 said:

Wait what, that's not possible Sandy Bridge is 2nd gen Intel...

 

Can you try to reset the BIOS and enable above 4G support and boot in legacy mode?

 

The strange thing is that the card should work and should be recognized by nvidia-smi. At least I can see nothing suspicious in your syslog...

 

From my perspective that is a hardware combination issue or misconfigured BIOS setting.

Yeah sorry edited my original response to say it was Skylake.   I have another server here running a i7-2700k  so got mixed up :)

I'll look to reset the BIOS

Link to comment
3 minutes ago, HellraiserOSU said:

Yeah sorry edited my original response to say it was Skylake.   I have another server here running a i7-2700k  so got mixed up :)

Please report back about your findings please, this is a really strange issue if it's a Skylake system and should at least work...

 

Where did your Diagnostics go?

Link to comment

i MAYBE got it.

In the Bios there was a PCI device mode and it was at UEFI ..   changed it to Legacy .. There was other areas that let you change either UEFI and Legacy mode and they were all in Legacy mode except for this one.

Now, I am not getting the RMIInitAdapter failed error

 

image.png.1059e5d771fcbaf40abd74ebef3ea205.png

 

So, hopefully this is it.. Thanks for your help!

  • Like 1
Link to comment
15 hours ago, ich777 said:

The first line tells the system that it's a 1050Ti but actually the second line tells you that this is a Fermi based card (Geforce 400 and 500 series), Pascal based card should have something like a GP106 Audio Controller on it and not a GF116 (Fermi based).

 

Unfortunately I think you're right. I should have known better; It has a VGA port and Nvidia hasn't done analog video since Maxwell.

That's what I get for not paying close attention to the listing.

Thanks for the help!

  • Like 1
Link to comment
On 6/17/2021 at 12:27 AM, ich777 said:

Do you know on what version you where previously and for how long you where on this version?

 

Can you open up a terminal from Unraid and execute this command:


sed -i '/disable_xconfig=/c\disable_xconfig=true' "/boot/config/plugins/nvidia-driver/settings.cfg"

 

after that please enter this command:


cat /boot/config/plugins/nvidia-driver/settings.cfg

 

and check if the line that starts with 'disable_xconfig' is set to 'true', should look something like that:

grafik.png.67ddc4ffca6fb4ded3d3dd028e9acd1d.png

 

After that please reboot and see if it is working again and please let me know.

Sorry for the late response I have been overseas doing contract work. I do not recall the previous version I was on be for I updated. But the command fixed my issue. Thank you for your time. 

  • Like 1
Link to comment

Well I thought it was fixed, but now it's back to this. Server hasn't been rebooted in 3 days.. Guess will keep digging  :(

image.png.d2b346fd1b9052b08d63c374e01d4506.png

 

image.png.f7616a6eca29f8aee659ad3c974aa39d.png

 

Edit - did a server reboot and it appeared in the installed GPU for a second and then it goes with the RmInitAdapter failed errors.

Edited by HellraiserOSU
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.