Jump to content

[Plugin] Nvidia-Driver


ich777

Recommended Posts

Update:  Disregard the failure was cause by a IPv6 issue which has now been fixed.....

 

 

After a recent upgrade to Unraid 6.12.13 the plugin stopped working so I removed it and tried to reinstall it but I am encountering the current error message 

 

--------------Can't download Nvidia Driver Package v560.35.03----------------- plugin: run failed: '/bin/bash' returned 1 Executing hook script: post_plugin_checks

 

 

tower-diagnostics-20240825-0822.zip

Edited by multilateral-pheasant3747
Link to comment
5 hours ago, multilateral-pheasant3747 said:

Update:  Disregard the failure was cause by a IPv6 issue which has now been fixed.....

 

if you are sure your network is working as it should and you are not using any adblockers etc which could cause this

 

may uninstall the plugin, remove all nvidia driver related files from stick ./boot/config/plugins/nvidia....., reboot, install again clean.

Link to comment

Hello!

 

I have been asking about this in "General Support", but now trying here also.

 

I started to get random crashes of the Unraid server 16/8, after the system had been running stable for over a year. First about once a day, then a lot more often. When it happens, I need to hold the power-button for a few seconds to shut the server down.

 

I am now fairly sure it is somehow related to the Nvidia driver/plugin. With the driver uninstalled, the server has not hang/crashed once. If I install it back (latest version), the problem usually appears within hours. It crashes even if there is no transcoding going on in the Plex server (which is the only thing I use the GPU for).

 

The mother board is really old (Hewlett-Packard p6-2265eo/2ADA, BIOS 7.12 06/07/2012), but it has worked fine with Nvidia gtx 1650 (until now)?!

 

Any suggestions/ideas what to try? An older (specific) version of the driver? Has someone else seen something similar?

 

Thanks!

 

Link to comment
1 hour ago, _JC_ said:

I have been asking about this in "General Support", but now trying here also.

Please post the link.

 

1 hour ago, _JC_ said:

When it happens, I need to hold the power-button for a few seconds to shut the server down.

This means that you got some kind of hard crash, did you already set up a syslog server and/or mirror the syslog to flash?

 

1 hour ago, _JC_ said:

The mother board is really old (Hewlett-Packard p6-2265eo/2ADA, BIOS 7.12 06/07/2012), but it has worked fine with Nvidia gtx 1650 (until now)?!

Are you sure that the Power Supply is still up to the task? Did you change anything recently, are you also sure that you are on the latest BIOS version?

 

1 hour ago, _JC_ said:

Any suggestions/ideas what to try? An older (specific) version of the driver? Has someone else seen something similar?

Without Diagnostics or a syslog where the server crashed I really can't tell anything, maybe something is defective now if it isn't working any more as expected.

 

If it was working fine and you changed nothing it seems related to hardware.

 

Again, Diagnostics would be very helpful.

Link to comment
4 hours ago, ich777 said:

 

4 hours ago, ich777 said:

This means that you got some kind of hard crash, did you already set up a syslog server and/or mirror the syslog to flash?

 

Only the short part I put in the other post (where I found "nvidia"). I will let the server run for some time now without the Nvidia driver to ensure it is stable.

 

4 hours ago, ich777 said:

Are you sure that the Power Supply is still up to the task? Did you change anything recently, are you also sure that you are on the latest BIOS version?

 

Everything else, except MB and CPU is replaced and fairly new (<2 years). Performance has been fine, so all "ok". No changes at all recently.

 

 

4 hours ago, ich777 said:

Without Diagnostics or a syslog where the server crashed I really can't tell anything, maybe something is defective now if it isn't working any more as expected.

 

If it was working fine and you changed nothing it seems related to hardware.

 

Again, Diagnostics would be very helpful.

 

Ok I will get that once I install the driver again! So, apparently nobody else has the same problem now? Maybe it is then time to get new MB, CPU, and memory. And hope it is not the gfx-card!

 

Btw, how is it possible to get relevant diagnostics, if the server just freezes?

 

Edited by _JC_
Link to comment
28 minutes ago, _JC_ said:

Btw, how is it possible to get relevant diagnostics, if the server just freezes?

With the syslog server, it should be clickable since the forum does this automatically.

 

29 minutes ago, _JC_ said:

Only the short part I put in the other post (where I found "nvidia"). I will let the server run for some time now without the Nvidia driver to ensure it is stable.

If the crash happend and nvidia-smi was mentioned you must have called it somewhere, are you sure you weren't on the Dasboard or something wasn't transcoding since otherwise nothing would call nvidia-smi on it's own and it only is called when you are on the dashboard or similar.

 

30 minutes ago, _JC_ said:

So, apparently nobody else has the same problem now?

No that I know, a few issues here and there but nothing similar currently.

 

31 minutes ago, _JC_ said:

And hope it is not the gfx-card!

If it's the GPU I would recommend that you look for something like a Nvidia T400 (2GB) it is low power and suitable for most users.

Link to comment
26 minutes ago, ich777 said:

With the syslog server, it should be clickable since the forum does this automatically.

 

Sorry, what do you mean? 🙂

 

26 minutes ago, ich777 said:

If the crash happend and nvidia-smi was mentioned you must have called it somewhere, are you sure you weren't on the Dasboard or something wasn't transcoding since otherwise nothing would call nvidia-smi on it's own and it only is called when you are on the dashboard or similar.

 

Ah, ok. Yes, dashboard was open, and probably GPU statistics...

 

26 minutes ago, ich777 said:

 

No that I know, a few issues here and there but nothing similar currently.

 

Ok, not what I was hoping for, but well...

 

 

26 minutes ago, ich777 said:

If it's the GPU I would recommend that you look for something like a Nvidia T400 (2GB) it is low power and suitable for most users.

 

Ok, good to know. Maybe time to upgrade. Looking at ASUS PRIME A520M-K + AMD Ryzen 5 4500 as a reasonable upgrade?

 

Thanks!

 

Link to comment
10 minutes ago, _JC_ said:

Sorry, what do you mean? 🙂

The text syslog server <- should be clickable and there you will see how to get the log from crashes.

 

10 minutes ago, _JC_ said:

Ah, ok. Yes, dashboard was open, and probably GPU statistics...

Yes, that's what's calling nvidia_smi, would be really interesting what happens if you've leave it closed and see if it's stable that way, if it is maybe start a transcode maybe that will crash it but I doubt it.

 

10 minutes ago, _JC_ said:

Ok, not what I was hoping for, but well...

I mean you can go through the previous few posts but all issues where resolved IIRC (one or two hardware failures and also a plugin error that I can't reproduce).

 

10 minutes ago, _JC_ said:

Ok, good to know. Maybe time to upgrade. Looking at ASUS PRIME A520M-K + AMD Ryzen 5 4500 as a reasonable upgrade?

I don't know if I would go with AMD since if you go with Intel (something like 10th to 12th gen) has a pretty capable iGPU and you wouldn't even need a dGPU for transcoding and the Intel iGPUs are also able to handle about 3 to 4 simultaneous transcodes without no issue <- this is just my recommendation since I never had issues with Intel and you can get CPU + Motherboard on the used marked (just make sure the CPU has a iGPU, most of them do but some don't).

Link to comment
On 8/28/2024 at 8:45 PM, ich777 said:

Yes, that's what's calling nvidia_smi, would be really interesting what happens if you've leave it closed and see if it's stable that way, if it is maybe start a transcode maybe that will crash it but I doubt it.

 

This is interesting! I am fairly sure I had the dashboard open (or I just tried to login) every time the server crashed. I now removed GPU statistics, re-installed the nvidia driver, and activated it in the Plex server. It has now been running fine since last evening, and today I had it hw-transcode 4 movies at the same time, with no issue (yet).

I will let it run like this (and not even logging into the Unraid UI...) to see if it is stable! 

 

UPDATE 30/8: Server still stable with no crash, after I removed GPU statistics plugin. This is great, but would be interesting to know why?! 🤔

Edited by _JC_
Update
  • Like 1
Link to comment

Crash after installing Nvidia Driver?!

 

I had a server up and running since the beginning of 2024 and all was well until a few weeks ago when I came home to find it crashed and repeatedly rebooting when I attempted to start it back up.

Since then I have had nothing but further issues and crashes, so today I decided to try starting fresh. At this point I've narrowed it down to something going on with my Nvidia 3060Ti. On my most recent attempt, I booted into a fresh unraid OS from a brand new thumb drive. Started up a 30 day trial, and went directly to Apps. Installed CA and then installed Nvidia driver plugin. Went to settings and when pressing the Nvidia icon to check on the driver, the server instantly reboots. This is reproducable.

My 3060Ti is only a few months old and was working very well in Plex, Tdarr, Stable Diffusion, and others before the crash a few weeks ago.
Does anyone have any idea where I should start with trying to solve this? I've done okay up until this point but I am far from being an Unraid expert.

Edit to add: I have this issue on 6.10.13 and 7.0 beta 2, makes no difference. The server was originally running 6.10.13 before the initial crash.

Edit 2: I just tried swapping in my old 980Ti and am now able to see the Nvidia Plugin options after clicking on it. Did my 3060Ti crap out?? This keeps getting stranger. 😵

 

TLDR;
Unraid server in new config crashes immediately after installing Nvidia driver and trying to access Settings/Nvidia

Edited by MissEmma
Link to comment
6 hours ago, MissEmma said:

Went to settings and when pressing the Nvidia icon to check on the driver, the server instantly reboots. This is reproducable.

when you say your server reboots ...

 

then its most likely a hardware issue, to aggressive power saving features, xmp, ... well, hardware related in the end, hard reboots ...

a software crash ends in a "freeze" where you have to hard reboot yourself as the server doesnt react anymore ...

 

sadly ... hardware issues like this are hard to narrow down, as you usually wont find anything in the logs, like pull the plug.

 

to startup debugging its more or less a clean BIOS (defaults, no power savings, no xmp), clean Unraid, no power savings ...

memtest to check RAM

 

then step by step either replace hardware or check if you can narrow it down to specific hardware by usage.

 

looks in your case somehow when NV GPU is accessed

 

power supply

GPU ... (tried on another mashine for testing ?)

PCIe slot (tried another for testing ?)

...

 

like said, hard reboots usually hardware == hard to nail down ...

  • Like 1
Link to comment
6 hours ago, MissEmma said:

I just tried swapping in my old 980Ti and am now able to see the Nvidia Plugin options after clicking on it. Did my 3060Ti crap out?? This keeps getting stranger. 😵

Seems like a hardware issue with your 3060Ti when another card is working.

Do you have another PC where you can put the card in, install the driver and let a benchmark run for some time?

 

Maybe try to enable mirror syslig to flash and see if you got something in the mirrored syslog.

Link to comment
On 8/29/2024 at 10:34 AM, _JC_ said:

This is interesting! I am fairly sure I had the dashboard open (or I just tried to login) every time the server crashed. I now removed GPU statistics, re-installed the nvidia driver, and activated it in the Plex server. It has now been running fine since last evening, and today I had it hw-transcode 4 movies at the same time, with no issue (yet).

I will let it run like this (and not even logging into the Unraid UI...) to see if it is stable! 

 

UPDATE 30/8: Server still stable with no crash, after I removed GPU statistics plugin. This is great, but would be interesting to know why?! 🤔

It is now quite safe to say that the problem was due to the GPU statistics plugin. Server still running fine, with many hw-transcodes.

  • Like 1
Link to comment
27 minutes ago, _JC_ said:

It is now quite safe to say that the problem was due to the GPU statistics plugin. Server still running fine, with many hw-transcodes.

What happens when you visit the plugin page (just to let you know it will also call nvidia-smi so a crash might also happen).

Link to comment

Every time I try to install it, I get an error and it won't install. I get the following error every single time I try to install it. I have a Nvidia 2060 Super and a Arc A380 (its passed through to Jellyfin). Any help would be nice. I am trying to set it up for handbrake.

 

Install Plugin:

plugin: downloading: nvidia-driver.plg ... done

plugin: downloading: nvidia-driver-2024.07.10.txz ... done

+==============================================================================

| Skipping package nvidia-driver-2024.07.10 (already installed)

+==============================================================================

-----ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR - ERROR------

---Can't get latest Nvidia driver version and found no installed local driver---

plugin: run failed: '/bin/bash' returned 1

Executing hook script: post_plugin_checks

Link to comment
11 minutes ago, Maxachaka said:

I get an error and it won't install.

enough space on the USB ?

USB Stick ok ?

some adblocker ... running which could block git downloads ?

...

 

and may i ask, if you already have a card for hardware transcoding (ARC) ... why not use it for both, jelly and handbrake ?

as long you are using Docker ... but thats your choice ;)

Link to comment
23 minutes ago, alturismo said:

enough space on the USB ?

USB Stick ok ?

There is plenty of space on the usb stick and it has been working fine for what i have going for it. 1G out of 64G used

24 minutes ago, alturismo said:

some adblocker ... running which could block git downloads ?

I don't believe there is anything and I double checked for anything just in case and found nothing

24 minutes ago, alturismo said:

and may i ask, if you already have a card for hardware transcoding (ARC) ... why not use it for both, jelly and handbrake ?

as long you are using Docker ... but thats your choice ;)

The issue is with the Arc A380. It works perfectly for decoding AV1 media and all of the sort. However, it is the biggest pain in the ass to not only set up on unraid (I had to manually add it as a path in unraid which was a pain in the butt to figure out), but it also sucks for encoding. It encodes at ~20fps for 4k HDR AV1. My CPU does ~12-15 fps but with a much smaller file size. This is mainly for when I have to handbrake a lot of media en mass (which is when I do x265) and don't want to create a literal space heater in my small little dorm (it uses a Ryzen 9 5900X).

Link to comment
12 minutes ago, Maxachaka said:

but it also sucks for encoding. It encodes at ~20fps for 4k HDR AV1

but you are aware that the NV 20xx Series (also 30xx Series) cant encode to AV1 in hardware ...

 

40xx Series can ... or Intel ARC ;) if thats your main goal ... ancoding to AV1

Link to comment
27 minutes ago, Maxachaka said:

There is plenty of space on the usb stick and it has been working fine for what i have going for it. 1G out of 64G used

Please post your Diagnostics.

Did you do anything custom to your Unraid installation?

 

This error usually means that either the connection to GitHub & the GitHub API does not work or something is eating up all your GitHub API tokens on your network (PiAlert was such a thing that did this <- don't know if that's fixed already).

 

27 minutes ago, Maxachaka said:

I don't believe there is anything and I double checked for anything just in case and found nothing

May I ask where you are located in the world? Do you have Unifi network gear? Please make sure to allow access to GitHub and the GitHub API.

 

28 minutes ago, Maxachaka said:

However, it is the biggest pain in the ass to not only set up on unraid (I had to manually add it as a path in unraid which was a pain in the butt to figure out), but it also sucks for encoding. It encodes at ~20fps for 4k HDR AV1. My CPU does ~12-15 fps but with a much smaller file size. This is mainly for when I have to handbrake a lot of media en mass (which is when I do x265) and don't want to create a literal space heater in my small little dorm (it uses a Ryzen 9 5900X).

Is encoding even reliably working on Linux, TBH I haven't looked into that for a long time and my ARC card is working fine on Windows and definitely can push more than 20fps <- without HDR.

  • Like 1
Link to comment
21 hours ago, ich777 said:

What happens when you visit the plugin page (just to let you know it will also call nvidia-smi so a crash might also happen).

I have tried that quite many times now, without any issues. I will report back if I notice any further problems. What might be causing the GPU statistics plugin on the dashboard to crash the server?!

Link to comment
51 minutes ago, _JC_ said:

I have tried that quite many times now, without any issues. I will report back if I notice any further problems. What might be causing the GPU statistics plugin on the dashboard to crash the server?!

I really don't know since this shouldn't happen maybe @SimonF has a clue?

To what update interval did you set GPU Statistics? I can only imagine if you set a really, really short interval you basically force it to crash -> I would recommend that you set it to at least 2000 or even 5000 since this means it will update every 2 or 5 seconds which should be more then enough for most users.

Always keep in mind that every status update also produces some kind of CPU load...

Link to comment
1 hour ago, ich777 said:

I really don't know since this shouldn't happen maybe @SimonF has a clue?

To what update interval did you set GPU Statistics? I can only imagine if you set a really, really short interval you basically force it to crash -> I would recommend that you set it to at least 2000 or even 5000 since this means it will update every 2 or 5 seconds which should be more then enough for most users.

Always keep in mind that every status update also produces some kind of CPU load...

Thanks. Maybe at some point I will test, but now I let it run without the GPU statistics :) Once when the server crashed, the boot drive got corrupted, so I had to restore it from the cloud backup :( I don't think I changed the interval from the default in the plugin config.

Link to comment
6 hours ago, ich777 said:

I really don't know since this shouldn't happen maybe @SimonF has a clue?

To what update interval did you set GPU Statistics? I can only imagine if you set a really, really short interval you basically force it to crash -> I would recommend that you set it to at least 2000 or even 5000 since this means it will update every 2 or 5 seconds which should be more then enough for most users.

Always keep in mind that every status update also produces some kind of CPU load...

 

4 hours ago, _JC_ said:

Thanks. Maybe at some point I will test, but now I let it run without the GPU statistics :) Once when the server crashed, the boot drive got corrupted, so I had to restore it from the cloud backup :( I don't think I changed the interval from the default in the plugin config.

If you do try again can you have logs going to a syslog server. I cannot think why it is crashing host. Maybe frequency of refresh related.

  • Like 1
Link to comment

I'm having an issue where the Nvidia Driver isn't detecting/reporting GPUs. I had a 1050ti in my first PCiE slot and everything was working no problems. I recently moved the 1050ti to the 3rd slot and added a 3060 to the 1st slot. This is a configuration I had in the server a few months ago. But now the driver isn't finding the GPUs even though Unraid is and can use them in VMs. I tried deleting the plugin and driver and reinstalling but no luck, any help would be greatly appreciated.Screenshot2024-09-03at1_14_10PM.thumb.png.51c6e0f0bd7c83fc94427e6f381825ac.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...