-
Posts
104 -
Joined
-
Last visited
-
Days Won
1
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by BomB191
-
-
On 2/23/2023 at 9:46 PM, juan11perez said:
Thank you for sharing this info. I also have the asus 470 and have had the problem for months.
I ended up installing a fan controller .
I'll try this fix
edit: so ive added the fix. The asus sensors are gone, but I also lost cpu fan speeds etc.....
this the case for others?
Yes this removes unraids ability to look/touch any of that. I personally figure not knowing the temp is better then knowing my fans will randomly shut off causing thermal shutdowns.
Just set the Bios to the fan levels you need and call it a day.
- 1
-
On 2/22/2023 at 9:28 AM, general18 said:
Do you put anything in the file or what?
I've tried by putting the text from Reddit inside the disable-asus-wmi.conf file (TXT File format). This text: # Workaround broken firmware on ASUS motherboards blacklist asus_wmi_sensors
I then rebooted the server and went to bed with a noisy server. When I then woke up, it was completely silent, and was running around 85-95 degrees C on a 3800X.
Motherboard is: ASUSTeK COMPUTER INC. PRIME X470-PRO
And to be 100% sure: Can I just add the file to the USB drive through the share, or do I have to take the USB drive into my PC and do it that way?
Just open your flash and go to *\flash\config\modprobe.d
Create whatever file you like with .conf mines "disable-asus-wmi.conf"
In that file paste
# Workaround broken firmware on ASUS motherboards
blacklist asus_wmi_sensorsreboot and random fan stops resolved.
I'm not the correct person to explain exactly wtf it does or how it works. But I assume it just makes unraid/linux not touch that fan controller
- 1
-
I can also confirm the fix has resolved the same issues I was having.
- 1
-
On 1/27/2023 at 7:08 PM, ich777 said:
I really don't know what causes that issue because I barely can't reproduce it and I'm pretty much clueless what it could be since Unraid runs from RAM and the package is also installed on boot, so this basically means everything is installed fresh on each reboot.
I can only imagine that something in Docker prevents it from running properly because that's the only place in the chain where something is stored across reboots.
Anyways, glad that everything is now working for you again!
Hey I found the root cause!
Krusader grabs something locking out /proc/sys/kernel/overflowuidI haven't dug into it too much. but basically if I run krusader I cannot restart anything that uses the GPU without a system reboot.
This is only the (looks it up) the one from your repository the binhex one is ok even when it has root privileges. (I did have your one on root privileges with the advised settings)
-
58 minutes ago, ich777 said:
What did you change to fix the fan issues that you where having? Is there maybe a setting that you‘ve changed that could affect the Nvidia Driver?
it was a fan issue with asus x470 boards and the fan controller that's now in the base unraid/linux image bug out causing all fans to stop after a random amount of time, within a week or so. I have a file in 'config\modprobe.d' named 'disable-asus-wmi.conf'
With the text of
'# Workaround broken firmware on ASUS motherboards
blacklist asus_wmi_sensors'1 hour ago, ich777 said:This happened once to me but a force update from the container fixed this issue for me back then on my test server.
Back then I tried also to downgrade to a previous version from Unraid and upgraded again but I couldn‘t reproduce it.
So an update while I was awaiting a reply.
I re downgraded back to unraid 6.10.3. tried the usual stuff and failed.
Re updated back to 6.11.5 did the usual thing and now it works. I have no idea. but it has persisted through several reboots and transcoding now works on the GPU.
Best guess something just stuck initially and fixed itself when I cycled the down/upgrade cycle
- 1
-
6 hours ago, ich777 said:
What do you mean exactly with that? Do you have nvidia-persistenced enabled? If so, you kill it by doing:
kill $(pidof nvidia-persistenced)
from a Unraid Terminal.
I enabled then killed it. (someone else was having power state issues)
6 hours ago, ich777 said:Can you please double check if the UUID from the GPU matches in the template? Can you also maybe post a screenshot from the container template so that I can see which parameters you've added for the Nvidia Driver to work?
Please note that on most newer driver version the Key "all" at the GPU UUID causes issues and you should always put in your UUID from the card.
Also please add a Variable, as described in the second post from this thread, with the Key: "NVIDIA_CAPABILITIES" and as Value: "all"
This should fix the issue
Still failed
Tried un installing then reinstalling as per instructions and this also fail.
Same with force update
the weird thing is everything else works
Driver picks it up OK an so does nvidia-smi (I did try downgrading versions)
Same issue with the plex docker from plex.
I would expect this to work but somethings weird
So update I think its something to do with /proc/sys/kernel/overflowuid its weird even logged in as root I don't have permissions to delete it or modify permissions
-
I think I'm finally stuck.
updated to 6.11.5 (after some system fan issues) updated the nvidia driver to 525.85.05.
I can see it in the plugin but I cannot use '--runtime=nvidia' it just fails with bad permeameter or if I check it I get the below
I have tried full uninstall/reinstall with reboots. changing versions etc. Now it did work once when I was looking around with other peoples issues 'nvidia-persistenced' then '$(pidof nvidia-persistenced)' But now it just wont take at all.
Im sure its something silly like it always is haha. any help would be huge. diagnostics attached
-
Fantastic news! please come back after we can confirm no more issues after a week or so.
Then I can give this a spin too.
-
On 1/2/2023 at 11:49 PM, Haldanite said:
Been running stable on the rollback to 6.10.3 for almost 3 weeks now. Today just got a Fix Common Problems warning to upgrade to 6.1.1.5; that's not going to happen again.
Same here. I don't know how to test and I need the stability
-
On 12/26/2022 at 2:59 PM, Minijuice said:
Only up for 12 hours but so far we're stable.
I have yet to have a random Fan stop/power down since downgrading to 6.10.3 it would have happened by now. and with I think 4 other people that have spoken up with the exact same problem and exact same resolution it has to be something Unraid related and the Airflow monitor. no way this many people on the same ish hardware are all having hardware problems that go away after downgrading.
-
4 hours ago, chest069 said:
Out Of Memory errors detected on your server is an error I get when running Unmanic does anyone know how to fix this?
Your running out of RAM, so your system is killing processes to prevent that. only work around is add more memory/RAM.
-
I'm also finding this. uptime is now 3 days 22 hours.
I now wonder how we go about figuring out what's causing the problem.
-
25 minutes ago, ConnerVT said:
Your hardware *not* shutting down when the fans stop? (Sorry, a moment of levity seemed needed)
Have you asked around in the ASUS forum? Perhaps it is more widespread than Unraid (such as across several distros/kernels)
Keep cool.
Ha well yes that's very true. The magic smoke would be released if that happened.
I did have a quick google. but most of what I found was rookies not explaining to well and a ton of recommendations to get AI tuner running on the OS (Asus fan controller software) so kinda useless for us unfortunately.
-
9 hours ago, Haldanite said:
This has me thinking that there might be a common denominator with the MB. The BIOS on my ROG STRIX X470 is version 5220 from 9/12/2019. There have been several updates since this version and nothing specifically noting any issues with the system fans shutting down, but it might be worth trying to update to the latest BIOS version 6042 and upgrading UNRAID back to 6.11.5 and see what happens. Not really a fan of knowing that my system shuts down due to hitting high temps; really don't want to do that to many times.
So that's exactly what I did after discovering it was the fans. latest Bios and latest Unraid. Fans still shut off randomly I had 18 hours and 26 hours.
I have done the same as you did and rolled unraid back to 6.10.3. only been 9 hours so far.
I also couldn't think of anything worse for my hardware having to shutdown due to temp protections. So now we wait
-
Interesting so we have 3 servers on Asus ROG STRIX X470-F Gaming motherboards. All with intermittent total fan failure.
I'm running 6.11.5 and had the issue on 6.11.2 and I think the version before that too. (initially thought I had RAM issues, before I saw the fans stop in Realtime, logs show nothing).
It would be one hell of a chance that 3 motherboards have intermittent fan controller problems. My wild guess is maybe some weird power management issues between Unraid and the Bios>
Do keep us posted on 6.10.3. I have managed to go 18 hours all the way upto 1.5 weeks
-
Update to that.
fans still turned off. made it to 26 hours though. whats even more annoying is I dont get anything in the logs
-
Same Issue as above but STRIX X470-F GAMING mobo and 2700X.
Never had fan control plugin but do have system temp installed
Edit: Just updated the Bios and confirmed all fans are 100% except the CPU. so now we wait
-
10 minutes ago, ich777 said:
What did you do exactly?
Did you remove the entry with ‘all‘ or did you remove your UUID?
Yes when I went to create a fresh container i noticed it under 'show more settings'
so on my container i had 2x NVIDIA_VISIBLE_DEVICES one with my GPU and one with 'all' in the field.
I deleted the variable I created and used the one in the container already and renamed it to my gpu
So the container now has the below in the settings regarding the GPU
- 1
-
9 minutes ago, ich777 said:
No worries, so it is now working or am I wrong?
yes confirmed now working!
Thank you very much 💖
-
5 minutes ago, ich777 said:
No, have you yet tried to change a value in the container and to basically recreate it on your server and see if this fixes your issue?
Can you maybe post your Diagnostics?
I require a dunce hat for tonight.
Went to make a new container and notice these 2 prams hiding under more settings.
Figures it would be something extremally stupid. I didn't even contemplate checking in there.
The disappointment in myself is unmeasurable.
TIL check 'Show more settings ...' Sorry for wasting your time. and thank you immensely for the assistance
-
48 minutes ago, ich777 said:
Do you have nvidia-persistenced enabled? If yes, please disable it with:
kill $(pidof nvidia-persistenced)
and try it again after you've disabled it.
Do you have all the variables in the Docker template as described in the second post of this thread?
After running 'kill $(pidof nvidia-persistenced)'
I get the same error
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown.
Can also confirm both required variables are in the docker
Key : NVIDIA_VISIBLE_DEVICES
Value : GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a
Key : NVIDIA_DRIVER_CAPABILITIES
Value : all
This is in the unmanic container. I assume I'm not at the point of the container itself having issues with it yet.
I am on Version: 6.10.3 should I hop onto the 6.11.0-rc3?
-
3 hours ago, ich777 said:
A user already had the same issue and I was able to reproduce this on my test server while up- and downgrading Unraid version.
I was able to solve it here:
Please also look at the following post since the user had many packages installed from the NerdPack and in his case it seems to be that this caused the issue.
...please report back if you got it working again.
Unfortunately I attempted those fixes before posting.
The only nerd pack item I had installed was perl (cant even remember what I installed it for to be fair)
But all has been removed completely and rebooted, I also tried reinstalling the driver after this aslo - same result
nvidia-persistenced in the cmd line is accepted but no change.
NVIDIA_VISIBLE_DEVICES I think is where my issue might be
I'm copying the information from
Confirmed no spaces, Tried re copy pasting
Correct "GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a"
Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown.
incorrect "asfa" I triel 'all' also as i saw that somewhere when I was searching.
Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown.
Item is as per instructions on first post
NVIDIA_DRIVER_CAPABILITIES however spits a different error when I set it to 'some'
Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' unsupported capabilities found in 'some' (allowed ''): unknown.
with the correct 'all' I get
Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown.
My final attempt was
put ' --runtime=nvidia' in extra pram
fail the "save/compile'
Go back in and edit the template and repasted the 'GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a'
Failed with the same error as NVIDIA_VISIBLE_DEVICES as above.
-
On 8/3/2022 at 8:23 PM, ich777 said:On 8/3/2022 at 7:43 PM, Scootter said:
First, thanks for all the work you do to provide this plugin! I suddenly have started having issues with any docker that uses --runtime=nvidia. I first noticed after a system reboot and saw that Plex had not started up. When I tried to start it I immediately got "Execution Error, Bad parameter".
This usually indicates that the runtime is not working properly and also is logged in your syslog.
What packages have you installed from the Nerd Pack? I can only imagine that you have something installed that is interfering with the Nvidia Driver.
Have you changed anything recently in your system, may it be the hardware or software (Docker, Plugins,...)
So I appear to be having this issue.
Fresh install though so never worked before.
Just uninstalled Nerd Pack and rebooted
Getting
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown.
I'm Sure I'm missing something.
Done the usual reboot/reinstall etc etc. (initially I was having VIFO problems, system used to pass it through now it doesn't)
Driver gets the below and appears to be A ok
GPU stats is getting pulled correctly too
I'm like 99% sure I'm missing something dumb. what logs would you need?
Edit: Also confirmed these are ok too.
-
I have rolled back but now I'm getting
/bin/sh: 1: apk: not found
Nothing else has changed
Ultimate UNRAID Dashboard (UUD)
in User Customizations
Posted
ooo yes please Always loved this dashboard but never had the energy or time to make something I like.