BomB191

Members
  • Posts

    104
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by BomB191

  1. ooo yes please Always loved this dashboard but never had the energy or time to make something I like.
  2. Yes this removes unraids ability to look/touch any of that. I personally figure not knowing the temp is better then knowing my fans will randomly shut off causing thermal shutdowns. Just set the Bios to the fan levels you need and call it a day.
  3. Just open your flash and go to *\flash\config\modprobe.d Create whatever file you like with .conf mines "disable-asus-wmi.conf" In that file paste # Workaround broken firmware on ASUS motherboards blacklist asus_wmi_sensors reboot and random fan stops resolved. I'm not the correct person to explain exactly wtf it does or how it works. But I assume it just makes unraid/linux not touch that fan controller
  4. Hey I found the root cause! Krusader grabs something locking out /proc/sys/kernel/overflowuid I haven't dug into it too much. but basically if I run krusader I cannot restart anything that uses the GPU without a system reboot. This is only the (looks it up) the one from your repository the binhex one is ok even when it has root privileges. (I did have your one on root privileges with the advised settings)
  5. it was a fan issue with asus x470 boards and the fan controller that's now in the base unraid/linux image bug out causing all fans to stop after a random amount of time, within a week or so. I have a file in 'config\modprobe.d' named 'disable-asus-wmi.conf' With the text of '# Workaround broken firmware on ASUS motherboards blacklist asus_wmi_sensors' So an update while I was awaiting a reply. I re downgraded back to unraid 6.10.3. tried the usual stuff and failed. Re updated back to 6.11.5 did the usual thing and now it works. I have no idea. but it has persisted through several reboots and transcoding now works on the GPU. Best guess something just stuck initially and fixed itself when I cycled the down/upgrade cycle
  6. I enabled then killed it. (someone else was having power state issues) Still failed Tried un installing then reinstalling as per instructions and this also fail. Same with force update the weird thing is everything else works Driver picks it up OK an so does nvidia-smi (I did try downgrading versions) Same issue with the plex docker from plex. I would expect this to work but somethings weird So update I think its something to do with /proc/sys/kernel/overflowuid its weird even logged in as root I don't have permissions to delete it or modify permissions
  7. I think I'm finally stuck. updated to 6.11.5 (after some system fan issues) updated the nvidia driver to 525.85.05. I can see it in the plugin but I cannot use '--runtime=nvidia' it just fails with bad permeameter or if I check it I get the below I have tried full uninstall/reinstall with reboots. changing versions etc. Now it did work once when I was looking around with other peoples issues 'nvidia-persistenced' then '$(pidof nvidia-persistenced)' But now it just wont take at all. Im sure its something silly like it always is haha. any help would be huge. diagnostics attached tower-diagnostics-20230127-0816.zip
  8. Fantastic news! please come back after we can confirm no more issues after a week or so. Then I can give this a spin too.
  9. I have yet to have a random Fan stop/power down since downgrading to 6.10.3 it would have happened by now. and with I think 4 other people that have spoken up with the exact same problem and exact same resolution it has to be something Unraid related and the Airflow monitor. no way this many people on the same ish hardware are all having hardware problems that go away after downgrading.
  10. Your running out of RAM, so your system is killing processes to prevent that. only work around is add more memory/RAM.
  11. I'm also finding this. uptime is now 3 days 22 hours. I now wonder how we go about figuring out what's causing the problem.
  12. Ha well yes that's very true. The magic smoke would be released if that happened. I did have a quick google. but most of what I found was rookies not explaining to well and a ton of recommendations to get AI tuner running on the OS (Asus fan controller software) so kinda useless for us unfortunately.
  13. So that's exactly what I did after discovering it was the fans. latest Bios and latest Unraid. Fans still shut off randomly I had 18 hours and 26 hours. I have done the same as you did and rolled unraid back to 6.10.3. only been 9 hours so far. I also couldn't think of anything worse for my hardware having to shutdown due to temp protections. So now we wait
  14. Interesting so we have 3 servers on Asus ROG STRIX X470-F Gaming motherboards. All with intermittent total fan failure. I'm running 6.11.5 and had the issue on 6.11.2 and I think the version before that too. (initially thought I had RAM issues, before I saw the fans stop in Realtime, logs show nothing). It would be one hell of a chance that 3 motherboards have intermittent fan controller problems. My wild guess is maybe some weird power management issues between Unraid and the Bios> Do keep us posted on 6.10.3. I have managed to go 18 hours all the way upto 1.5 weeks
  15. Update to that. fans still turned off. made it to 26 hours though. whats even more annoying is I dont get anything in the logs
  16. Same Issue as above but STRIX X470-F GAMING mobo and 2700X. Never had fan control plugin but do have system temp installed Edit: Just updated the Bios and confirmed all fans are 100% except the CPU. so now we wait
  17. Holy shit! I've been dealing with this for what feels like 2 months. at first I thought it was RAM. but I just watched the dam thing. all fans stopped its slowly overheated then killed itself from temp protection
  18. Yes when I went to create a fresh container i noticed it under 'show more settings' so on my container i had 2x NVIDIA_VISIBLE_DEVICES one with my GPU and one with 'all' in the field. I deleted the variable I created and used the one in the container already and renamed it to my gpu So the container now has the below in the settings regarding the GPU
  19. yes confirmed now working! Thank you very much 💖
  20. I require a dunce hat for tonight. Went to make a new container and notice these 2 prams hiding under more settings. Figures it would be something extremally stupid. I didn't even contemplate checking in there. The disappointment in myself is unmeasurable. TIL check 'Show more settings ...' Sorry for wasting your time. and thank you immensely for the assistance
  21. After running 'kill $(pidof nvidia-persistenced)' I get the same error docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown. Can also confirm both required variables are in the docker Key : NVIDIA_VISIBLE_DEVICES Value : GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a Key : NVIDIA_DRIVER_CAPABILITIES Value : all This is in the unmanic container. I assume I'm not at the point of the container itself having issues with it yet. I am on Version: 6.10.3 should I hop onto the 6.11.0-rc3?
  22. Unfortunately I attempted those fixes before posting. The only nerd pack item I had installed was perl (cant even remember what I installed it for to be fair) But all has been removed completely and rebooted, I also tried reinstalling the driver after this aslo - same result nvidia-persistenced in the cmd line is accepted but no change. NVIDIA_VISIBLE_DEVICES I think is where my issue might be I'm copying the information from Confirmed no spaces, Tried re copy pasting Correct "GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a" Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown. incorrect "asfa" I triel 'all' also as i saw that somewhere when I was searching. Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown. Item is as per instructions on first post NVIDIA_DRIVER_CAPABILITIES however spits a different error when I set it to 'some' Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' unsupported capabilities found in 'some' (allowed ''): unknown. with the correct 'all' I get Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown. My final attempt was put ' --runtime=nvidia' in extra pram fail the "save/compile' Go back in and edit the template and repasted the 'GPU-9ef5c7e3-966f-cd37-8881-73507c0b7e0a' Failed with the same error as NVIDIA_VISIBLE_DEVICES as above.
  23. This usually indicates that the runtime is not working properly and also is logged in your syslog. What packages have you installed from the Nerd Pack? I can only imagine that you have something installed that is interfering with the Nvidia Driver. Have you changed anything recently in your system, may it be the hardware or software (Docker, Plugins,...) So I appear to be having this issue. Fresh install though so never worked before. Just uninstalled Nerd Pack and rebooted Getting docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #1:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: false: unknown device: unknown. I'm Sure I'm missing something. Done the usual reboot/reinstall etc etc. (initially I was having VIFO problems, system used to pass it through now it doesn't) Driver gets the below and appears to be A ok GPU stats is getting pulled correctly too I'm like 99% sure I'm missing something dumb. what logs would you need? Edit: Also confirmed these are ok too.