-
Server Crash - Unraid 6.12.6
@JorgeB Thanks. I've considered this in the past, using a remote syslog server - or even the local option. However I have syslog being mirrored to flash... to that end, mirroring didn't catch the crash either. So - how then would a remote or local syslog server catch a crash?
-
-
Server Crash - Unraid 6.12.6
Overall a stable system, but crashed last weekend and wasn't able to get this posted until now. Crashed somewhere between Oct 13th and morning of Oct 14th 2024. Logs didn't capture anything unusual - that I can find. I did manage to get a screenshot of the frozen server. But not sure there is anything of much value there. Appreciate anyone that can provide insight into why this occurred. Info attached: blackbox-diagnostics-20241017-2014.zip 2024-10-14_080009.pdf
-
Going crazy trying to figure out GPU Passthrough stability issues
**Possible Solution** I had this exact same issue and it was driving me nuts too. I see this thread is a little old and not much on here. So I figured I share my experience and solution, but your mileage may differ. I was building a gaming VM, passing through a GTX 1080ti GPU and could play for around 10-20mins before the VM would just “crash” and unraid will fill the logs with this error: vfio-pci 0000:01:00.0: vfio_bar_restore: reset recovery - restoring BARs vfio-pci 0000:01:00.0: vfio_bar_restore: reset recovery - restoring BARs vfio-pci 0000:01:00.0: vfio_bar_restore: reset recovery - restoring BARs ... Literally 1000’s of those until I would force stop the VM. After which I could no longer restart the VM – I would get a 127 error, not able to access the GPU device (see below as to my assumption). And my GPU was on 0000:01:00.0 – as well. Others might be in the #2 slot and so forth, which your addressing should follow. I didn’t have the CoreFreq Plugin as suggested above, so that’s ruled out. So to cut right to it – after hours and yes, DAYS of ruling out configuration settings, trying vbios, MB bios settings, VM graphics drivers, etc, etc. IT WAS THE DAM PSU (power supply). 🤬 I only came across this once on other thread… somewhat similar but not 100% the same setup and experience. So in my particular case – I added the GPU above (used from Ebay) and since my server case is more server grade case, there are no 6 or 8 pinouts for GPUs. Yes, I could “adapt” to those, but I only have two main power lines: one powering my disk array and the other powering the SATA drive enclosure. So I didn’t want to steal power from “unraid”… Thus I bought a secondary power supply (flat/modular) 400W and wired it up to power on with the primary PSU. (you can google how to do all this – there are “multiple power supply adapters”) – DON’T just jump it with a paper clip… 🤦♂️ Anyway, after swapping the GPU to my desktop computer – I could play for “hours” without any issues, no crashes, etc. (using the primary 500w PSU, which has two main cables with 6-8 pinout plugs). This ruled out any GPU card issues… it’s fine. 😁 I pulled the secondary PSU (from the unraid server) and used it to “replicate” my setup. And BINGO – 10-20mins in the game CRASHed! I looked over at the GPU which now has blinking LEDs on the power ports… 🤔. Looking at the secondary PSU – the only tell was the fan was not spinning. Yep it “died”. 🤦♂️ So I’m guessing it has some thermal/over heat protection or it can’t supply 250w long term…. Which is what my GPU needs at 95-100% usage.. So don’t just buy a crappy PSU for GPUs – You have to have a PSU that can drive 300w+ long term for hours JUST for the GPU. That would mean a system PSU of 600+w, just to be clear. Again, I’m just driving the GPU – so 400w should have been enough, but that PSU just can’t handle the load…. "junk" and getting returned. This is also why I had to power cycle / reboot unraid every time this would crash. Since there was no power to the GPU, Unraid could not reset the board (GPU). I also would get these errors… (clues) sometimes, not all times, in the logs. vfio-pci 0000:01:00.1: Unable to change power state from D0 to D3hot, device inaccessible vfio-pci 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible vfio-pci 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible Your experience could slightly different, if you’re using a primary PSU, it might just drop momentarily in power (voltage) because the 250w (or more, very GPU specific here) the GPU needs continuous is too much of a long term demand and the voltage drops just slightly enough that the GPU will then “crash”… at which time the mb / OS would try to reset the card – the logging above. Once the card is reset - some people are able then to re-start their VMs after the GPU crashed. Unless the Primary PSU would also die, you’d never know for a moment it dropped in voltage just enough for the GPU to crash. As Unraid and the rest of your system would be “fine”. (but could crash too) For me the secondary PSU was off/died – so there was no way to reset and repower the GPU, other than to reboot. I guess I probably could have just pulled the power plug on the secondary PSU and then plugged it back in – now thinking about it. But whatever, a full reboot was necessary for me, which “reset” the secondary PSU to power back up. (and I only rebooted like 100+ times, trying to figure all this out, for DAYS) AHHHH 🤬 Overall, you get the idea… this post is now long enough. I would very, very strongly recommend you rule out the PSU if you get these or similar errors. Especially if the VM/GPU is fine during normal operational, but only crashes and these errors show up when it’s under load… Good luck and hope this helps someone else.
-
Can't stop docker container Nextcloud or reboot the whole system
I too am now having the same issues as above... no point in duplicating. I had no issues on NC 17 or 18. It was only when I went to NC 20.0.4 that this started to occur... So it is NC? As mentioned above, I recently added (on NC 20) a binding/redirect from the docker /tmp path back to my cache disk. After just locking up again, I'm going to remove that... and see what happens. What has been consistent on the "crash" is accessing NC via the iphone IOS app. For whatever reason that seems to be the the "thing" the locks up the docker. Not always, but at least once a day or so when I tried to access the app on my iphone, it will do a server request and then just die. Curious if others see that too. Could it be with IOS app? Other access with NC client on Windows 10 hasn't locked anything up yet... Really hoping someone figures this out!!!
-
[Support] Linuxserver.io - Nextcloud
I'm having a very similar issue - how did you solve? I never saw a follow up or response to your post?
howiser1
Members
-
Joined
-
Last visited