Hi guys, my unraid server has been running 24/7 for a few months before suddenly docker services start to fail and the docker daemon being unavailable. I tried logging in to the WebUI, but that was completely frozen. I sadly wasn't able to copy diagnostics, since ssh was also unavailable.
Using the IPMI I managed to get the following screenshot while initiating an acpi shutdown. The server was completely unresponsive for a few minutes until it finally shut down.
Upon restart, a parity check was automatically triggered, which indicates to me that the “soft shutdown” wasn't performed correctly.
The same happened again a day later. The same messages in the ipmi and the WebUI being barely useable, indicating "no flash drive". Not sure if this was just a frontend bug or the flash drive actually having issues.
I have been using unraid for 5 years now and never saw anything like this before.
The diagnostics file was generated after the restart has been performed.
I searched around about the error message, but didn't find anything related to my error. My system has 128GB of ECC ram, so it's unlikely of being related to running oom on a 2GB system.
anton-diagnostics-20240609-0925.zip
Edit:
Flash drive issues likely cause these errors. Beginning with this log message:
emhttpd: Unregistered Flash device error (ENOFLASH7)
Followed by this:
emhttpd: Plus key detected, GUID: 0781-5581-0000-100124105314 FILE: /boot/config/Plus.key emhttpd: error: device_read_smart, 9567: Cannot allocate memory (12): device_spinup: stream did not open: nvme1n1
After multiple of these flash drive errors, followed by more memory allocation errors, the whole system becomes more and more unresponsive. Resulting in docker containers not being able to allocate resources and becoming more and more unstable.
Which results in the original error message:
Cannot fork: Resource temporarily unavailable
At this point, the unraid WebGUI is barely responsive, and this stage can continue for about 1-3h, during which I am still able to gracefully shut down the server using the IPMI KVM with the powerdown command.
Uptime Kuma is the only service able to still push out a discord notification that a ping process has failed at this stage.
If I miss this window, the server locks up completely so that even an ACPI Shutdown doesn't get through. The only way to shut down now is by turning off the PSU.
Recommended Comments
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.