[SOLVED] Unraid 6.9.2 randomly restarts

czaj · June 27, 2021

Dear Unraid Community,

I am looking for help with identifying the root cause of Unraid server unplanned restarts, which are driving me nuts. I am hoping for suggestions what can be checked, how can I try to catch more logs from a time right before unplanned restart and how can I try to trigger unplanned restart to find the problem and solve it.

Unraid system is installed with a few docker containers and VMs, everything is configured and running perfectly - unless the random reboot happen. Server can run without problems days or weeks but sometimes this is happening even day after day. The time of reboot is totally random (although I observed that it is usually happening in the morning hours - between 5AM and 8AM). This is not an power issue as I have UPS.

So far I did following:

Checked RAM with memtest - no issues.
Replaced PSU with new one (I had this change to modular PSU planned so I used this as opportunity to troubleshoot).
Enabled q to flash to be able review logs after unplanned reboot, but there is nothing helpful in logs.
Disabled few containers/vms to check are they root cause of the problem - without luck. I was suspecting that Windows VM (vdesktop01) might somehow cause that, but recently restart happened even with Windows VM turned off.
Identified IRQ #16 issue and fixed it by disabling this feature (in modprobe - options i2c-i801 disable_features=0x10) - didn't help.

I am attaching following files for review:

Syslog log, which contains logs from few hours before unplanned restart and a full boot log after crash happened.
Diagnotics data.

System specification:

Mobo: MSI B250M PRO-VDH
CPU: Intel Pentium G4600
RAM
- GOODRAM 4GB (1x4GB) 3000MHz CL16 IRDM X Black
- G.SKILL 32GB (2x16GB) 3000MHz CL16 Aegis
PSU: SeaSonic FOCUS GX-650 ATX 80+ GOLD
Drives:
- 2x WD RED 4TB for array
- 2x GOODRAM IRDM Pro gen. 2 512GB for cache
- 1x GOODRAM 120GB 2,5" SATA SSD IRDM GEN. 2 - directly attached to VM
UPS: APC SmartUPS 750

Please let me know what additional data should be provided to review this case.

elysium-diagnostics-20210627-1226.zip syslog-unraid.log

JorgeB · June 27, 2021

This sounds like a hardware issue, nothing being logged at the time points to the same, very difficult top diagnose remotely, basically you need to start swapping hardware, easiest thing to test is using only one DIMM at a time, if that doesn't help and since you already replaced the board PSU next thing would be using a different board.

czaj · June 27, 2021

Thank you for the quick reply.

I was thinking about hardware issue as well, but I intentionally skipped that since this setup was running for 2 years without an issue on old OS and problems started not long after installing Unraid (of course I can be totally wrong as hardware issue might happen in the similar time as Unraid installation).

Leaving one memory DIMM is a very good suggestion. I was stress testing memory via running software dedicated for stress tests on VMs and nothing happened - but I didn't test it by physically removing it. I will do that and update this thread if find something.

In case of any other ideas - especially how to try to force that behavior again - I am open for suggestions.

Tristankin · June 27, 2021

Any chance you are using igpu transcoding in plex? You may have used old config steps. I was having the same issue on my system.

JorgeB · June 28, 2021

Another thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

czaj · June 28, 2021

Tristankin, I don't use plex so this is not it - thank you for suggestion!

JorgeB, I was trying how server will behave with all docker containers and VMs powered down - worked for 2 weeks without any problem.

Actually I reminded myself that not so long ago I enabled XMP profile to increase memory speed from 2133mhz to 2400mhz. After checking on the Internet that it is possible that XMP is causing unplanned restarts - I disabled it and currently waiting for results (will all memory dimms and all containers/vms enabled). If that won't help - my next step will be to leave one dimm and test again.

czaj · June 29, 2021

Server restarted today again during morning. So this is not XMP. Going to leave one memory stick and test what will happen.

czaj · August 7, 2021

To provide update on my situation - server is still randomly rebooting.

Only pattern which I noticed is that when there is not much stuff running (only core containers/vms) - reboots are not so often (once per week), but when I enable all my stuff - reboot usually happens within 24h.

I've tried:

Moving USB Stick with OS from USB3 to USB2 port (to confirm does the USB stick is a problem),
Running it without UPS (to confirm does UPS is a problem),
Reset BIOS to default settings,
Run system on one memory stick (tried for each of 3, in different dimm slots).

I run out of the ideas what else from the current setup can be tested. If someone has any other ideas what can I check or what can be problem here - please let me know!

I am planning to buy new motherboard (Gigabyte B560M DS3H) and CPU (Intel i5-11400) to upgrade current setup and hopefully confirm that reboots are no longer happening. I went though forum to check does B560 chipset and 11th gen CPU are working with unRAID 6.9.2 and basing on the forum posts they do. So hopefully I won't be surprised here.

Edited August 7, 2021 by czaj

Tristankin · August 8, 2021

Have you tried 6.8.3? I had to roll back to stop the random hangs.

czaj · August 8, 2021

Are the symptoms in your situation were same as with mine? I mean literally no logs which may indicate what happened and reboot (not just hang which requires manual restart)? Did you have these hangs in 6.9.1 as well?

Rolling back to 6.8.3 sounds like a time consuming process (considering for example that cache array won't be detecting in 6.8.x). Maybe trying to roll back to 6.9.1 is something which I need to test.

PS. I am really hoping that this won't be a software issue because keeping old version of OS running is not a best thing from the security perspective.

Tristankin · August 9, 2021

Tell me about it, The new 1MB aligned cache works on 6.8.3 fine though.

Mine hung, not a reboot, nothing in syslog. So maybe it is different? Just a cheaper option than having to replace hardware.

Edited August 9, 2021 by Tristankin

czaj · September 1, 2021

Update: Today is 21st day of NAS running without a crash under a full workload. It seems that changing motherboard (Gigabyte B560M DS3H) and CPU (Intel i5-11400) resolved this problem.

There are two potential root causes of that:

Issue with old motherboard/CPU - most likely, but didn't test it yet to confirm it,
Incompatibility between OS and hardware components - least likely but still possible (I guess ;)).

No matter what is a root cause, this case seems to be resolved. Thank you for help and suggestions how to resolve this.

[SOLVED] Unraid 6.9.2 randomly restarts

Recommended Posts

czaj

Link to comment

JorgeB

Link to comment

czaj

Link to comment

Tristankin

Link to comment

JorgeB

Link to comment

czaj

Link to comment

czaj

Link to comment

czaj

Link to comment

Tristankin

Link to comment

czaj

Link to comment

Tristankin

Link to comment

czaj

Link to comment

Join the conversation