Jump to content

[SOLVED] Unraid 6.9.2 randomly restarts


Recommended Posts

Dear Unraid Community,

 

I am looking for help with identifying the root cause of Unraid server unplanned restarts, which are driving me nuts. I am hoping for suggestions what can be checked, how can I try to catch more logs from a time right before unplanned restart and how can I try to trigger unplanned restart to find the problem and solve it.

Unraid system is installed with a few docker containers and VMs, everything is configured and running perfectly - unless the random reboot happen. Server can run without problems days or weeks but sometimes this is happening even day after day. The time of reboot is totally random (although I observed that it is usually happening in the morning hours - between 5AM and 8AM). This is not an power issue as I have UPS.

 

So far I did following:

  1. Checked RAM with memtest - no issues.
  2. Replaced PSU with new one (I had this change to modular PSU planned so I used this as opportunity to troubleshoot).
  3. Enabled q to flash to be able review logs after unplanned reboot, but there is nothing helpful in logs.
  4. Disabled few containers/vms to check are they root cause of the problem - without luck. I was suspecting that Windows VM (vdesktop01) might somehow cause that, but recently restart happened even with Windows VM turned off.
  5. Identified IRQ #16 issue and fixed it by disabling this feature (in modprobe - options i2c-i801 disable_features=0x10) - didn't help.

 

I am attaching following files for review:

  1. Syslog log, which contains logs from few hours before unplanned restart and a full boot log after crash happened. 
  2. Diagnotics data.

 

System specification:

  • Mobo: MSI B250M PRO-VDH
  • CPU: Intel Pentium G4600
  • RAM
    • GOODRAM 4GB (1x4GB) 3000MHz CL16 IRDM X Black
    • G.SKILL 32GB (2x16GB) 3000MHz CL16 Aegis
  • PSU: SeaSonic FOCUS GX-650 ATX 80+ GOLD
  • Drives:
    • 2x WD RED 4TB for array
    • 2x GOODRAM IRDM Pro gen. 2 512GB for cache
    • 1x GOODRAM 120GB 2,5" SATA SSD IRDM GEN. 2 - directly attached to VM
  • UPS: APC SmartUPS 750

 

Please let me know what additional data should be provided to review this case.

elysium-diagnostics-20210627-1226.zip syslog-unraid.log

Link to comment

This sounds like a hardware issue, nothing being logged at the time points to the same, very difficult top diagnose remotely, basically you need to start swapping hardware, easiest thing to test is using only one DIMM at a time, if that doesn't help and since you already replaced the board PSU next thing would be using a different board.

Link to comment

Thank you for the quick reply.

 

I was thinking about hardware issue as well, but I intentionally skipped that since this setup was running for 2 years without an issue on old OS and problems started not long after installing Unraid (of course I can be totally wrong as hardware issue might happen in the similar time as Unraid installation).

 

Leaving one memory DIMM is a very good suggestion. I was stress testing memory via running software dedicated for stress tests on VMs and nothing happened - but I didn't test it by physically removing it. I will do that and update this thread if find something.

 

In case of any other ideas - especially how to try to force that behavior again - I am open for suggestions.

 

Link to comment

Tristankin, I don't use plex so this is not it - thank you for suggestion!

 

JorgeB, I was trying how server will behave with all docker containers and VMs powered down - worked for 2 weeks without any problem.

 

Actually I reminded myself that not so long ago I enabled XMP profile to increase memory speed from 2133mhz to 2400mhz. After checking on the Internet that it is possible that XMP is causing unplanned restarts - I disabled it and currently waiting for results (will all memory dimms and all containers/vms enabled). If that won't help - my next step will be to leave one dimm and test again.

 

Link to comment
  • 1 month later...

To provide update on my situation - server is still randomly rebooting.

Only pattern which I noticed is that when there is not much stuff running (only core containers/vms) - reboots are not so often (once per week), but when I enable all my stuff - reboot usually happens within 24h.

 

I've tried:

  • Moving USB Stick with OS from USB3 to USB2 port (to confirm does the USB stick is a problem),
  • Running it without UPS (to confirm does UPS is a problem),
  • Reset BIOS to default settings,
  • Run system on one memory stick (tried for each of 3, in different dimm slots).

I run out of the ideas what else from the current setup can be tested. If someone has any other ideas what can I check or what can be problem here - please let me know!

 

I am planning to buy new motherboard (Gigabyte B560M DS3H) and CPU (Intel i5-11400) to upgrade current setup and hopefully confirm that reboots are no longer happening. I went though forum to check does B560 chipset and 11th gen CPU are working with unRAID 6.9.2 and basing on the forum posts they do. So hopefully I won't be surprised here.

Edited by czaj
Link to comment

Are the symptoms in your situation were same as with mine? I mean literally no logs which may indicate what happened and reboot (not just hang which requires manual restart)? Did you have these hangs in 6.9.1 as well?

 

Rolling back to 6.8.3 sounds like a time consuming process (considering for example that cache array won't be detecting in 6.8.x). Maybe trying to roll back to 6.9.1 is something which I need to test.

 

PS. I am really hoping that this won't be a software issue because keeping old version of OS running is not a best thing from the security perspective.

Link to comment
  • 4 weeks later...

Update: Today is 21st day of NAS running without a crash under a full workload. It seems that changing motherboard (Gigabyte B560M DS3H) and CPU (Intel i5-11400) resolved this problem.

 

There are two potential root causes of that:

  1. Issue with old motherboard/CPU - most likely, but didn't test it yet to confirm it,
  2. Incompatibility between OS and hardware components - least likely but still possible (I guess ;)).

 

No matter what is a root cause, this case seems to be resolved. Thank you for help and suggestions how to resolve this.

  • Like 1
Link to comment
  • JorgeB changed the title to [SOLVED] Unraid 6.9.2 randomly restarts

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...