Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

UnRAID 6.12.2 server crashed and restarted

Featured Replies

I was working on configuring a new VM with the SPICE VM Console Protocol, and my VM stopped responding.  Unfortunately, it was because my server had crashed (dirty) and restarted.  I'm seeing Fix Common Problems telling me my server has detected hardware errors, and suggests in install mcelog via the NerdPack plugin, but I don't think that plugin is available anymore.

 

Not sure what hardware error occurred, but would like to learn how to find out.

 

 

 

 

Edited by Alyred
Remove irrelevant diagnostics.

  • Author

Thanks.  Did you see something in particular in my diagnostics that made you believe it was something specific in my settings? The system ran TrueNAS for months before UnRAID without issue, though I did update the BIOS once for the new AGESA API when I installed UnRAID.  I have most of the overclocking turned off, though the memory is running at an overclocked profile as it was before and I did tune the elliptical curve down slightly.

 

I'll check for the c-state settings.

  • Author

Got the BIOS option for Power Supply Idle Control set to "typical" and the server just crashed again while idle. Since it went for weeks prior to the change and this time it crashed within an hour, it appears to have made things less stable, or I just got unlucky.

 

@JorgeB, you replied to someone else's post a couple of years ago that said the go file line to disable C6 should no longer be needed with the Power Supply Idle Control set to Typical. Is that still the case or is there possible a regression bug in 6.12.x?

 

I've now setup the local syslog server but the only option it gives for "local syslog folder" is "<custom>" and won't accept any input.  I've mirrored it to flash for the time being wit ha 10mb maximum filesize.

  • Community Expert
10 hours ago, Alyred said:

is there possible a regression bug in 6.12.x?

Not AFAIK.

  • Author

So it just restarted again this morning.  I do have the syslog from the syslog server being mirrored to the flash drive. Not sure if that's safe to upload in its raw form like that? Happy to do so if needed.

 

This is with the Power Supply Idle Control set to Typical still in the BIOS.

 

Glancing through I see the following "hardware errors" logged during the bootup I believe:
 

Jul 14 08:17:26 Moissanite kernel: microcode: CPU0: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU1: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU2: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU3: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU4: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU5: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU6: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU7: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU8: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU9: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU10: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU11: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU12: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU13: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU14: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU15: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU16: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU17: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU18: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU19: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU20: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU21: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU22: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU23: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU24: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU25: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU26: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU27: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU28: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU29: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU30: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: CPU31: patch_level=0x0a20120a
Jul 14 08:17:26 Moissanite kernel: microcode: Microcode Update Driver: v2.2.
Jul 14 08:17:26 Moissanite kernel: IPI shorthand broadcast: enabled
Jul 14 08:17:26 Moissanite kernel: sched_clock: Marking stable (1536275102, 317374787)->(1886336138, -32686249)
Jul 14 08:17:26 Moissanite kernel: mce: [Hardware Error]: Machine check events logged
Jul 14 08:17:26 Moissanite kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: baa0000000030150
Jul 14 08:17:26 Moissanite kernel: mce: [Hardware Error]: TSC 0 MISC d012000200000000 SYND 4d000002 IPID 500b000000000 
Jul 14 08:17:26 Moissanite kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1689347821 SOCKET 0 APIC 0 microcode a20120a

 

moissanite-diagnostics-20230714-0944.zip

  • Community Expert

You are still overclocking the RAM, please see the link above again.

  • Author

Right, thought I turned off XMP when I got the Power Supply Idle Control, but apparently didn't save that config.

 

Question for future visitors: Are you seeing that as part of the syslog error I posted (with the Hardware error), or something else you saw in the diagnostics?

 

I'll get XMP disabled (though seems a waste, especially since it was stable for months in TrueNAS) and let it run to see if I get further lockups.

  • Community Expert
2 minutes ago, Alyred said:

Are you seeing that as part of the syslog error I posted

In the diags, was just double checking since I assumed it had already been done.

Keep in mind that Unraid completely runs from RAM, which is a different situation as Truenas.

 

  • Author

 

1 hour ago, JorgeB said:

In the diags, was just double checking since I assumed it had already been done.

Thanks.  Have it fixed to 2666 mhz now and XMP turned off, upped the voltage a tad to 1.250v for a bit more stability since I'm not having heat issues. Edit: Decided to just run it at "auto", which is 1.200v.

 

1 hour ago, bonienl said:

Keep in mind that Unraid completely runs from RAM, which is a different situation as Truenas.

 

This is a good point, though it was running fine through all of my weeks of trying to get the Preclears to work properly (I believe these were related to other issues, certainly no crashes) and didn't start crashing until I began configuring a VM hosted in the server.

Edited by Alyred

I just think xmp is flaky. I have had tons of issues blue screen, random reboots, hard locks, just to name a few on my own and client machines. I've had them run great for weeks to over a year then bam irratice pc behavior. On both windows and linux PCs. Turning off xmp always fixes the problems. You're not trying to get the most fps out of a server anyways it's really not needed.

Edited by skaterpunk0187

  • Author

New server rebooted again just a few minutes ago. RAM was set to 2666 Mhz/not overclocked, BIOS setting for Power Supply Idle Control set to Typical. What should I try next?

 

DIagnostics attached. I still have my syslog mirrored to flash and can upload that too if necessary.

 

 

moissanite-diagnostics-20230716-0043.zip

  • Community Expert
1 minute ago, Alyred said:

I still have my syslog mirrored to flash and can upload that too if necessary

 

That is the only file that can show what was happening leading up to the reboot.

 

Do you have your server set to automatically boot if power is applied?  Do you have a UPS?  A reboot (as opposed to a crash) is normally triggered by something external to Unraid or is hardware related.

  • Author
50 minutes ago, itimpi said:

 

That is the only file that can show what was happening leading up to the reboot.

 

Do you have your server set to automatically boot if power is applied?  Do you have a UPS?  A reboot (as opposed to a crash) is normally triggered by something external to Unraid or is hardware related.

The server is running on a UPS that doesn't seem to be causing issues otherwise; there's another computer plugged into it but sleeping for the time being as I'm using the monitor and keyboard while I'm building the server. It hasn't restarted or been interrupted.

 

I believe it's set to stay off if the power's been interrupted. That's how I usually set it because I don't want them powering on when the UPS comes online after a power outage or anything like that. I don't know of anything else that might be causing a restart, after a day or two of running without issue I just come back to find the array offline and the uptime has recently restarted.

 

Here's the syslog. It covers both the recent restarts.

 

I'm currently running a memtest on it because why not.

 

syslog.txt

  • Author

Where would I look specifically for the VM system logs and errors? Not for the individual VMs but for the virtual system?

 

The problems/crashes only started happening when I created and began running a VM, and then I noticed something when looking around - somehow, my System share was still stuck on my array. I thought I had moved it off long ago but the entire thing was still there. So I just stopped the VM and Docker services, forced mover to finish, then removed the secondary storage so it was exclusive to my SATA cache pool. Restarted my VM and container and let's see how it goes.

I'm wondering if something trying to access the system share while the array disk was spun down, or something with timeouts relating to that, might have caused the system reboot/crash?

 

Memtest ran fine with zero errors over 14 hours. I also double-checked my BIOS settings.  At one time I had tried to use the curve optimizer to slightly reduce the voltage to the CPU (-10) but had turned off PBO altogether, which made the curve optimizer settings go away. I had assumed those would have been reset but maybe not; I re-enabled it long enough to set that back to 0 and then re-disabled PBO.

Currently Global C-States is still enabled, and left Power Supply Idle Control set to Typical.

Going to let it run for a bit and see if moving the system share to the SATA pools help, without mover bothering it or it being on a possibly spun-down array drive, etc.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.