Debian 8.4 VM: Uhhuh. NMI received for unknown reason 20 on CPU


Recommended Posts

Hi team,

I just created my first VM with Debian 8.4 on unRAID 6.2.0 beta20 and I'm getting these constant kernel messages:

 

-Message from syslogd@debian at Apr  8 14:39:18 ...

kernel:[ 1580.252051] Dazed and confused, but trying to continue

 

-Message from syslogd@debian at Apr  8 14:39:48 ...

kernel:[ 1610.252055] Uhhuh. NMI received for unknown reason 20 on CPU 0.

 

-Message from syslogd@debian at Apr  8 14:39:48 ...

kernel:[ 1610.252055] Do you have a strange power saving mode enabled?

 

-Message from syslogd@debian at Apr  8 14:39:48 ...

kernel:[ 1610.252055] Dazed and confused, but trying to continue

 

Apparently, from my searches, they don't mean anything bad, but then why are they appearing?

 

Are these a result of the settings I chose for the VM or what?

 

Here are my logs, thanks!

tower-diagnostics-20160408-1443.zip

Link to comment
  • 5 months later...
  • 3 weeks later...

I'm seeing the same problem under unRAID 6.2 on a Debian VM newly created. I doubt we all have a hardware problem.

 

After a reboot and cleaning-up the syslog:

 

Oct 24 21:26:17 SFTP systemd[1]: Startup finished in 4.195s (kernel) + 4.839s (userspace) = 9.035s.

Oct 24 21:27:03 SFTP kernel: [  55.366799] random: nonblocking pool is initialized

Oct 24 21:33:58 SFTP kernel: [  469.621005] Uhhuh. NMI received for unknown reason 00 on CPU 0.

pinky-diagnostics-20161024-2147.zip

Link to comment
  • 2 weeks later...

Has anyone come up with a solution to this? Researching on Google suggests setting nmi_watchdog to 0 - but that did not help in my case.

 

I am a new UnRaider and had always had this issue. Thought it was bad hardware, but after much testing/swapping - all seems well, just the annoying messages that I cannot suppress and it screws up my Vim editing.

Link to comment
  • 1 month later...

Any progress with this? I'm positive I don't have bad hardware or any type of conflicts.  I'm thinking there's something inside Debian Jessie throwing up the panic.

 

Can we start posting our spec's and maybe perhaps find a commonality?

 

Here's mine, attached. I've also uploaded the same to support.  I'm hoping someone can see the problem and pinpoint it for us to get a fix.

 

Thanks!

 

P.S. - My suspicion is that I have IPMI enabled.  I will turn it off in BIOS and report back when I take my server down for maintenance this weekend [new year]

 

Can't disable IPMI, maybe ACPI settings? I have OS Aware turned on, but if I disable this - I would not be able to use my 2nd CPU socket.

unraid-spec.txt

Link to comment

I got lucky and ran into an article from stackerflow regarding someone having the same issue on supermicro motherboards - I followed the recommended solution - find the JWD jumper (jumper for Watch Dog feature) and left it open.  By default, it was on pins 2+3, which allowed the system to create un-maskable interrupt messages when it detects applications hanging.  Jumper pins 1+2 would reset any time an application hung.  I left my jumper to only be on 1 - thus leaving it open.  Watch Dog setting in BIOS was left alone (default enable) and I am not receiving any NMI messages while working inside the VM.

 

I'll update if it comes back, but it has been a couple of hours - a typical occurrence is around 10-15 minutes of shell/terminal access work before I start seeing the NMI messages.

 

ref: http://serverfault.com/questions/695650/supermicro-bmc-watchdog-caused-reboots

 

Update 1/20/2017 - Didn't fix it.  Still getting these errors.  Would someone please help me :'(. Frustrating trying to setup a VM and seeing the messages overwrite my vim sessions.

Link to comment
  • 4 weeks later...

I just ran into this myself.  I have been running multiple Windows Server VMs (with Titan-Xs and 1080s passed through) serving games and applications for SteamLink boxes throughout my home without issue (other than using Windows itself).  I decided to run a trial with SteamOS and after passing through a 1080 to the VM and booting this immediately popped up.  I had both the GFX and Audio of the 1080 passed through.  I immediately ran into this.  After passing through only the GFX this went away.  Did not debug any further as I'm running into other issues......

 

Update:

I take it back, it wasn't the Audio pass-through.  I had 8 of my threads (I have a dual Xeon server) out of 32 contiguously assigned to the VM.  I had also changed to split the threads between physical CPUs.  This appears to have caused this to stop.  It has been over an hour and before was seeing this a couple times a minute.

 

Update:

And now its back.  Nothing in my server IPMI log.....

 

-e-

Link to comment
  • 2 weeks later...

I too have this issue, running a Debian 8 VM on cores 0/1 of my 6700k i7. I hadn't seen this on earlier VMs, but today I have been spending a lot more time inside the CLI - so it's possible the issue has always been there and I had missed it.

 

This VM is CLI only, running Unraid 6.3.0-rc9

Link to comment
  • 1 month later...

I am also experiencing this.  I didn't have the issue on my Ubuntu vm, but I created a CentOS 7 vm and the issue starts up 10 minutes after booting up the vm.  I have tried the suggestions above and just like everybody else, they don't fix the issue.  Really hoping someone discovers the solution to this.  I have a supermicro motherboard and AMD processor.

Link to comment

I have had this on my Centos 7 VM (server). Had the NMI CPU syslog messages. This did not show on Ubuntu server VM. What I did is instead of CPU passthrough, i have changed it to QEMU emulated CPU in VM settings (edit). No errors showing. So not sure if it is a problem with CPU passthrough and some instability due to it. Do we blame KVM/QEMU or not or is it bug in CentOS package? i doubt we all have hardware issues.

 

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 6
model name      : QEMU Virtual CPU version 2.5+
stepping        : 3
microcode       : 0x1
cpu MHz         : 3392.260
cache size      : 4096 KB

Edited by vader75
Link to comment
9 hours ago, vader75 said:

I have had this on my Centos 7 VM (server). Had the NMI CPU syslog messages. This did not show on Ubuntu server VM. What I did is instead of CPU passthrough, i have changed it to QEMU emulated CPU in VM settings (edit). No errors showing. So not sure if it is a problem with CPU passthrough and some instability due to it. Do we blame KVM/QEMU or not or is it bug in CentOS package? i doubt we all have hardware issues.

 

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 6
model name      : QEMU Virtual CPU version 2.5+
stepping        : 3
microcode       : 0x1
cpu MHz         : 3392.260
cache size      : 4096 KB

 

This seemed to fix the issue for me as well.
For the sake of complete records: I am running Turnkey Linux Core (Debian based); https://www.turnkeylinux.org.

Link to comment

Follow-up: I apparently spoke too soon. The issue has returned.

 

I'm still reasonably certain that this isn't a hardware problem, but I have spare components I can swap in (despite the current hardware having passed every test I've run). It will be about a week or two before I'm able to get everything swapped and tested, but I will report back once done in case it matters.

Link to comment

Getting the same on my UnRAID Debian Linux VM...anyone have a solution?

 

Message from syslogd@localhost at Apr 15 11:07:46 ...
 kernel:[ 3172.589419] Uhhuh. NMI received for unknown reason 31 on CPU 0.

Message from syslogd@localhost at Apr 15 11:07:46 ...
 kernel:[ 3172.589419] Do you have a strange power saving mode enabled?

Message from syslogd@localhost at Apr 15 11:07:46 ...
 kernel:[ 3172.589419] Dazed and confused, but trying to continue

Message from syslogd@localhost at Apr 15 11:08:16 ...
 kernel:[ 3202.617042] Uhhuh. NMI received for unknown reason 21 on CPU 0.

Message from syslogd@localhost at Apr 15 11:08:16 ...
 kernel:[ 3202.617047] Do you have a strange power saving mode enabled?

 

Link to comment
  • 2 weeks later...
Quote

 kernel:[  585.812028] Uhhuh. NMI received for unknown reason 31 on CPU 0.

Message from syslogd@debian at Apr 26 16:31:11 ...
 kernel:[  585.812028] Do you have a strange power saving mode enabled?

Message from syslogd@debian at Apr 26 16:31:11 ...
 kernel:[  585.812028] Dazed and confused, but trying to continue

Message from syslogd@debian at Apr 26 16:31:41 ...
 kernel:[  615.812030] Uhhuh. NMI received for unknown reason 21 on CPU 0.

Message from syslogd@debian at Apr 26 16:31:41 ...
 kernel:[  615.812030] Do you have a strange power saving mode enabled?

Message from syslogd@debian at Apr 26 16:31:41 ...
 kernel:[  615.812030] Dazed and confused, but trying to continue

 

Also affected,

Asrock X99 extreme4 // 32gb ECC // GTX 750 passthrough (other vm)

 

The affected vm is a debian jessie vm ( 3.16.0-4-amd64 ). The problems weren't there until I disabled ipv6 in the vm. 

 

sysctl.conf

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv6.conf.eth0.disable_ipv6 = 1

Not sure if related, i'll try to revert the sysctl settings and see if it goes away.

 

 

Any ideas where to start to debug errors like this?

Edited by duketwo
Link to comment
  • 1 month later...
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.