Kernel panic not syncing timeout not all cpus entered broadcast exception handler x9dri-ln4f


geo10

Recommended Posts

Hi guys,

 

I am having problems with setting up unraid windows 10 vms. Info about the system: 

mobo: supermicro x9dri-ln4f+

cpus: e5 2690 (v1)

 

whenever I fire up the windows vm and start to tinker with it i get an error saying:

Kernel panic not syncing timeout not all cpus entered broadcast exception handler.

 

What I have tried

I have did some digging and tried to start unraid with kernel option max_cstate=1, this didnt help.

I have set energy saving mode of the cpu to max performance

Ran memtest86 for a couple of passes ran to 10 passes stopped it there.

 

I have read this might be because of a kernel bug...

 

Any input is appreciated, I would love to use unraid as it has a very comfy ecosystem built around.

 

Thanks,

geo10

 

EDIT: for context the system halts with this error after the windows vm is started with varied time until system failure (5mins to 20 mins)

EDIT2#: I have also flashed the newest bios on the board but that didn't help. Running unraid without the vm running doesn't produce the error so I am still hoping its some BIOS\software related issue.

Edited by geo10
context
Link to comment
On 11/24/2019 at 8:44 PM, geo10 said:

Thanks,

geo10

 

I knew this is going to be an edge case and prolly not many ppl face it, but still if someone succesfully solved it I would  appreciate some input, so BUMP for awareness.

 

+ Additional context:

 

What I have tried since then: I have change the sata mode to IDE: succesfully ran AIDA64 stress test on the vm for half an hour, had to turn it off as it was about to go to sleep.

Next day i change the sata mode to AHCI and disabled aggressive energy management in the BIOS, the AIDA64 test ran for 40 minutes then the host crashed with the usual error message.

I, then unplugged the HDD and SSD and put the into a SATA2 slot on the mobo, the test ran for 1 hour and 10 minutes then crashed with the usual error.

So, I have changed back the sata mode to IDE with the HDD and SSD staying in the SATA2 slots on the mobo and the test is running now for 37 minutes.

 

Also the timing on the VM is strange as the second timer in AIDA64 does sometime skip a second or update it too fast or too slow, so its odd at best.

 

Any gurus out there please drop me a clue.

Edited by geo10
Link to comment

IDE instead of AHCI is not recommended. Are you overclocking? Don't. Have you done memtest?

 

You need to get us more information. Go to Tools - Diagnostics and attach the complete diagnostics zip file to your next post. Also, if you are on 6.7.2 or higher, go to Settings and Enable Syslog Server so you can save syslog to give us after a crash.

Link to comment

Hey,

 

Thanks for contributing to this.

 

I will link the diagnostics file tomorrow. I will also enable the syslog server tomorrow when I have time to get a look.

Here is what I have tried:

Tried 6.6.7 - Same error after 2 hours or something.

Tried 6.8 rc7 - Same error after 1 and a half hours or so.

I tried it with Windows 10 hosting a VM in virtualbox, that one didn't crash for more than 3 hours straight, so I suspect this is something to do with the underlying kernel.

 

 

I have read this thread where the OP fixed his issue with rolling back to 6.2. I wouldn't mind rolling back to 6.2 as long as I get a working copy of Unraid...

 

Link to comment
11 minutes ago, geo10 said:

I have read this thread where the OP fixed his issue with rolling back to 6.2. I wouldn't mind rolling back to 6.2 as long as I get a working copy of Unraid...

That thread is nearly 3 years old, and 6.2 isn't supported as I mentioned on your other thread. I just reviewed that entire thread, and that user was trying to run on a marginal (at best) PSU. And the whole thing about 6.2 was as compared to 6.3, not compared to any more recent version. And here is a much more recent post from that same user which includes diagnostics that show that user is currently running 6.7.2:

 

https://forums.unraid.net/topic/84900-parity-errors-but-no-unclean-shutown/?do=findComment&comment=787857

 

So I think you might as well forget about 6.2 as the solution to your problems. There are lots of threads where a user thinks they have the exact same problem as another user (and they think that is a good reason to hijack the thread for some reason), but further diagnosis reveals their problems were completely different. If you do want to try a slightly older (and arguably still supported) version see the DOWNLOAD page linked at the top of this page.

Link to comment

Hey,

 

Thanks for coming back!

Sorry, I didnt directly say whether I am overclocking or not: I am running xeon cpus, so I am not overclocking nor have intention to. In my first post I have indicated I ran memtest to 10 passes, but since I am running ecc ram I would be suprised if anything popped out there.

 

Let me attach the system diagnostics file. I have also enabled syslog server and if it crashes next time I will come back with the logs.

 

Thanks

tower-diagnostics-20191130-0852.zip

Link to comment

I just started the server and fired up a windows VM, I enabled syslogs mirroring to the flash but it doesn't contain anything useful I think (also enabled local syslog server but that failed to even create a file...).

 

When I get back home I will try to setup remote syslogs server. Meanwhile, here is the attached syslog file:

syslog

Link to comment

Okay so I have set up the syslog server + mirroring the logs to the flash.

Both of them including the crash: the syslog server one crashed at about 15:30, thats why you see it restarted the server there... No log entries really before that so I think this won't help too much.

 

I will also inlcude the logs from the flash, but they are roughly the same with, just additional logs before that. It also shows that after 14:55 log entries werent created... But the server only crashed at about 15:30...

I hope you guys can find anything in there.

 

syslog-192.168.0.152.log syslog

Link to comment

I have tested this with Ubuntu 19, but after I started a VM on it a couple of minutes it crashed again, and sadly Ubuntu systemlogs are not persistent after a crash (?????) so I have no log files again.

 

Since it didn't crash with Windows I think this is still software related. I'm firing up Windows Hyper-V VM right now to check whether it crashes or not..

 

Any input is appreciated.

Link to comment
  • 3 months later...

I am having the same issue with my x9dri-ln4f+ system as well (attached diagnostics when the system rebooted as well as the screenshot from the KVM view moments after starting the VM).

 

Here is the XML configuration for said VM:

 

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Windows 10</name>
  <uuid>65af8239-80de-8ddd-35e7-a0dbf8de19c5</uuid>
  <description>Gaming Platform</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='16'/>
    <vcpupin vcpu='1' cpuset='36'/>
    <vcpupin vcpu='2' cpuset='17'/>
    <vcpupin vcpu='3' cpuset='37'/>
    <vcpupin vcpu='4' cpuset='18'/>
    <vcpupin vcpu='5' cpuset='38'/>
    <vcpupin vcpu='6' cpuset='19'/>
    <vcpupin vcpu='7' cpuset='39'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/65af8239-80de-8ddd-35e7-a0dbf8de19c5_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source dev='/dev/disk/by-id/ata-WDC_WDS200T2B0A-00SM50_20053B800394'/>
      <target dev='hdc' bus='sata'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/ISOs/Win10_1909_English_x64.iso'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/ISOs/virtio-win-0.1.173-2.iso'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:95:eb:d8'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/domains/Gigabyte.GTX670.2048.120914.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x83' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

tower-diagnostics-20200304-0627.zip

 

Screen Shot 2020-03-04 at 6.17.11 AM.png

EDIT:

 

On reboot, the GPU was not accessible from the unraid UI, I removed the GPU and soundcard entry, and the VM started without issue.  Now to figure out what happened to my GPU, and why trying to pass it through unraid to have a kernel panic.

Edited by ogi
additional information about GPU
Link to comment
  • 2 years later...

Hi,

 

May I ask what GPU or cards you have installed? 

 

We are trying to troubleshoot this same error with a K80, Quadro 4000 and one LSI HBA Card.

 

The system fails when we start to add any drives to the system or any real use of Unraid. The system will just die. 

Edited by neural
spelling
Link to comment
1 hour ago, neural said:

Hi,

 

May I ask what GPU or cards you have installed? 

 

We are trying to troubleshoot this same error with a K80, Quadro 4000 and one LSI HBA Card.

 

The system fails when we start to add any drives to the system or any real use of Unraid. The system will just die. 

I only had this issue with my GTX 670.  I have had GPU pass through work with the Quadro RTX 4000 and a RTX 3080 FE.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.