Jump to content

VM's just started crashing - Logs attached


Recommended Posts

All of my VM's just started crashing today. AFAIK nothing has changed with the VM or Unraid. The rest of Unraid and multiple Docker containers are unaffected. I tried a VM I haven't started for months (so it's on a much older version of Windows 10) and it crashed the same way. One of them is using a RAW image, the other Qcow2. Hardware is i7 13700k with 64GB RAM.

 

I have tried searching the forums and Google, and rebooted the Unraid server. I tried updating VirtIO drivers, expanding the Vdisk to make sure it's not out of space, and disabled Windows Defender. I also changed which CPU's were assigned so that only the P-cores were assigned to the VM, but it didn't make a difference. 

 

It seems to be OK as long as I just boot it and leave it idle. As soon as I run something that uses the CPU, it crashes. In particular I am running an old Java 8 based application that I use for my business. I doubt the particular app or Java is the root cause because this app is ancient and hasn't changed since 2017. I've been running this app on VM's on Unraid for years and it has run flawlessly until today, and on this server for about 6 months. 

 

I just created a new VM from scratch using a freshly downloaded Windows 10 ISO. I didn't install any Windows updates and only install the aforementioned Java app. It crashed the same way as soon as I tried to run the Java program. 

 

I'm not trying to pass through GPU or sound. All of my VM's are vanilla Windows 10 installs. 

 

The last thing in the VM log when it crashed is the following, then a register dump. I don't think the spice-server bug is the issue because I've seen it on VM's that didn't crash. Syslog, VM log, and VM XML are attached. 

 

I tried running the same Java app on a Linux VM and it froze but didn't crash to the point of doing a register dump. I could still connect with VNC, but everything was frozen and there was no CPU usage according to Unraid. 

 

char device redirected to /dev/pts/1 (label charserial0)
qxl_send_events: spice-server bug: guest stopped, ignoring
error: kvm run failed Invalid argument

 

 

voyager-syslog-20230407-2319.zip UnraidVMlog.txt UnraidVM.xml

Edited by WalkerJ
Link to comment

You have a lot of split lock detections, you can try to set to off split_lock_detect, so that "split-lock operations are not detected and nothing is done when they occur."

Just modify the append line of the syslinux configuration like this:

append initrd=/bzroot split_lock_detect=off

 

Not sure if this will make a difference.

Link to comment
7 hours ago, ghost82 said:

You have a lot of split lock detections, you can try to set to off split_lock_detect, so that "split-lock operations are not detected and nothing is done when they occur."

Just modify the append line of the syslinux configuration like this:

append initrd=/bzroot split_lock_detect=off

 

Not sure if this will make a difference.

 

I tried this and it got rid of the split log messages in the log, but the VM's still crashed. 

 

Link to comment

If I were you I would understand first if it crashes because of the host or because of the guest.

In the past I found useful to analyze the windows dumps after crashes with an utility called whocrashed just to find that I had an issue with a network driver.

Maybe the dumps will reveal something useful...

Link to comment
10 hours ago, ghost82 said:

If I were you I would understand first if it crashes because of the host or because of the guest.

In the past I found useful to analyze the windows dumps after crashes with an utility called whocrashed just to find that I had an issue with a network driver.

Maybe the dumps will reveal something useful...

 

It's crashing with both Windows and Linux guests (Ubuntu & Clear Linux). Some are old, others I just created from scratch. I don't think it's the guest, but I'm not sure where to go next.

 

I tried undervolting the CPU and setting a CPU power limit, but it didn't change anything. I also ran Memtest and there were no errors. 

 

I also tried limiting a VM to only 2 cores, and enabled hugepages. 

 

Update... I tried restoring a libvirt backup and moving the VM IMG to a different disk (from cache to array) - same result. 

 

I also tried running Prime95 instead of the app I usually use in these VM's. It crashed just the same with Prime95, so it's not something specifically to the apps I'm using in the VM.

 

Update 2.... I updated Unraid to 6.12.0-rc2 and get the same result. 

 

Edited by WalkerJ
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...