[6.3.3] rebooting win10 VM causes frozen host


5 posts in this topic Last Reply

Recommended Posts

Description:
rebooting a windows 10 VM causes the unraid host machine to 100% non accessible. Web GUI, SSH, FTP, SMB shares, console, you name it, all timed out.

Settings that do NOT affect reproduction:

running unraid in safe-mode and/or GUI.

 

How to reproduce:

create a windows 10 vm from template (see attached) using same Primary vDisk.

Note: you will have an activation error now.

disable windows sleep settings in the VM to prevent the VM from pausing.

Leave the computer on for at least 24 hours at the windows 10 desktop (logged in)

with the mouse press start -> power -> restart. (Shut down will not cause the time out)

1 out of 2 times the VM will cause the host to time out.

when this happens you are left with no choice but to press and hold the power button.

upon reboot of unraid host ssh into the server and cd into /etc/libvirt/qemu/nvram/

ls -l

validate the VM that caused the reboot crash file is now 0 bytes

{UUID}_VARS-pure-efi.fd will be zero bytes. whereas {UUID} is the UUID of that win10 vm.

Try and start the VM and you will reach an execution error saying something about QEMU pflash cannot be zero bytes (pc system firmware cannot have zero size)

 

Temp Fix:

Create a new VM with same specs giving the same virtual hard drive and ISO.

the problem temp fix:

It costs a new windows 10 license every time unraid generates you a new UUID. It's not as simple as taking a working UUID and renaming the file to the old UUID that corrupted to zero bytes. That causes a BSOD BAD_SYSTEM_CONFIG_INFO.

<uuid>UUID-HERE</uuid>

Other information:

I am forced to press and hold the power button of the physical server due to no response at all. With that being said, when the unraid OS comes back up, it's unable to start the VM that caused the time out due to /etc/libvirt/qemu/nvram/(UNIQUE VM ID)_VARS-pure-efi.fd being zero bytes big. Which outputs an execution error of something about pflash cannot be zero bytes.

PLEASE NOTE:

I am very frustrated with unraid at this point! I've lost 3 licenses of windows 10 pro due to this issue. Digital entitlement license does not work for win10 because you can only transfer once.

 

After rebooting the VM you will see the guest boot windows then freeze 1-10 seconds in.

Time out occurs about ~30 seconds after system freezes.

This will only occur when rebooting a windows 10 VM. I've ran my vm for 14 days just fine until reboot.

 

 

Any help would be greatly appreciated,

 

Thank you,

Kevin

 

win10 template.PNG

tower-diagnostics-20170516-0633.zip

Link to post

One thing to note right off the bat is the sheer number of call traces showing up in your logs due to btrfs errors.  Until those are resolved, I don't think we should even waste time worrying about the VM because the two may be related.  Prior to creating the VM, did this system have any stability issues?  How about hard shut downs?  Is the data currently on the cache pool of vital importance?

 

If it were my system, I'd want to start clean with a new cache pool.  This means wiping the filesystem off each of the devices participating in the pool.  To do this, you will need to stop the array and login to the server via ssh.  From there, identify your cache disks by their sdX identifier (you can see this on the Main tab).  For each disk, type the following command:


WARNING:  THIS WILL COMPLETELY WIPE THE FILESYSTEM OFF THE DEVICE AND DATA RECOVER-ABILITY WILL BE NEAR IMPOSSIBLE ONCE DONE, SO MAKE SURE YOU GET THE DEVICE LETTER CORRECT!!

wipefs -a /dev/sdX

Replace X with the letter for your device.

 

Once this is done, refresh the webGui on the Main tab and the cache devices should show up with a blue indicator instead of green.  If so, you can start the array and format the cache pool once again.

 

From there, try creating another VM and see if the issues persist.  This could be hardware related or a kernel bug, but without more info and further testing/troubleshooting, we won't know.

Link to post
  • 2 weeks later...
On 5/17/2017 at 2:00 PM, jonp said:

One thing to note right off the bat is the sheer number of call traces showing up in your logs due to btrfs errors.  Until those are resolved, I don't think we should even waste time worrying about the VM because the two may be related.  Prior to creating the VM, did this system have any stability issues?  How about hard shut downs?  Is the data currently on the cache pool of vital importance?

 

If it were my system, I'd want to start clean with a new cache pool.  This means wiping the filesystem off each of the devices participating in the pool.  To do this, you will need to stop the array and login to the server via ssh.  From there, identify your cache disks by their sdX identifier (you can see this on the Main tab).  For each disk, type the following command:


WARNING:  THIS WILL COMPLETELY WIPE THE FILESYSTEM OFF THE DEVICE AND DATA RECOVER-ABILITY WILL BE NEAR IMPOSSIBLE ONCE DONE, SO MAKE SURE YOU GET THE DEVICE LETTER CORRECT!!


wipefs -a /dev/sdX

Replace X with the letter for your device.

 

Once this is done, refresh the webGui on the Main tab and the cache devices should show up with a blue indicator instead of green.  If so, you can start the array and format the cache pool once again.

 

From there, try creating another VM and see if the issues persist.  This could be hardware related or a kernel bug, but without more info and further testing/troubleshooting, we won't know.

 

 

Hello jonp,

Thank you for the response. I have a couple of questions before I perform this wipefs command.

 

Is there a way to perform a UUID backup of a VM?

  •     Since win10 activation seems to be linked to the UUID of the VM. I have backed up my vDisk.img of win10 w/ current active license.

If so, which other files should I backup before wiping my cache pool clean? 

Is it possible to create a new VM w/ a previous UUID from before a cache pool wipe?

 

Thank you,

Kevin

Link to post

You should save the XML which includes the UUID.    You can do this by using the Edit XML option and then doing a copy/paste of the contents into the file you save.  When you later create a new VM then you can copy/paste the UUID into the XML for the new VM, or (probably easier) Create the new VM from the saved XML and then make any changes you want.

Link to post
13 hours ago, itimpi said:

You should save the XML which includes the UUID.    You can do this by using the Edit XML option and then doing a copy/paste of the contents into the file you save.  When you later create a new VM then you can copy/paste the UUID into the XML for the new VM, or (probably easier) Create the new VM from the saved XML and then make any changes you want.

THANK YOU :D

I solved this issue with exactly like you said C & P the XML and create a custom VM with that custom XML after doing a wipefs on all cache drives.

 

Thanks again lime-tech community! :)

 

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.