Win 10 VM hangs when UPS goes to battery...


harshl

Recommended Posts

Having a strange issue with a new build.

 

I have a Threadripper 1950x on an ASRock Professional Gaming X399 MB running UNRAID 6.6.6. Passing through a Vega 56 (Video only, not the audio portion so resets work correctly), two of the onboard USB controllers and the onboard audio. VM works great for the most part.

 

Now a bit of background, I have a Brother all in one laser printer. Each time the printer comes out of standby, it puts enough load on the electrical circuit for my UPS to go to battery for about 2 seconds, then flip back to utility power.

 

The issue I am having is that each time this occurs and the UPS kicks to battery and back, my Win 10 VM locks up hard. I have to force stop it and reboot the entire server since the PCI devices didn't get released cleanly.

I don't see anything in the log except that the UPS did kick to battery and back:

Dec 19 15:57:04 TheCube kernel: mdcmd (40): spindown 1
Dec 19 18:50:33 TheCube apcupsd[22841]: Power failure.
Dec 19 18:50:35 TheCube apcupsd[22841]: Power is back. UPS running on mains.
Dec 19 18:53:00 TheCube apcupsd[22841]: Power failure.
Dec 19 18:53:02 TheCube apcupsd[22841]: Power is back. UPS running on mains.
Dec 19 20:27:49 TheCube emhttpd: req (5): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslog

Any thoughts on why this may happen or how I might resolve it, outside of trying to resolve the electrical sag? The UPS is a sinewave UPS that should be delivering clean power at all times and nothing in UNRAID seems to be affected outside of the VM, dockers and everything else continue running.

 

I am going to try disabling the UPS daemon temporarily to see if that makes any difference, but I suspect it won't. Even if it does, it isn't a long term fix, but may point to something I can submit a bug report on I suppose.

 

Diagnostics attached. Thanks for any ideas!

-Landon

diagnostics-20181219-2149.zip

Link to comment

I would suggest you tempoary plug Unraid server directly to mains. Then check UPS power flip will cause VM hang or not. This is to idenify problem on hardware or software side.

 

If AIO Brother cause power flip, you also need troubleshoot, otherwise the UPS battery will life shorter. Pls confirm UPS and Brother have earth, if still can't solve then you may need adjust UPS sensitive setting.

Edited by Benson
Link to comment
14 hours ago, Benson said:

I would suggest you tempoary plug Unraid server directly to mains. Then check UPS power flip will cause VM hang or not. This is to idenify problem on hardware or software side.

 

If AIO Brother cause power flip, you also need troubleshoot, otherwise the UPS battery will life shorter. Pls confirm UPS and Brother have earth, if still can't solve then you may need adjust UPS sensitive setting.

To be clear, the Brother printer is not plugged into the UPS, just into the same breaker/circuit in the house.

 

Tomorrow, I will test pulling the power on the unit and also letting the Brother printer cause the sag and see if there is any difference with and without the UPS daemon running.

Thanks!

Link to comment

Well, I was finally able to test again a bit yesterday. Running or not running the UPS daemon does not make a difference. Here are the logs for the VM after it happened and I force stopped it last time. The USB devices stay up, but it looks like it loses the video card for whatever reason.

 

Dec 25 19:36:48 TheCube kernel: br0: port 2(vnet0) entered disabled state
Dec 25 19:36:48 TheCube avahi-daemon[6418]: Interface vnet0.IPv6 no longer relevant for mDNS.
Dec 25 19:36:48 TheCube avahi-daemon[6418]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe65:b60d.
Dec 25 19:36:48 TheCube kernel: device vnet0 left promiscuous mode
Dec 25 19:36:48 TheCube kernel: br0: port 2(vnet0) entered disabled state
Dec 25 19:36:48 TheCube avahi-daemon[6418]: Withdrawing address record for fe80::fc54:ff:fe65:b60d on vnet0.
Dec 25 19:36:49 TheCube kernel: AMD-Vi: Completion-Wait loop timed out
Dec 25 19:36:49 TheCube kernel: AMD-Vi: Completion-Wait loop timed out
Dec 25 19:36:49 TheCube kernel: AMD-Vi: Completion-Wait loop timed out
Dec 25 19:36:50 TheCube kernel: iommu ivhd0: AMD-Vi: Event logged [
Dec 25 19:36:50 TheCube kernel: iommu ivhd0: IOTLB_INV_TIMEOUT device=0b:00.0 address=0x000000107ec5c7a0]
Dec 25 19:36:50 TheCube kernel: iommu ivhd0: AMD-Vi: Event logged [
Dec 25 19:36:50 TheCube kernel: iommu ivhd0: IOTLB_INV_TIMEOUT device=0b:00.0 address=0x000000107ec5c7c0]
Dec 25 19:36:50 TheCube kernel: iommu ivhd0: AMD-Vi: Event logged [
Dec 25 19:36:50 TheCube kernel: iommu ivhd0: IOTLB_INV_TIMEOUT device=0b:00.0 address=0x000000107ec5c7e0]
Dec 25 19:36:50 TheCube kernel: vfio-pci 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Dec 25 19:36:51 TheCube kernel: vfio-pci 0000:0b:00.0: Refused to change power state, currently in D3

Kind of looks to me like it has something to do with power management, but I really don't know.

 

Simple for me to reproduce if anyone wants to see anything else at the time this occurs. It happens 100% of the time.

 

Thanks for any ideas!

Link to comment
15 hours ago, harshl said:

Refused to change power state, currently in D3

That's the error message you should get if the reset bug happens. Can you please check if your VM power settings are set to performance and not to balanced. For me it looks like your VM goes into standby or sleep state. This could happen because it's idlling for to long or the UPS gives a signal to the controller software, both could be the reason. Make sure to disable any kind of software from the UPS inside the VM and disable sleep.

 

And why the hell are you connecting your laser printer to an UPS? To shut it down safely during a power outake? These devices pulling lots of power when heating up or during a print job. You don't really need to connect it to a UPS. The main reason for a UPS is to safely shutdown your server and prevent a hard reset. 

Link to comment
8 hours ago, bastl said:

Can you please check if your VM power settings are set to performance and not to balanced.

I followed SpaceInvaderOne's video for optimizing the VM. It is for sure set to Performance. I will check to see if any other Sleep settings are enabled that I can find.

 

8 hours ago, bastl said:

This could happen because it's idlling for to long or the UPS gives a signal to the controller software, both could be the reason.

I can have the VM sit idle for days and this won't occur, but the second someone prints, it happens. The USB for the UPS is not passed through to the guest, it is only seen by the host and is recognized and queried properly by UNRAID. Shouldn't be any way for the VM to communicate with it.

 

8 hours ago, bastl said:

And why the hell are you connecting your laser printer to an UPS?

I'm not, per my previous posts clarifying this, the printer is not plugged into the UPS, it is simply on the same electrical circuit (breaker) and causes enough sag on the circuit for the UPS to flip to battery for two seconds, then back to utility. It isn't even enough sag to shut anything down. XBOX, TV and other computers in the room all keep functioning just fine, but the UPS is sensitive enough that it kicks over briefly.

 

Appreciate the pointers @bastl, hopefully I will find something. In the meantime, I have ordered an extension cable so I can run the printer power to a nearby dedicated circuit so it won't affect this machine anymore, but I sure would like to understand why this happens in the first place, because it shouldn't.

 

Thanks again!

Edited by harshl
Added context.
Link to comment

@harshl  Sorry i misread something. I thought your printer is connected to the UPS. As mentioned earlier, your UPS might be set to sensitive to power fluctuations. I remember an kinda similar behaviour in a small office couple years ago. From time to time the server they had, randomly decided to shutdown by the UPS. It was in kinda old building with really old wiring. After months of searching for the cause, the issue was found. The neigbour sometimes used some welding equipment in the basement and caused the UPS to think the power isn't clean enough and triggered the server to shutdown.

 

Btw don't think that the settings you made in Windows are there until you change it by your own. With nearly every Windows update i have to reapply the MSI fix, not for all. Same for the power settings. I had it twice now, that the power plan was changed back to balanced from high performance. This happened on a test VM i randomly startup, install updates and shut it down again as well as on my main VM. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.