VM Stopped Working, Marking Clocksource "TSC" as unstable because skew is too large


Recommended Posts

Hello.

everytime I start 2 games on my virtual machines the complete system freeze and hangs with the following error, i must restart the entire system to "clear" the error.

The System runs perfectly if i only play on 1 VM.

 

Jul 18 16:39:00 Tower kernel: clocksource: timekeeping watchdog on CPU2: Marking clocksource 'tsc' as unstable because the skew is too large:
Jul 18 16:39:00 Tower kernel: clocksource: 'hpet' wd_now: f98f602d wd_last: f9338bb7 mask: ffffffff
Jul 18 16:39:00 Tower kernel: clocksource: 'tsc' cs_now: 58701fecccb cs_last: 586cfe2c152 mask: ffffffffffffffff
Jul 18 16:39:01 Tower kernel: clocksource: Switched to clocksource hpet
Jul 18 16:39:24 Tower kernel: hrtimer: interrupt took 420323087 ns

My Hardware:

M/B: Asrock 970 Extreme 4

CPU: AMD FX-8350 @ 4GHZ ( NO OVERCLOCK)

RAM: 32GB DDR3 @ 1333MHZ

Graphiccard Host: GT 520

Graphiccard vm1 (slot1): GTX 970

Graphiccard vm2 (slot2): GTX 770

4 Disks: 1 SSD ~120GB + ~3TB HDD for VM1, 1 SSD ~120GB + ~3TB HDD for VM2

 

I searched in the Forum about that error, and the only real "helpfull" thing was to buy a new Motherboard / CPU,

But how can I exclude the motherboard / CPU as error source?

 

 

sometimes this info appears in the log:

Jul 18 16:45:42 Tower ntpd[1768]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

 

tower-syslog-20170718-1702.zip

tower-diagnostics-20170718-1704.zip

Link to comment
  • 2 weeks later...
On 20.7.2017 at 8:28 AM, brando56894 said:

I had this happen multiple times in FreeNAS 10 while using bhyve as my hypervisor and running Linux in a KVM. We never did find a solution. I think it just means that your hardware isn't powerful enough. I have a SuperMicro x10SDV-F0 which has a Xeon-D 1540 (16 cores at 2 GHz).

Do you think my Cpu isnt powerfull enough or my motherboard?, the error reads like my cpu gets overclocked by something and freezes then.

Link to comment

8 cores at 4 GHz is pretty powerful and a lot more than what I currently have (16 cores at 2 GHz). Have you monitored your load averages and temperature periodically? It could be getting too hot, also what size is your PSU? It may not be drawing enough power under very high load.

 

Edit: After looking at your hardware again, this could most likely be your issue since you're running not one, but two very power hungry GPUs, along with 4 drives, and a powerful CPU. Running a single game on a "bare metal" OS like Windows requires a lot of power (depending on what game you're playing), add double virtualization to that AND running two games, you have a recipe for a very power hungry beast. Come to think of it, 8 cores at 4 GHz might not be enough for that heavy a load. What games crash the system? Is it something very intensive like The Witcher 3, GTA V, Watch Dogs 2, etc...? What resolution are you playing them at? All these things we need to know to help narrow it down. 

 

I think your system may simply just be overloaded, and your CPU can't keep up.

 

I'm waiting on my Asus X99-W to come in so I can swap that along with a water-cooled Xeon E5-1650 (6 cores at 3.6 GHz) in place of my current CPU and motherboard to see if the same thing still happens with much better hardware. 

Edited by brando56894
Link to comment
  • 1 month later...
On 1.8.2017 at 3:30 AM, brando56894 said:

8 cores at 4 GHz is pretty powerful and a lot more than what I currently have (16 cores at 2 GHz). Have you monitored your load averages and temperature periodically? It could be getting too hot, also what size is your PSU? It may not be drawing enough power under very high load.

 

Edit: After looking at your hardware again, this could most likely be your issue since you're running not one, but two very power hungry GPUs, along with 4 drives, and a powerful CPU. Running a single game on a "bare metal" OS like Windows requires a lot of power (depending on what game you're playing), add double virtualization to that AND running two games, you have a recipe for a very power hungry beast. Come to think of it, 8 cores at 4 GHz might not be enough for that heavy a load. What games crash the system? Is it something very intensive like The Witcher 3, GTA V, Watch Dogs 2, etc...? What resolution are you playing them at? All these things we need to know to help narrow it down. 

 

I think your system may simply just be overloaded, and your CPU can't keep up.

 

I'm waiting on my Asus X99-W to come in so I can swap that along with a water-cooled Xeon E5-1650 (6 cores at 3.6 GHz) in place of my current CPU and motherboard to see if the same thing still happens with much better hardware. 

hmm...i have a 850W PSU on that thing, i will test how much it will take under havy load. and i will look on the cpu temps but i think its not too hot under havy load

 

I tested it with games (on both Guests) with WIN10 and games like: Guild Wars 2, Arma 3, StarWars Battlefront 2.

 

anything new on your setup?

 

Link to comment
  • 3 months later...

I got the same message

but I am not playing games. only running  3 VMs not very loaded. and 2-3 dockers like Plex, CrashPlan.

I have 32 cores total on dual XEON 2.6GHz with 64GB memory  and I have 1600W Platinum PSU.

when I got the message the parity check was also running (following the tonight's crash)

 

so it doesn't look to me like heavy CPU use.

what I do have lately is my system hangs once a few days. I am trying to isolate the issue. the logs dont say much. they dont catch the situation realtime.

dont know too much how to debug this.

Link to comment

Hi,

my System needs under havy load around 600W , looks like this is also Not the Problem.

 

I thought this Message is only at AMD cpus.... i did not found a solution yet, i only found a Forum in which they got the same Problem at a linux System too and fixed it by editing the OS code to „delete“ this message.

If you know more pls Write it here.

Link to comment

Rather than replacing the motherboard/cpu, which seems like a rather drastic solution, is it possible to change the clock source in the go file itself?


It looks like linux should allow customization of what clock source the kernel is using. Having it start with hpet or jiffies instead of tsc might be an option?

 

see https://www.kernel.org/doc/html/v4.10/admin-guide/kernel-parameters.html

 

Is it possible to explicitly set clocksource=hpet for the people having this issue?

Link to comment
On 5.12.2017 at 5:24 PM, SnickySnacks said:

Rather than replacing the motherboard/cpu, which seems like a rather drastic solution, is it possible to change the clock source in the go file itself?


It looks like linux should allow customization of what clock source the kernel is using. Having it start with hpet or jiffies instead of tsc might be an option?

 

see https://www.kernel.org/doc/html/v4.10/admin-guide/kernel-parameters.html

 

Is it possible to explicitly set clocksource=hpet for the people having this issue?

how can i test this?

must i edit the complete unraid-server or only each vm? my startup code for a VM looks like this:

<domain type='kvm'>
  <name>Windows 10-770</name>
  <uuid>e39c2f3c-1bba-0d69-1a0f-1dbbcba80125</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.9'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/e39c2f3c-1bba-0d69-1a0f-1dbbcba80125_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Windows 10-770/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disk4/Windows 10-770/vdisk2.img'/>
      <target dev='hdd' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/de_windows_10_pro_10240_x64_dvd.iso'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.126-2.iso'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:eb:f3:da'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x00' slot='0x14' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046a'/>
        <product id='0x010d'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x18f8'/>
        <product id='0x0f97'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

Link to comment

I haven't tested this or anything, as I don't have this issue.

But, given that the error seems to be generated from the tower kernel, it's likely you'd need to set it for unraid, not for each vm individually.

 

What you'd need to do to test, now that I am looking at it, is on your usb edit the file

 

/syslinux/syslinux.cfg

and either create a new entry or add to the append line of an existing one

clocksource=hpet

 

So like

label unRAID OS
  menu default
  kernel /bzimage
  append initrd=/bzroot

 

would become

label unRAID OS
  menu default
  kernel /bzimage
  append initrd=/bzroot clocksource=hpet

 

it might work, do nothing, or not boot, but it's simple enough to undo either way.


It's worth a try, at least.

 

Link to comment

soooo.... it changed the code and looks like its working! 

 

tested it for around 1 hour under full load, so only info that appears is this (but its only a info):

 

Dec 7 09:39:15 Tower ntpd[1762]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

now i can try to get more performance out of my virtual machines. thanks!

Edited by elbro_dark
Link to comment
  • 3 years later...
On 12/7/2017 at 7:09 PM, elbro_dark said:

looks like its working

 

May I ask if this still is working?

 

This error hits me in one of two identical VMs. Here the switch goes from clocksource tsc to clocksource acpi_pm. A running parity check reports 4 days instead of 15 hours now.

 

According to the old bug report, there are additional kernel parameters that might help:

 

tsc=reliable

 

https://bugzilla.kernel.org/show_bug.cgi?id=203183

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.