elbro_dark Posted July 18, 2017 Share Posted July 18, 2017 Hello. everytime I start 2 games on my virtual machines the complete system freeze and hangs with the following error, i must restart the entire system to "clear" the error. The System runs perfectly if i only play on 1 VM. Jul 18 16:39:00 Tower kernel: clocksource: timekeeping watchdog on CPU2: Marking clocksource 'tsc' as unstable because the skew is too large: Jul 18 16:39:00 Tower kernel: clocksource: 'hpet' wd_now: f98f602d wd_last: f9338bb7 mask: ffffffff Jul 18 16:39:00 Tower kernel: clocksource: 'tsc' cs_now: 58701fecccb cs_last: 586cfe2c152 mask: ffffffffffffffff Jul 18 16:39:01 Tower kernel: clocksource: Switched to clocksource hpet Jul 18 16:39:24 Tower kernel: hrtimer: interrupt took 420323087 ns My Hardware: M/B: Asrock 970 Extreme 4 CPU: AMD FX-8350 @ 4GHZ ( NO OVERCLOCK) RAM: 32GB DDR3 @ 1333MHZ Graphiccard Host: GT 520 Graphiccard vm1 (slot1): GTX 970 Graphiccard vm2 (slot2): GTX 770 4 Disks: 1 SSD ~120GB + ~3TB HDD for VM1, 1 SSD ~120GB + ~3TB HDD for VM2 I searched in the Forum about that error, and the only real "helpfull" thing was to buy a new Motherboard / CPU, But how can I exclude the motherboard / CPU as error source? sometimes this info appears in the log: Jul 18 16:45:42 Tower ntpd[1768]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized tower-syslog-20170718-1702.zip tower-diagnostics-20170718-1704.zip Quote Link to comment
brando56894 Posted July 20, 2017 Share Posted July 20, 2017 I had this happen multiple times in FreeNAS 10 while using bhyve as my hypervisor and running Linux in a KVM. We never did find a solution. I think it just means that your hardware isn't powerful enough. I have a SuperMicro x10SDV-F0 which has a Xeon-D 1540 (16 cores at 2 GHz). Quote Link to comment
elbro_dark Posted July 31, 2017 Author Share Posted July 31, 2017 On 20.7.2017 at 8:28 AM, brando56894 said: I had this happen multiple times in FreeNAS 10 while using bhyve as my hypervisor and running Linux in a KVM. We never did find a solution. I think it just means that your hardware isn't powerful enough. I have a SuperMicro x10SDV-F0 which has a Xeon-D 1540 (16 cores at 2 GHz). Do you think my Cpu isnt powerfull enough or my motherboard?, the error reads like my cpu gets overclocked by something and freezes then. Quote Link to comment
brando56894 Posted August 1, 2017 Share Posted August 1, 2017 (edited) 8 cores at 4 GHz is pretty powerful and a lot more than what I currently have (16 cores at 2 GHz). Have you monitored your load averages and temperature periodically? It could be getting too hot, also what size is your PSU? It may not be drawing enough power under very high load. Edit: After looking at your hardware again, this could most likely be your issue since you're running not one, but two very power hungry GPUs, along with 4 drives, and a powerful CPU. Running a single game on a "bare metal" OS like Windows requires a lot of power (depending on what game you're playing), add double virtualization to that AND running two games, you have a recipe for a very power hungry beast. Come to think of it, 8 cores at 4 GHz might not be enough for that heavy a load. What games crash the system? Is it something very intensive like The Witcher 3, GTA V, Watch Dogs 2, etc...? What resolution are you playing them at? All these things we need to know to help narrow it down. I think your system may simply just be overloaded, and your CPU can't keep up. I'm waiting on my Asus X99-W to come in so I can swap that along with a water-cooled Xeon E5-1650 (6 cores at 3.6 GHz) in place of my current CPU and motherboard to see if the same thing still happens with much better hardware. Edited August 1, 2017 by brando56894 Quote Link to comment
elbro_dark Posted September 4, 2017 Author Share Posted September 4, 2017 On 1.8.2017 at 3:30 AM, brando56894 said: 8 cores at 4 GHz is pretty powerful and a lot more than what I currently have (16 cores at 2 GHz). Have you monitored your load averages and temperature periodically? It could be getting too hot, also what size is your PSU? It may not be drawing enough power under very high load. Edit: After looking at your hardware again, this could most likely be your issue since you're running not one, but two very power hungry GPUs, along with 4 drives, and a powerful CPU. Running a single game on a "bare metal" OS like Windows requires a lot of power (depending on what game you're playing), add double virtualization to that AND running two games, you have a recipe for a very power hungry beast. Come to think of it, 8 cores at 4 GHz might not be enough for that heavy a load. What games crash the system? Is it something very intensive like The Witcher 3, GTA V, Watch Dogs 2, etc...? What resolution are you playing them at? All these things we need to know to help narrow it down. I think your system may simply just be overloaded, and your CPU can't keep up. I'm waiting on my Asus X99-W to come in so I can swap that along with a water-cooled Xeon E5-1650 (6 cores at 3.6 GHz) in place of my current CPU and motherboard to see if the same thing still happens with much better hardware. hmm...i have a 850W PSU on that thing, i will test how much it will take under havy load. and i will look on the cpu temps but i think its not too hot under havy load I tested it with games (on both Guests) with WIN10 and games like: Guild Wars 2, Arma 3, StarWars Battlefront 2. anything new on your setup? Quote Link to comment
dadarara Posted December 5, 2017 Share Posted December 5, 2017 I got the same message but I am not playing games. only running 3 VMs not very loaded. and 2-3 dockers like Plex, CrashPlan. I have 32 cores total on dual XEON 2.6GHz with 64GB memory and I have 1600W Platinum PSU. when I got the message the parity check was also running (following the tonight's crash) so it doesn't look to me like heavy CPU use. what I do have lately is my system hangs once a few days. I am trying to isolate the issue. the logs dont say much. they dont catch the situation realtime. dont know too much how to debug this. Quote Link to comment
elbro_dark Posted December 5, 2017 Author Share Posted December 5, 2017 Hi, my System needs under havy load around 600W , looks like this is also Not the Problem. I thought this Message is only at AMD cpus.... i did not found a solution yet, i only found a Forum in which they got the same Problem at a linux System too and fixed it by editing the OS code to „delete“ this message. If you know more pls Write it here. Quote Link to comment
SnickySnacks Posted December 5, 2017 Share Posted December 5, 2017 Rather than replacing the motherboard/cpu, which seems like a rather drastic solution, is it possible to change the clock source in the go file itself? It looks like linux should allow customization of what clock source the kernel is using. Having it start with hpet or jiffies instead of tsc might be an option? see https://www.kernel.org/doc/html/v4.10/admin-guide/kernel-parameters.html Is it possible to explicitly set clocksource=hpet for the people having this issue? Quote Link to comment
elbro_dark Posted December 6, 2017 Author Share Posted December 6, 2017 On 5.12.2017 at 5:24 PM, SnickySnacks said: Rather than replacing the motherboard/cpu, which seems like a rather drastic solution, is it possible to change the clock source in the go file itself? It looks like linux should allow customization of what clock source the kernel is using. Having it start with hpet or jiffies instead of tsc might be an option? see https://www.kernel.org/doc/html/v4.10/admin-guide/kernel-parameters.html Is it possible to explicitly set clocksource=hpet for the people having this issue? how can i test this? must i edit the complete unraid-server or only each vm? my startup code for a VM looks like this: <domain type='kvm'> <name>Windows 10-770</name> <uuid>e39c2f3c-1bba-0d69-1a0f-1dbbcba80125</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>4</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='5'/> <vcpupin vcpu='2' cpuset='6'/> <vcpupin vcpu='3' cpuset='7'/> </cputune> <os> <type arch='x86_64' machine='pc-i440fx-2.9'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/e39c2f3c-1bba-0d69-1a0f-1dbbcba80125_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='4' threads='1'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows 10-770/vdisk1.img'/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/disk4/Windows 10-770/vdisk2.img'/> <target dev='hdd' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/de_windows_10_pro_10240_x64_dvd.iso'/> <target dev='hda' bus='ide'/> <readonly/> <boot order='2'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/virtio-win-0.1.126-2.iso'/> <target dev='hdb' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:eb:f3:da'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x00' slot='0x14' function='0x2'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046a'/> <product id='0x010d'/> </source> <address type='usb' bus='0' port='1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x18f8'/> <product id='0x0f97'/> </source> <address type='usb' bus='0' port='2'/> </hostdev> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </memballoon> </devices> </domain> Quote Link to comment
SnickySnacks Posted December 7, 2017 Share Posted December 7, 2017 I haven't tested this or anything, as I don't have this issue. But, given that the error seems to be generated from the tower kernel, it's likely you'd need to set it for unraid, not for each vm individually. What you'd need to do to test, now that I am looking at it, is on your usb edit the file /syslinux/syslinux.cfg and either create a new entry or add to the append line of an existing one clocksource=hpet So like label unRAID OS menu default kernel /bzimage append initrd=/bzroot would become label unRAID OS menu default kernel /bzimage append initrd=/bzroot clocksource=hpet it might work, do nothing, or not boot, but it's simple enough to undo either way. It's worth a try, at least. Quote Link to comment
elbro_dark Posted December 7, 2017 Author Share Posted December 7, 2017 We will see, i Test it today. Quote Link to comment
elbro_dark Posted December 7, 2017 Author Share Posted December 7, 2017 (edited) soooo.... it changed the code and looks like its working! tested it for around 1 hour under full load, so only info that appears is this (but its only a info): Dec 7 09:39:15 Tower ntpd[1762]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized now i can try to get more performance out of my virtual machines. thanks! Edited December 7, 2017 by elbro_dark Quote Link to comment
hawihoney Posted April 15, 2021 Share Posted April 15, 2021 On 12/7/2017 at 7:09 PM, elbro_dark said: looks like its working May I ask if this still is working? This error hits me in one of two identical VMs. Here the switch goes from clocksource tsc to clocksource acpi_pm. A running parity check reports 4 days instead of 15 hours now. According to the old bug report, there are additional kernel parameters that might help: tsc=reliable https://bugzilla.kernel.org/show_bug.cgi?id=203183 Quote Link to comment
elbro_dark Posted April 15, 2021 Author Share Posted April 15, 2021 Sorry, I changed my Hardware to a Dual Xeon X5680, cause the FX8350 had not enough cores Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.