VM crashing continuously...


Recommended Posts

Hey guys, odd issue with VM's. I have been running two unraid servers for around 2 years now without issue. One is solely storage for my client data (TITAN), the other is personal media server (ASGARD). ASGARD is running an i7, and has been utilizing a Win 10 VM for 2 years now without issue, I use it strictly for compressing files. Both have compatible hardware for VM's. Until about a month ago, it started randomly shutting off without reason.

Sometimes it will come back on, sometimes it will just come up as failed when I try to start. When it does come back on, it will last anywhere from 5 minutes to 5 hours before it shuts down again, complete crash. 

On a whim thinking maybe the SSD was going, I pulled the VM SSD (set up using unassigned devices) from ASGARD and moved it over to TITAN, pointed a VM to it, and booted it up. Runs perfectly. Ran it for a week, compressed files, no issue what soever. So... I'm at a loss.

What the hell would cause the VM to do this on one machine but not the other? I am also unsure what kind of info could maybe assist here as I've only messaged on forums a handful of times, so if you need me to upload some logs from something or screen shots let me know. Any help would be greatly appreciated, as TITAN is supposed to be for client data only and I don't want to use it as my compression machine and mix files. 

Edited by Malachi89
Link to comment

Usually this is something filling up, like the SSD or something else...  I had something like this when I had a repeated motherboard error filling up my Syslog file, which once it was full, the VM's would just lockup with no notification...

 

Most likely something similar is happening here...

 

If that doesn't help, we need diagnostics files, and the VM XML file...

Link to comment

Your system is having several issues...  Libvirt keeps loosing access to files:

2019-02-17 05:45:16.964+0000: 9822: info : libvirt version: 4.7.0
2019-02-17 05:45:16.964+0000: 9822: info : hostname: Asgard
2019-02-17 05:45:16.964+0000: 9822: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.964+0000: 9823: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.972+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.973+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.974+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.975+0000: 9823: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:19.244+0000: 9824: error : virStorageFileReportBrokenChain:4776 : Cannot access storage file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.100+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.108+0000: 9825: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.110+0000: 9825: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.111+0000: 9825: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.112+0000: 9822: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory

and your system keeps overheating:

.......
.......
Feb 17 16:15:08 Asgard kernel: CPU6: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU7: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU5: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU3: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU1: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU2: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU1: Core temperature above threshold, cpu clock throttled (total events = 2144810)
Feb 17 16:15:08 Asgard kernel: CPU5: Core temperature above threshold, cpu clock throttled (total events = 2144810)
Feb 17 16:15:08 Asgard kernel: CPU1: Core temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU5: Core temperature/speed normal
Feb 17 16:20:28 Asgard nginx: 2019/02/17 16:20:28 [error] 8876#8876: *203316 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:20:49 Asgard nginx: 2019/02/17 16:20:49 [error] 8876#8876: *203381 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:21:10 Asgard nginx: 2019/02/17 16:21:10 [error] 8876#8876: *203455 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:21:51 Asgard nginx: 2019/02/17 16:21:51 [error] 8876#8876: *203589 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:22:33 Asgard nginx: 2019/02/17 16:22:33 [error] 8876#8876: *203724 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:23:57 Asgard nginx: 2019/02/17 16:23:57 [error] 8876#8876: *203990 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"

Notice where it shows

Quote

.....cpu clock throttled (total events = 2144810)

Most likely your log files are filling up with CPU issues in the logs, and this eventually kills LibVirt...  Either way, check your hardware... Once you get the logs to stop screaming at you, you will probably fix the issues with LibVirt and your VM's....

 

 

 

 

Also, to GHunter, I am not able to find the XML files in the diagnostics...  Does it matter if they have the "Anonymous" version of the diagnostics? or am I just going blind?

On 2/17/2019 at 1:38 PM, GHunter said:

 

The VM XML is included in the diagnostics file so no need to post it separately

 

Link to comment
5 minutes ago, Malachi89 said:

Interesting! Guess I should learn to read these diagnostics haha. Been running fine for years so not sure why it’s overheating now, but I’ll toss a bigger CPU cooler in there and see if that helps and go from there!

 

It could also just be your thermal grease drying out, or having bumped the cooler dislodging the thermal connection... There are several options...  If it is thermal grease issues, you might want to look into this, it is a new product, and not really mentioned everywhere yet...:

https://www.amazon.com/gp/product/B07CKVW18G/ref=ox_sc_act_title_3?smid=A23NVCSO4PYH3S&psc=1

 

Edit: it could be something like a new video card overheating the case as well, again there are many options for what is causing this...  I would not jump straight to buying something to fix it...

Edited by Warrentheo
Link to comment

In all seriousness it definitely needs an aftermarket cooler LOL it’s an i7 4.0 GHz And it gets worked hard with just a stock cooler. I did go out and buy a cooler master 212 evo Which was on sale for about $25 at my local computer store. I will run that for a couple days with the VM and see how it goes. Also double checked everything else  and found out one of my front fans had stopped working so I replaced that with a spare I had at home. Fingers crossed  this does the trick 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.