Jump to content
Malachi89

VM crashing continuously...

9 posts in this topic Last Reply

Recommended Posts

Hey guys, odd issue with VM's. I have been running two unraid servers for around 2 years now without issue. One is solely storage for my client data (TITAN), the other is personal media server (ASGARD). ASGARD is running an i7, and has been utilizing a Win 10 VM for 2 years now without issue, I use it strictly for compressing files. Both have compatible hardware for VM's. Until about a month ago, it started randomly shutting off without reason.

Sometimes it will come back on, sometimes it will just come up as failed when I try to start. When it does come back on, it will last anywhere from 5 minutes to 5 hours before it shuts down again, complete crash. 

On a whim thinking maybe the SSD was going, I pulled the VM SSD (set up using unassigned devices) from ASGARD and moved it over to TITAN, pointed a VM to it, and booted it up. Runs perfectly. Ran it for a week, compressed files, no issue what soever. So... I'm at a loss.

What the hell would cause the VM to do this on one machine but not the other? I am also unsure what kind of info could maybe assist here as I've only messaged on forums a handful of times, so if you need me to upload some logs from something or screen shots let me know. Any help would be greatly appreciated, as TITAN is supposed to be for client data only and I don't want to use it as my compression machine and mix files. 

Edited by Malachi89

Share this post


Link to post

Usually this is something filling up, like the SSD or something else...  I had something like this when I had a repeated motherboard error filling up my Syslog file, which once it was full, the VM's would just lockup with no notification...

 

Most likely something similar is happening here...

 

If that doesn't help, we need diagnostics files, and the VM XML file...

Share this post


Link to post

Yeah I had read that can happen and deliberately made the VM image slightly smaller to help prevent that. 

 

I have the diag file but how do I get the VM XML?

Edited by Malachi89

Share this post


Link to post
On 2/12/2019 at 5:29 PM, Malachi89 said:

Yeah I had read that can happen and deliberately made the VM image slightly smaller to help prevent that. 

 

I have the diag file but how do I get the VM XML?

 

The VM XML is included in the diagnostics file so no need to post it separately

Share this post


Link to post

Your system is having several issues...  Libvirt keeps loosing access to files:

2019-02-17 05:45:16.964+0000: 9822: info : libvirt version: 4.7.0
2019-02-17 05:45:16.964+0000: 9822: info : hostname: Asgard
2019-02-17 05:45:16.964+0000: 9822: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.964+0000: 9823: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.972+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.973+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.974+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:16.975+0000: 9823: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:19.244+0000: 9824: error : virStorageFileReportBrokenChain:4776 : Cannot access storage file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.100+0000: 9821: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.108+0000: 9825: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.110+0000: 9825: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.111+0000: 9825: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory
2019-02-17 05:45:24.112+0000: 9822: error : qemuOpenFileAs:3143 : Failed to open file '/mnt/disks/KINGSTON_SV300S37A120G_50026B774609E873/Windows 10/vdisk1.img': No such file or directory

and your system keeps overheating:

.......
.......
Feb 17 16:15:08 Asgard kernel: CPU6: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU7: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU5: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU3: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU1: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU2: Package temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU1: Core temperature above threshold, cpu clock throttled (total events = 2144810)
Feb 17 16:15:08 Asgard kernel: CPU5: Core temperature above threshold, cpu clock throttled (total events = 2144810)
Feb 17 16:15:08 Asgard kernel: CPU1: Core temperature/speed normal
Feb 17 16:15:08 Asgard kernel: CPU5: Core temperature/speed normal
Feb 17 16:20:28 Asgard nginx: 2019/02/17 16:20:28 [error] 8876#8876: *203316 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:20:49 Asgard nginx: 2019/02/17 16:20:49 [error] 8876#8876: *203381 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:21:10 Asgard nginx: 2019/02/17 16:21:10 [error] 8876#8876: *203455 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:21:51 Asgard nginx: 2019/02/17 16:21:51 [error] 8876#8876: *203589 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:22:33 Asgard nginx: 2019/02/17 16:22:33 [error] 8876#8876: *203724 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /webGui/include/DeviceList.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"
Feb 17 16:23:57 Asgard nginx: 2019/02/17 16:23:57 [error] 8876#8876: *203990 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.2, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.106", referrer: "http://192.168.1.106/Main"

Notice where it shows

Quote

.....cpu clock throttled (total events = 2144810)

Most likely your log files are filling up with CPU issues in the logs, and this eventually kills LibVirt...  Either way, check your hardware... Once you get the logs to stop screaming at you, you will probably fix the issues with LibVirt and your VM's....

 

 

 

 

Also, to GHunter, I am not able to find the XML files in the diagnostics...  Does it matter if they have the "Anonymous" version of the diagnostics? or am I just going blind?

On 2/17/2019 at 1:38 PM, GHunter said:

 

The VM XML is included in the diagnostics file so no need to post it separately

 

Share this post


Link to post

Interesting! Guess I should learn to read these diagnostics haha. Been running fine for years so not sure why it’s overheating now, but I’ll toss a bigger CPU cooler in there and see if that helps and go from there!

 

Edited by Malachi89

Share this post


Link to post
5 minutes ago, Malachi89 said:

Interesting! Guess I should learn to read these diagnostics haha. Been running fine for years so not sure why it’s overheating now, but I’ll toss a bigger CPU cooler in there and see if that helps and go from there!

 

It could also just be your thermal grease drying out, or having bumped the cooler dislodging the thermal connection... There are several options...  If it is thermal grease issues, you might want to look into this, it is a new product, and not really mentioned everywhere yet...:

https://www.amazon.com/gp/product/B07CKVW18G/ref=ox_sc_act_title_3?smid=A23NVCSO4PYH3S&psc=1

 

Edit: it could be something like a new video card overheating the case as well, again there are many options for what is causing this...  I would not jump straight to buying something to fix it...

Edited by Warrentheo

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now