1812 Posted November 29, 2018 Share Posted November 29, 2018 (edited) original topic was called "QEMU eating vdisks making them non-bootable? Or just bad luck? but I think I've found the culprit or a culprit... or i dont know... below in the edit. I've had this happen on 2 different servers over the past 3-4 days. A couple days ago, I was using a vm on one server, and the vm just stopped working (like a hard power off) but server still was on. Then neither vm's on that machine would boot, even after server reboot. This morning, I woke up to find my other main server pegged at 100% cpu usage. I was able to get into the terminal and run pftop which showed qemu was the culprit. -------- UPDATE so after recreating the firewall vm, I told app data backup to run. it did, shutting down dockers and doing it thing, but returned errors, and then never restarted the dockers, then the array went to undefined and showed: So i had to do a hard reset of the entire server. When the vm of my firewall tried to boot it was stuck at shell. Even recreating the xml didn't help, I had to rebuild the entire vm. I've got to rebuild the other 2 vm's this morning now on the first server... just curious is anyone else has had anything like this pop up on 6.6.5? ----- after rebuilding the firewall vm, i then ran app data backup. it stopped all the dockers, did some things, then reported an error. the array then went to an undefined state: all discs now show under unassigned... and the dashboard shows no icons But the firewall is still running.... i attempted to get the diagnostics.zip but clicking download only reloads the page. however I was able to copy the entire syslog from the webgui: ----------> syslog.rtf I also noticed that unassigned devices does not mount smb folders anymore, and it shows an error the log but manually clicking mount will connect. Same issue of not auto mounting disks in unassigned devices. rebooting the server clears the issue, but I'm not sure for how long. I've disabled app data backup for the moment. any thoughts anyone???? Edited November 29, 2018 by 1812 Quote Link to comment
John_M Posted December 1, 2018 Share Posted December 1, 2018 How is it if you boot into Safe Mode? Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 On 11/29/2018 at 9:04 AM, 1812 said: but returned errors, diagnostics after this happens again would be a plus Quote Link to comment
1812 Posted December 1, 2018 Author Share Posted December 1, 2018 2 hours ago, Squid said: diagnostics after this happens again would be a plus I know it would be helpful, but it ends up locking up everything and won't download them. As a side update, I am starting to think that it isn't an app data backup issue but rather a docker one. It hung for quite a long time after restarting Krusader last night. After it finally reloaded, I attempted a docker scrub and it locked up.....I'll have more time to investigate this coming week. Maybe a corrupted image that is causing issues when appdata backup runs and cycles the dockers off/on? There is a copy of the syslog in the original post but i'll repost here: On 11/29/2018 at 9:04 AM, 1812 said: syslog.rtf 9 hours ago, John_M said: How is it if you boot into Safe Mode? server boots fine but no issues observed because nothing is loaded in terms of plugins, dockers, vms, etc... for the moment, as long as I don't touch it or mess with dockers that have already auto-started, it seems stable and the running dockers are are operational. Quote Link to comment
Squid Posted December 1, 2018 Share Posted December 1, 2018 Then, when you're in the mood to play around, put fix common problems into troubleshooting mode and then do a backup You have something else going on that the high I/O being generated from the reads on your cache drive. I can guarantee that the backup plugin will not stop your array (ever), does not stop VMs (ever). Quote Link to comment
John_M Posted December 1, 2018 Share Posted December 1, 2018 19 minutes ago, 1812 said: server boots fine but no issues observed because nothing is loaded in terms of plugins, dockers, vms, etc... Which is a good thing. You have a stable platform and by selectively switching the higher level functions off and on you should be able to isolate the culprit with relative ease. Quote Link to comment
1812 Posted December 2, 2018 Author Share Posted December 2, 2018 23 hours ago, Squid said: Then, when you're in the mood to play around, put fix common problems into troubleshooting mode and then do a backup You have something else going on that the high I/O being generated from the reads on your cache drive. I can guarantee that the backup plugin will not stop your array (ever), does not stop VMs (ever). will do and report back in a few days. I often forget about the troubleshooting mode in fcp... thanks! Quote Link to comment
1812 Posted December 3, 2018 Author Share Posted December 3, 2018 (edited) On 12/1/2018 at 10:10 AM, Squid said: I can guarantee that the backup plugin will not stop your array (ever), does not stop VMs (ever). OK. so I manually ran the appdata backup after putting fix common problems in trouble shooting mode. It ran the backup said it was done with errors. The dockers never restarted. Then errors started pouring in the syslog. I also attached a video showing how wonky it was afterward, (at the bottom of this post) and also showing it not downloading diagnostics. The array status kept changing from green to orange. undefined.mov It was all very exciting ... lol... It did not stop the firewall this time. a reboot returned it to normal operation (I wasn't previously able to reboot via webgui but could this time.) there was also something that i assume was a popup that never rendered correctly on the right side that said undefined:undefined undefined undefined. I was able to get to the syslog, and while it wouldn't download, i could copy/paste it into a document. it is here: errors.rtf I looked in the appdata backup location for the tar to " verify errors occurred" indicated by the logs but only found 3 older backups from mid November and no new backups. the libvirt backup has a timestamp that coincides with this backup so it was copied over. there were no other browser tabs open at the time either. FCP syslog tail: FCPsyslog_tail.txt What looks like an auto-generated diag file from shutdown: brahms1-diagnostics-20181203-1247.zip and the movie: server error.mov Pretty neat, huh? Edited December 3, 2018 by 1812 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.