goinsnoopin Posted May 6, 2019 Share Posted May 6, 2019 (edited) Unraid 6.6.7 Here is what I am experiencing: VMs kept going to resume Attempted to shutdown VMs, by hitting resume then quickly hitting stop. One of the two VMs shut down. When doing the same for the second VM, Unraid crashed. No webgui, no ssh/telnet access. Server is headless so I don't have keyboard or monitor (will in future). So I was unable to get diagnostics for this original crash. That being said I know my cache drive was over utilized as I had this happen once before (VMs going to resume) when a docker log filled the cache drive. I also know a couple of the array disks had little free space and I know that can sometime be an issue. Due to hard crash, power cycled Unraid. Unraid came up and parity check started. My two VMs that were running during the crash were no longer listed in my VM tab (my two other VMs that were not running at the time were the only two listed). A handful of dockers were running, balance of them would not start. Decided to reboot unraid via webgui to see if they would return. It returned the exact same way only 2 VMs with handful of dockers. Also note that it says parity check canceled, so I probably should not have done this as parity check was in progress. Deleted docker.img and recreated with 10 or so of my key dockers from templates. Worked with no issues. Went to bed. Got up this morning....read some forums and decided to run btrfs balance on cache drive. While this was running, I was looking into VM issue. Realized I had an older libvirt.img in another location and switched to this image. Went to VM tab and saw a bunch of older VMs I had from lets say a year ago or so. All VMs were stopped. I then went to my VM xml file backups and copied the VM xml for my main windows 10 VM, went to xml mode, pasted the backup xml file, unchecked start on creation and then created the new VM. I immediately got a disk 3 read error, followed by notification that disk 3 was disabled. Went to btrfs balance and saw it was still running so I canceled it, stopped the array, went to settings to turn off auto start array, rebooted server and started array in maintenance mode. After rebooting in maintenance mode, disk 3 shows up, has a smart report and in looking at the smart status stats, I don't see any issues. I am currently running an extended smart test on this disk and am waiting for results. I know this is a lot...any advice on how to proceed? on cache drive. While this was running, I was looking into VM issue. Realized I had an older libvirt.img in another location and switched to this image. Went to VM tab and saw a bunch of older VMs I had from lets say a year ago or so. All VMs were stopped. I then went to my VM xml file backups and copied the VM xml for my main windows 10 VM, went to xml mode, pasted the backup xml file, unchecked start on creation and then created the new VM. I immediately got a disk 3 read error, followed by notification that disk 3 was disabled. Went to btrfs balance and saw it was still running so I canceled it, stopped th tower-diagnostics-20190506-0648.zip Edited May 8, 2019 by goinsnoopin Quote Link to comment
witalit Posted May 6, 2019 Share Posted May 6, 2019 (edited) I was completely off the mark so removing this post. Edited May 6, 2019 by witalit Quote Link to comment
witalit Posted May 6, 2019 Share Posted May 6, 2019 Just reading back over your post I noticed you already recreated docker.img so possibly what I just posted is useless.. Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 Disk3 and another unassigned 1TB disk dropped offline, likely a controller issue since both are in the same, reboot and post new diags. Docker image errors were because cache was full. 1 Quote Link to comment
witalit Posted May 6, 2019 Share Posted May 6, 2019 Just now, johnnie.black said: Disk3 and another unassigned 1TB disk dropped offline, likely a controller issue since both are in the same, reboot and post new diags. Docker image errors were because cache was full. Thanks johnnie I was awaiting your reply, least it helps with me scanning these log files. Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 Here is current diagnostics. Disk 3 extended smart report just completed successfully no errors. Thanks guys for providing feedback! tower-diagnostics-20190506-0946.zip Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 SMART for disk3 looks fine, it was likely a controller problem since two disks dropped simultaneously, if the emulated disk is mounting correctly you can rebuild on top of the old disk. Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 On second though, and by the way the disks got disabled, you were likely passing through the SATA controller to one of the VMs. Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 Johnnie....I am not passing any sata controllers to VMs. The second disk that crashed at same time is an unassigned disk that has an image file that is mounted to one of the VMs that was running for my son's video game storage i.e. a d: drive. The win10 VMs c drive is on the SSD cache drive. Does this second thought impact your suggestion if rebuilding on same disk? 😄 Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 8 minutes ago, goinsnoopin said: ..I am not passing any sata controllers to VMs. I was trying to avoid going thought the VM diags, but yes you are: -device vfio-pci,host=07:00.0,bus=root.1,addr=00.2,multifunction=on This is the previously mentioned 2 port Asmedia controller. Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 Johnnie....I am confused. I have 3 pcie cards that I pass through. Two Nvidia GPUs (one to each VM) and one USB3 pcie card to one of the VMs. I have no sata pcie cards in my case....however my motherboard ASRock Z97 Extreme6 does have an ASMedia ASM1061 sata 4 ports. I haven't made changes to this in forever...back to my original post...I did restore an old libvirt....and diagnostics were run with this old libvirt....could that be the issue? Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 17 minutes ago, goinsnoopin said: however my motherboard ASRock Z97 Extreme6 does have an ASMedia ASM1061 sata 4 ports. It has two 2 port Asmedia controllers, one of them is being used by a VM, this can happen if hardware was added/removed that changed the PCI assignments, e.h. 07:00.00 could have been the USB controller before. You can also see the kernel driver in use for the 1st Asmedia controller, vfio-pci confirming it's being used by quemu. 07:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02) Subsystem: ASRock Incorporation Motherboard [1849:0612] Kernel driver in use: vfio-pci Kernel modules: ahci 08:00.0 USB controller [0c03]: Fresco Logic FL1100 USB 3.0 Host Controller [1b73:1100] (rev 10) Subsystem: Fresco Logic FL1100 USB 3.0 Host Controller [1b73:1100] Kernel driver in use: vfio-pci 09:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02) Subsystem: ASRock Incorporation Motherboard [1849:0612] Kernel driver in use: ahci Kernel modules: ahci 0a:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142] Subsystem: ASRock Incorporation ASM1042A USB 3.0 Host Controller [1849:1142] Kernel driver in use: xhci_hcd Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 I think my plan is to verify emulated disk3 by starting array. Then delete libvirt image and create VMs from backup XML files....but will double check pci assignments as hardware changes made have given them different addresses since backup. Then if all looks good, will rebuilt disk3. Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 Just make sure make the Bobby_Steam VM is no longer passing through that controller, or the same will happen again. Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 Disk 3 emulated contents look fine. Deleted libvirt.img, tried to start my original win10 VM and get an execution error. Operation failed: unable to find any master var store for loader: /usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd Unassigned disk sdg no longer shows up. Believe that was the drive that failed. Libvirt was not on cache drive it was in a share called system ...in looking at emulated contents this was on disk3. Any suggestions...should I rebuild disk3 to get to a normal state then deal with VMs? tower-diagnostics-20190506-1326.zip Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 SATA controller is still being used by the Bobby_Steam VM, and as soon as it was started it dropped both disks again, and it will do it until you stop using it in the VM. Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 (edited) I am not using Bobby_steam VM. I removed all VM definitions in VM manager then created a new one called Kitchen PC. However, you are right both drives dropped??? That forced disk3 to be unassigned on reboot. Assume I should rebuild disk 3 and not touch any VM stuff until rebuild complete? Edited May 6, 2019 by goinsnoopin Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 Just now, goinsnoopin said: I am not using Bobby_steam VM There was an attempt to start it, and that is enough. Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 Don't understand that??? Don't think I attempted to start any other than Kitchen PC At this point I want all VMs gone...will create new. Is removing them from webgui enough. Any other fragments elsewhere? Disk3 became unassigned in last reboot, so assume I should assign and rebuild data and not touch any VMs until rebuilt? Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 Just now, goinsnoopin said: Don't think I attempted to start any other than Kitchen PC Maybe it's set to auto start? Quote Link to comment
goinsnoopin Posted May 6, 2019 Author Share Posted May 6, 2019 Quick step back: Before crash libvirt.img was in a share that was allowed to be on any disk. looked like this data was on disk3. reading forums most people had libvirt.img on cache drive. I then went to my cache drive and saw a libvirt folder, which contained a libvirt.img. I went to vm settings and changed my path to point to the libvirt.img on the cache drive. once I did this a bunch of old vms like bobby steam vm showed up. these vms havent been used by me in years. They may have been autostart but I cant tell as today I deleted the libvirt.img on cache drive then removed all vms from the webgui. So the only vm in my webgui is called KitchenPC. This is the one that wont start. Do i have some corruption? maybe I should make vm via form vs copying backup xml? I did review template and it looked good, passed correct usb controller. Should I rebuild disk 3 first? Quote Link to comment
JorgeB Posted May 6, 2019 Share Posted May 6, 2019 5 minutes ago, goinsnoopin said: Should I rebuild disk 3 first? You can, but you'll need to reboot, since disk3 (and the unassigned disk) dropped offline because of the VM, and not attempt to start it again next time, or the same will happen again. Quote Link to comment
goinsnoopin Posted May 7, 2019 Author Share Posted May 7, 2019 Johnnie, How is the libvirt.img created? Is it getting VM information from the flash drive? If I delete libvirt.img, then turn VMs back on it keeps auto creating my 6 old VMs and the Bobby steam is one of them with the wrong SATA card passthrough...and it's set to auto start. Any ideas on how to break this circle? I have rebuilt disk3 two times now. I even verified libvirt.img was gone via telnet session and checked each disk. At one point tried to edit bobby_steam VM to change pci address to the USB card....hit update and it changed to updating and never completed. Quote Link to comment
JorgeB Posted May 7, 2019 Share Posted May 7, 2019 You likely have more than one libvirt.img, a new libvirt.img would not have any VMs, you can also just delete that VM (or disable auto-start). Quote Link to comment
goinsnoopin Posted May 7, 2019 Author Share Posted May 7, 2019 Johnnie, I truly have tried both of those. If I delete VMs(definitions only as I want to keep the disks)....they disappear from Unraid VM webgui, however on reboot they reappear and offending one auto starts....passing the incorrect SATA card then disabling the disk. So I have used the PCi.ids to list the devices for passthrough. Is there an opposite to that command to prevent this SATA controller from being passed to the VM while I figure this out. At this point wondering if there is some corruption on flash drive from original hard crash... contemplating pulling USB flash and doing clean install. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.