Caennanu Posted October 1, 2022 Share Posted October 1, 2022 Gday all, Recently i've set up a Virtual machine using a coral TPU on Unraid. This TPU i had to passthrough to the VM by editting the xml, as it wouldn't be listed as a passable device. This all worked fine and dandy, set up my new VM with the TPU, installed shinobi tested that for a while, works great. So time to dismantle the old VM, which was using GPU acceleration for object detection. Removing the GPU from the system however seems to have a fairly big impact i didn't anticipate. When booting the unraid machine, everything seems to work just fine. except that the VM's do not actually boot. Trying to analyse what is happening, i notice the GUI is freezing. Also SSH'ing into the system fails, and the command line interface on the physical machine doubles nearly every keystroke, making it impossible to do anything. The system becomes unresponsive. After a couple of reboots. i manage to catch the logs. It seems its trying to find the TPU. Which makes sence. But probably due to the hardware swapping, it has changed Id's.... So... i turn the system off. Disable automatically loading the array (via disk.cfg on the usb), so that i can boot. start the array manually without the VM's starting automatically after the array comes up.... this does not seem to be the case... So the questions are... Is there a way to edit the VM file to temporarily remove the TPU till i know the new id, so i can re-add it? Is there a way to actually stop the VM's from auto starting? (would be nice to know in general, if this can be done via CLI or editting files) And before you ask. With an system that is almost unresponsive, adding logs will be a challenge. Unraid version: i'm on RC4. Quote Link to comment
Caennanu Posted October 1, 2022 Author Share Posted October 1, 2022 After reading some on the forum with similar issues, i haven't been able to find a resolution yet other than perhaps renaming the images. Luckily i don't have the images on the disk that is part off the array (only their backups) so if needed i could just pull that disk and edit the xml's from the gui. Would still prefer an more near solution tho. Quote Link to comment
Caennanu Posted October 2, 2022 Author Share Posted October 2, 2022 update: Puling the disk with the VM's on it. Also causes the system to crash. Can't pull logs from unraid, but ipmi now shows an MCE i haven't had before... CPU 2: Machine Check: 0 Bank 17: dc2040000000011b TSC 0 ADDR 28b880 SYND 149901000a800500 IPID 9600450f00 PROCESSOR 2:830f10 TIME 1664694930 SOCKET 0 APIC 4 microcode 8301055 Quote Link to comment
Caennanu Posted October 2, 2022 Author Share Posted October 2, 2022 update2: after re-arranging hardware in pci-e slots, i've managed to get the TPU its 'configured' ID again. System now boots, VM manager boots. Would love to know if someone knows how to change this ID if added manually to the XML of VM's. Because this isn't particularly user friendly. Quote Link to comment
SimonF Posted October 2, 2022 Share Posted October 2, 2022 35 minutes ago, Caennanu said: update2: after re-arranging hardware in pci-e slots, i've managed to get the TPU its 'configured' ID again. System now boots, VM manager boots. Would love to know if someone knows how to change this ID if added manually to the XML of VM's. Because this isn't particularly user friendly. The XML are stored in the following path, but VM Manager has to be active for it to be mounted. To stop a VM from autostarting you have to remove the symlink in the autostart dir. root@computenode:/etc/libvirt/qemu# ls Debian.xml Linux.xml Linux3.xml Linux5.xml Unraid-VM.xml Windows\ 10.xml autostart/ nvram/ swtpm/ HA.xml Linux2.xml Linux4.xml Ubuntu.xml Windows\ 10\ test\ 7.xml Windows\ 11.xml networks/ rpi.xml root@computenode:/etc/libvirt/qemu# ls autostart HA.xml@ root@computenode:/etc/libvirt/qemu# What where you looking to change? Quote Link to comment
Solution Caennanu Posted October 2, 2022 Author Solution Share Posted October 2, 2022 @SimonF Thanks for the reply. The thing is, i couldn't start the VM manager without making unraid unresponsive. That is why i was looking for a way to at least disable the auto starting off the VM's within the manager. The reason i wanted to turn it off, is because 1 VM is non default. I had added the below line to the XML to load up the Coral TPU, which wasn't a passable pci-e device. Since i moved hardware around, or rather removed, it seemed that the Id of the coral TPU was different, and it was. <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </hostdev> The bus in question changed from 0x03 to 0x01. And i had no way to edit this. And that seemed the most logical cause of unraid becomming unstable. Quote Link to comment
SimonF Posted October 2, 2022 Share Posted October 2, 2022 7 hours ago, Caennanu said: @SimonF Thanks for the reply. The thing is, i couldn't start the VM manager without making unraid unresponsive. That is why i was looking for a way to at least disable the auto starting off the VM's within the manager. The reason i wanted to turn it off, is because 1 VM is non default. I had added the below line to the XML to load up the Coral TPU, which wasn't a passable pci-e device. Since i moved hardware around, or rather removed, it seemed that the Id of the coral TPU was different, and it was. <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </hostdev> The bus in question changed from 0x03 to 0x01. And i had no way to edit this. And that seemed the most logical cause of unraid becomming unstable. The file system will need to be mounted so you can update. Not sure on the process to mount the libvirt image without starting Libvirt. Quote Link to comment
Caennanu Posted October 3, 2022 Author Share Posted October 3, 2022 @SimonF Makes sence in a way. Still i believe there should be a way to turn off auto start of VM's when starting VM manager. Specifically for issues like the one i experienced. We shall see, maybe i should log a bug report. or a feature request to add a toggle to the VM manager menu with something like 'Disable VM auto start' Quote Link to comment
SimonF Posted October 4, 2022 Share Posted October 4, 2022 On 10/3/2022 at 9:42 AM, Caennanu said: @SimonF Makes sence in a way. Still i believe there should be a way to turn off auto start of VM's when starting VM manager. Specifically for issues like the one i experienced. We shall see, maybe i should log a bug report. or a feature request to add a toggle to the VM manager menu with something like 'Disable VM auto start' Raise a feature request. Do you know which device was trying to be mapped when the system was crashing. Quote Link to comment
Caennanu Posted October 6, 2022 Author Share Posted October 6, 2022 On 10/4/2022 at 8:53 PM, SimonF said: Do you know which device was trying to be mapped when the system was crashing. I do not know which it was trying to map at the time, as i fail to see the logic of the allocation. The only thing i found is that the TPU was allocated on bus 00, while it was trying to allocate it at 03. If anything was connected / assigned to 03 at the time i do not know. If it was... it was connecting to either of the folowing: LSI HBA, Adaptec HBA, Dual 10GB SFP NIC or GT710 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.