MrScopi Posted July 27, 2022 Share Posted July 27, 2022 I am struggling to fix an error that came up recently. I noticed that I was receiving BTRFS errors on my cache drive, and eventually figured out they only appear after I start my Win10 VM. Looking at the syslog it looks like the "SATA link down" errors start immediately after the VM boots, and eventually the drive is assigned a new letter (sdh > shl) and BTRFS starts to panic. I've attached the syslog and diagnostics. Thanks for any help in solving this, I appreciate it. tower-diagnostics-20220727-1453.zip tower-syslog-20220727-1946.zip Quote Link to comment
MrScopi Posted July 27, 2022 Author Share Posted July 27, 2022 <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm'> <name>Windows 10</name> <uuid>a3334b0f-b3f9-ab09-4cf5-543986bafac6</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>17301504</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4'/> <vcpupin vcpu='1' cpuset='16'/> <vcpupin vcpu='2' cpuset='5'/> <vcpupin vcpu='3' cpuset='17'/> <vcpupin vcpu='4' cpuset='6'/> <vcpupin vcpu='5' cpuset='18'/> <vcpupin vcpu='6' cpuset='7'/> <vcpupin vcpu='7' cpuset='19'/> </cputune> <os> <type arch='x86_64' machine='pc-i440fx-5.1'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/a3334b0f-b3f9-ab09-4cf5-543986bafac6_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv mode='custom'> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> <kvm> <hidden state='on'/> </kvm> </features> <cpu mode='host-passthrough' check='none' migratable='on'> <topology sockets='1' dies='1' cores='4' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source dev='/dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB_S3D2NX0J900251Y'/> <target dev='hdc' bus='sata'/> <boot order='1'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> <controller type='usb' index='0' model='qemu-xhci' ports='15'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:c0:0c:52'/> <source bridge='br0'/> <model type='virtio-net'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='3'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <audio id='1' type='none'/> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/> </source> <rom file='/mnt/user/isos/NVIDIA.RTX3070Ti.8192.210425.trimmed.rom'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x0a' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x07' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x07' slot='0x00' function='0x3'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </hostdev> <memballoon model='none'/> </devices> </domain> This is the XML of the VM in question. Quote Link to comment
JorgeB Posted July 28, 2022 Share Posted July 28, 2022 Very strange, since the cache device is in a controller with other devices and those are not affected, do you have any other free SATA port you could use for the cache device? Quote Link to comment
MrScopi Posted July 28, 2022 Author Share Posted July 28, 2022 Yeah, that's what I couldn't figure out either. And it was very tied to starting the VM. I do have other ports, so I'll try swapping those around tonight. Quote Link to comment
MrScopi Posted July 29, 2022 Author Share Posted July 29, 2022 Is there an easy way to reset the kvm settings to default without nuking all my VM setups? I migrated recently from a dual Xeon Supermicro board to this current AMD setup, and am wondering if there's some vestigial changes that I may be fighting. Changing ports and cables did not affect the error. It also occurs when I boot with the GPU passthrough disabled. I have another VM that runs normally (Ubuntu server). A difference between them would be that my Windows VM is running on an Unassigned Devices disk passthrough, while the Ubuntu VM is not. Thanks again for the help. Quote Link to comment
JorgeB Posted July 29, 2022 Share Posted July 29, 2022 You can try creating a new VM but pointing to the existing vdisk. Quote Link to comment
MrScopi Posted July 29, 2022 Author Share Posted July 29, 2022 7 hours ago, JorgeB said: You can try creating a new VM but pointing to the existing vdisk. Tried this, and the error still occurred. Quote Link to comment
JorgeB Posted July 29, 2022 Share Posted July 29, 2022 Can't think of what the problem could be. Quote Link to comment
MrScopi Posted July 29, 2022 Author Share Posted July 29, 2022 Thanks for taking a look, I'll continue trying to solve this. Quote Link to comment
MrScopi Posted July 30, 2022 Author Share Posted July 30, 2022 Manjaro VM using passthrough: works fine Fresh install Windows VM using VM image: works fine Fresh install Windows VM using passthrough: TBD Signs are pointing to *something* about the install itself that is messed up and grabbing that drive. 1 Quote Link to comment
MrScopi Posted August 9, 2022 Author Share Posted August 9, 2022 OK, so a fresh nvme install of Win10 is able to run without triggering the cache. However, a few days later the disconnects keep happening, and btrfs gets corrupted (thank god for appdata backups...). Is it possibly power/psu related? I had 10 sata drives, and a few were powered by splitters. Perhaps only btrfs was sensitive enough to have issues with the sata cutting out intermittently? I've moved some files around and bought a new nvme drive, so I now have an nvme cache and nvme Windows install. Hopefully that resolves this issue. Leaving the update for whomever stumbles across this thread. Quote Link to comment
JorgeB Posted August 9, 2022 Share Posted August 9, 2022 5 hours ago, MrScopi said: However, a few days later the disconnects keep happening Do you mean the NVMe device drops offline? That would be a different issue. Quote Link to comment
MrScopi Posted August 9, 2022 Author Share Posted August 9, 2022 No, the disconnects are on SATA connections, not NVMe. Quote Link to comment
JorgeB Posted August 9, 2022 Share Posted August 9, 2022 Still rather strange, and you're sure there are no disconnects without the VM running? Quote Link to comment
MrScopi Posted August 9, 2022 Author Share Posted August 9, 2022 I think I wasn't clear in my writing, but I'll try to list the order here: 1. Fresh install of Win10 on NVMe (old install was on SATA SSD) 2. No immediate disconnects 3. Three or so days later, I see my Docker apps are malfunctioning, and syslog shows SATA disconnects of the SATA cache drive 4. I now have Win10 on an NVMe and cache on an NVMe, and physically disconnected the SATA drive My current hypothesis for the original problem is that booting up Win10 (which was then on a passed-through SATA) caused enough of a SATA power issue to cause disconnects on the SATA cache, which didn't bother the xfs array, but did bother the btrfs cache drive. I now have one fewer SATA drive plugged in, and have moved to NVMe for cache and VM. We'll see if stability improves. Quote Link to comment
MrScopi Posted August 11, 2022 Author Share Posted August 11, 2022 Final planned update. Running the original VM on the original SATA drive, but now with an NVMe cache, and there are no errors. I'm chalking it up to SATA power being overtaxed. Guess I won't be adding more drives any time soon! Thanks for the help Jorge, I appreciate the insights provided here. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.