Astryl Posted November 17, 2021 Posted November 17, 2021 (edited) AMD Threadripper w/ 590X GPU passthrough. Was previously stable for well over a year. Suddenly when I start it I can no longer get into windows. It either boot loops at the "preforming recovery" screen or freezing while loading on the TianoCore splash screen. Attempted fixes: Four different vBIOS roms. New VM XML New VDisk / Fresh VM (this worked temporarily up until Windows loaded & immediately bricked at a black screen) Bind & unbound GPU at vfio. I'm at my wits end, this previously worked completely perfectly without issue. orbital-diagnostics-20211117-1802.zip Edited November 19, 2021 by Astryl Quote
Astryl Posted November 18, 2021 Author Posted November 18, 2021 Nov 18 02:07:01 ORBITAL kernel: Plex Media Serv[40895]: segfault at 14e338ecb018 ip 000014e33d56caa3 sp 000014e33736d3c0 error 4 in Plex Media Server[14e33ccc5000+bae000] Nov 18 02:07:01 ORBITAL kernel: Code: 8b 45 08 49 8b 4d 20 48 89 ca 48 09 c2 0f 84 1a 02 00 00 48 39 c8 75 0e 49 8b 4d 18 49 3b 4d 30 0f 84 07 02 00 00 49 8b 4d 00 <83> 79 08 ff 0f 84 a5 01 00 00 41 8b 4e 10 83 f9 01 75 05 41 8b 0e Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS error (device sdc1): bad tree block start, want 1228455936 have 0 Nov 18 02:07:09 ORBITAL kernel: BTRFS info (device sdc1): failed to delete reference to Plex Media Server.4.log, inode 53909236 parent 268 Nov 18 02:07:09 ORBITAL kernel: BTRFS: error (device sdc1) in __btrfs_unlink_inode:4034: errno=-5 IO failure Nov 18 02:07:09 ORBITAL kernel: BTRFS info (device sdc1): forced readonly Nov 18 02:07:09 ORBITAL kernel: BTRFS: error (device sdc1) in btrfs_rename:9598: errno=-5 IO failure Uh... Is my cache pool dying? Good health reports on both drives... Quote
JorgeB Posted November 18, 2021 Posted November 18, 2021 24 minutes ago, Astryl said: Is my cache pool dying? Looks more like filesystem corruption, but those errors are not in the posted diags. Quote
Astryl Posted November 18, 2021 Author Posted November 18, 2021 1 hour ago, JorgeB said: Looks more like filesystem corruption, but those errors are not in the posted diags. Yeah looking back at it, this is from today and may be related to my UD ssd for my plex media metadata and not relevant to the other issue. Outside of changing slots on my GPU, does anyone have any insight here? Last night I tried the following additional steps: PCIe overrides from "Both" -> "Multifunction" Adding "video=efifb:off" to my syslinux.cfg Still freezes at the same place on either the new or old vDisk. Quote
Astryl Posted November 18, 2021 Author Posted November 18, 2021 -m 65536 \ -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":68719476736}' \ -overcommit mem-lock=off \ -smp 1,sockets=1,dies=1,cores=1,threads=1 \ -uuid a1b3e671-4ac9-86b0-dfd8-b927bd0d0dc2 \ -display none \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=33,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=localtime \ -no-hpet \ -no-shutdown \ -boot strict=on \ -device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \ -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \ -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \ -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \ -device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \ -device nec-usb-xhci,p2=15,p3=15,id=usb,bus=pcie.0,addr=0x7 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 \ -blockdev '{"driver":"file","filename":"/mnt/user/domains/Arcana/vdisk1.img","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \ -device ide-hd,bus=ide.2,drive=libvirt-1-format,id=sata0-0-2,bootindex=1,write-cache=on \ -netdev tap,fd=35,id=hostnet0 \ -device virtio-net,netdev=hostnet0,id=net0,mac=52:54:00:78:9b:47,bus=pci.1,addr=0x0 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=36,server=on,wait=off \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -audiodev id=audio1,driver=none \ -device vfio-pci,host=0000:4a:00.0,id=hostdev0,bus=pci.3,addr=0x0 \ -device vfio-pci,host=0000:4a:00.1,id=hostdev1,bus=pci.4,addr=0x0 \ -device usb-host,hostdevice=/dev/bus/usb/003/003,id=hostdev2,bus=usb.0,port=1 \ -device usb-host,hostdevice=/dev/bus/usb/007/002,id=hostdev3,bus=usb.0,port=2 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/3 (label charserial0) 2021-11-18T20:12:23.537143Z qemu-system-x86_64: vfio: Cannot reset device 0000:4a:00.1, no available reset mechanism. 2021-11-18T20:12:23.542112Z qemu-system-x86_64: vfio: Cannot reset device 0000:4a:00.1, no available reset mechanism. Played around with it some more. Seems the GPU isn't resetting, I already have the AMD reset app installed from CA. Not sure what changed in 72 hours that caused this to suddenly become an issue. Quote
Astryl Posted November 18, 2021 Author Posted November 18, 2021 Updating the VM to Q35 5.1 fixed this issue. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.