Hey guys,
I'm starting to loose my mind, I have a reset bug with an AMD 7900 XTX.
After restart it works once perfectly, but as soon as I reboot the Windows VM, I can't boot again.
I tried ACS, blacklist AMDGPU driver, Q35 vs i440fx, dumping ROM, disable D0 state, the amd reset bug plugin from the store, remove and PCI rescan.
I don't have any more idea.
WM log shows this (but it's the same at the first time after boot when it works):
2023-11-19T01:47:15.893928Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid argument
2023-11-19T01:47:15.893968Z qemu-system-x86_64: vfio_dma_map(0x14a7edc56c00, 0x380000000000, 0x10000000, 0x14a7dce00000) = -2 (No such file or directory)
2023-11-19T01:47:15.894085Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid argument
2023-11-19T01:47:15.894090Z qemu-system-x86_64: vfio_dma_map(0x14a7edc56c00, 0x380010000000, 0x200000, 0x14a7dcc00000) = -22 (Invalid argument)
2023-11-19T01:47:15.901186Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid argument
2023-11-19T01:47:15.901197Z qemu-system-x86_64: vfio_dma_map(0x14a7edc56c00, 0x380000000000, 0x10000000, 0x14a7dce00000) = -22 (Invalid argument)
2023-11-19T01:47:15.901336Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid argument
2023-11-19T01:47:15.901341Z qemu-system-x86_64: vfio_dma_map(0x14a7edc56c00, 0x380010000000, 0x200000, 0x14a7dcc00000) = -22 (Invalid argument)
2023-11-19T01:47:15.907046Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid argument
2023-11-19T01:47:15.907054Z qemu-system-x86_64: vfio_dma_map(0x14a7edc56c00, 0x380000000000, 0x10000000, 0x14a7dce00000) = -22 (Invalid argument)
no error in dmesg, after I remove and rescan the GPU:
[185956.423209] pci 0000:03:00.0: Removing from iommu group 22
[185956.423323] pci 0000:03:00.1: Removing from iommu group 23
[185956.447322] pci 0000:03:00.0: [1002:744c] type 00 class 0x030000
[185956.447337] pci 0000:03:00.0: reg 0x10: [mem 0x6130000000-0x613fffffff 64bit pref]
[185956.447346] pci 0000:03:00.0: reg 0x18: [mem 0x6140000000-0x61401fffff 64bit pref]
[185956.447352] pci 0000:03:00.0: reg 0x20: [io 0x4000-0x40ff]
[185956.447358] pci 0000:03:00.0: reg 0x24: [mem 0x86f00000-0x86ffffff]
[185956.447364] pci 0000:03:00.0: reg 0x30: [mem 0x87000000-0x8701ffff pref]
[185956.447438] pci 0000:03:00.0: PME# supported from D1 D2 D3hot D3cold
[185956.447564] pci 0000:03:00.0: Adding to iommu group 22
[185956.447570] pci 0000:03:00.0: vgaarb: bridge control possible
[185956.447571] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[185956.447595] pci 0000:03:00.1: [1002:ab30] type 00 class 0x040300
[185956.447606] pci 0000:03:00.1: reg 0x10: [mem 0x87020000-0x87023fff]
[185956.447682] pci 0000:03:00.1: PME# supported from D1 D2 D3hot D3cold
[185956.447751] pci 0000:03:00.1: Adding to iommu group 23
[185956.471547] pci 0000:03:00.0: BAR 0: assigned [mem 0x6130000000-0x613fffffff 64bit pref]
[185956.471561] pci 0000:03:00.0: BAR 2: assigned [mem 0x6140000000-0x61401fffff 64bit pref]
[185956.471567] pci 0000:03:00.0: BAR 5: assigned [mem 0x86f00000-0x86ffffff]
[185956.471569] pci 0000:03:00.0: BAR 6: assigned [mem 0x87000000-0x8701ffff pref]
[185956.471570] pci 0000:03:00.1: BAR 0: assigned [mem 0x87020000-0x87023fff]
[185956.471572] pci 0000:03:00.0: BAR 4: assigned [io 0x4000-0x40ff]
[185956.471630] pci 0000:03:00.1: D0 power state depends on 0000:03:00.0
[185982.931308] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[185982.958294] br0: port 6(vnet9) entered blocking state
[185982.958297] br0: port 6(vnet9) entered disabled state
[185982.958322] device vnet9 entered promiscuous mode
[185982.958393] br0: port 6(vnet9) entered blocking state
[185982.958394] br0: port 6(vnet9) entered forwarding state
[185999.148982] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[185999.148989] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[185999.148992] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
[185999.148993] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x27@0x450
[186030.378009] br0: port 6(vnet9) entered disabled state
[186030.378189] device vnet9 left promiscuous mode
[186030.378191] br0: port 6(vnet9) entered disabled state
[186031.039678] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
No libvirt or vfio errors.
What else to check? what to do? do you have any idea, I'm getting desperate, I can't reboot all the time.
If you have any idea please help me out.
Thanks