Jump to content

ghost82

Members
  • Posts

    2,726
  • Joined

  • Last visited

  • Days Won

    19

Everything posted by ghost82

  1. I'm thinking if it is a good idea to use only one kernel and in particular one of the 5.19.x series..I'm seeing lots of kernel panics on users' logs especially related to amd gpus passed through because of the kernel, for example: Oct 7 12:40:42 Tower kernel: amdgpu 0000:0e:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF Oct 7 12:40:42 Tower kernel: [drm] Detected VRAM RAM=8176M, BAR=256M Oct 7 12:40:42 Tower kernel: [drm] RAM width 256bits GDDR6 Oct 7 12:40:42 Tower kernel: [drm] amdgpu: 8176M of VRAM memory ready Oct 7 12:40:42 Tower kernel: [drm] amdgpu: 32122M of GTT memory ready. Oct 7 12:40:42 Tower kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072 Oct 7 12:40:42 Tower kernel: [drm] PCIE GART of 512M enabled (table at 0x00000080001E8000). Oct 7 12:40:42 Tower kernel: amdgpu 0000:0e:00.0: amdgpu: PSP runtime database doesn't exist Oct 7 12:40:42 Tower kernel: amdgpu 0000:0e:00.0: amdgpu: PSP runtime database doesn't exist Oct 7 12:40:42 Tower kernel: [drm] Found VCN firmware Version ENC: 1.17 DEC: 5 VEP: 0 Revision: 2 Oct 7 12:40:42 Tower kernel: amdgpu 0000:0e:00.0: amdgpu: Will use PSP to load VCN firmware Oct 7 12:40:44 Tower kernel: [drm] failed to load ucode SMC(0x2C) Oct 7 12:40:44 Tower kernel: [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x0) Oct 7 12:40:44 Tower kernel: [drm:psp_load_smu_fw [amdgpu]] *ERROR* PSP load smu failed! Oct 7 12:40:44 Tower kernel: [drm:psp_v11_0_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring Oct 7 12:40:44 Tower kernel: [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed Oct 7 12:40:44 Tower kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22 Oct 7 12:40:44 Tower kernel: amdgpu 0000:0e:00.0: amdgpu: amdgpu_device_ip_init failed Oct 7 12:40:44 Tower kernel: amdgpu 0000:0e:00.0: amdgpu: Fatal error during GPU init Oct 7 12:40:44 Tower kernel: amdgpu 0000:0e:00.0: amdgpu: amdgpu: finishing device. Oct 7 12:40:44 Tower kernel: amdgpu: probe of 0000:0e:00.0 failed with error -22 Oct 7 12:40:44 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090 Oct 7 12:40:44 Tower kernel: #PF: supervisor write access in kernel mode Oct 7 12:40:44 Tower kernel: #PF: error_code(0x0002) - not-present page Oct 7 12:40:44 Tower kernel: PGD 0 P4D 0 Oct 7 12:40:44 Tower kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI Oct 7 12:40:44 Tower kernel: CPU: 12 PID: 7040 Comm: rpc-libvirtd Tainted: G W 5.19.14-Unraid #1 Oct 7 12:40:44 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F36d 07/20/2022 Oct 7 12:40:44 Tower kernel: RIP: 0010:drm_sched_fini+0x42/0x90 [gpu_sched] Oct 7 12:40:44 Tower kernel: Code: e8 6c 74 fb e0 48 8d ab 98 00 00 00 4c 8d 63 f8 48 85 ed 74 29 48 89 ef e8 b5 f3 71 e1 48 8b 45 10 48 8d 55 10 48 39 d0 74 0c <c6> 80 90 00 00 00 01 48 8b 00 eb ef 48 89 ef e8 6d f4 71 e1 48 83 Oct 7 12:40:44 Tower kernel: RSP: 0018:ffffc90003ac3ba0 EFLAGS: 00010217 Oct 7 12:40:44 Tower kernel: RAX: 0000000000000000 RBX: ffff888f9ce29630 RCX: ffff88816dbb81c0 Oct 7 12:40:44 Tower kernel: RDX: ffff888f9ce296d8 RSI: ffff88816dbb81e8 RDI: ffff888f9ce296c8 Oct 7 12:40:44 Tower kernel: RBP: ffff888f9ce296c8 R08: ffff888f9cf5e700 R09: 0000000000400031 Oct 7 12:40:44 Tower kernel: R10: ffff888f9cf5e700 R11: ffff88816c661520 R12: ffff888f9ce29628 Oct 7 12:40:44 Tower kernel: R13: ffff888f9ce20000 R14: ffff88810145c370 R15: ffff888100f47a60 Oct 7 12:40:44 Tower kernel: FS: 0000148f4929b6c0(0000) GS:ffff88900eb00000(0000) knlGS:0000000000000000 Oct 7 12:40:44 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 7 12:40:44 Tower kernel: CR2: 0000000000000090 CR3: 000000015dfca000 CR4: 0000000000350ee0 Oct 7 12:40:44 Tower kernel: Call Trace: Oct 7 12:40:44 Tower kernel: <TASK> Oct 7 12:40:44 Tower kernel: amdgpu_fence_driver_sw_fini+0x35/0x7c [amdgpu] Oct 7 12:40:44 Tower kernel: amdgpu_device_fini_sw+0x27/0x2bf [amdgpu] Oct 7 12:40:44 Tower kernel: amdgpu_driver_release_kms+0x12/0x25 [amdgpu] Oct 7 12:40:44 Tower kernel: drm_dev_put+0x31/0x62 [drm] Oct 7 12:40:44 Tower kernel: release_nodes+0x3d/0x5c Oct 7 12:40:44 Tower kernel: devres_release_all+0x91/0xb9 Oct 7 12:40:44 Tower kernel: device_unbind_cleanup+0xe/0x61 Oct 7 12:40:44 Tower kernel: really_probe+0x268/0x273 Oct 7 12:40:44 Tower kernel: __driver_probe_device+0x8d/0xbd Oct 7 12:40:44 Tower kernel: driver_probe_device+0x1f/0x77 Oct 7 12:40:44 Tower kernel: __device_attach_driver+0x83/0x97 Oct 7 12:40:44 Tower kernel: ? driver_allows_async_probing+0x58/0x58 Oct 7 12:40:44 Tower kernel: bus_for_each_drv+0x82/0xad Oct 7 12:40:44 Tower kernel: __device_attach+0xb9/0x154 Oct 7 12:40:44 Tower kernel: bus_rescan_devices_helper+0x3a/0x69 Oct 7 12:40:44 Tower kernel: drivers_probe_store+0x34/0x50 Oct 7 12:40:44 Tower kernel: kernfs_fop_write_iter+0x134/0x17f Oct 7 12:40:44 Tower kernel: new_sync_write+0x7c/0xbb Oct 7 12:40:44 Tower kernel: vfs_write+0xda/0x129 Oct 7 12:40:44 Tower kernel: ksys_write+0x76/0xc2 Oct 7 12:40:44 Tower kernel: do_syscall_64+0x68/0x81 Oct 7 12:40:44 Tower kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd Oct 7 12:40:44 Tower kernel: RIP: 0033:0x148f4aa8542f Oct 7 12:40:44 Tower kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 19 2d f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 6c 2d f8 ff 48 Oct 7 12:40:44 Tower kernel: RSP: 002b:0000148f4929a670 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 Oct 7 12:40:44 Tower kernel: RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000148f4aa8542f Oct 7 12:40:44 Tower kernel: RDX: 000000000000000c RSI: 0000148f3807a4e0 RDI: 000000000000001c Oct 7 12:40:44 Tower kernel: RBP: 000000000000000c R08: 0000000000000000 R09: 0000148f3c001c00 Oct 7 12:40:44 Tower kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000148f3807a4e0 Oct 7 12:40:44 Tower kernel: R13: 000000000000001c R14: 0000000000000000 R15: 0000148f4b2275c0 Oct 7 12:40:44 Tower kernel: </TASK> Oct 7 12:40:44 Tower kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha cmac cifs asn1_decoder cifs_arc4 cifs_md4 dns_resolver xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter hfsplus cdrom xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls ixgbe xfrm_algo mdio igb ipv6 gigabyte_wmi wmi_bmof mxm_wmi edac_mce_amd edac_core amdgpu kvm_amd kvm gpu_sched drm_ttm_helper ttm drm_display_helper drm_kms_helper drm crct10dif_pclmul crc32_pclmul backlight agpgart syscopyarea crc32c_intel sysfillrect ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl k10temp Oct 7 12:40:44 Tower kernel: i2c_piix4 ccp sysimgblt fb_sys_fops ahci libahci i2c_algo_bit sata_sil24 nvme i2c_core nvme_core thermal wmi tpm_crb tpm_tis tpm_tis_core tpm button acpi_cpufreq unix [last unloaded: xfrm_algo] Oct 7 12:40:44 Tower kernel: CR2: 0000000000000090 Oct 7 12:40:44 Tower kernel: ---[ end trace 0000000000000000 ]--- Oct 7 12:40:44 Tower kernel: RIP: 0010:drm_sched_fini+0x42/0x90 [gpu_sched] Oct 7 12:40:44 Tower kernel: Code: e8 6c 74 fb e0 48 8d ab 98 00 00 00 4c 8d 63 f8 48 85 ed 74 29 48 89 ef e8 b5 f3 71 e1 48 8b 45 10 48 8d 55 10 48 39 d0 74 0c <c6> 80 90 00 00 00 01 48 8b 00 eb ef 48 89 ef e8 6d f4 71 e1 48 83 Oct 7 12:40:44 Tower kernel: RSP: 0018:ffffc90003ac3ba0 EFLAGS: 00010217 Oct 7 12:40:44 Tower kernel: RAX: 0000000000000000 RBX: ffff888f9ce29630 RCX: ffff88816dbb81c0 Oct 7 12:40:44 Tower kernel: RDX: ffff888f9ce296d8 RSI: ffff88816dbb81e8 RDI: ffff888f9ce296c8 Oct 7 12:40:44 Tower kernel: RBP: ffff888f9ce296c8 R08: ffff888f9cf5e700 R09: 0000000000400031 Oct 7 12:40:44 Tower kernel: R10: ffff888f9cf5e700 R11: ffff88816c661520 R12: ffff888f9ce29628 Oct 7 12:40:44 Tower kernel: R13: ffff888f9ce20000 R14: ffff88810145c370 R15: ffff888100f47a60 Oct 7 12:40:44 Tower kernel: FS: 0000148f4929b6c0(0000) GS:ffff88900eb00000(0000) knlGS:0000000000000000 Oct 7 12:40:44 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 7 12:40:44 Tower kernel: CR2: 0000000000000090 CR3: 000000015dfca000 CR4: 0000000000350ee0 I was also experiencing some minor issues on other boxes with kernels 5.18.x and 5.19.x and decided to skip these kernels. All good with 5.15.x or 5.10.x; 5.15.x will be supported till the end of 2023, 5.10.x till the end of 2026. Is it too much effort to let the users to choose between 2 kernel versions?Maybe support the latest one and the 5.10.x lts?
  2. Add an hostdev block, set the source address of the device you want to passthrough, set the target address of the passed through device in the vm, attach the device in the target guest to bus 0 for machine type i440fx, attach the device to bus 0 --> x for machine type q35 (bus 0 in q35 is like a "built-in device", but this is not your case; for q35, if bus is different than 0 check that you have a pcie-root-port with the index number equal to that of the target bus). For a q35 machine the block will be something like this: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> </hostdev> 1. Your source address is 41:00.0 (bus=41, slot=0, function=0) 2. Target address is 08:00.0 (bus=8, slot=0, function=0) 3. Check that pcie-root-port with index=8 exists: <controller type='pci' index='8' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='8' port='0xd' hotplug='off'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/> </controller> ----- For i440fx the block will be something like this: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> 1. Your source address is 41:00.0 (bus=41, slot=0, function=0), the same obviously 2. Target address is 00:05.0 (bus=0, slot=5, function=0): i440fx has only bus 0 If you get errors like double address in use, or something similar, check that the target address is not already in use by something else, in this case change the bus number (for q35) or the slot number (for i440fx).
  3. Use the attached vbios for the fighter version. It seems another issue on that reddit post, yours seems like the reset bug, but it shouldn't be, never found a 6000 series gpu with that bug, and my concern is about the mb bios... Powercolor.RX6600XT.8192.210701_1.rom
  4. It is possible to powerdown the whole server from the virtual machine too if it is of interest, with a qemu hook. This, for example: #!/bin/bash if [[ $1 == "Monterey" ]] && [[ $2 == "stopped" ]] then shutdown -h now fi Basically, when virtual machine with name Monterey is stopped (shutdown), the whole server will shutdown too with command shutdown -h now. Obviously, to boot again the server you need physical access. Out of interest you can also autostart a vm when the server boots. Consider a vm as a real pc, so yes, this is possible; depending on what you want a vm could not be strictly necessary, for example pihole is a plugin for unraid. Things to check are vt-x and vt-d support; vt-d is higly recommended too, to be able to passthrough hardware to the vm. If I understood well, for this to be possible you need the keyboard/mouse and gpu passed through to the vm: this will work like a desktop pc. If they are in another room they have other devices in their hands (a client), another pc? you can access the vm with remote desktop from the client and manage all from the client. Depending on the use, a gpu passed through to the vm is recommended for graphics acceleration even if the vm is accessed remotely. Good alternatives to remote desktop are moonlight, parsec, nomachine. It depends on how you configure the server; for them it could be the same as using a real pc; as I wrote it is possible to autostart the vm on server boot and shut it down on vm shutdown. If the vm is configured with keyboard/mouse and gpu passthrough attached to a monitor they will not notice they're running a virtual machine.
  5. You could also check if there are some beeps from the internal speaker, if you have it, and check also the bios post code on the motherboard, that 2-digits red display: sometimes it can point in the right direction to know why it's happening.
  6. Here: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <boot order='1'/> <alias name='hostdev2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev>
  7. Do it the right way: 1. bind audio and video to vfio 2. set gpu as multifunction and pass a rom file, your settings are wrong, change from this: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x1'/> </hostdev> to this: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </source> <rom file='/path/to/Powercolor.RX6600XT.8192.210701.rom'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x1'/> </hostdev> Replace /path/to/Powercolor.RX6600XT.8192.210701.rom with the correct path. Check that your rx6600xt is the red devil version. 3. Reboot and try ---- If you still get errors like: Refused to change power state from D0 to D3hot Unable to change power state from D3cold to D0 you have something wrong in your bios configuration, or the bios is simply bugged: in this case I suggest to check the vfio group on reddit to see if anyone else has this issue for that motherboard and to contact Gigabyte directly, since your motherboard is still supported and you run the latest F7 available bios. Powercolor.RX6600XT.8192.210701.rom
  8. yes, it could be a private ticket which only you and the official support can see.
  9. Can you check the link? Cannot find anything.
  10. Nothing more to advise, sorry, all I know is that my networks are available as soon as I start any vm, virtio, e1000, vmxnet3...
  11. Nice that we found the solution, never give up!Still a mistery why the benchmark apps overcome that limit... PS: no problem for the ordered ssd, one ssd more will find its use for sure
  12. Is it an hp z420? --> update, yes it is, you wrote it If it is, I'm reading that you should have in your bios some settings about "performance profile" or "power regulator settings": check if you have some dynamic settings there and switch to static high performance, if you have that setting. Check all the other power settings in bios, especially related to pcie (if any...). I'm reading that some hp servers should have power saving modes in bios that could throttle pcie devices.
  13. I think the important thing is to make trim to work for the virtual disk. virtio, and in particular virtio-blk should support trim, try to add discard=unmap: <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <source file='/mnt/user/domains/Windows 10/vdisk1.img' index='2'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </disk> Check available size of the real disk in unraid, copy a big file inside the vm on the virtual disk. Check again available size of the real disk: should have decreased. Delete the file you copied from inside the vm. Check available size of the real disk: if it increases again trim is working correctly. This is what JorgeB pointed in his reply.
  14. true, sorry, virtio is not compatible with rotation_rate
  15. My guess is that this shouldn't be related to vms...what about disabling bonding (bond0) and use only br0 bridge?
  16. That is making things more difficult, because apparently the gpu works well, try the advices given above, read all again because I edited my posts.
  17. This is my gpu, audio and video parts passed: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </source> <rom file='/opt/gpu-bios/6900xt.rom'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x06' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x1'/> </hostdev> Real hardware addresses (seen by the host) are: video at 06:00.0 audio at 06:00.1 --> add multifunction='on' in the address line, so that I can set the addresses in the guest at: video: 03:00.0 audio: 03:00.1 Same bus (03), same slot (00), different functions (0 and 1) For what I'm reading this is normal if the gpu isn't under load. Are all the cables connected to the power port(s) of the gpu and does your psu have enough power for all? It should just exit or crash, nor limiting performances, I was talking about bad programming, if this is the case..
  18. What about other softwares?like benchmark softwares or other games? It could simply be that that simulator is not programmed well to be run in a vm? As far as the vm settings I can only suggest to: 1. use q35 machine type instead of i440fx for better pcie compatibility 2. put the gpu in a multifunction device, video and audio parts in same bus, same slot, different function. 3. check for irq conflicts inside the guest and use msi fix to switch from irqs to msi(x) if there are irq conflicts (4). Use ovmf instead of seabios (but you need to convert the disk or reinstall windows), so that the gpu will use the uefi vbios instead of the legacy one.
  19. I read that the fenvi t919, or better the BCM94360CD chipset, can be problematic in some smbios, like the mac pro 7,1, which you are using or the imac pro 1,1, or anything newer than the imac 15,1. The ideal smbios for this chipset should be that of the imac 15,1 (monterey not offically supported for this model). Someone reported that smbios of imac 17,1, officially supported by monterey is still working with that chipset.
  20. mmm... if the tv was on before starting the vm and it didn't make any difference I'm not confident that the dummy plug will change things Did you try to force dgpu acceleration for that app/game in windows settings? For example: https://pureinfotech.com/set-gpu-app-windows-10/
  21. I would say, this for sure! How many developers abandoned their projexts because of continuous apple changes, and not for profitable advantages for the end user, but just for locking more their systems...you have no idea of how many wireless dongles I changed in 5 years, just because they weren't working in minor os revisions. Want to buy apple?just use with its software and take into account that apple software support will end very fast, apple business is selling hardware, remember...
  22. Try to change to this: <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows 10/vdisk1.img' index='2'/> <backingStore/> <target dev='hdc' bus='virtio' rotation_rate='1'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </disk> to force to be seen as ssd.
  23. This could be the issue, when the os is running headless I think graphics acceleration will be disabled, all will be loaded on the cpu side. If you have a monitor around, just plug it in the gpu output and turn it on, then stream via moonlight and see if there's any difference. If it makes a difference you can think about buying a dummy hdmi (or whatever connection has your gpu) to simulate an attached monitor.
  24. Enable remote access to the mac os vm, just to be sure you only have a black screen but the guest os is booting; if it is booting you need to modify the config.plist of opencore and add a boot-arg so nvidia drivers will be used.
×
×
  • Create New...