December 31, 20178 yr Hi everyone, I'm running into an issue and wanted to share, mainly to help contribute to the 6.4 QA...but if anyone has any suggestions, that'd be nice too. Short Description I upgraded from rc10b to rc18f this morning and am now no longer able to run nested virtualization. Long Description I have specific requirements for my labs where I need to leverage nested virtualization (Cisco VIRL if anyone is interested). I run several VMs on UNRAID but I only use nested virtualization on one of them (the others appear to be operating fine after the upgrade). The guest hypervisor OS is Ubuntu 14.04 LTS (3.19.0-74-generic) running KVM/QEMU version 2.2.0. The only passthrough that I'm doing from the UNRAID system is CPU host-passthrough - I pin/isolate the vCPUs for this guest from the others. The physical hardware is AMD ThreadRipper 1950X on an ASUS Zenith Extreme X399 board (if the rest of the peripherals are important/relevant, let me know). After upgrading from rc10b to rc18f, the guest hypervisor now consistently crashes whenever a nested guest tries to start. Unfortunately, I don't know where in the changes between rc10b and rc18f the issue was introduced since I simply made the jump from 10b to 18f. On UNRAID, logs in /var/log/libvirt/* and /var/log/syslog weren't too helpful. Logs on the guest hypervisor provide some info but not enough to tell me (or at least as far as I can understand) the root cause and fix - here're logs from a couple of the times where the guest hypervisor crashed: Some of the logs were cut short (the guest hypervisor's name is 'virl'): Dec 31 10:23:28 virl kernel: [ 398.872252] audit_printk_skb: 135 callbacks suppressed Dec 31 10:23:28 virl kernel: [ 398.872255] audit: type=1400 audit(1514737408.461:76): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt Dec 31 10:23:28 virl kernel: [ 398.872401] audit: type=1400 audit(1514737408.461:77): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_br Dec 31 10:23:32 virl kernel: [ 402.671699] BUG: unable to handle kernel paging request at ffff9008bf81eea0 Dec 31 10:23:32 virl kernel: [ 402.671703] IP: [<ffffffff811a7542>] handle_mm_fault+0x132/0x10e0 Dec 31 10:23:32 virl kernel: [ 402.671709] PGD 0 Dec 31 10:23:32 virl kernel: [ 402.671710] Oops: 0000 [#1] SMP Dec 31 10:23:32 virl kernel: [ 402.671712] Modules linked in: xt_REDIRECT nf_nat_redirect xt_mark vxlan ip6_udp_tunnel udp_tunnel xt_comment iptable_raw xt_CHECKSU Dec 31 10:23:32 virl kernel: [ 402.671739] CPU: 15 PID: 3044 Comm: kvm.real Tainted: G OE 3.19.0-74-generic #82~14.04.1-Ubuntu Dec 31 10:23:32 virl kernel: [ 402.671741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Here's another sample (capture more in my console this time): Dec 31 10:34:48 virl kernel: [ 42.351596] init: plymouth-stop pre-start process (17226) terminated with status 1 Dec 31 10:36:42 virl kernel: [ 156.341944] audit_printk_skb: 135 callbacks suppressed Dec 31 10:36:42 virl kernel: [ 156.341947] audit: type=1400 audit(1514738202.678:69): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvir Dec 31 10:36:42 virl kernel: [ 156.342073] audit: type=1400 audit(1514738202.678:70): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_b Dec 31 10:36:46 virl kernel: [ 159.996795] BUG: Bad page map in process kvm.real pte:ffff8808c986c429 pmd:8c986c067 Dec 31 10:36:46 virl kernel: [ 159.996799] addr:0000564a43abf080 vm_flags:08100073 anon_vma:ffff8800bba5e870 mapping: (null) index:564a43abf Dec 31 10:36:46 virl kernel: [ 159.996803] CPU: 3 PID: 1917 Comm: kvm.real Tainted: G OE 3.19.0-74-generic #82~14.04.1-Ubuntu Dec 31 10:36:46 virl kernel: [ 159.996804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Dec 31 10:36:46 virl kernel: [ 159.996805] 0000000000000000 ffff8808e96c7ce8 ffffffff817b61b3 0000564a43abf080 Dec 31 10:36:46 virl kernel: [ 159.996807] ffff880a0ef69080 ffff8808e96c7d38 ffffffff811a37ca ffff8808c986c429 Dec 31 10:36:46 virl kernel: [ 159.996808] 0000000564a43abf ffff880a0e76c000 0000000000000000 ffff8808c986c429 Dec 31 10:36:46 virl kernel: [ 159.996810] Call Trace: Dec 31 10:36:46 virl kernel: [ 159.996816] [<ffffffff817b61b3>] dump_stack+0x63/0x81 Dec 31 10:36:46 virl kernel: [ 159.996818] [<ffffffff811a37ca>] print_bad_pte+0x1aa/0x250 Dec 31 10:36:46 virl kernel: [ 159.996820] [<ffffffff811a463e>] vm_normal_page+0x8e/0xa0 Dec 31 10:36:46 virl kernel: [ 159.996822] [<ffffffff811a7ba2>] handle_mm_fault+0x792/0x10e0 Dec 31 10:36:46 virl kernel: [ 159.996824] [<ffffffff81202e70>] ? poll_select_copy_remaining+0x130/0x130 Dec 31 10:36:46 virl kernel: [ 159.996827] [<ffffffff81062d64>] __do_page_fault+0x1c4/0x5a0 Dec 31 10:36:46 virl kernel: [ 159.996830] [<ffffffff810f263a>] ? do_futex+0x10a/0x630 Dec 31 10:36:46 virl kernel: [ 159.996832] [<ffffffff810e470e>] ? ktime_get_ts64+0x4e/0xf0 Dec 31 10:36:46 virl kernel: [ 159.996834] [<ffffffff81202e41>] ? poll_select_copy_remaining+0x101/0x130 Dec 31 10:36:46 virl kernel: [ 159.996835] [<ffffffff81063171>] do_page_fault+0x31/0x70 Dec 31 10:36:46 virl kernel: [ 159.996837] [<ffffffff817bfe28>] page_fault+0x28/0x30 Dec 31 10:36:46 virl kernel: [ 159.996838] Disabling lock debugging due to kernel taint Dec 31 10:36:46 virl kernel: [ 159.996840] kvm.real: Corrupted page table at address 564a43abf080 Dec 31 10:36:46 virl kernel: [ 159.996841] PGD 90bee0067 PUD 915452067 PMD 8c986c067 PTE ffff8808c986c429 Dec 31 10:36:46 virl kernel: [ 159.996843] Bad pagetable: 000d [#1] SMP Dec 31 10:36:46 virl kernel: [ 159.996845] Modules linked in: xt_REDIRECT nf_nat_redirect xt_mark vxlan ip6_udp_tunnel udp_tunnel xt_comment iptable_raw xt_CHECKS Dec 31 10:36:46 virl kernel: [ 159.996880] CPU: 3 PID: 1917 Comm: kvm.real Tainted: G B OE 3.19.0-74-generic #82~14.04.1-Ubuntu Dec 31 10:36:46 virl kernel: [ 159.996881] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 Dec 31 10:36:46 virl kernel: [ 159.996882] task: ffff8808e979a740 ti: ffff8808e96c4000 task.ti: ffff8808e96c4000 Dec 31 10:36:46 virl kernel: [ 159.996883] RIP: 0033:[<00007f0061219404>] [<00007f0061219404>] 0x7f0061219404 Dec 31 10:36:46 virl kernel: [ 159.996886] RSP: 002b:00007ffd9c4c9ab0 EFLAGS: 00010202 Dec 31 10:36:46 virl kernel: [ 159.996887] RAX: 0000564a43abf070 RBX: 0000000000000001 RCX: 0000000000000000 Dec 31 10:36:46 virl kernel: [ 159.996887] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000564a43abf070 Dec 31 10:36:46 virl kernel: [ 159.996888] RBP: 00007ffd9c4c9ae4 R08: 0000564a431bea00 R09: 0000000000000000 Dec 31 10:36:46 virl kernel: [ 159.996889] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000020230d00 Dec 31 10:36:46 virl kernel: [ 159.996889] R13: 0000000000000001 R14: 000000000000000f R15: 0000564a43ab1330 Dec 31 10:36:46 virl kernel: [ 159.996891] FS: 00007f006a925980(0000) GS:ffff880a3fc60000(0000) knlGS:0000000000000000 Dec 31 10:36:46 virl kernel: [ 159.996892] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 31 10:36:46 virl kernel: [ 159.996893] CR2: 0000564a43abf080 CR3: 000000090208e000 CR4: 00000000003407e0 Dec 31 10:36:46 virl kernel: [ 159.996895] Dec 31 10:36:46 virl kernel: [ 159.996896] RIP [<00007f0061219404>] 0x7f0061219404 Dec 31 10:36:46 virl kernel: [ 159.996897] RSP <00007ffd9c4c9ab0> Dec 31 10:36:46 virl kernel: [ 159.996899] ---[ end trace e5ed6cb101eeea59 ]--- For now, I'll revert back to UNRAID rc10b. Thanks!
January 1, 20188 yr Author Yeah, that's kind of what I was thinking too. That's why I thought it was important to note my jump from 10b to 18f - the kernel changes in 15e or 16b would probably cause the same issues for me too. I suppose I could test that if the dev folks found that helpful. Otherwise I'll stick with 10b for the time being. Thanks for the reply.
January 2, 20188 yr On 12/31/2017 at 7:08 PM, realies said: Most likely due to the experimental AMD kernel patches. No such patches are in -rc18f.
January 3, 20188 yr On 12/31/2017 at 1:55 PM, zblue.h said: For now, I'll revert back to UNRAID rc10b. Does rc14 still work? That's the last release using the 4.13.x kernel before we moved to 4.14.x.
January 4, 20188 yr Author Eschultz, Great point, I can certainly try upgrading from 10b to 14. I'll do that any follow up shortly. Thanks for the support!
January 4, 20188 yr Author On second thought, my KVM is down so I'll wait until I'm home before I try to jump to 14.
January 5, 20188 yr Author I upgraded from 10b to 14 and can confirm that nested virtualization still works, as expected.
Archived
This topic is now archived and is closed to further replies.