• [6.8.1 / 6.8.2] vm boot loops after update


    Keksgesicht
    • Solved Minor

    I ran 6.8.0 for the last months without any problem. Now I decided to upgrade to 6.8.2 and the VMs with GPU passed through are boot looping. When I roll back to 6.8.0 the problem is gone. 6.8.1 show me the same picture as 6.8.2.

     

    Do I something wrong or outdated with my VMs or is this (un)known problem?

    cookietower-diagnostics-20200206-2256.zip




    User Feedback

    Recommended Comments

    ok, I removed the Kernel parameter to drop the framebuffer (couldn't install drivers inside the VM without it on Host EFI boot), switched from EFI to legacy boot mode and even started the server in "safe mode" (no plugins running) with version 6.8.2. The Problem still occurs. The problem is not limited to the primary GPU. With VNC I didn't noticed any problems. But when I roll back to 6.8.0 the passthrough works again. What has changed in libvirt with the last updates?

    cookietower-diagnostics-20200207-1842.zip

    Edited by Keksgesicht
    retested
    Link to comment

    Hi there,

     

    Took a look at your logs and unfortunately nothing is jumping out at me to indicate what is causing the issue.  So just to be clear, when on 6.8.1 or 6.8.2 and you start a VM that has a GPU assigned, the monitor lights up, but you get caught in a "boot loop."  What part is looping?  Do you ever see Windows trying to load or does it always stay on Tianocore or something like that?

     

    The toughest part in diagnosing this is your use of AMD hardware for the CPU.  AMD is notorious for having problems with VFIO/GPU pass through, though there are many like yourself who have been able to make it work.  None of our Intel systems in the lab are exhibiting the behavior you describe, but I'd still like to help you get to the bottom of it.

     

    Both QEMU and Libvirt were upgraded with 6.8.2.  On 6.8.0 libvirt version was 5.8 and QEMU was 4.1.1.  On 6.8.2 Libvirt is at 5.10 and QEMU at 4.2.  If anything was likely to cause the issue, it'd be the QEMU update, but even that isn't a guarantee.

     

    I also see a lot of messages in your libvirt log about files missing.  You might want to check that too to ensure that your paths to your VMs didn't get messed up.

     

    Anything else special about your VMs?  What if you try to create a new VM on 6.8.2 with GPU pass through.  Does that work or does just starting it get stuck in a boot loop as well?

    Link to comment

    These are normal Windows 10 installations 1909 with EFI boot mode and Q35 chipset. The primary VM has one USB Controller passed through. I noticed the missing files myself in the logs and found them in one of my unused VM XML templates.

    The VMs presenting the Windows boot logo. 10 seconds later the image freezes, blackouts, GPU fans stopping for a second and windows logo again. This keeps going on until I destroy the VM. The second VM only blackouts after 10 seconds but the monitor backlight keeps on. So the OS is still alive? Windows never tried to enter rescue mode.

    I heard about problems with AMD CPUs but I only had to add amd_iommu=on to the kernel parameters. With UnRaid 6.7.2 and 6.8.0 this all wasn't any problem. Maybe I libvirt or QEMU developer knows something about my problem.

    I will test your ideas later :D

    Did something changed in a prerelease version that I could test?
    Or should I upload my 6.8.0 diagnostics for comparison?

     

    Edited by Keksgesicht
    Link to comment

    ok, I created new XML templates and copied my custom changes like iothreads assingment and USB controller throughpassing into these templates. Now it's working. Maybe something in the nvram that the newer version of QEMU didn't expected.

     

    Also I want to add: I only tested VNC on my Ubuntu VM when I wrote my first comment. So it's maybe some problem related to Windows?

     

    I hope my problem is solved and I only have to reenable MSI inside the VMs.

    Link to comment

    I'm very glad to hear you got things up and running again.  Still not 100% certain what caused this, but please let us know if you continue to have issues.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.