• 6.7.2 Starting VM with passthrough GPU crashes unraid


    armbrust
    • Minor

    Moving from 6.6.7 to either 6.7.0, 6.7.1 6.7.2 all have the same issue.

    Everything works correctly, except starting a VM that has a GPU passed through.  When starting this VM the system crashes.

     

    I've attached diagnostics from both versions (6.6.7 and 6.7.2), just before starting the VM.   Nothing was changed in the configuration between runs.   There is another VM running fine in both cases.  It has nothing passed through.

     

    Also attached is the xml config of the problem VM.

     

    I tailed the syslog in both versions when starting the VM, and they look the same..  In both there is some sort of DMA fault, but in 6.7.2, it works fine.

     

    This is a tail of the syslog when starting the problem VM in 6.7.2:

    Jul  5 09:33:23 Tower kernel: vfio-pci 0000:0a:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
    Jul  5 09:33:23 Tower kernel: br0: port 3(vnet2) entered blocking state
    Jul  5 09:33:23 Tower kernel: br0: port 3(vnet2) entered disabled state
    Jul  5 09:33:23 Tower kernel: device vnet2 entered promiscuous mode
    Jul  5 09:33:23 Tower kernel: br0: port 3(vnet2) entered blocking state
    Jul  5 09:33:23 Tower kernel: br0: port 3(vnet2) entered forwarding state
    Jul  5 09:33:24 Tower avahi-daemon[7313]: Joining mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe13:8859.
    Jul  5 09:33:24 Tower avahi-daemon[7313]: New relevant interface vnet2.IPv6 for mDNS.
    Jul  5 09:33:24 Tower avahi-daemon[7313]: Registering new address record for fe80::fc54:ff:fe13:8859 on vnet2.*.
    Jul  5 09:33:24 Tower kernel: vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x900
    Jul  5 09:33:25 Tower kernel: vfio-pci 0000:00:1a.7: enabling device (0000 -> 0002)
    Jul  5 09:33:25 Tower kernel: vfio_cap_init: 0000:00:1a.7 hiding cap 0xa
    Jul  5 09:33:28 Tower kernel: vfio-pci 0000:0a:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
    Jul  5 09:33:30 Tower kernel: DMAR: DRHD: handling fault status reg 2
    Jul  5 09:33:30 Tower kernel: DMAR: [DMA Read] Request device [00:1a.7] fault addr eb000 [fault reason 06] PTE Read access is not set
    Jul  5 09:33:30 Tower nginx: 2019/07/05 09:33:30 [crit] 7479#7479: *2093 connect() to unix:/var/tmp/Letsencrypt.sock failed (2: No such file or directory) while connecting to upstream, client: 192.168.1.101, server: , request: "GET /dockerterminal/Letsencrypt/ws HTTP/1.1", upstream: "http://unix:/var/tmp/Letsencrypt.sock:/ws", host: "tower"

    Here is the tail of the sys log on startup of the same VM in 6.6.7 for comparison.

     

    Jul  5 09:51:28 Tower kernel: vfio-pci 0000:0a:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
    Jul  5 09:51:28 Tower kernel: br0: port 3(vnet2) entered blocking state
    Jul  5 09:51:28 Tower kernel: br0: port 3(vnet2) entered disabled state
    Jul  5 09:51:28 Tower kernel: device vnet2 entered promiscuous mode
    Jul  5 09:51:28 Tower kernel: br0: port 3(vnet2) entered blocking state
    Jul  5 09:51:28 Tower kernel: br0: port 3(vnet2) entered forwarding state
    Jul  5 09:51:29 Tower avahi-daemon[6629]: Joining mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe13:8859.
    Jul  5 09:51:29 Tower avahi-daemon[6629]: New relevant interface vnet2.IPv6 for mDNS.
    Jul  5 09:51:29 Tower avahi-daemon[6629]: Registering new address record for fe80::fc54:ff:fe13:8859 on vnet2.*.
    Jul  5 09:51:29 Tower kernel: vfio_ecap_init: 0000:0a:00.0 hiding ecap 0x19@0x900
    Jul  5 09:51:30 Tower kernel: vfio-pci 0000:00:1a.7: enabling device (0000 -> 0002)
    Jul  5 09:51:30 Tower kernel: vfio_cap_init: 0000:00:1a.7 hiding cap 0xa
    Jul  5 09:51:32 Tower kernel: vfio-pci 0000:0a:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
    Jul  5 09:51:34 Tower kernel: DMAR: DRHD: handling fault status reg 2
    Jul  5 09:51:34 Tower kernel: DMAR: [DMA Read] Request device [00:1a.7] fault addr eb000 [fault reason 06] PTE Read access is not set

    I've tried with and without "iommu=pt" in syslinux config.

     

    Anybody have any ideas?  Thanks

     

     

    tower-diagnostics-6.7.2-20190705-1324.zip tower-diagnostics-6.6.7-20190705-0918.zip Problem VM Config.xml




    User Feedback

    Recommended Comments

    Just a forewarning, not all hardware is compatible with passthrough, and even if the manufacturer says that it is (ie: includes IOMMU in the BIOS, it does not mean that it actually works correctly)

     

    In your case, you are attempting to use passthrough on a motherboard that was first sold at least 8 years ago.  First suggestion is to check for updates to the BIOS (it is dated 2011)

    Link to comment

    Thanks for the comment, I appreciate the time.

     

    Agreed that this is old hardware.   But pass through has been working well for me prior to 6.7.x.   Something has changed from 6.x.x ->  6.7.x.

     

    Unfortunately I do have the latest BIOS.   

     

    I'm hoping there is a work around for 6.7.x as I don't want to get left behind in versions, and don't want to buy new hardware.

    Link to comment

    A couple other things I tried without success:

     

    Bind the GPU to the vfio driver:   apped vfio-pci.ids=10de:1d01,10de:0fb8  

    Append pcie_aspm=off video=vesafb:off,efifb:off  in syslinux config

    Link to comment
        <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
          </source>
          <alias name='hostdev0'/>
          <rom file='/mnt/user/domains/msi1030.dump'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
        </hostdev>

    Try removing

    xvga='yes'

    from the XML line for your card...

    From what I have read, this setting is no longer supported, so this is my best guess...

     

    Next best guess is to ask where you got the Video BIOS file for the card, try re-dumping it directly from the UnRaid commandline for this specific card if it was not gotten from there in the first place...

     

    Third best option, try using the Nvidia plugin for UnRaid (Read the documentation for it before you do)

     

    And last option is "Create" a new VM with the same options as the current one, and just point it to the same drives and image files as the current one, this resets all the settings and sometimes fixes some of these types of issues...

     

    This is what the line for my EVGA GTX1070 passthrough reads like:

        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
          </source>
          <alias name='hostdev0'/>
          <rom file='/mnt/cache/system/EVGA_GTX1070_FTW_DT_DUMPFROMUNRAIDCOMMANDLINE.rom'/>
          <address type='pci' domain='0x0000' bus='0x05' slot='0x01' function='0x0'/>
        </hostdev>

     

    Edit: This is minor, but renaming your VBIOS file from ".dump" to ".rom" lets is show up in the VM Creation WebUI for UnRaid...

    Edited by Warrentheo
    Link to comment

    Thanks Warrentheo for the ideas..   removing xvga='yes' prevents it from crashing, but also prevents the VM from booting in either 6.7.2 or 6.6.1.

     

    I did dump the rom from this card, but I'll try to do it again - the problem is according to space invaders video you need to run it in a VM first..   I do have another GPU in the system..  perhaps I can somehow make it the primary.. 

     

    Creating a new VM using the same image has the same results as the existing VM.

     

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.