Jump to content

VM - Nvidia P40, Windows 10 pro, Code 12.


Recommended Posts

Currently having issues getting an Nvidia P40 (24GB vram) to work on windows 10 pro vm, device manager is reporting Code 12 (This device cannot find enough free resources that it can use). I'm looking to use the card for OobaBooga and Stable diffusion.  Vram being king right now and the P40 having 24GB, it is ideal since it's cost is 1/8 of a 3090 or 4090 that also have 24GB Vram.  Though i'm sure i'd have issues with either card right now given the Vram size is larger than the average right now.  I've supplied the steps I've tried and resource information.  If you have any ideas let me know.

 

Host Specs.

Motherboard: Supermicro X10DRG-Q Version 3.2, BIOS dated: Fri 22 Nov 2019 12:00:00 AM PST (Latest bios)
Processor:  2x  Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz, 22 c, 44 t
Memory: 512 GB DDR4 PC-2133 ecc


Based on research I've done in other posts that suggested the following:

  • Host BIOS Enable ReBAR support  

  • Host BIOS Enable 4G Decoding

    • Enabled

  • Enable & Boot Custom Kernel syslinux configuration (near beginning of this thread)

    • Done, applied the patch found in post, by copying the new bzimage to the boot usb and rebooted unraid

  •  Boot Unraid in UEFI Mode

    • Done

  •  VM must use UEFI BIOS

    • Done

  •  VM must have the top line of XML from <domain type='kvm'> to:

    <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

     

  •  VM must have added the following (after the </device> line, before the </domain> line):

  •   <qemu:commandline>
        <qemu:arg value='-fw_cfg'/>
        <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
      </qemu:commandline>

     

  • Settings > Vm Manager > PCIe ACS override  : Both
  • Unbound and Rebound P40 to "VFIO at boot." tools > devices > PCI Devices and IOMMU groups.
  • Modifying resource 0 doesn't seem to work, I get permission errors even as root and the card being unbound.
    • cd /sys/bus/pci/devices/0000:84:00.0/
      echo 13 > resource0_resize

       

XML of VM:
 

Spoiler
<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>Windows 10</name>
  <uuid>266d38e0-94cb-3316-02d5-f8c7f3aaff82</uuid>
  <description>Win10DevBox</description>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>133169152</memory>
  <currentMemory unit='KiB'>37224448</currentMemory>
  <memoryBacking>
    <source type='memfd'/>
    <access mode='shared'/>
  </memoryBacking>
  <vcpu placement='static'>26</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='45'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='46'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='47'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='48'/>
    <vcpupin vcpu='8' cpuset='5'/>
    <vcpupin vcpu='9' cpuset='49'/>
    <vcpupin vcpu='10' cpuset='6'/>
    <vcpupin vcpu='11' cpuset='50'/>
    <vcpupin vcpu='12' cpuset='7'/>
    <vcpupin vcpu='13' cpuset='51'/>
    <vcpupin vcpu='14' cpuset='8'/>
    <vcpupin vcpu='15' cpuset='52'/>
    <vcpupin vcpu='16' cpuset='9'/>
    <vcpupin vcpu='17' cpuset='53'/>
    <vcpupin vcpu='18' cpuset='10'/>
    <vcpupin vcpu='19' cpuset='54'/>
    <vcpupin vcpu='20' cpuset='11'/>
    <vcpupin vcpu='21' cpuset='55'/>
    <vcpupin vcpu='22' cpuset='12'/>
    <vcpupin vcpu='23' cpuset='56'/>
    <vcpupin vcpu='24' cpuset='13'/>
    <vcpupin vcpu='25' cpuset='57'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-7.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/266d38e0-94cb-3316-02d5-f8c7f3aaff82_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='13' threads='2'/>
    <cache mode='passthrough'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Windows 10/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Windows10x64.iso'/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.229.iso'/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <filesystem type='mount' accessmode='passthrough'>
      <driver type='virtiofs' queue='1024'/>
      <binary path='/usr/libexec/virtiofsd' xattr='on'>
        <cache mode='always'/>
        <sandbox mode='chroot'/>
        <lock posix='on' flock='on'/>
      </binary>
      <source dir='/mnt/user/VMs'/>
      <target dir='VMs'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </filesystem>
    <interface type='bridge'>
      <mac address='52:54:00:41:31:cb'/>
      <source bridge='br0'/>
      <model type='virtio-net'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x84' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/isos/vbios/NVIDIA.TeslaP40.24576.161020.rom'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-fw_cfg'/>
    <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
  </qemu:commandline>
</domain>

 



LSPCI for P40 once bound to "VFIO at boot"

Spoiler

 

84:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
        Subsystem: NVIDIA Corporation GP102GL [Tesla P40]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 11
        NUMA node: 1
        IOMMU group: 112
        Region 0: Memory at fa000000 (32-bit, non-prefetchable) [disabled] [size=16M]
        Region 1: Memory at 387000000000 (64-bit, prefetchable) [disabled] [size=32G]
        Region 3: Memory at 387800000000 (64-bit, prefetchable) [disabled] [size=32M]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x16
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [250 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [420 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: vfio-pci
        Kernel modules: nvidia_drm, nvidia

 

 

Boot Log, doesn't show any errors with the resize:

Spoiler
driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \
-device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 0.0.0.0:0,websocket=5700,audiodev=audio1 \
-k en-us \
-device '{"driver":"qxl-vga","id":"video0","max_outputs":1,"ram_size":67108864,"vram_size":67108864,"vram64_size_mb":0,"vgamem_mb":16,"bus":"pcie.0","addr":"0x1"}' \
-device '{"driver":"vfio-pci","host":"0000:84:00.0","id":"hostdev0","bus":"pci.5","addr":"0x0","romfile":"/mnt/user/isos/vbios/NVIDIA.TeslaP40.24576.161020.rom"}' \
-fw_cfg opt/ovmf/X-PciMmio64Mb,string=65536 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
qxl_send_events: spice-server bug: guest stopped, ignoring

 



A few posts I've looked through:
 

 

 

Link to comment
5 minutes ago, Wes1 said:

Currently having issues getting an Nvidia P40 (24GB vram) to work on windows 10 pro vm, device manager is reporting Code 12 (This device cannot find enough free resources that it can use). I'm looking to use the card for OobaBooga and Stable diffusion.  Vram being king right now and the P40 having 24GB, it is ideal since it's cost is 1/8 of a 3090 or 4090 that also have 24GB Vram.  Though i'm sure i'd have issues with either card right now given the Vram size is larger than the average right now.  I've supplied the steps I've tried and resource information.  If you have any ideas let me know.

 

Host Specs.

Motherboard: Supermicro X10DRG-Q Version 3.2, BIOS dated: Fri 22 Nov 2019 12:00:00 AM PST (Latest bios)
Processor:  2x  Intel® Xeon® CPU E5-2699 v4 @ 2.20GHz, 22 c, 44 t
Memory: 512 GB DDR4 PC-2133 ecc


Based on research I've done in other posts that suggested the following:

  • Host BIOS Enable ReBAR support  

  • Host BIOS Enable 4G Decoding

    • Enabled

  • Enable & Boot Custom Kernel syslinux configuration (near beginning of this thread)

    • Done, applied the patch found in post, by copying the new bzimage to the boot usb and rebooted unraid

  •  Boot Unraid in UEFI Mode

    • Done

  •  VM must use UEFI BIOS

    • Done

  •  VM must have the top line of XML from <domain type='kvm'> to:

    <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

     

  •  VM must have added the following (after the </device> line, before the </domain> line):

  •   <qemu:commandline>
        <qemu:arg value='-fw_cfg'/>
        <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
      </qemu:commandline>

     

  • Settings > Vm Manager > PCIe ACS override  : Both
  • Unbound and Rebound P40 to "VFIO at boot." tools > devices > PCI Devices and IOMMU groups.
  • Modifying resource 0 doesn't seem to work, I get permission errors even as root and the card being unbound.
    • cd /sys/bus/pci/devices/0000:84:00.0/
      echo 13 > resource0_resize

       

XML of VM:
 

  Reveal hidden contents



LSPCI for P40 once bound to "VFIO at boot"

  Reveal hidden contents

 

Boot Log, doesn't show any errors with the resize:

  Reveal hidden contents



A few posts I've looked through:
 

 

 

did the fix in this thread not work?

 

Link to comment
2 minutes ago, SimonF said:

did the fix in this thread not work?

 


Wasn't sure what the specific item being referenced was.  I assumed it was the block below, is that not the case?

  <qemu:commandline>
    <qemu:arg value='-fw_cfg'/>
    <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
  </qemu:commandline>

 

Link to comment
2 minutes ago, Wes1 said:


Wasn't sure what the specific item being referenced was.  I assumed it was the block below, is that not the case?

  <qemu:commandline>
    <qemu:arg value='-fw_cfg'/>
    <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
  </qemu:commandline>

 

Yes to increase the mmio. Also which vera of unraid are you running?

Link to comment
7 minutes ago, SimonF said:

Have you asked Supermicro is rebar is supported as like you I cannot find any info I saw some feedback for x12 which said no.

Yeah, this might be the crux of the issue.  Which sucks, because after the 40 series nvidia and potentially higher Vram on all cards, we'll be running into this issue with any board that doesn't have Rebar support when trying to passthrough the gpu.

Link to comment

I see 2 issues in your VM config:

 

<memory unit='KiB'>133169152</memory>

<currentMemory unit='KiB'>37224448</currentMemory>

 

Those values for the memory amount should be the same for both lines. That could be the cause of your resource problem.

 

<vendor_id state='on' value='none'/>

 

You should enter a vendor id. Something like '1234567890ab' instead of 'none'.

Edited by Jumbo_Erdnuesse
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...