• [6.8.0-RC6] CPU Pinning Editor removes VM template if an XML error occurs


    jbartlett
    • Minor

    Running RC5 vs RC6 because of the USB device passthrough issue mentioned in another bug report.

     

    I had a VM (Cam 1) with a custom CPU block with 16 CPU's pinned. This is to enable hyper-threading in the VM when the Threadripper CPU itself doesn't really support it.

     

      <cpu mode='custom' match='exact' check='full'>
        <model fallback='forbid'>EPYC</model>
        <topology sockets='1' cores='14' threads='2'/>
        <cache level='3' mode='emulate'/>
        <feature policy='require' name='topoext'/>
        <feature policy='disable' name='monitor'/>
        <feature policy='require' name='hypervisor'/>
        <feature policy='disable' name='svm'/>
        <feature policy='disable' name='x2apic'/>
        <numa>
          <cell id='0' cpus='0-13' memory='6291456' unit='KiB'/>
          <cell id='1' cpus='14-27' memory='6291456' unit='KiB'/>
        </numa>
      </cpu>

    I cloned the hard drive of another Windows 10 VM and used "Settings > CPU Pinning" to remove four CPU's (2 CPU/2 HT) from "Cam 1" and assigned them to "Cam 1 PiP". The VM "Cam 1 PiP" had a standard web GUI form editor template. When I clicked the Apply button, I got back an error message akin to "CPU topology doesn't match maximum vcpu count" and then the "Cam 1" VM was gone from the list of configured VM's.

     

    I recreated the "Cam 1" XML but I wasn't able to duplicate.

     

    Recommend restoring the original XML if the CPU Pinning edit of the XML results in an invalid XML.

    vm1-diagnostics-20191119-0743.zip




    User Feedback

    Recommended Comments

    I'm able to reproduce it. I had missed a custom configuration that causes the error to happen. I've updated the above entry to include the cpu/numa tree.

     

    The CPU Pinning editor preserves the CPU block but the existence of the numa tags causes an invalid CPU assignment error on removing a pinned CPU: internal error: Number of CPUs in <numa> exceeds the <vcpu> count

     

    At this point, the Cam 1 VM no longer existed.

     

    Since the GUI editor doesn't understand the hardware numa assignments I'm duplicating inside the VM, it can't properly edit this cpu/numa tree. Recommend checking to see if this xml tree exists and not allowing an edit via the CPU Pinning page.

     

    I've attached the full VM XML.

    Cam1VM.xml

    Edited by jbartlett
    Link to comment

    I guess support could be added by removing the same number of CPU from the different numa ID's. Take 1 off, take 1 off of ID=1. Take 2 off, take 1 off ID=1, 1 off ID=2. etc.

    Link to comment

    @jbartlett Is there any benefit to emulate a virtual multi node topology? I never used more than a full physical node for a single VM so I never tweaked it like this.

    Link to comment
    4 hours ago, bastl said:

    I never used more than a full physical node for a single VM so I never tweaked it like this.

    Same physical node has better latency but in term of pure speed, more cores beat fewer cores. So it depends on the workload. Gaming should be same node. Transcoding wants as many cores as it can get.

    Link to comment

    @testdasi I know it always depends on the workload. My question was if there is a benefit to "trick" VM into thinking it runs on multiple nodes

        <numa>
          <cell id='0' cpus='0-13' memory='6291456' unit='KiB'/>
          <cell id='1' cpus='14-27' memory='6291456' unit='KiB'/>
        </numa>

    compared to a VM with the same core count without this tweak. How will windows react to this, does it change anything?

     

    Without this lines unraid will automatically assign RAM to the VM. With this setting, you forcing it into a dual node config with 6GB each. Or am I wrong?

    Link to comment

    It's the following block (not in the CPU xml tree) that causes the RAM to split up between physical numa nodes. It's my understanding that the <numa> block has the guest OS thinking the RAM is split up between the virtual nodes.

    <numatune>
      <memory mode='interleave' nodeset='0,2'/>
    </numatune>

    I got at least a 19% improvement in memory operations (read/write/copy) at the cost of a higher latency. This showed a 2% increase in CPU loads in my use case.

    Link to comment
    5 hours ago, bastl said:

    @testdasi I know it always depends on the workload. My question was if there is a benefit to "trick" VM into thinking it runs on multiple nodes

    Some programs are numa aware in that they'll prioritize their threads on one node vs another. If the node assignment matches the physical server, then you will see a benefit. I did on mine but I don't recall the percentages.

    Link to comment
    26 minutes ago, jbartlett said:

    <numatune>

       <memory mode='interleave' nodeset='0,2'/>

    </numatune>

    Ok this setting I also looked into back than working through all the RedHat optimisation guides, but I never used it because I never handed over more than 1 node to a single VM.

    Edited by bastl
    Link to comment

    I benchmarked (AIDA64) my VM with the cpu numa block and without and the physical RAM assigned to one numa node. The read/write/copy speeds were compatible as expected but there was a 0.4ns decrease in the memory latency using the numa block.

     

    Unless the physical RAM is split up too, the only advantage to having the cpu/numa block is for matching the physical CPU/Numa configuration which does provide an improvement in performance in benchmarks.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.