Jump to content

thenonsense

Members
  • Posts

    126
  • Joined

  • Last visited

Report Comments posted by thenonsense

  1. It sounds like you've got gigabit internet (or at least on that scale).  Can you do some file transfers between your NAS and a local VM?  NAS and another device on the network?  Post your local speed results.  

     

    I'll do the same this weekend and post my findings, but I'm living nicely on just 100Mb speeds to the rest of the world.  I'm also getting those speeds on all my unraid resources.  I'm sitting on 6.7.0 stable as well.  

  2. I begrudgingly admit that after some testing (remember this was randomly occurring, at a rate of about 75%) I was no longer able to replicate the issue, so it looks like it was hardware related.  Not sure why the issue occurred in 6.6, but did not occur when downgrading back to 6.5.3, still I did notice on the dead motherboard some LITERAL burns.  I'll bump to 6.7.0-rc2 and also verify full functionality, but for the sake of stable releases, it looks like we're good.

     

    Thanks guys.  I've been bested.  I can now chalk another weird bug up to hw-failure.

     

    In other news, anyone want to help office-space an x399 board in SoCal?

  3. Took a bit, but I finally did some testing.

    Using uma, the time to post is faster, but time to login is still an eon.  Single core still pinged for 100% usage.

    Using numa, time to post is slower (probably because it has to work harder to find enough contiguous room for VMs) but otherwise unchanged.

     

    Conclusion:  numa is not the cause.  Uma's faster by about 5-10 seconds to post, but Windows still takes several minutes to load.

     

  4. Just now, 1812 said:

    Issue does not exist for me and hasn’t since they incorporated the fix in a previous version.

    I'm not seeing a build in your signature/profile, can you tell us what you're running?

    Two people have confirmed it on ThreadRippers.  

     

    This bears a strong resemblance to the issue patched in 6.5.3, as mentioned earlier, and re-hashed here:

    On 10/31/2018 at 8:36 AM, limetech said:
    On 10/31/2018 at 4:17 AM, bastl said:

    For example creating a fresh windows VM with a GPU passthrough sometimes the VM shows this behaviour on boot if i give more than 1 core to the VM.

    We have noticed this anomaly with multiple windows versions and multiple CPU families.  Very strange.

    Regarding this:

    2 minutes ago, eschultz said:

    I used to have slow boot issues a while ago before I updated the BIOS on my MSI X399 gaming pro carbon ac motherboard.  Those AGESA versions make all the difference.  Which motherboard are you using?

    I'm on an Aorus Gaming 7 X399, BOIS F11e Agesa 1.1.01.a.  Assuming you bumped on 11/15, you'd be running 1.1.0.2.  

  5. This is still an issue.  incorrect core allocations and numa nodes are not the cause.

     

    On 10/31/2018 at 8:36 AM, limetech said:

    We have noticed this anomaly with multiple windows versions and multiple CPU families.  Very strange.

    It seems you've also noticed this on other platforms.  I noted the fix applied to 6.5.3 outside of this bug report, but didn't include it here:
     

    "In terms of code changes, this is a very minor release; however, we changed a significant linux kernel CONFIG setting that changes the kernel preemption model.  This change should not have any deleterious effect on your server, and in fact may improve performance in some areas, certainly in VM startup (see below).  This change has been thoroughly tested - thank you! to all who participated in the 6.5.3-rc series testing.

     

    Background: several users have reported, and we have verified, that as the number of cores assigned to a VM increases, the POST time required to start a VM increases seemingly exponentially with OVMF and at least one GPU/PCI device passed through.  Complicating matters, the issue only appears for certain Intel CPU families.  It took a lot of work by @eschultz in consultation with a couple linux kernel developers to figure out what was causing this issue.  It turns out that QEMU makes heavy use of a function associated with kernel CONFIG_PREEMPT_VOLUNTARY=yes to handle locking/unlocking of critical sections during VM startup.  Using our previous kernel setting CONFIG_PREEMPT=yes makes this function a NO-OP and thus introduces serious, unnecessary locking delays as CPU cores are initialized.  For core counts around 4-8 this delay is not that noticeable, but as the core count increases, VM start can take several minutes(!)."

     

    From the 6.5.3 release notes here: 

     

    I asked this in the 6.6.2 announcement but do you think this is related to the issue arising after the move from 6.5.3?

  6. On 10/31/2018 at 3:14 AM, tjb_altf4 said:

    CPU pairings look off, spread across both numa nodes and not using any thread pairings, this may be impacting performance.

    I did the research, talked to Aorus, birthed the post discussing core and CCX assignments for Threadripper.  The pairings are correct.  The CCXs are correct.  The dies are correct.  Numactl confirms the CCXs and dies.  Good try though.

     

    I'm working off the theory that the 100% usage on one core is due to some race condition, as for typical (fast) boots, CPU usage on one core seems to always spike during boot for a short period, but all cores see usage by the time the VM posts.  That, combined with the random factor when the boots are slow/fast, screams race condition 101.

  7. I did retest.  This bug report exists for the purpose of the conversation going on here:

    The bug is now logged as persistent from 6.6.0rc1 to 6.6.2.  

×
×
  • Create New...