• [ 6.6.0-rc1 ] VM performance on threadripper 2990wx


    SpaceInvaderOne
    • Solved

    I am having some problems with this version using a server with a threadripper 2990wx,  64 gigs ram, a gtx 1080 GPU and a gigabyte x399 gaming 7 motherboard with bios version f10 (AGESA 1.1.0.0  )

     

    On 6.5.3 my boot times of vms are very fast and everything works as expected.

     

    1.  ubuntu mate  with 32 cores (64 threads), 32 gigs ram .... passed through  gtx 1080 

    time from clicking start to desktop ......43 seconds

    2. windows 10 with 16 cores(32 threads), 32 gigs ram ...passed through nvme drive, gtx 1080, USB controller

    time from clicking start to desktop ..... 33 seconds

    3. windows 10 6 cores (12 threads) 4 gigs ram...passed through gtx1080

    time from clicking start to boot ......25 seconds.

     

    On 6.6.0-rc1

    1.  ubuntu mate  with 32 cores (64 threads), 32 gigs ram .... passed through  gtx 1080 

    time from clicking start to desktop .......4 minutes 5 seconds

    2. windows 10 with 16 cores(32 threads), 32 gigs ram ...passed through nvme drive, gtx 1080, USB controller

    time from clicking start to desktop ..... after 10 minutes forced stopped vm

    3. windows 10 6 cores (12 threads) 4 gigs ram...passed through gtx1080

    time from clicking start to boot .......2 minutes and 19 seconds

     

    Also, benchmarks suffering about 50 % drop in single thread scores and 25 percent drop in multi core (using geek bench on ubuntu vm)

    Any other people having issues with threadripper and VMS?

     

    ultramagnus-diagnostics-20180904-2110.zip




    User Feedback

    Recommended Comments



    1 hour ago, Dazog said:

    Not many people have the 32 core variant, including developers of these applications. Shrugs.

    QFT!  Luckily we have a threadripper and threadripper 2 as well and we too see some issues with stuttering (but not the boot time issues documented here).  I know we're looking into it.

     

    Not to start an AMD vs. Intel war here, but just an FYI, Intel hardware doesn't show these same issues as AMD.  I generally believe that AMD has a huge opportunity in the market if they can dedicate more time and resources to supporting Linux and virtualization better.  Right now it seems like it's a crapshoot when you buy AMD CPUs or GPUs on whether or not you're going to have a good experience.  Intel + NVIDIA is practically guaranteed to work at this point.

    Link to comment
    6 minutes ago, jonp said:

    QFT!  Luckily we have a threadripper and threadripper 2 as well and we too see some issues with stuttering (but not the boot time issues documented here).  I know we're looking into it.

     

    Not to start an AMD vs. Intel war here, but just an FYI, Intel hardware doesn't show these same issues as AMD.  I generally believe that AMD has a huge opportunity in the market if they can dedicate more time and resources to supporting Linux and virtualization better.  Right now it seems like it's a crapshoot when you buy AMD CPUs or GPUs on whether or not you're going to have a good experience.  Intel + NVIDIA is practically guaranteed to work at this point.

    Well when you disrupt a monopoly things take time.

     

    I'd rather have a year of eypc/threadripper issues eventually be sorted instead of intel decided I don't need 32 cores in my unraid server for 10x less money ;)


    6.6 is shipping on the latest and greatest kernel which has alot better support over 4.14, it's just that most of the software wasn't worried about normal folks running 32 core machines en masse.

     

    Most corporations would just pay and or modify the linux kernel themselves for 1 off mass core counts :)

    Link to comment
    25 minutes ago, jonp said:

    Luckily we have a threadripper and threadripper 2 as well and we too see some issues with stuttering (but not the boot time issues documented here).  I know we're looking into it.

     

    This provides a nice insight into our release process.  In the past we would hold back stable releases until all the stuff that previously seemed to work in a prior stable, for some reason quit working correctly in current development.

     

    Clearly if this is something we did, meaning a bug in s/w we write, then probably we would hold back the release.  But in this case it's not so clear where the issue is.  Therefore, we will probably be releasing 6.6.0 stable even if this issue persists.  This is because 6.6.0 includes lots of updates and security fixes which need to be published and this particular issue, though extremely annoying, does not affect a huge percentage of the user base.

     

    Link to comment

    Hi everyone!

     

    It looks like the AGESA 1.1.0.1a Gigabyte Bios updates are now out:

     

    https://www.gigabyte.com/Motherboard/X399-DESIGNARE-EX-rev-10#support-dl-bios

    https://www.gigabyte.com/Motherboard/X399-AORUS-Gaming-7-rev-10#support-dl-bios

     

    I hope this solves the issue as I have a Gigabyte X399 Designaire and 2950x arriving today.

     

    (I was in the process of messaging Gigabyte when the update popped up, must be a sign 😂🤞).

     

    Cheers,

    Tom 🙂

    Edited by tombonez
    Link to comment
    3 hours ago, limetech said:

    Just deleted someone's post by accident, which appeared in topic right where this one did... sorry!

    Probably best to leave the moderating to those like @trurl

     who know what they're doing, quick someone remove Tom's admin status! 🤣

    Edited by CHBMB
    Link to comment
    14 hours ago, tombonez said:

    Hi everyone!

     

    It looks like the AGESA 1.1.0.1a Gigabyte Bios updates are now out:

     

    https://www.gigabyte.com/Motherboard/X399-DESIGNARE-EX-rev-10#support-dl-bios

    https://www.gigabyte.com/Motherboard/X399-AORUS-Gaming-7-rev-10#support-dl-bios

     

    I hope this solves the issue as I have a Gigabyte X399 Designaire and 2950x arriving today.

     

    (I was in the process of messaging Gigabyte when the update popped up, must be a sign 😂🤞).

     

    Cheers,

    Tom 🙂

    Well, I just changed my motherboard for an ASRock board and swapped it out today. Then I read this post and saw there was a new bios for the Gigabyte boards! Typical.

    So I decided to quickly swap back to the Gigabyte board to try the new bios.

    Well, the good news is vms seem to start fine now ,no problems there anymore, so the AGESA 1.1.0.1a seems to have fixed this issue. Iommu groups are not as good as before but fine using the acs overide patch.

    Unfortunately, I couldn't run any benchmarks on the vms as the removing the Enermax liquid cooler caused it to fail..again...wow ...I am so fed up with Enermax. I had this Enermax 360 for only about 3 weeks. It was a replacement after having it exchanged after the previous one also failed. (again after swapping the CPU) I know the first revision had a fault whereby the coolant caused corrosion and the parts would block the pump. So it looks like my replacement was old stock. 

    The supplier doesn't even have any more in stock so I have contacted Enermax directly and they have told me to ship it to Germany to be replaced for a revision 2. Huh, don't know how much that will cost or how long I will be without it. I am seriously thinking of just using air cooling and going with a Noctua NH U14s TR4 then selling the Enermax when I get the replacement. I have had 3 aio coolers fail in the last year...Anyway, I will stop moaning about my first world problems!

    Link to comment
    12 minutes ago, SpaceInvaderOne said:

    Well, I just changed my motherboard for an ASRock board and swapped it out today. Then I read this post and saw there was a new bios for the Gigabyte boards! Typical.

     

    I am seriously thinking of just using air cooling and going with a Noctua NH U14s TR4 then selling the Enermax when I get the replacement.

     

    hey spaceinvaderone,

     

    might consider to tell the name/type of the mainboards you use for your threadripper cpus?

     

    well, i can recommend the Noctua CPU air coolers, at least here i've used several of them, all excellent!

    Link to comment

    I can also confirm that my Gigabyte X399 Designare with Threadripper 2950x now performs as well on 6.6.0 rc4 as it did on 6.5.3 with the new BIOS version F11e.

     

    @SpaceInvaderOne - LOL, I did the same thing and bought the ASRock Taichi to work around this issue, installed it last night and saw the updates to the thread this morning. I prefer the Gigabyte board so this worked out OK.

    Edited by rinseaid
    Clarity around unRAID versions
    Link to comment
    54 minutes ago, SpaceInvaderOne said:

    Well, I just changed my motherboard for an ASRock board and swapped it out today. Then I read this post and saw there was a new bios for the Gigabyte boards! Typical.

    So I decided to quickly swap back to the Gigabyte board to try the new bios.

    Well, the good news is vms seem to start fine now ,no problems there anymore, so the AGESA 1.1.0.1a seems to have fixed this issue. Iommu groups are not as good as before but fine using the acs overide patch.

    Unfortunately, I couldn't run any benchmarks on the vms as the removing the Enermax liquid cooler caused it to fail..again...wow ...I am so fed up with Enermax. I had this Enermax 360 for only about 3 weeks. It was a replacement after having it exchanged after the previous one also failed. (again after swapping the CPU) I know the first revision had a fault whereby the coolant caused corrosion and the parts would block the pump. So it looks like my replacement was old stock. 

    The supplier doesn't even have any more in stock so I have contacted Enermax directly and they have told me to ship it to Germany to be replaced for a revision 2. Huh, don't know how much that will cost or how long I will be without it. I am seriously thinking of just using air cooling and going with a Noctua NH U14s TR4 then selling the Enermax when I get the replacement. I have had 3 aio coolers fail in the last year...Anyway, I will stop moaning about my first world problems!

    Really sorry on the Enermax thing.. I took a risk getting a Rev 2 version.. Really hoping they did fix it. On the IOMMU's there may be a setting in the bios. "Enumerate all iommu in IVRs"  Mine had this and without it my IOMMU groupings were terrible. They still aren't nearly as good as an ASRock but the ASRock won't let you do an LSI 9201-16i card.

    Link to comment
    4 hours ago, SpaceInvaderOne said:

    I am seriously thinking of just using air cooling and going with a Noctua NH U14s TR4 then selling the Enermax when I get the replacement. I have had 3 aio coolers fail in the last year...Anyway, I will stop moaning about my first world problems!

    I can vouch for the Noctua. It performs about the same level as a 360 rad all-in-one cooler. One small thing to note is that you may have to raise the fan to clear the RAM (especially the slots nearest to the socket), which makes the height a bit taller than spec.

     

    Will be curious what your benchmark figures is. I'm already seeing higher all-core turbo. :D 

    ~# cat /proc/cpuinfo | grep "MHz"
    cpu MHz         : 3838.883
    cpu MHz         : 3842.082
    cpu MHz         : 3839.437
    cpu MHz         : 3838.451
    cpu MHz         : 3841.782
    cpu MHz         : 3842.523
    cpu MHz         : 3842.181
    cpu MHz         : 3835.253
    cpu MHz         : 3843.495
    cpu MHz         : 3834.948
    cpu MHz         : 3840.813
    cpu MHz         : 3841.923
    cpu MHz         : 3841.034
    cpu MHz         : 3841.805
    cpu MHz         : 3835.424
    cpu MHz         : 3841.688
    cpu MHz         : 3833.602
    cpu MHz         : 3842.614
    cpu MHz         : 3842.236
    cpu MHz         : 3842.521
    cpu MHz         : 3843.175
    cpu MHz         : 3842.522
    cpu MHz         : 3841.415
    cpu MHz         : 3842.517
    cpu MHz         : 3831.458
    cpu MHz         : 3842.367
    cpu MHz         : 3842.650
    cpu MHz         : 3842.180
    cpu MHz         : 3842.218
    cpu MHz         : 3841.945
    cpu MHz         : 3842.148
    cpu MHz         : 3840.602
    cpu MHz         : 3832.557
    cpu MHz         : 3842.336
    cpu MHz         : 3842.060
    cpu MHz         : 3840.882
    cpu MHz         : 3841.774
    cpu MHz         : 3840.777
    cpu MHz         : 3842.270
    cpu MHz         : 3842.064
    cpu MHz         : 3842.156
    cpu MHz         : 3835.171
    cpu MHz         : 3841.963
    cpu MHz         : 3840.519
    cpu MHz         : 3839.358
    cpu MHz         : 3833.257
    cpu MHz         : 3830.856
    cpu MHz         : 3840.741
    cpu MHz         : 3834.879
    cpu MHz         : 3842.435
    cpu MHz         : 3841.519
    cpu MHz         : 3840.938
    cpu MHz         : 3842.043
    cpu MHz         : 3840.830
    cpu MHz         : 3841.720
    cpu MHz         : 3837.862
    cpu MHz         : 3841.364
    cpu MHz         : 3840.644
    cpu MHz         : 3824.251
    cpu MHz         : 3840.582
    cpu MHz         : 3842.038
    cpu MHz         : 3840.441
    cpu MHz         : 3841.091
    cpu MHz         : 3840.340

     

    Edited by testdasi
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.