The Black Bear - Threadripper 2990WX build


Recommended Posts

2 hours ago, testdasi said:

Interesting. Was the xml in the post part of your unraid VM xml?

I may try it when I have some time. Wonder if I can actually have the best of both worlds by using Process Lasso to restrict games to the same node while having my VM cross NUMA nodes for things that need more cores than low latency.

<cpu mode='custom' match='exact' check='partial'>

<model fallback='allow'>EPYC-IBPB</model>

<topology sockets='1' cores='8' threads='2'/>

<feature policy='require' name='topoext'/> </cpu>

Is the XML and yes unraid only stuff.. Apparently QEMU has patches for 3.0 that fix a lot of this and make SMT work properly for threadripper but I don't think we can apply them on unraid. We will have to wait for Unraid to apply them but the problem is some aren't "official" while others are.

Edited by Jerky_san
Link to comment
  • 1 month later...
On 9/2/2018 at 6:56 PM, testdasi said:

Thanks to the magic of KVM, I now have MacOS running on an old Surface 3. 😁

MacOS on Surface 3~01.jpg

Just saw this. Does this mean you're running a MacOS VM on your Threadripper unRAID box? If so, what's the performance like? And what software are you using on the Surface 3 for remote access to the MacOS VM?

Link to comment
On 12/30/2018 at 12:31 PM, sonofdbn said:

Just saw this. Does this mean you're running a MacOS VM on your Threadripper unRAID box? If so, what's the performance like? And what software are you using on the Surface 3 for remote access to the MacOS VM?

Yes, running MacOS VM and remote into it using NoMachine. Performance-wise, it's ok for general uses (I only assign 4 cores + GT 710), with a bit of lag when watching Youtube. It was more a fun experience than actual need for me.

Link to comment
  • 2 months later...

I just got doing a build like yours. Is there anything special need to be done in the bios do you optimize for unraid?

Also I’m having issues trying to pass through my onboard sound card or a PCI sound card to a w10 vm. Did you have any issues like this?


Sent from my iPhone using Tapatalk Pro

Link to comment
  • 2 months later...

Some long overdue updates:

 

As nobody has noticed, I have been whining a lot about being unable to update to unRAID above 6.5.3 due to terrible inexplicable lags to my main workstation VM which renders it (no pun intended) useless. How terrible? Think running Windows 10 on a Pentium MMX, if that's even possible. So I spent a lot of time on and off to scientifically eliminate all possible causes and eventually arrived at the only possibility, which is ACS multifunction override. So another month spent on rejigging stuff to eliminate the need for it and voila, I just managed to update to 6.7.0 any my workstation VM is now lag free.

 

Along the way:

  • Discovered that the bottom right M.2 slot (the 2280 size), for whatever the F reason, is in the same IOMMU group as (a) both wired LAN ports, (b) wireless LAN, (c) SATA controller and (d) the middle PCIe slot. Hence, it practically cannot be passed through via the PCIe method (need ACS multifunction override which lags - see above).
  • My Toshiba 3TB HDD used for temp space died, likely due to being overworked. Writing TBs weekly will kill a HDD just as much as a SSD (probably even more so considering the 4K-ness of it).
  • Made (a series of) mistakes rejigging stuff and long story short, accidentally ran a pre-clear on my SM951 to about 10%. Promptly have reallocated sectors and unrecoverable errors. So I now decided to run the SM951 to the ground (i.e. use it for temp space). It's an AHCI PCIe M.2 (i.e. basically a glorified SATA controller before NVMe was a thing - I was an early adopter of the M.2 PCIe SSD you see) so it's not too bad.
  • Added a Samsung PM983 3.84TB NVMe SSD. This is a 22110 form factor and it runs hot - on average 10 degrees hotter than the 970 EVO. Both shows up with identical ID, probably because they use the same controller, so I reckon the extra heat is due to the extra capacity.
  • My workstation VM now only runs on PCIe-passed-through NVMe (well except the boot vdisk). The Crucial MX300 2TB is now cache (it just had its first reallocated sector a few weeks ago). The Samsung 850 EVO 2TB is used as a NAS disk (basically an overflow for the workstation VM).
Link to comment
9 hours ago, testdasi said:

Discovered that the bottom right M.2 slot (the 2280 size), for whatever the F reason, is in the same IOMMU group as (a) both wired LAN ports, (b) wireless LAN, (c) SATA controller and (d) the middle PCIe slot. Hence, it practically cannot be passed through via the PCIe method (need ACS multifunction override which lags - see above).

On almost all x399 boards if they have 3 M.2 slots, one is connected via the chipset to the CPU and this is the reason why all these other devices are in the same group. All are attached to the chipset.

 

9 hours ago, testdasi said:

Added a Samsung PM983 3.84TB NVMe SSD. This is a 22110 form factor and it runs hot - on average 10 degrees hotter than the 970 EVO. Both shows up with identical ID, probably because they use the same controller, so I reckon the extra heat is due to the extra capacity.

Which slot is used? Near the gpu? if so, it's normal that the temperature is higher. 

Link to comment

I've put one of these in my case blowing on the WD Black NVMe drive (& other components) which runs quite hot. Unfortunately, I've forgotten it was in there and needs to be powered back on after power off and now the WD Black didn't show up after my last reboot. 😁

 

https://www.amazon.com/Fancii-Personal-Portable-Whisper-Technology/dp/B072DSHKCH/

 

I'm also really glad I re-read through this post, the tips on the slow/fast cores comes in quite handy now that I'm working on a new Ryzen build with a different MB. I'll be taking in four Brio web cams and then outputting them over NDI so I'd like to squeeze every bit of speed out of the Win VM.

Edited by jbartlett
Link to comment
16 hours ago, bastl said:

Which slot is used? Near the gpu? if so, it's normal that the temperature is higher. 

This PM983 just runs hot. I tested both upper (under the GPU) and lower slots and the lower one is actually hotter by 3-4 degrees. My hypothesis is that the GPU fan creates air flow in just the right direction which helps the SSD dissipate more heat than it receives from the GPU.

 

2 hours ago, jbartlett said:

I've put one of these in my case blowing on the WD Black NVMe drive (& other components) which runs quite hot. Unfortunately, I've forgotten it was in there and needs to be powered back on after power off and now the WD Black didn't show up after my last reboot. 😁

 

https://www.amazon.com/Fancii-Personal-Portable-Whisper-Technology/dp/B072DSHKCH/

 

I'm also really glad I re-read through this post, the tips on the slow/fast cores comes in quite handy now that I'm working on a new Ryzen build with a different MB. I'll be taking in four Brio web cams and then outputting them over NDI so I'd like to squeeze every bit of speed out of the Win VM.

That is a cool idea! Will see if I can get something similar on this side of the pond.

Link to comment
On 11/7/2018 at 8:19 PM, Jerky_san said:

I was wondering do you have a plex server and if you do does your main VM get really slow when people are streaming from it? My main VM has all of its own cores completely dedicated to it but the minute someone starts watching something from plex my system's performance nose dives with massive stuttering. I can't figure out why either and its starting to drive me a little crazy.

 

If you look at CPU utilization with htop nothing is really crossing 50%. Memory is 60% used. The VM has its own dedicated SSD. It makes like 0 sense why the stuttering and crap starts occurring. It usually begins with voice distortion on discord and turns into a full blown mess. Games even like factorio that are basically just CPU and memory bound turn to crap with me getting less than 20fps.

No problem at all with GPU pass through. 

Link to comment
4 hours ago, J05u said:

I dont which magic you doing with VM - for me VM's are unstoppable headaches, once i make one working. Creating second one with another GPU and use - and in result both are not working. I want to kill myself

No magic mate. Just a lot of trial and error. And the zen acceptance that some stuff is impossible to do (e.g. passing through the middle PCIe 2.0 slot to a VM due to IOMMU group needing ACS multifunction override which lags on unRAID 6.6.0+).

Link to comment

Some more updates:

1. Updated to F12e BIOS and found out it changes the CPU core numbering!!!. Previously it was 0 + 1 numbering scheme. F12e changed it to 0 + 32 numbering scheme - if I can trust numactl.

numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
node 0 size: 48268 MB
node 0 free: 570 MB
node 1 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
node 2 size: 48355 MB
node 2 free: 352 MB
node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
node 3 size: 0 MB
node 3 free: 0 MB
node distances:
node   0   1   2   3
  0:  10  16  16  16
  1:  16  10  16  16
  2:  16  16  10  16
  3:  16  16  16  10

 

 

2. On the bright side, it gave me the motivation to reoptimise my core assignment and VM/docker usage. Shamelessly forked binhex's privoxy VPN and consol's VNC Web Browser to make some changes to make those dockers work better for my very niche purposes. That helps kill off 3 VMs.

 

3. Bought a M.2 -> PCIe adapter to see if I can move the PM983 to the vacant bottom PCIe slot (so I can add more M.2 drives) and then found out that slot is in the same group as the middle PCIe slot and the bottom right M.2 slot and the LAN ports. #facepalm. So have to revert and wasted 2 hours of my life.

Edited by testdasi
Link to comment
3 hours ago, jbartlett said:

Might be a good idea to retry your benchmarks. The new BIOS seems to improve processing

I reran my benchmark but saw no major difference vs F11e with pure CPU work load. If anything it's about 2% slower, could be due to Windows security patches.

On the bright side, that means I can trust numactl core numbering.

 

I did notice GPU-assisted encoding was significantly better. 4k encoding 15% better, 1080p a whooping 37% better! While I'm happy with that, I don't think it's BIOS related (or maybe it is, I don't know).

 

It does seem to run about 1-2 degrees cooler, similar to the post you quoted.

Link to comment
  • 2 weeks later...
On 6/6/2019 at 5:30 AM, testdasi said:

If anything it's about 2% slower

I got a 2% increase in speed by disabling the CPU mitigations. Could be coincidence since I'm testing a different MB. My main rig has the same MB as you, I'll see about doing a benchmark when I get around to updating it's BIOS. Currently, it's in "It works, don't frack with it" status.

Link to comment

Some major updates:

  • Many thanks to @DZMM for helping me with rclone. Rclone has allowed me to move most of my content up to the cloud, run parity less (better speed) and reduce my local storage needs (got rid of the 8TB Archive and 6TB Black - may get rid of the 8TB Iron Wolf in the future - my array now only serves as local backup and offline storage for some short-term content in case the Internet is down.).
  • Plex works perfectly fine with cloud storage, even on my old 150/150 connection. Nobody in the household noticed my move to the cloud. It was that transparent. I won't tell anyone and see how long it takes for someone to notice.
  • My upgrade to 1Gb/1Gb is more of a quality of life addition since I can do my offsite backup to the cloud more quickly. Will get rid of Crashplan as my offsite backup service. While they were great in the past, their recent move to business-only has exposed their shortcomings vs other business-level solutions. Speed and ease of restoration are the 2 biggest sore thumbs.
  • My workstation VM is "upgraded" from i440fx to Q35. I'm delighted to announce, the GPU fans no longer stop while idling without MSI Afterburner! :D That means my PM983 can idle under 45 degrees regularly. I think once Unraid moves on to a more recent qemu, the default Win10 template should use Q35, which has better support for PCIe. At the moment, I have to manually add the codes below to the xml; otherwise, my PCIe runs at x1.
  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
  </qemu:commandline>
  • The latest Linux kernel + F12 BIOS seem to make disabling Global C State Control less stable. For extremely strange reasons, that manifests as out of memory errors if I try to reserve more than 50% of RAM all at once (e.g. start my workstation VM). So if you disabled Global C State Control in the past and now seem to have some instability, maybe try enabling that with the latest BIOS.

 

Edited by testdasi
  • Like 1
Link to comment
Some major updates:
  • Many thanks to [mention=70898]DZMM[/mention] for helping me with rclone. Rclone has allowed me to move most of my content up to the cloud, run parity less (better speed) and reduce my local storage needs (got rid of the 8TB Archive and 6TB Black - may get rid of the 8TB Iron Wolf in the future - my array now only serves as local backup and offline storage for some short-term content in case the Internet is down.).
  • Plex works perfectly fine with cloud storage, even on my old 150/150 connection. Nobody in the household noticed my move to the cloud. It was that transparent. I won't tell anyone and see how long it takes for someone to notice.
  • My upgrade to 1Gb/1Gb is more of a quality of life addition since I can do my offsite backup to the cloud more quickly. Will get rid of Crashplan as my offsite backup service. While they were great in the past, their recent move to business-only has exposed their shortcomings vs other business-level solutions. Speed and ease of restoration are the 2 biggest sore thumbs.
  • My workstation VM is "upgraded" from i440fx to Q35. I'm delighted to announce, the GPU fans no longer stop while idling without MSI Afterburner! That means my PM983 can idle under 45 degrees regularly. I think once Unraid moves on to a more recent qemu, the default Win10 template should use Q35, which has better support for PCIe. At the moment, I have to manually add the codes below to the xml; otherwise, my PCIe runs at x1.
   value='-global'/> value='pcie-root-port.speed=8'/> value='-global'/> value='pcie-root-port.width=16'/>

  • The latest Linux kernel + F12 BIOS seem to make disabling Global C State Control less stable. For extremely strange reasons, that manifests as out of memory errors if I try to reserve more than 50% of RAM all at once (e.g. start my workstation VM). So if you disabled Global C State Control in the past and now seem to have some instability, maybe try enabling that with the latest BIOS.
 
Just curious how much did you move to the cloud?

Sent from my SM-N960U using Tapatalk

Link to comment
1 hour ago, ijuarez said:

Just curious how much did you move to the cloud?

About 25TB.

 

Took me 10 days, 2 of which on 150Mbps, 3 on 250Mbps (because my old router can't handle gigabit) and the rest on 1Gbps.

Average speed is about 30MB/s but given 2/3 of the object counts are my offsite backup which are full of tiny files in the KB range, that's not too shabby.

 

Compare that with Crashplan which last took 10 days to backup 200GB worth of files in the MB ranges and I'd say it's a huge improvement.

Link to comment
5 minutes ago, testdasi said:

About 25TB.

 

Took me 10 days, 2 of which on 150Mbps, 3 on 250Mbps (because my old router can't handle gigabit) and the rest on 1Gbps.

Average speed is about 30MB/s but given 2/3 of the object counts are my offsite backup which are full of tiny files in the KB range, that's not too shabby.

 

Compare that with Crashplan which last took 10 days to backup 200GB worth of files in the MB ranges and I'd say it's a huge improvement.

wow how much is that costing you?

Link to comment
On 6/27/2019 at 10:21 AM, testdasi said:

Some major updates:

  • Many thanks to @DZMM for helping me with rclone. Rclone has allowed me to move most of my content up to the cloud, run parity less (better speed) and reduce my local storage needs (got rid of the 8TB Archive and 6TB Black - may get rid of the 8TB Iron Wolf in the future - my array now only serves as local backup and offline storage for some short-term content in case the Internet is down.).
  •  

I've just followed your lead and removed my parity drive as well - can't believe I didn't consider this before as all my media is in the cloud and my personal files are backed up there as well, so I have limited local storage needs now.

 

On 6/27/2019 at 10:21 AM, testdasi said:

My workstation VM is "upgraded" from i440fx to Q35. I'm delighted to announce, the GPU fans no longer stop while idling without MSI Afterburner! :D That means my PM983 can idle under 45 degrees regularly. I think once Unraid moves on to a more recent qemu, the default Win10 template should use Q35, which has better support for PCIe. At the moment, I have to manually add the codes below to the xml; otherwise, my PCIe runs at x1.

Hmm this is one for my to investigate as I have 3 W10 VMs on 247 running i440fx

Link to comment
2 hours ago, DZMM said:

Hmm this is one for my to investigate as I have 3 W10 VMs on 247 running i440fx

I think Unraid defaulting to i440fx is a legacy thing because Q35 used to be rubbish with Windows some years ago with the old qemu.

From Unraid 6.6 onwards though, the progress made in qemu was good enough to make Q35 at least on par with i440fx and potentially even superior with regards to PCIe passthrough. I remember seeing posts in forum about some GPU that would not passthrough without Q35.

 

A few tips for you:

  • Q35 doesn't like SeaBIOS + Windows. So make sure you are on OVMF i.e. boot UEFI before switching.
  • Start a new template. I think the i440fx pci -> Q35 pcie move is way too complicated for the Unraid GUI to handle.
  • Don't forget that last bit of code to add manually to the xml. Wasted me an hour wondering why the F my 1070 doesn't work.
  <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
  </qemu:commandline>

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.