Jump to content

testdasi

Members
  • Posts

    2,812
  • Joined

  • Last visited

  • Days Won

    17

Everything posted by testdasi

  1. What cores did you pin? 4-7 + 12-15? If so try Pinning 2-3,6-7,10-11,14-15 Keep 4-7 + 12-15 but put your GPU on a different slot If I understand the diagram correctly (https://en.wikichip.org/w/images/7/75/amd_zen_octa-core_die_shot_(annotated).png) each die has 2 CCX with each having 16 PCIe lane connected to it. So it's quite possible that the bottleneck isn't CPU or GPU but latency having to constantly jump over a CCX to get to the GPU. Threadripper is quite different from Ryzen in term of optimisation. Try spreading your pins out across all 2 dies and 4 CCX (something like 3,7,11,15,19,23,27,31). Alternatively, pin only the cores that are connected directly to your GPU. The latter will give you best theoretical performance but does involve a bit of trial and error. The former at least reduces some jumping and easy to do and test.
  2. I have "reliably" caused "Out of memory" crashes under the following condition: Pin dies without direct memory access to some dockers Put those dockers under high load (like running 3 simultaneous Handbrake dockers transcoding H265) Leave about 1%-2% of memory free (about 80% occupied by VM / dockers, 18-19% by RAM cache) Under this condition, after about 10-15 minutes, some processes are automatically killed by unRAID with Out of Memory error. That's unexpected in this scenario because there's constantly 1%-2% memory totally free (manually monitored) + 18% "buffer" (RAM cache). Spreading the pinned cores across all 4 dies do not lead to this Out of Memory error so I suspect this has something to do with Threadripper 2 optimisation so probably a kernel problem and not actually unRAID bug. At this point, it's more an annoyance for me but I'm sure there's someone out there who might be caught off guard.
  3. +1 The ability to shutdown a VM and automatically starts a different one without touching the GUI is extremely useful for me. I have multiple templates of my workstation VM (using 8 cores, 16 cores, 24 cores) that I'd like to switch to depending on what my current work is and what is running on the server. Adobe Lightroom lags like crazy with 24 cores. Adobe Premiere scales a lot better.
  4. Can you? Sure. Should you? Unlikely. The underlined statement suggests you have a 2nd machine that for sure can benefit from having a fast NVMe SSD. Using your NVMe for unRAID cache (and only cache) is a massive waste and given the 256GB size also a massive inconvenience because it will fill up super quickly.
  5. So apparently my test 6 happens to be testing all the slow cores (no direct memory access). Will need to retest the fast cores once my data migration is done. ~# numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 48208 MB node 0 free: 350 MB node 1 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 2 size: 48354 MB node 2 free: 4680 MB node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 node 3 size: 0 MB node 3 free: 0 MB node distances: node 0 1 2 3 0: 10 16 16 16 1: 16 10 16 16 2: 16 16 10 16 3: 16 16 16 10
  6. Apparently there's already commands to tell which core is on which die. @bastl @Jcloud Perhaps you guys can try to see what shows up? ~# numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 48208 MB node 0 free: 350 MB node 1 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 2 size: 48354 MB node 2 free: 4680 MB node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 node 3 size: 0 MB node 3 free: 0 MB node distances: node 0 1 2 3 0: 10 16 16 16 1: 16 10 16 16 2: 16 16 10 16 3: 16 16 16 10 Apparently can even check which VM is using how much RAM connecting to which node: ~# numastat qemu Per-node process memory usage (in MBs) PID Node 0 Node 1 Node 2 ----------------------- --------------- --------------- --------------- 33117 (qemu-system-x86) 1751.71 0.00 2442.32 33297 (qemu-system-x86) 2840.03 0.00 1326.58 82938 (qemu-system-x86) 28445.78 0.00 20757.30 91591 (qemu-system-x86) 182.21 0.00 8052.15 ----------------------- --------------- --------------- --------------- Total 33219.73 0.00 32578.35 PID Node 3 Total ----------------------- --------------- --------------- 33117 (qemu-system-x86) 0.00 4194.02 33297 (qemu-system-x86) 0.00 4166.61 82938 (qemu-system-x86) 0.00 49203.09 91591 (qemu-system-x86) 0.00 8234.37 ----------------------- --------------- --------------- Total 0.00 65798.09
  7. Shameless plug: I already did some testing in my build topic.
  8. I asked the question a short while ago about isolating core 0 and what happens. Theoretically unRAID should avoid the core but now that I think about it, I don't think that's the case. And this draws from my experience with isolating cores and then assign the isolated cores to a docker. The docker would end up using ONE of the cores to 100%. A docker is part of what you would call "unRAID" (since it's part of the host). That means isolation actually doesn't prevent the host to use the core. My hypothesis is that a process doesn't know if the core is isolated or not until it starts and checks the isolation list and/or being told "you naughty process, you can't use this". But since it already hold the cores, it will continue to do whatever it wants to do until done like it doesn't care. But it's also prevented from using any other isolated core. So until this is fully resolved, the old advice to keep core 0 (and its SMT sister) free would still be in effect. That is complicated by the inconsistency in core pair display in different envi. My 2990WX shows 0 paired with 1 (so not 0 paired with 32 as yours). Your Zenith X399 must be doing something very different.
  9. So apparently, the 2990WX all-core turbo is 3.4GHz. Note: this was on F10 BIOS. # grep MHz /proc/cpuinfo cpu MHz : 3315.662 cpu MHz : 3302.891 cpu MHz : 3382.593 cpu MHz : 3384.594 cpu MHz : 3389.368 cpu MHz : 3389.600 cpu MHz : 3391.838 cpu MHz : 3390.623 cpu MHz : 3392.705 cpu MHz : 3397.049 cpu MHz : 3389.122 cpu MHz : 3384.777 cpu MHz : 3393.248 cpu MHz : 3393.420 cpu MHz : 3393.441 cpu MHz : 3393.442 cpu MHz : 3386.566 cpu MHz : 3378.696 cpu MHz : 3393.268 cpu MHz : 3392.793 cpu MHz : 3388.878 cpu MHz : 3392.872 cpu MHz : 3393.441 cpu MHz : 3393.330 cpu MHz : 3393.136 cpu MHz : 3391.281 cpu MHz : 3393.417 cpu MHz : 3393.139 cpu MHz : 3391.659 cpu MHz : 3393.042 cpu MHz : 3392.735 cpu MHz : 3390.230 cpu MHz : 3390.927 cpu MHz : 3399.651 cpu MHz : 3393.443 cpu MHz : 3393.257 cpu MHz : 3398.353 cpu MHz : 3393.405 cpu MHz : 3393.446 cpu MHz : 3393.409 cpu MHz : 3393.484 cpu MHz : 3392.372 cpu MHz : 3393.442 cpu MHz : 3393.443 cpu MHz : 3393.363 cpu MHz : 3392.820 cpu MHz : 3393.443 cpu MHz : 3393.308 cpu MHz : 3392.475 cpu MHz : 3393.030 cpu MHz : 3375.652 cpu MHz : 3363.026 cpu MHz : 3393.333 cpu MHz : 3393.370 cpu MHz : 3393.444 cpu MHz : 3393.236 cpu MHz : 3388.671 cpu MHz : 3392.779 cpu MHz : 3391.320 cpu MHz : 3393.352 cpu MHz : 3393.198 cpu MHz : 3393.226 cpu MHz : 3393.170 cpu MHz : 3392.884
  10. I think the red dots are due to the forum feature to pop up a "quote selection" button. Doesn't affect Safari apparently.
  11. Problems: The case has 8 expansion slots, your 4 GPU will occupy all 8. You won't have space for the USB card. You either need a new case or some creative modding. The 2nd GPU will cover the middle PCIe slot so you will need creative use of PCIe extender(s) to make it work. The Taichi X399 middle slot is PCIe x1 (albeit with open end) so I'm not sure your 4-controller PCIe x4 USB card is going to work in that slot. Theoretically it will just be slower. Theoretically, you need a case with at least 10 expansion slots: GPU1 (x2) - extender in - USB - GPU2 (x2) - GPU3 (x2) - GPU4 via exender out (x2). It's still not going to be easy to (a) get an extender to stretch over 5+ slots over 2 big GPUs and (b) your GPU3 + 4 will completely cover all the ports at the bottom of the board so access is going to be a massive pain. Also perhaps consider a Gigabyte board since it has full-length middle slot. I would also recommend opting for compact GPUs but it looks like you guys are reusing your existing stuff so that's not an option then. You might want to think simpler. There's no need for a separate USB controller if hot-plugging isn't a requirement. You can pass through individual USB devices to unRAID. If you all use exactly the same model of peripherals, it's going to be a massive pain to identify things and edit xml but should still work. The motherboard has 2 separate USB 3.0 controllers that can be pass-through to VM (in addition to a shared USB 3.1 controller that can be used for individual USB devices pass-through). So if you can live with just having 2VMs with dedicated controllers and 2 VMs with no hot-plug (preferably the 2 with distinctly different peripherals) then that simplifies things. In short, take the USB controller out of the question and the build might just work.
  12. Do you know which AGESA was the proper fix in? That probably helps the TR peeps to know for sure the min BIOS to use.
  13. All of my USB 2.0, internal USB 3.0 and 3.1 gen2 show up under the same USB 3.1 controller so I'm guessing there's no chance for me to pass through the 3.1 controller since that is shared with the unRAID stick. Funny enough, I don't have any Asmedia device! Only AMD - and all my USB port works so perhaps it's a 2nd gen TR thing. Not a big deal for me since I only need 2 USB 3.0 controllers for my main Win and Mac VM.
  14. Theoretically (based on reference below), unRAID can support vmdk but mine doesn't work Directly edit in template -> disk not showing up Convert using qemu-img leads to an all-blank raw img file so naturally doesn't work When I use the latest qemu-img for Windows, it converts the vmdk to raw correctly (in fact already using it) so I suspect probably old version is used by unRAID. Reference:
  15. Thanks to the magic of KVM, I now have MacOS running on an old Surface 3. 😁
  16. For nvme drives, you need to pass it through via PCIe passthrough for best performance. It's not a SATA device.
  17. Memory interleaving may be the difference because it relates to the Threadripper design. A Threadripper CPU is essentially equivalent to a dual-CPU / quad-CPU in the server world, which leads to the UMA / NUMA distinction. When the CPU is in UMA mode, memory is interleaved and exposed to both dies with priority for throughput. When in NUMA mode, there's no interleaving and each die access its own memory bus first and then the other die i.e. priority for better latency. In other words, UMA treats the CPU as one unit and NUMA treats each die as its own CPU. For the 1950X, UMA / NUMA can be selected. For the 2990WX, for the same reasons that you mentioned, only NUMA mode is available. So when it comes to pairing logical cores to physical cores, it might be done incorrectly in UMA if the numbering is based on NUMA. It also makes sense why the 2990WX has a different numbering scheme since NUMA is the only option. Of course, that's just my hypothesis since I can't turn on interleaving on my 2990WX to test.
  18. So I updated to the latest template. Is there anyway to pass additional parameters to openvpn? My idea is to use remote-random to allow the docker to pick a random server at every restart. The section of code below is deleted every time the docker start and replace with the server in the template. Only the remote-random line remains so I'm guessing something was set up to remove any lines starting with "remote " remote-random remote de-berlin.privateinternetaccess.com 1197 remote de-frankfurt.privateinternetaccess.com 1197 remote czech.privateinternetaccess.com 1197 remote france.privateinternetaccess.com 1197 remote ro.privateinternetaccess.com 1197 remote spain.privateinternetaccess.com 1197 remote swiss.privateinternetaccess.com 1197 remote sweden.privateinternetaccess.com 1197
  19. @bastl: One thing I can think of - did you have memory interleaving on or off? Unfortunately I can only wish that I have a 1950X laying around. I aint Linus.
  20. So here is a quick summary of my test results. I use barebone SMT-off as the base (since it's fastest). The % below is slower than base so lower is better. All tests done on Windows barebone / VM. SMT is on for VM. Nothing else is running while doing the tests (except for (8)) Barebone SMT on: 52% <-- yes SLOWER! VM 1-7, 17-23, 33-39, 49-55 (28 logical cores): 33% VM all odd numbers except 1, 17, 33, 49 (28 logical cores): 36% VM first 32 except 0, 8, 16, 24 (28 logical cores): 29% VM all odd numbers (32 logical cores): 30% VM last 32 (32 logical cores): 34% VM all odd numbers except 1, 9, 17, 25, 33, 41, 49, 57 (24 logical cores): 20% VM same as 7 but with 3 simultaneous transcodes on the even logical cores using dockers (24 logical cores): 56% My conclusions: (1), (7) and (8) says Windows is badly optimised for Threadripper 2 but Linux is much better. (3) - (7) seems to confirm what I was guessing. Each 8 logical cores represent 4 physical cores and thus 1 CCX. Spreading things evenly across more CCX improves performance. 3-3-3-3-3-3-3-3 is faster than 3-4-3-4-3-4-3-4! Linux is actually great with SMT optimisation so I'll stick to my weird-and-wonderful config moving forward.
  21. That looks similar to mine without acs override so I'm guessing it's some kind of a default setting for Threadripper BIOS.
×
×
  • Create New...