SpaceInvaderOne Posted July 16, 2017 Share Posted July 16, 2017 (edited) Hi, Guys. This is a series of 3 videos about tuning the unRAID server. It is a guide that is for the server as a whole but has a lot of information for VMs so I thought this forum section the best place to post this. Some of the topics are:- Cpu governor and enabling turbo boost About vCPUs and hyperthreading. How VMs and Docker containers and affect each other performance. Pinning cores to Docker containers. Using the same container with different profiles Allocating resources to Docker containers. Decreasing latency in VMs Using emuatorpin Isolating CPU cores Setting extra profiles in syslinux for isolated cores Checking wether cores have been correctly isolated Disabling hyperthreading Having unRAID manage vCPUs as opposed to vCPU pinning. Hope this video is interesting Part 1 Part 2 Part 3 Edited July 16, 2017 by gridrunner 3 6 Quote Link to comment
dlandon Posted July 17, 2017 Share Posted July 17, 2017 These videos are very good. I actually learned some things I didn't know. You did an excellent job with the cpu pinning and assignment. As you said it is as much art as science. There is no "one size fits all" solution. If there was, LT would just do things that way. The only thing I might comment on is the caching on a VM vdisk. From everything I've read, this is the best setup for performance on a vdisk: <driver name='qemu' type='raw' cache='directsync' io='native'/> I appreciate your compliments on the Tips & Tweaks plugin. I did it because I thought it appropriate to keep people off the command line where it is too easy to make mistakes, and give users an easy way to make adjustments. I had no idea the Performance governor and Turbo could make that much difference. Quote Link to comment
SpaceInvaderOne Posted July 20, 2017 Author Share Posted July 20, 2017 (edited) On 17/07/2017 at 9:52 PM, dlandon said: These videos are very good. I actually learned some things I didn't know. You did an excellent job with the cpu pinning and assignment. As you said it is as much art as science. There is no "one size fits all" solution. If there was, LT would just do things that way. The only thing I might comment on is the caching on a VM vdisk. From everything I've read, this is the best setup for performance on a vdisk: <driver name='qemu' type='raw' cache='directsync' io='native'/> I appreciate your compliments on the Tips & Tweaks plugin. I did it because I thought it appropriate to keep people off the command line where it is too easy to make mistakes, and give users an easy way to make adjustments. I had no idea the Performance governor and Turbo could make that much difference. Thanks @dlandon it means a lot if you think they are good. I have always used cache='none' as it's it meant to be equivalent to the host's disk performance wise. But yes you saying that I think it may be better using direct sync as it will use both o_dsync and 0_direct when interacting with the vdisk. The 'none' uses the disk write cache. I have read the io='native' is definitely better than io='threads' and works only with with o_direct . So if using either cache='none' or cache='directsync' they use o_direct so both would benefit from this setting. Yeah, I really like the tips and tweaks plugin. I was very happy when i found it I am surprised the turbo isn't the default unRAID setting for intel CPUs. Especially with how many people use gaming vms on unRAID. I have been thinking of making a script that checks which vms are running from virsh then if its my gaming vm, enabling the turbo and performance but when not running go back to powersave. Edited July 20, 2017 by gridrunner 1 Quote Link to comment
dlandon Posted July 20, 2017 Share Posted July 20, 2017 23 minutes ago, gridrunner said: Thanks @dlandon it means a lot if you think they are good. I have always used cache='none' as it's it meant to be equivalent to the host's disk performance wise. But yes you saying that I think it may be better using direct sync as it will use both o_dsync and 0_direct when interacting with the vdisk. The 'none' uses the disk write cache. I have read the io='native' is definitely better than io='threads' and works only with with o_direct . So if using either cache='none' or cache='directsync' they use o_direct so both would benefit from this setting. Yeah, I really like the tips and tweaks plugin. I was very happy when i found it I am surprised the turbo isn't the default unRAID setting for intel CPUs. Especially with how many people use gaming vms on unRAID. I have been thinking of making a script that checks which vms are running from virsh then if its my gaming vm, enabling the turbo and performance but when not running go back to powersave. You have obviously done more research on the vdisk caching than I have. It sounds like the 'none' works fine and is equivalent. I believe that Linux does set the Turbo mode on by default. I don't know why I did it that way, but T&T defaults Turbo to off. Maybe I need to re-think that. Probably what I should do is display the current Turbo setting in the status under the 'Governor:'. Quote Link to comment
SpaceInvaderOne Posted July 20, 2017 Author Share Posted July 20, 2017 10 minutes ago, dlandon said: You have obviously done more research on the vdisk caching than I have. It sounds like the 'none' works fine and is equivalent. I believe that Linux does set the Turbo mode on by default. I don't know why I did it that way, but T&T defaults Turbo to off. Maybe I need to re-think that. Probably what I should do is display the current Turbo setting in the status under the 'Governor:'. Ah Ok. I thought the turbo was set off as default on unRAID. I just ran cat /sys/devices/system/cpu/intel_pstate/no_turbo on my test server and see its on. Yes think tips and tweaks should have the turbo set to on as default leaving the user able to turn off if they want. Quote Link to comment
DZMM Posted August 4, 2017 Share Posted August 4, 2017 Thanks for the videos. Quick question re emulator pinning. I have a 14 core CPU and I have 4 VMs running pretty much 247 spread between 13 of the cores, with each one pinned to the first core (pinned 0, vm1 cores 1,2 vm2 3,4,5,6 vm3 7,8,9,10 vm4 11,12,13) In your video you suggest pinning all free cores. Do you think I should pin 0,3-13 for vm1, 0-2,7-13 for vm2 etc etc? Quote Link to comment
SpaceInvaderOne Posted August 12, 2017 Author Share Posted August 12, 2017 (edited) On 04/08/2017 at 8:30 PM, DZMM said: Thanks for the videos. Quick question re emulator pinning. I have a 14 core CPU and I have 4 VMs running pretty much 247 spread between 13 of the cores, with each one pinned to the first core (pinned 0, vm1 cores 1,2 vm2 3,4,5,6 vm3 7,8,9,10 vm4 11,12,13) In your video you suggest pinning all free cores. Do you think I should pin 0,3-13 for vm1, 0-2,7-13 for vm2 etc etc? 1 sorry @DZMM sorry for this late reply to your question. I suggest when using emulator pin you pin to a core that is free from use by the VM. I also have a 14 core CPU. Thread pairing for my 14 core Xeon looks like this. So i guess this is how your VMS pinned at present cpu 0 <===> cpu 14 . emulator pin to first core for all vms cpu 1 <===> cpu 15 . vm1 cpu 2 <===> cpu 16 vm1 cpu 3 <===> cpu 17 vm2 cpu 4 <===> cpu 18 . vm2 cpu 5 <===> cpu 19 . vm2 cpu 6 <===> cpu 20 . vm2 cpu 7 <===> cpu 21 vm3 cpu 8 <===> cpu 22 . vm3 cpu 9 <===> cpu 23 . vm3 cpu 10 <===> cpu 24 .vm3 cpu 11 <===> cpu 25 vm4 cpu 12 <===> cpu 26 vm4 cpu 13 <===> cpu 27 vm4 So yes you could emulator pin to core 0 (0,14) for all VMS. But I would only emulator pin if you are having latency issues with a particular VM Are you having issues with all of them? What are the VMS type? Are they Windows VMS, Linux etc and what are they used for? Are you running any docker containers on the server at the same time as these VMs?. Core 0 is favoured by unRAID so it is possible you could max out core 0 if things are happening on the server and 4 VMS emulator processes are pinned to core 0 (0,14) so there by It can adversely affect the VMs. If you have isolated the VMs cores in syslinux from unRAID then this would make this problem worse because unRAID would only be able to use core 0, so any docker containers it spins up unless those were manually pinned to other cores would sit on core 0. Do you really need 4 cores for 2 of your VMs. Could the 4 core VMs use 3 cores? then you could have like this cpu 0 <===> cpu 14 unRAID . dockers cpu 1 <===> cpu 15 emulator pin all vms dockers cpu 2 <===> cpu 16 emulator pin all vms dockers ---------vms------------------------ cpu 3 <===> cpu 17 . vm1 cpu 4 <===> cpu 18 . vm1 cpu 5 <===> cpu 19 . vm2 cpu 6 <===> cpu 20 . vm2 cpu 7 <===> cpu 21 . vm2 cpu 8 <===> cpu 22 . vm3 cpu 9 <===> cpu 23 . vm3 cpu 10 <===> cpu 24 . vm3 cpu 11 <===> cpu 25 . vm4 cpu 12 <===> cpu 26 . vm4 cpu 13 <===> cpu 27 . vm4 If you really need the 4 cores for those VMs, how about the other 2 VMs (the 3 core and the 2 core.) Are they using a lot of horse-power and/or low latency is essential. If not they could share cores whilst the 4 core VMs have their own. ie cpu 0 <===> cpu 14 unraid cpu 1 <===> cpu 15 emulator pin vm2 and vm3 cpu 2 <===> cpu 16 emulator pin vm2 and vm3 -----low cpu usuage vms------- cpu 3 <===> cpu 17 vm1 & vm4 cpu 4 <===> cpu 18 vm1 & vm4 cpu 5 <===> cpu 19 vm1 & vm4 ------important low latency 4 core vms---- (maybe isolate below cores in syslinuxcofig for exclusive use???) cpu 6 <===> cpu 20 . vm2 cpu 7 <===> cpu 21 . vm2 cpu 8 <===> cpu 22 . vm2 cpu 9 <===> cpu 23 . vm2 cpu 10 <===> cpu 24 . vm3 cpu 11 <===> cpu 25 . vm3 cpu 12 <===> cpu 26 . vm3 cpu 13 <===> cpu 27 . vm3 Anyway, I am not sure if I have answered your question or just rambled on. But really only emulator pin the VMS that need low latency. Don't be worried to share cores with other VMS that arent doing lots of number crunching or need low latency as in a gaming VM. When all your VMS are running look at the load on each core in the unRAID dash. That way you can see whats going on across the cores that have been pinned. Edited August 12, 2017 by gridrunner Quote Link to comment
DZMM Posted August 12, 2017 Share Posted August 12, 2017 Thanks gridrunner - that helped me decide what to do. My 4 VMs are: VM1&2: Used by my young kids. They don't do much heavy duty, but their gaming is increasing. Probably could get away with less than 3 cores, but my overall CPU usage rarely goes >50%, so for now I'm indulging them VM3: My daily driver. Giving myself 4 cores because I'm a bloke, even though I probably don't need them VM4: pfsense. At the moment the CPU usage is also low, but might pick up as my linespeed goes up (hopefully soon as I'm on 19/1) I've adjusted my cores to: cpu 0 <===> cpu 14 . Unraid cpu 1 <===> cpu 15 . Emulator Pin for VMs cpu 2 <===> cpu 16 vm1 cpu 3 <===> cpu 17 vm1 cpu 4 <===> cpu 18 . vm1 cpu 5 <===> cpu 19 . vm2 cpu 6 <===> cpu 20 . vm2 cpu 7 <===> cpu 21 vm2 cpu 8 <===> cpu 22 . vm3 cpu 9 <===> cpu 23 . vm3 cpu 10 <===> cpu 24 .vm3 cpu 11 <===> cpu 25 vm3 cpu 12 <===> cpu 26 vm4 cpu 13 <===> cpu 27 vm4 I think you're right and I was taxing Core 0 too hard, so moving the emulators to another core will hopefully help. I've pinned all as the usage on core 1 never seems to max out, so I don't think there's a lot of overhead My docker usage can get high thanks to mainly Plex, but even when transcoding, unraid seems to intelligently spread the usage across a number of cores. It seems to touch the higher cores last, so that's why I've put VMs 3&4 on the higher cores. In the future I might pin dockers, but I honestly think unraid does a good job of spreading the load and I don't want to restrict it's options. I haven't really had a situation where a docker has impacted VM usage yet, although I do occasionally get a blip on video playback that could be down to sabnzbd activity - hopefully having two cores that aren't used by VMs will help unRAID better balance docker usage. Quote Link to comment
TinkerToyTech Posted August 12, 2017 Share Posted August 12, 2017 I too would like to thank @gridrunner for all of his videos. What I have found is that after watching them 3-5 times, if I want to fully implement what you spell out in a given video, there's say 5 tips, each with 5 steps, that is at least 25 pauses, rewinds, zooms, etc to be sure I have gotten it perfect. I'm a total noob when it comes to linux, I had some exposure to BSD Unix about 30 years ago but I was mainly playing Hack, rogue, hunt, advent et. al and reading/posting in the newsgroups. @gridrunner, have you joined Patreon, or have a public facing PayPal? If you are, I have a tip for you. @TinkerToyTech Quote Link to comment
allanp81 Posted August 13, 2017 Share Posted August 13, 2017 On 17/07/2017 at 9:52 PM, dlandon said: These videos are very good. I actually learned some things I didn't know. You did an excellent job with the cpu pinning and assignment. As you said it is as much art as science. There is no "one size fits all" solution. If there was, LT would just do things that way. The only thing I might comment on is the caching on a VM vdisk. From everything I've read, this is the best setup for performance on a vdisk: <driver name='qemu' type='raw' cache='directsync' io='native'/> I appreciate your compliments on the Tips & Tweaks plugin. I did it because I thought it appropriate to keep people off the command line where it is too easy to make mistakes, and give users an easy way to make adjustments. I had no idea the Performance governor and Turbo could make that much difference. For me personally, I see massive latency spikes if I change it to anything suggested in this thread. Leaving it on cache='writeback' gives me greens the whole time. Quote Link to comment
NotYetRated Posted January 16, 2018 Share Posted January 16, 2018 Awesome videos, hoping tips here will help me with my 2 VM's experiencing audio/video desync as well as random audio static type issues I have been experiencing. Sorry if I missed this, but how are people checking latency on their VM's? Quote Link to comment
SSD Posted February 18, 2018 Share Posted February 18, 2018 Hey - had a thought as I was setting up my new 12 core server. My two main sources of CPU stress come from my Plex Docker and from my WIndows VM. And when one is working hardest, normally the other is pretty quiet. So I prefer not to limit cores which allows the task needing the cores to have them all available. From prior experience, if I let them share all their cores, I have issues with the Windows VM slowing while heavy transcoding is happening in Plex. To address, I had excluded the VM's first core from Plex, and that helped a lot, but VM still a bit sluggish. (This was on my old 4 core server that I experimented with this). Holding back a full core from Plex was sort of a big deal though, but couldn't be avoided because otherwise the VM was unusable. But as I was thinking, seems both are using cores in the same order. What if I could let Plex continue to use cores from lowest core up, but let the VM use cores from last core down, as shown below: Default: Plex: 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 VM: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 New: Plex: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 VM: 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 They should interfere with each other less. I still have the first core used by each excluded from the other, at least for now. Hoping that will mean less interference in general. But not 100% sure. The nature of Plex may be to transcode using all available cores uniformly, but this can't hurt I don't think. Have not done much transcoding yet, but think this might be worth trying to allow cores to be shared most effectively. Btw - the core mappings have to be manually editted in the XML, from: <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='13'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='14'/> <vcpupin vcpu='4' cpuset='3'/> <vcpupin vcpu='5' cpuset='15'/> etc. to: <vcpupin vcpu='0' cpuset='11'/> <vcpupin vcpu='1' cpuset='23'/> <vcpupin vcpu='2' cpuset='10'/> <vcpupin vcpu='3' cpuset='22'/> <vcpupin vcpu='4' cpuset='9'/> <vcpupin vcpu='5' cpuset='21'/> etc. Apologies if this is covered in the video. I didn't go back and rewatch but don't remember seeing it. Quote Link to comment
DZMM Posted February 18, 2018 Share Posted February 18, 2018 48 minutes ago, SSD said: Hey - had a thought as I was setting up my new 12 core server. My two main sources of CPU stress come from my Plex Docker and from my WIndows VM. And when one is working hardest, normally the other is pretty quiet. So I prefer not to limit cores which allows the task needing the cores to have them all available. From prior experience, if I let them share all their cores, I have issues with the Windows VM slowing while heavy transcoding is happening in Plex. To address, I had excluded the VM's first core from Plex, and that helped a lot, but VM still a bit sluggish. (This was on my old 4 core server that I experimented with this). Holding back a full core from Plex was sort of a big deal though, but couldn't be avoided because otherwise the VM was unusable. But as I was thinking, seems both are using cores in the same order. What if I could let Plex continue to use cores from lowest core up, but let the VM use cores from last core down, as shown below: Default: Plex: 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 VM: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 New: Plex: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 VM: 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 They should interfere with each other less. I still have the first core used by each excluded from the other, at least for now. Hoping that will mean less interference in general. But not 100% sure. The nature of Plex may be to transcode using all available cores uniformly, but this can't hurt I don't think. Have not done much transcoding yet, but think this might be worth trying to allow cores to be shared most effectively. Btw - the core mappings have to be manually editted in the XML, from: <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='13'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='14'/> <vcpupin vcpu='4' cpuset='3'/> <vcpupin vcpu='5' cpuset='15'/> etc. to: <vcpupin vcpu='0' cpuset='11'/> <vcpupin vcpu='1' cpuset='23'/> <vcpupin vcpu='2' cpuset='10'/> <vcpupin vcpu='3' cpuset='22'/> <vcpupin vcpu='4' cpuset='9'/> <vcpupin vcpu='5' cpuset='21'/> etc. Apologies if this is covered in the video. I didn't go back and rewatch but don't remember seeing it. I kind of do this with my VMs - my kids VMs are on the lower cores, and mine and the pfsense VM are on the highest cores. I don't know if it helps, but when I'm bored and watching the cores activity the higher cores do seem to get less use. Quote Link to comment
Chamzamzoo Posted March 16, 2018 Share Posted March 16, 2018 I have had a great time setting up my UnRaid server and going through all your videos Gridrunner, they have made the setup and tweaking process a joy All tips work great for me except ISOLCPUS which dramatically reduced performance in my case (by dropping 1 core it seems). Has anyone seen this before? I have three cores and 6 threads pinned to my gaming VM which gives good benchmarks in aida64. When I put ISOLCPUs on, the score plummeted by 30-50% and only used two cores while running the test (before it used all three it had pinned). This is a Threadripper system so it has two memory controllers and other semi exotic features. Perhaps the memory score is going down because it's losing access to a core (less access to memory), which is born out by the CPU usage I'm seeing also. But why ISOLCPUS loses me a core, I do not know. This gives a aida64 cache and memory result of: Now, with those CPUS isolated as follows: Quote label unRAID OS menu default kernel /bzimage append isolcpus=1-3,9-11 initrd=/bzroot I get this: and a peak at system status while it's running show that only 2 cores are being used. The same test without ISOLCPUS used all 3 of the cores and gained a better score. Any thoughts? On 16/01/2018 at 11:24 PM, NotYetRated said: Awesome videos, hoping tips here will help me with my 2 VM's experiencing audio/video desync as well as random audio static type issues I have been experiencing. Sorry if I missed this, but how are people checking latency on their VM's? I also would like to know what test others are running here, it's not clear to me if there's any popular testing tools that this forum generally uses. Quote Link to comment
steve1977 Posted October 3, 2019 Share Posted October 3, 2019 This is a very interesting thread. No clue how I could have missed this one earlier. Not sure whether my question should be in a separate thread, but let me start here. Happy to open a new one if fits? I noticed that Intel has implemented several version of turbo boost: turboboost, turboboost 2.0 and even turboboost 3.0. I am planning to upgrade my 7800X to the newly released 10980XE. But want to make sure that my gaming VM would benefit from the maximum single core boost of 4.8GHz. I'd need this particularly for emulator games that rely on single-core performance. Any thoughts whether turboboost 3.0 would be supported within a Windows VM? https://blogs.forbes.com/antonyleather/files/2019/10/intel-cascade-lake-x-2.png Quote Link to comment
steve1977 Posted December 3, 2019 Share Posted December 3, 2019 On 10/3/2019 at 4:39 PM, steve1977 said: This is a very interesting thread. No clue how I could have missed this one earlier. Not sure whether my question should be in a separate thread, but let me start here. Happy to open a new one if fits? I noticed that Intel has implemented several version of turbo boost: turboboost, turboboost 2.0 and even turboboost 3.0. I am planning to upgrade my 7800X to the newly released 10980XE. But want to make sure that my gaming VM would benefit from the maximum single core boost of 4.8GHz. I'd need this particularly for emulator games that rely on single-core performance. Any thoughts whether turboboost 3.0 would be supported within a Windows VM? https://blogs.forbes.com/antonyleather/files/2019/10/intel-cascade-lake-x-2.png Any thoughts? Quote Link to comment
Koenig Posted June 8, 2020 Share Posted June 8, 2020 I just built a server with a threadripper and I'm letting Unraid, dockers and some VM's that don't use much resources ( mail-server, home-assistant, XP, tvheadend and some various others) share my first 8 cores, then I have two others that need all the performance they can get out their assigned cores. My question is: Is it better to pin the emulator pins of the VM's last mentioned to just a single core of the first eight or to all eight and let Unraid handle the load balancing? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.