• 6.8.3 disk activities does not respect isolcpus


    testdasi
    • Minor

    My syslinux boot has:

    isolcpus=32-63

     

    Originally I discovered that when I run btrfs scrub (Main -> diskx -> scrub), it doesn't respect the isolcpus i.e. it uses the cores that are supposed to be isolated (see attached screenshot).

    After a few tries, I also found other activities e.g. copying files between disks (simple cp command on console) also does not respect isolcpus.

    This looks to be system independent as I can reproduce this even on Unraid as a VM.

     

    BTRFS scrub.PNG




    User Feedback

    Recommended Comments



    21 hours ago, Jeffarese said:

    I created a post about this bug. I can reproduce too with R9 3900x.

    Have you tried adding nohz_full and rcu_nocs to your syslinux?

    It seems to have worked on my server but I'd like to double check on a different config just to be sure it ain't placebo effects.

    • Like 1
    Link to comment
    1 hour ago, testdasi said:

    Have you tried adding nohz_full and rcu_nocs to your syslinux?

    It seems to have worked on my server but I'd like to double check on a different config just to be sure it ain't placebo effects.

     

    I'm going to try this today on my 3900X as I have a fair bit of activity on the isolated cpu cores even with the VMs shutdown. I'll report back if this has had any impact on on my 3900x.

     

    isolcpus=4-12,16-23 nohz_full=4-12,16-23 rcu_nocs=4-12,16,23

    Link to comment
    1 hour ago, testdasi said:

    Have you tried adding nohz_full and rcu_nocs to your syslinux?

    It seems to have worked on my server but I'd like to double check on a different config just to be sure it ain't placebo effects.

     

    I'm going to try this today on my 3900X as I have a fair bit of activity on the isolated cpu cores even with the VMs shutdown. I'll report back if this has had any impact on on my 3900x.

     

    isolcpus=4-12,16-23 nohz_full=4-12,16-23 rcu_nocs=4-12,16-23

     

    Just applied this and rebooted. Cores 6-11 plus the HT 16-23 cores have 0% of usage now with my gaming VM shutdown. Before they would get spikes and hovered around 3 to 5%.

     

    With the VM running but idle, I'm sitting with core 6 at about 3 to 5%  and the rest 0-1%.  I wonder how much this will affect gaming performance as I was getting a little studdering while gaming. 

     

    Thank you for this tip!

    Link to comment

    I haven't tried, but it seems it would fix my issues since they solved it for Chess with my same CPU.

     

    What do those two settings do? Any downside?

    Link to comment
    1 hour ago, Jeffarese said:

    What do those two settings do? Any downside?

     

    This is getting a bit technical, but if you want to get more info on these settings, here is NO_HZ. It is around stopping clock tickets for idle CPUs. 

     

    https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt

     

    And the other. Read Copy Update Ryzen fix...

     

    https://utcc.utoronto.ca/~cks/space/blog/linux/KernelRcuNocbsMeaning

     

    I gamed (streaming using Moonlight) last night for hours, and had a constant 60 FPS with no drops. This is with Fallout 4 (yes it's a little old) and the frame rate is limited to 60, but up to now I was getting some studdering off and on. Not constant, but annoying. I still find the game studders a little when you first start it up, but after a few mins of game play it's gone. I'll test out The Witcher 3 a little over the next few days to see if the studdering is gone in that game as well, but I think this has helped. 

    Link to comment

     I tested it with:

    isolcpus=5-11,17-23 nohz_full=5-11,17-23 rcu_nocs=5-11,17-23

     

    But still getting the same activity on CPU-9.

     

    Any idea?

    Edited by Jeffarese
    Link to comment

    I'm still getting some activity on HT cores 18 and 19, with the VM powered down as well. When I first applied this and rebooted it seem to be respecting it, but after a few days or running, I noticed higher CPU usage that I was seeing the first few days. Going to reboot and see if I get the same thing.

     

    Update after reboot: CPU usage on these two threads have gone down about 0% with the VM idle. I'll monitor but I expect I will need to reboot the system when whatever process kicks in that is not obeying the exclusions. 

    Edited by Chess
    Update
    Link to comment

    That kind of defeats the purpose... I can't be rebooting my server.

    I would expect isolcpus to work as expected.

     

    Also, my cpu usage kicks in just after reboot :(

    Link to comment

    Is this bug going to be fixed?

    This is pretty much a deal breaker to use Unraid as main server with a Windows gaming VM.

    Link to comment
    12 minutes ago, Jeffarese said:

    Is this bug going to be fixed?

    This is pretty much a deal breaker to use Unraid as main server with a Windows gaming VM.

    If its a bug and if the additional settings in syslinux.cfg don't help you out, then it's a bug with linux itself and not unRaid.

    Link to comment

    I totally understand that and it makes sense, but I don't understand how such an obvious bug has not been reported elsewhere (that I found).

    Link to comment
    On 3/6/2020 at 3:13 AM, Jeffarese said:

    That kind of defeats the purpose... I can't be rebooting my server.

    I would expect isolcpus to work as expected.

    I feel the same way. Going to test out the 6.9.0 Beta 1 to see if things improve with the new kernel that is in 6.9.x and do some testing. Right now I'm not seeing any load on the issolated cores, but that does not mean much as I just restarted.

    Link to comment

    I can report that the syslinux fix above didn't work, unfortunately.

     

    It seemed to have work but I just did a large cp and the process is hitting an isolated core.

    Link to comment
    56 minutes ago, testdasi said:

    I can report that the syslinux fix above didn't work, unfortunately.

     

    It seemed to have work but I just did a large cp and the process is hitting an isolated core.

    Yeah, it didn't work at all for me either as I stated in my previous post. Let's see if the 6.9.0 beta with 5.x kernel solves it 🤞

    Link to comment
    5 hours ago, Jeffarese said:

    You're a brave man :P

     

    I'm eager to hear your experience on it, hope it fixes it!

     

    hahaa, not normally, but I need this to work as it has replaced my gaming streaming system. So right now with the VM shutdown I have 0s across the board on the CPUs that I have dedicated to the VM. But my unraid is pretty quiet right now, so I'll have to see. I do run PLEX inside the VM as I use the video card to do some transcoding, so I can't leave it offline to watch the cores, but I'll report back. Did game last night and the night before on the beta with a lot less studdering on Fallout 4.

    Link to comment

    Follow up. Played Fallout 4 most of the night last night, and the skips were mostly gone. Just tested again and it's as smooth as glass on ultra. Even hit my plex server on the same VM with a little transcoding and did not notice anything in game.

     

    I'm not saying it's fixed, and I'd wait till they get a few more betas out before jumping on 6.9.x unraid betas, but for now I'm going to stick on it and hope that there is no major bugs that I get hit with.

    Link to comment

    In my case I would have noticed instantly if it worked.

     

    I think my server might have more activity than yours and it's super easy to spot if the isolated cores are working.

     

    However, I'm not able to test this beta, I need the stability :(

    Link to comment

    Just want to add that the issue itself is not fixed.

    The mitigation appears to be sheer performance improvement but it is not a resolution unfortunately.

    Link to comment

    @testdasi you are correct. I still have usage on these isolated cores, but I do have less studdering in the games I play. That being said, I've not had much chance to play any games over the last little while. 

    Link to comment

    Any idea if this is likely to be resolved if 6.9.0 if it includes the 5.6 kernel? They don't directly address it but mention something about managed interrupts.

     

    image.png.5df2c0b9fb44eb2a6d8ebd6edbc889f1.png

     

    Is CPUAffinity a thing in unRAID? This thread talks about it and sounds like it would be a better solution vs isolcpu.

     

    Link to comment

    Wanted to come back and give an update now that I'm on 6.9.0 beta 22. There is a bug in qemu on Ryzen 2 CPUs that needs a manual edit to the fix a BSOD and it took me awhile to get the proper edits to my VM config to get it to run, but after doing that I'm not running at a locked 60 fps with no studdering in The Witcher 3. 

     

    Win10 thinks my CPU is an EPYC CPU, but it has made things more stable. I'll post what I did if anyone needs this info.

    Link to comment
    On 6/22/2020 at 5:36 PM, Chess said:

    Wanted to come back and give an update now that I'm on 6.9.0 beta 22. There is a bug in qemu on Ryzen 2 CPUs that needs a manual edit to the fix a BSOD and it took me awhile to get the proper edits to my VM config to get it to run, but after doing that I'm not running at a locked 60 fps with no studdering in The Witcher 3. 

     

    Win10 thinks my CPU is an EPYC CPU, but it has made things more stable. I'll post what I did if anyone needs this info.

    Could you share, what oyu have done?

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.