• [6.8.3-6.9.2] Huge CPU Load and IO-Wait when copying to Array


    windowslucker
    • Urgent

    Since I started using Unraid on Version 6.8.3 I've got a strange issue when copying files directly to the array. CPU load gets to nearly 100% and the UI as well as access to my shares via SMB is extremely slow and buggy. Additionally the access to Web UIs of Docker Containers is not possible. The whole server becomes basically unusable until copying is done. VMs are not affected, though.

     

    Checking the top commands shows me a very high wa value with values up to 70-90. After copying has finished, the value drops within a couple of seconds and everything turns back to normal.

    This is only the case when copying directly to the array without using the cache drive or when Mover is running.

     

    I'm currently using the LSI/DELL IT Mode SAS 9207-8i, but the same issue occured with my previous PCIE to SATA controller, as well when the drives were connected directly to the motherboard's SATA connectors.
     

    Over the last year I've went through different HDDs and a CPU and mainbaord swap. (From desktop drives to NAS drives and from an Intel i7 4770 to a Ryzen 5 2600 and a Gigabyte B450M D3SH mainbaord.) The issue has been the same througout all the changes.

     

    top.jpeg

    srv-diagnostics-20210526-1044.zip




    User Feedback

    Recommended Comments

    Are you copying from a VM to the array or from the array to a VM? Or does a VM write to an passed thru virtual disk?

     

    I ask because I do see high QEMU utilization in your top window.

     

    Link to comment
    3 hours ago, hawihoney said:

    Are you copying from a VM to the array or from the array to a VM? Or does a VM write to an passed thru virtual disk?

     

    I ask because I do see high QEMU utilization in your top window.

     

     

    No VMs are involved in this case. Just copying from my PC to a SMB share. 

    Also, the problem consists regardless of VMs or Docker Containers. If I switch them all off, I still get the same high CPU utilization and iowait while copying to the array.

    Edited by windowslucker
    Link to comment
    16 minutes ago, windowslucker said:

    No VMs are involved in this case. Just copying from my PC to a SMB share.

     

    Ok, but qemu-system-86 is the PC System Emulator. It shows that huge CPU load in the screenshot.

     

    QEMU with high CPU load, and with high IO wait values, points directly to data written thru this emulator. QEMU has to emulate virtual disks and it's reads/writes. If there are no VMs envolved, I'm out because I don't understand running QEMU processes without active VMs.

     

    I do see the shfs process as well, but both can escalate each other. High IO-wait because of QEMU emulator with parallel access to SMB can bring a system nearly to halt.

     

    This is my own experience, but if there are no VMs running I can't help you any further because I do not understand it then.

     

    Link to comment
    25 minutes ago, hawihoney said:

    I'm out because I don't understand running QEMU processes without active VMs

     

    Well I may have been unclear. The VMs are active in the screenshot above but weren't actively used or copied from. They were more or less sitting idle at that time.

    Link to comment
    48 minutes ago, windowslucker said:

    The VMs are active in the screenshot above but weren't actively used or copied from.

     

    Please have a look at the screenshot below. This is my running system with two active VMs. There's a 20 GB copy going on from bare metal from the host to one of the VMs.

     

    Now compare the CPU usage of my system with your screenshot. My VMs are heavily in use and your VMs are not actively used? I guess that the VM's are 

    Link to comment

    Don't know what happened. Without further activity my post has been posted and I cann't edit it any longer.

     

    ---

     

    Now compare the CPU usage of my system with your screenshot. My VMs are heavily in use and your VMs are not actively used? I guess that the VM's are busy with itself.

     

    I would check what the VMs are doing, what and were they writing to. Is there enough RAM for the VMs, etc.

     

    This may block SMB out.

     

    Clipboard01.jpg

    Link to comment
    21 hours ago, hawihoney said:

    My VMs are heavily in use and your VMs are not actively used? I guess that the VM's are busy with itself.

     

    I just ran a test with VM Manager switched off under Settings -> VM Manager and another test with VM-Manager switched on, but without starting any VMs. So no VMs turned on in any of those two tests.

    The results between both of those tests stayed the same.


    The only difference to the problem I described in my post above was, that the CPU load didn't skyrocket like it did earlier, but rather increased slowly and gradually over the time of 3 minutes or so. The outcome, however is the same. Dockers unresponding and huge CPU load.cpu7.thumb.PNG.3c058b3fb83176a612b276611a2d2984.PNG

     

    This test was done by coping three 10 Gb files from my desktop computer to an unraid share. The share's configuration can be seen in the screenshot below.

     

    Unbenannt.thumb.PNG.be64bf560ac535d28efb947ca4ec1ea4.PNG

     

    I've also recorded the main part of the copying stuff where the unresponsive dockers, switched off VMs and huge CPU load can be observed.

     

    The Video can be found here: Unraid Bug Report CPU/IOWAIT

     

    Edited by windowslucker
    Link to comment

    Try copying straight to disk shares to confirm the issue is not present. This seems related to all other shfs tickets logged on this very forum. I've been using disk shares due to this issue.

    Link to comment
    4 hours ago, xxxliqu1dxxx said:

    Try copying straight to disk shares to confirm the issue is not present

     

    I'm not sure if I understand you correctly. Do you mean I should try copying straight to e.g. /mnt/disk1/folderxy instead of /mnt/user/folderxy?

    Link to comment
    11 minutes ago, windowslucker said:

     

    I'm not sure if I understand you correctly. Do you mean I should try copying straight to e.g. /mnt/disk1/folderxy instead of /mnt/user/folderxy?

    Exactly. You will see the CPU usage will be lower. I just want you to test this to confirm it's the same problem that was mentioned in several other posts - copying to the array causes a lot of shfs problems, which involves unusually high CPU loads. Copying straight to the disk share does not exhibit the same behavior which further reinforces the problem in shfs which I hope limetech will fix one day.

    Edited by xxxliqu1dxxx
    typo
    Link to comment
    1 hour ago, xxxliqu1dxxx said:

    Exactly. You will see the CPU usage will be lower. I just want you to test this to confirm it's the same problem that was mentioned in several other posts - copying to the array causes a lot of shfs problems, which involves unusually high CPU loads. Copying straight to the disk share does not exhibit the same behavior which further reinforces the problem in shfs which I hope limetech will fix one day.

    Hi - sorry if I was not clear, but I would ask if you could test and post your results, this way you can be confident the problem is with uploading to the array, vs uploading to the disk shares. Plus, additional information will be documented for this SHFS issue.

    Link to comment
    Quote

    I would ask if you could test and post your results

     

    I just did another test copying directly to the disk share, as you suggested. VMs were switched off, again. As I can't think of any way to copy to a disk share using SMB, the only difference to my prior tests is that I was using WinSCP to copy the files to /mnt/disk1/copyTestShare.

    As far as I can tell, the issue stayed the same. Unaccessible dockers and high CPU load. The Unraid Webinterface stayed accessible, though.

     

    CPU.thumb.PNG.228fe45f90d4c6f67ef2406b18eeec59.PNG

     

     

    Also, WinSCP got some weird disconnects while copying. The text on the top right window says: "The remote computer didn't send any data for more than 15 seconds." I think it has something to do with unraid freezing, but of course I'm not completely sure if that's really the case.

    disconnect.thumb.PNG.e885b09bd18e50d1a02d02e90ea10bca.PNG

     

    Again, I recorded everything and uploaded it to Youtube for you to review.

     

     

    Link to comment

    You can use disk shares via SMB. You just need to enable that.

    image.png.3f0a200f98cdcd4ad04248a12d5c54d6.png

    Make sure you have yes configured there. You may have to stop the array first. And then, try again with SMB.

    Link to comment
    1 minute ago, xxxliqu1dxxx said:

    You can use disk shares via SMB. You just need to enable that.

    image.png.3f0a200f98cdcd4ad04248a12d5c54d6.png

    Make sure you have yes configured there. You may have to stop the array first. And then, try again with SMB.

    This is Settings \ Global Share Settings in case you are wondering.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.