• [6.7.0-rc2] Reading all disks when writing to a single one


    hawihoney
    • Retest Minor

    After upgrade from 6.6.6 stable to 6.7.0-rc2 I see unusual reads whenever I write to a single disk.

     

    E.g. In this example I write/copy to \\tower2\disk21 from my Windows 10 machine (SMB). During the whole copy all other disks are spun up and are read at low speed. In the example shown in the picture a 40GB file is written. disk21 and parity/parity2 show the usual write activity. But the other disks are spun up and read as well.

     

    After the file is written reading the other disks stopps as well.

     

    Diagnostics and image attached.

     

    *** Edit: The Main page shows that same read activity for the flash drive as well. Forgot to mention that.

     

     

    tower2-diagnostics-20190127-1031.zip

    Clipboard01.jpg




    User Feedback

    Recommended Comments



    looking ok to me, small reads no longer showing when mover is running.

     

    image.png.60ec849ded032b7212748a87d4b06933.png

     

    Do see all the disks spinup that share is on, is that normal or should it just spin up disk its writing to?

    Link to comment
    24 minutes ago, SimonF said:

    looking ok to me, small reads no longer showing when mover is running.

     

    image.png.60ec849ded032b7212748a87d4b06933.png

     

    Do see all the disks spinup that share is on, is that normal or should it just spin up disk its writing to?

    I think that is normal due to UnRAID having to check the file does not already exist on another drive. Belonging to the share (although I could be wrong :( )

     

    it is likely that using the Folder Caching plugin will help here.    I know that recently some people have had problems with that plugin but it could well have been due to the same kernel bug that was causing these small writes?   I guess we will have to see what others find out?

    Link to comment
    Quote

    Definitely something else going on. This is not normal behavior.

    That's possible, indeed. Would be two different problems showing the same symtoms then. Interesting.

     

    Here are the results - just in case:

     

    I copy with MC from the bare metal server a 52GB file to the VM server. The source is a folder created on an Unassigned Devices RAID1 (BTRFS). The target is a folder in the VM. The disk of that remote folder is mounted on the bare metal server as follows. I use own Mount and Unmount scripts in User Scripts because I need to wait for the start of the VMs before mounting and Unassigned Devices seems to get a race condition when showing 48 SMB mount points on the Main Page:

     

    mkdir -p /mnt/hawi
    mkdir -p /mnt/hawi/192.168.178.101_disk1
    mount -t cifs -o rw,nounix,iocharset=utf8,_netdev,file_mode=0777,dir_mode=0777,vers=3.0,username=hawi,password=******** '//192.168.178.101/disk20' '/mnt/hawi/192.168.178.101_disk20'
    

    MC.jpg.0769f4842a20b28fb7af99e42aec81c4.jpg

     

    The start looked promising. RAM usage on target did not get beyond 12-13%. CPU was around 30%. At 32% of the copy process (15GB of copy done, RAM total is 16GB) the problem started. MC halted, those 8 threads showed 100% CPU on source server (host of the VM). Dto. on target server. Last I saw was 87% RAM usage on target server. The small read requests on the other disks started. First they came in chunks (disk1-disk5, disk6-disk10, ...). Short time later all disks were reading in small values. On one picture you see red marks and a blue mark. This blue mark shows the disk that is not part of the User Share and shows 50% of the read amount of the other disks.

     

     

    I know, Unraid in VM is not supported, but I think that shows a problem. My first guess (look two pages before) was SMBD. Because that process is holding that amount of CPU and virtual RAM when the problems are showing up.

     

    tower-diagnostics-20190223-0833.zip

    towervm01-diagnostics-20190223-0833.zip

    Source server showing the CPUs mapped to VM.jpg

    Target server, first small read requests in chunks.jpg

    Target server, now reading all disks.jpg

    Target server, problems start with 8 percent RAM usage.jpg

    Edited by hawihoney
    Link to comment
    20 minutes ago, hawihoney said:

    I copy with MC from the bare metal server a 52GB file to the VM server. The source is a folder created on an Unassigned Devices RAID1 (BTRFS). The target is a folder in the VM

    Have you tried a 'standard' test? Both UD and Unraid as VM are not part of a stock Unraid installation.

     

    This is copying a 80 GB file to the encrypted array on my server

    image.png.950802cc4ed82f955a5e1d326b989f23.png

    Copy speed is pretty constant and saturates the 1 Gbps link. During the whole process only parity and the designated disk show read/write acitivity. CPU usage is between 15% and 20%.

    Edited by bonienl
    Link to comment
    50 minutes ago, itimpi said:

    I think that is normal due to UnRAID having to check the file does not already exist on another drive. Belonging to the share (although I could be wrong :( )

    You are not wrong. If a share exists on multiple disks, the mover needs to retrieve information from all those disks in order to determine where to place the new file(s).

    Edited by bonienl
    Link to comment

    Can't go back to two bare metal servers. Within the last weeks I did rebuild my systems to one bare metal server and several JBOD chassis. Each JBOD chassis is driven by it's own 9300-8e in the bare metal server. All chassis are connect thru SAS cables, not Ethernet.

     

    So please close that thread. My problem is still there, but I can't retest that in an official supported environment.

     

    I decided to swap the contents. Big files on Main Server, small files in VMs on JBODs.

     

    Link to comment
    6 minutes ago, hawihoney said:

    Can't go back to two bare metal servers. Within the last weeks I did rebuild my systems to one bare metal server and several JBOD chassis. Each JBOD chassis is driven by it's own 9300-8e in the bare metal server. All chassis are connect thru SAS cables, not Ethernet.

     

    So please close that thread. My problem is still there, but I can't retest that in an official supported environment.

     

    I decided to swap the contents. Big files on Main Server, small files in VMs on JBODs.

     

     

    Isn't it better to switch the servers? Copy from the unraid vm to the bare metal one? That way the writing happens on the supported unraid. 

    Link to comment
    Quote

    Isn't it better to switch the servers? Copy from the unraid vm to the bare metal one? That way the writing happens on the supported unraid. 

    Ok, I will do a test and copy a big file from VM to bare metal. But I think that will not show the real problem. Bare metal server has 128 GB RAM. IMHO part of the problem shows up here whenever the size of the copied file exceeds the available RAM of the target machine.

     

    Perhaps copying over SAS cables to SMB mount points is to fast for the target server in a VM. Perhaps something caches to much. In one of my first posts here I did point at SMBD. This process is eating up all available RAM and CPU on the target side. Perhaps that's only happening within a VM. So many questions ;-)

     

    Will test tomorrow and report back. But I bet this will work.

     

    Link to comment

    I had a 4GB Ram unraid server copying 10,20,30 gb of files via smb, never had a problem always around 100mb/s. So it might be not a general problem.

     

    Unraid SMB -> Windows 10 Client.

    Edited by nuhll
    Link to comment

    Thanks everyone for retesting.  This one was a doozy.

     

    8 hours ago, hawihoney said:

    The start looked promising. RAM usage on target did not get beyond 12-13%. CPU was around 30%. At 32% of the copy process (15GB of copy done, RAM total is 16GB) the problem started. MC halted, those 8 threads showed 100% CPU on source server (host of the VM). Dto. on target server. Last I saw was 87% RAM usage on target server. The small read requests on the other disks started. First they came in chunks (disk1-disk5, disk6-disk10, ...). Short time later all disks were reading in small values. On one picture you see red marks and a blue mark. This blue mark shows the disk that is not part of the User Share and shows 50% of the read amount of the other disks.

    For your particular use case, perhaps consider experimenting with kernel virtual memory tuning:

    https://discuss.aerospike.com/t/tuning-kernel-memory-for-performance/4195

     

    Link to comment

    I cannot comment on other cases mentioned in this report; however, the behavior I noticed with rc4 seems to have been resolved with rc5. 

     

    When doing large file writes/DVR records/backups, I see only the parity disk and the target data disk spun up if they were not previously active.  I no longer see (at least in my limited testing) all disks spun up.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.