• [6.7.0-rc2] Reading all disks when writing to a single one


    hawihoney
    • Retest Minor

    After upgrade from 6.6.6 stable to 6.7.0-rc2 I see unusual reads whenever I write to a single disk.

     

    E.g. In this example I write/copy to \\tower2\disk21 from my Windows 10 machine (SMB). During the whole copy all other disks are spun up and are read at low speed. In the example shown in the picture a 40GB file is written. disk21 and parity/parity2 show the usual write activity. But the other disks are spun up and read as well.

     

    After the file is written reading the other disks stopps as well.

     

    Diagnostics and image attached.

     

    *** Edit: The Main page shows that same read activity for the flash drive as well. Forgot to mention that.

     

     

    tower2-diagnostics-20190127-1031.zip

    Clipboard01.jpg




    User Feedback

    Recommended Comments



    3 hours ago, hawihoney said:

    The small read request to the other disks always end with the read or write request to the single disk.

    I don't know what this means, please rephrase.

    Link to comment

    Please have a look at my Screenshots above. The 44 MB/s represent the activity I started. The xx KB/s are the activities I'm complaining about. If my job stops, the xx KB/s read requests stop immediately as well.

     

    Can't explain it better.

     

    Edited by hawihoney
    Link to comment
    On 1/28/2019 at 10:20 PM, limetech said:

    I don't know what this means, please rephrase.

    I think it means:

    When Writing a file to a single disk you get some read request from all the other disks, even the boot flash drive.

     

    This is a problem because it wakes up all disks in the system (start spinning), on unraid 6.6.6 you dont get these reads from other disks, it only keeps it to the disk you write to plus parity letting the rest of the disks stopped (not spinning).

     

    So in practice all my disks is now spinning all the time, and before it was mosty one or two disks spinning (none at night or work time).

    Running some dockers and these makes some small read/writes to cache disk (ssd) and that keeps the rest of my disks from going to sleep because unraid now reads from all disks when you write to just one.

    Edited by Handl3vogn
    Link to comment
    On 1/28/2019 at 9:59 AM, hawihoney said:

    Thanks, will do.

     

    ***Edit*** Wait, it can't be that easy. The small read request to the other disks always end with the read or write request to the single disk.

    Did you ever get around to trying this?

    Link to comment

    Oh, I went back to 6.6.6. This weekend I will build a new machine - did buy an additional  license today. I will set it up with 6.7.0-rc2 and report then.

     

    Link to comment

    @Tom: To make it faster I took an existing 6.6.6 machine.

     

    I stopped the array, switched User Shares to Off, started the array and after a minute or so I clicked Spin Down.

     

    Then, I thought, it would be better to test with 6.6.6 first before testing with 6.7.0-rc2. I opened Explorer on Windows and wrote \\tower2\disk21\test followed by Enter.

     

    On Tower2 all disks started to spin up. Then I copied a big file from my Windows machine to that particular disk. The small reads, I'm complaining about in my first post, did not happen.

     

    Upgrade to 6.70-rc2 and reboot:

     

    I opened Explorer on Windows and wrote \\tower2\disk21\test followed by Enter.

     

    On Tower2 only disk21 started to spin up. Then I copied a big file from my Windows machine to that particular disk. The small reads, I'm complaining about in my first post, did not happen.

     

    So, no additional reads with User Shares switched off. I will have a look at it a little bit.

     

    Don't know why all disks spun up on 6.6.6 when accessing disk21. I've never seen that before. I'm using this combo Windows, Total Commander, individual Disk Shares all day and night. I would ignore that for now.

     

    ***Edit***. 10 seconds after sending this post, all disks spin up. Read requests on all disks while that copy is still on it's way. That's the difference between 6.6.6 and 6.7.0-rc2. And before someone asks. No plugins, no User Shares, no Cache Dirs, whatever...

     

    Edited by hawihoney
    Link to comment

    It takes a few minutes copying before I get disk spinup or read requests to.

     

    If I copying a large file everything is good for around 15gb (around same time my transfer speed goes from 119MB/s to around 70MB/s) then after that I get reads from all disks.

    Link to comment

    Wondering if Windows is pinging each of the network shares once in a while, easy to test:

    If you are have User Shares disabled, only export the target disk share.  Say you are copying to disk5, then go to Shares and for all disks except disk5, set Export to 'No'.  Then in windows explorer verify the only share you see there is disk5.  Now spin down all the disks and start I/O to the share - now do the other disks still spin up?

    Link to comment

    Arg, just replyed in a lengthy post, did hit submit, that page greyed out and did not return. Everythings gone.

     

    Let's try again.

     

    I did a different test that comes close to your idea. I took two Unraid servers.

     

    On server1 (6.6.6) I added disk21 from server2 (6.7.0-rc2) via Unassigned Devices. Resulting SMB share was 192.168.178.34_disk21. Now I copied from different software/tools on server1 to that share. The result was identical to the Windows Explorer experience. I took MakeMKV docker, MKVToolNix docker, MC on root console. The result was always the same: The other disks on server2 spin up too and show small read requests.

     

    Two IMHO very important things I need to add:

     

    1.) I can confirm the user above and his post. Spin up of disk21 and the two parity disks on server2 came with a delay of 30 to 90 seconds. The tools are writing but the disks remain sleeping. I don't have any caching running, no Turbo Write.

     

    2.) When the tools report "writing complete" the disks are still writing. Again, it's for around 30 to 90 seconds. I'm curious if there's a write cache somewhere in the system since 6.7.0-rcx.

     

    My hardware:

     

    Server1: 1x Supermicro X9Dri-F, 2x Intel 2609 v2, 64GB RAM, LSI 9300-8i connected to Supermicro BPN-SAS2-EL1 backplane (both ports = 8 lanes), 2x PCIe x4 adapter cards holding 1x Samsung 970 EVO M.2 each. Both M.2 building the cache pool. Several dockers, 2x Unraid VMs to test my upcoming new builds. I try to work around the single array (28+2 drive) limit.

     

    Server2: 1x Supermicro X9Dri-F, 1x Intel 2609 v2, 32GB RAM, LSI 9300-8i connected to Supermicro BPN-SAS2-EL1 backplane (both ports = 8 lanes). No plugings, no docker, no User Shares, no VMs.

     

     

     

    Edited by hawihoney
    Link to comment

    May I ask back a question?

     

    Are you sure that the meaning of "Tunable (md_write_method): Auto" did not change with 6.7.0? Perhaps the read requests are the result of a different handling now (from "read/modify/write" to "reconstruct write").

     

    Just an idea.

     

    Link to comment
    3 hours ago, hawihoney said:

    May I ask back a question?

     

    Are you sure that the meaning of "Tunable (md_write_method): Auto" did not change with 6.7.0? Perhaps the read requests are the result of a different handling now (from "read/modify/write" to "reconstruct write").

     

    Just an idea.

     

    No, good guess though.

     

    Pretty sure this is due to inodes getting ejected from RAM because of big file transfer, need to devise a test method to prove it.

    Link to comment

    I can confirm that this is happening on RC3 as well.

     

    To the previous poster. What's special with the disks that don't show any read activity? If it happens here, all disks show these tiny read requests. 

     

     

    ***EDIT*** Please have a look at the screenshot. The top two drives are parity. I'm writing to disk20. Interestingly disk17 has no read request - all other drives show these read reqests.

     

    I'm writing to disk20 to a folder. On this server this folder is part of a User Share. This User Share does not exist on disk17. Seems that the read requests are only happening to drives that are part of the same User Share.

     

    Clipboard01.png

    Edited by hawihoney
    Link to comment
    56 minutes ago, hawihoney said:

    I can confirm that this is happening on RC3 as well.

     

    To the previous poster. What's special with the disks that don't show any read activity? If it happens here, all disks show these tiny read requests. 

     

     

    ***EDIT*** Please have a look at the screenshot. The top two drives are parity. I'm writing to disk20. Interestingly disk17 has no read request - all other drives show these read reqests.

     

    I'm writing to disk20 to a folder. On this server this folder is part of a User Share. This User Share does not exist on disk17. Seems that the read requests are only happening to drives that are part of the same User Share.

     

    Clipboard01.png

    Nothing that I can see. I am seeing read activity whilst plex is playing back movie on Disk1 but share is not across all the disks its reading. Reads stop on other disks when playback stops.

     

     image.png.4b96c1bf4e41fe448b3599539e093a63.png

    Edited by SimonF
    Link to comment

    All of these screenshots are showing these "writes" in terms of speed, but I have to wonder over what amount of time that calculation is made and whether or not it isn't just a result of averaging over some longer period that actually did include writes. Some of these screenshots show a write speed next to a temperature '*' that indicates the disk is spun down.

     

    There is a little icon at upper right above the column labels that lets you toggle between speed and counts. I wonder if it might be more instructive to observe the absolute counts instead of the speed.

    Link to comment

    As I wrote several times: I write to a single disk. And during that write activity all other disks show that read activity. The read activities stop immediately if write activity to the single disk ends. The disk LEDs reflect these read activities as well.

     

    So: All disks are working when I write to a single disk.

     

    Link to comment

    No encryption. Everything's XFS. A target unRAID test server had no cache, no plugins, no Dockers, no VMs, no User Shares. Just SMB connection to the disk shares.

     

    Link to comment

    Hello!

     

    i have the same or similar problem. One thing i might add is that once the file transfer stopped the read activity keeps going for a while.

    I am copying a large file to a user share over 10GbE network bypassing the cache. The drives are all xfs encrypted format. I do have turbo write and cache dirs etc. but it does not appear to be affected by them.

    1.PNG

    2.PNG

    3_10GbE_connection.PNG

    Link to comment

    That's not the same I think. You do use encryption. My original post is without encryption. In fact I don't even see much difference between your two posts. One shows faster results, but the pattern of the read and write requests, is nearly the same between 6.6.6 and 6.7.0.

     

    I'm not talking about transfer speeds. I'm talking about these small read requests shown in the screenshot of my original post. This is not true for 6.6.6.

     

    Edited by hawihoney
    Link to comment

    Update:

     

    Happening on RC4 as well.

     

     

    Don't know if this is related or just coincidence (both source and target unRAID server on 6.7.0-Rc4):

     

    The target unRAID server has 16GB RAM. During a copy of a 15GB file from the source unRAID server to the target unRAID server the read requests, I'm complaining about, started right before copy ended. And these read requests ended together with the write request.

     

    BTW, RC4 writes noticable faster to my unencrypted XFS array. Something between 10 and 15%.

     

    Edited by hawihoney
    Link to comment

    Just a follow up. Please see attached picture.

     

    I copy a file to disk20. This copy leads to write activity on disk20 and two parity disks (red color). This is correct.

     

    disk20 is part of a User share. This User share includes all but disk17 (blue color). The copy to disk20 does not touch disk17 - no read, no write. That's correct.

     

    All other disks, that are part of this user share (green color), show these small read requests. That's wrong.

     

    In a previous test I switched off User Shares completely. Then all disks (including disk17) show these wrong read requests.

     

    I can't explain it better. My knowledge of the english language ends here.

     

    Unbenannt.png

    Edited by hawihoney
    Link to comment

    I can now reproduce this, the missing piece was that for me at least, it only starts happening after a large transfer, i.e., after a fresh boot on my test server I start seeing the reads on all disks at around the 30% mark of a 30GB transfer, and it's easily reproducible after a reboot, with v6.6.6 I can complete the same transfer without the reads, so maybe this can help LT do the same.

     

    P.S.: test server has 8GB of RAM, since the RAM amount can likely affect if/when this happens.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.