• [6.5.3] Mover appears to be non-atomic and loses writes


    11rcombs
    • Urgent

    I wrote a very basic test program to verify this:

    #include <stdint.h>
    #include <unistd.h>
    #include <fcntl.h>
            
    int main() {
        uint32_t val = 0;
        int fd0 = open("test1", O_RDWR | O_CREAT);
        write(fd0, &val, sizeof(val));
        close(fd0);
        while (val < 100000) {
            uint32_t val2 = 0;
            int fd1 = open("test1", O_RDWR | O_CREAT);
            int fd2 = open("test2", O_RDWR | O_CREAT);
            read(fd1, &val2, sizeof(val2));
            close(fd1);
            val++;
            val2++;
            fd1 = open("test1", O_RDWR | O_CREAT);
            write(fd1, &val2, sizeof(val2));
            write(fd2, &val, sizeof(val));
            close(fd1);
            close(fd2);
        }
        return 0;
    }

    On a normal disk, the resulting "test1" and "test2" files will always contain identical 4-byte integers (100000). We can see this with a hex dump:

    $ cat test1 test2 | hexdump -C
    00000000  a0 86 01 00 a0 86 01 00                           |........|

    However, if the move program is run on test1 while the program runs, we can desynchronize:

    $ echo /mnt/cache/[path]/test1 | move -d 2
    $ cat test1 test2 | hexdump -C
    00000000  9f 86 01 00 a0 86 01 00                           |........|
    

    Note that the two files now differ by 1.

    Losing writes can result in a lot of unexpected behaviors; I think it might be responsible for corruption I've seen in files downloaded by Transmission, as well as in sqlite databases (I saw corruption in my Plex Media Server database last night that appears consistent with lost writes, and happened about the same time as I ran the mover script).

     

    I'm not sure what the best solution for this problem is, as I'm not familiar with the internals of the mover program or the shfs ioctl it uses. One route could be to do the copy to a tmp file on the destination drive, then while holding an internal lock on the file as exposed by fuse, verify that it hasn't changed since the copy started, and only then take the place of the source. Alternately, while a file is being moved, shfs could expose it to userspace such that reads come from the source file, but writes go to both the source and the destination.

    • Upvote 2



    User Feedback

    Recommended Comments

    I noticed that the move program does attempt to avoid moving files that are currently open, by calling the `/usr/local/sbin/in_use` shell script (which in turn calls fuser). This prevents most issues, but misses the case where the file isn't opened for writing until _after_ the check is complete; thus, the race condition.

    Link to comment

    This might explain some of the intermittent errors I've noticed with my files too, namely ones written to cache initially...

     

    Does the mover program implement checksumming? Something like rsync --checksum?

    Link to comment

    No more than ever before, but that's not really saying much. As far as I can tell, this issue has always been around (both with the current C move program and the previous rsync/shell approach).

    Link to comment
    8 hours ago, testdasi said:

    Wow, does this mean the mover is not to be trusted until the bug is fixed? ?

    This is one of those issues that while genuine is very hard to reproduce in practice.   The timing window for hitting this error will be very narrow so probably not an issue for most people in typical usage..    Definitely will not be an issue if running mover at its default of early morning when nobody is likely to be using the server.

    Link to comment
    3 hours ago, itimpi said:

    This is one of those issues that while genuine is very hard to reproduce in practice.   The timing window for hitting this error will be very narrow so probably not an issue for most people in typical usage..    Definitely will not be an issue if running mover at its default of early morning when nobody is likely to be using the server.

    Thanks. That's good to know. I have checksum checked many recently moved files and there has been no problem so I guess I just have to ensure nothing touches the files that are being moved.

     

    On 2nd thought though, I have had a few strangely corrupted files in the past that perhaps was due to refreshing Plex database while the mover is running. I'll have to keep that in mind.

    Edited by testdasi
    Link to comment

    After your test run, does file test1 appear on both cache disk and array disk?  Probably does.

     

    Maybe can solve this by having shfs keep track of open files internally.  This would also let us eliminate the very-expensive (in terms of overhead) call to 'fuser' for every file. 

    Link to comment
    5 minutes ago, limetech said:

    After your test run, does file test1 appear on both cache disk and array disk?  Probably does.

    I'm actually only seeing test1 on the array disk after the test.

    It looks like the procedure goes like this:

    1. move runs in_use to check if the file has open handles
    2. having passed, move calls ioctl 0xe008e001 to invoke the move routine in shfs
    3. shfs open()s the file on both /mnt/cache and /mnt/disk[…]
    4. shfs calls sendfile() to instruct the kernel to copy the file's contents from cache to array with low overhead
    5. shfs close()s both files
    6. shfs unlink()s the original on /mnt/cache

     

    During the period where two copies of the file exist (starting at when it's open()ed on /mnt/disk[…] and ending when it's unlink()ed from /mnt/cache), if you open() the file on /mnt/user, shfs will give you the one on /mnt/cache. This means that any write to a point in the file before the position sendfile() has copied so far (or any position, if the write takes place after sendfile() closes) will be dropped.

     

    Important life lesson I learned while researching this: do not pipe `strace -fp [shfs PID]` to a file on the user filesystem.

    5 minutes ago, limetech said:

    Maybe can solve this by having shfs keep track of open files internally.  This would also let us eliminate the very-expensive (in terms of overhead) call to 'fuser' for every file. 

    You could probably cut a fair bit of the cost by calling fork()+execve() yourself (using the full path to fuser and losetup, and presumably calling getxattr directly instead of shelling out to getfattr), instead of doing what I assume is a call to system(), which expensively invokes /bin/sh to in turn invoke /bin/bash to run in_use. But yeah, doing actual locking here would solve the issue properly.

    Link to comment

    getfattr() is only called by 'in_use' script to ask shfs what is the actual device where the file exists (and thus where it will show up in 'fuser' output).  If shfs keeps track of open files, 'in_use' would not be necessary.  However, the danger with this is that a file could be opened via the disk share and shfs would not know about it.

     

    That is, suppose we are moving /mnt/cache/share/test1 to array.  If a process has /mnt/user/share/test1 open then we can detect this and skip this file.  But if process bypassed shfs and opened /mnt/cache/share/test1 directly, then shfs would not know this (but in current implementation 'in_use' would catch this).  This might be problematic.  There is no way to ask kernel directly if a particular file is open or not AFAIK.

     

    Edit: the other complication here is that vdisk images mapped using loop back file system do NOT show up in 'fuser' output, those have to be checked using 'losetup -j'.  The VM Manager will open those files directly, that is, not via shfs.

    Link to comment

    Yeah, there's no great way around that (short of mounting the filesystem with mandatory locks enabled and using those, but not a lot of software expects to need to handle that on linux), and the only way the kernel provides to find out if a file's open at a given time is indeed iterating /proc like fuser or lsof does. You could keep doing the fuser check as a best-effort attempt to avoid clobbering writes directly to /mnt/cache (it's not like it's getting _worse_, at least), but still lock within shfs to guarantee no writes are dropped when using /mnt/user.

    Link to comment
    7 hours ago, 11rcombs said:

    Ping on this.

    It's not slipping through cracks but will not be included in any 6.6.x release.

    Link to comment

    Thank you for your testing and bringing this to my attention.  To elaborate: what I'm doing is adding code in 'shfs' that keeps track of files opened via shfs for both normal i/o and 'mover'.  If a file being moved gets opened, the move will get cancelled, the partial target file deleted, and the original source file opened (thus being skipped in this move cycle).  There are some caveats which will apply, for example, if a file is opened outside 'shfs' (like via disk share), the mover will not be able to detect this.

     

    Collectively these changes need to undergo a beta process and then prerelease, hence will not get into current bug fixing with 6.6.x.

    Link to comment
    2 hours ago, limetech said:

    if a file is opened outside 'shfs' (like via disk share), the mover will not be able to detect this.

    Since we are still fighting with some dockers having issues when using /mnt/user/appdata that are corrected when changed to /mnt/cache/appdata, I believe this is not an insignificant issue.

     

    The problem with /mnt/user/appdata seems to have a loose correlation with enabling direct i/o, at least in some instances.

    Link to comment

    Was this solved?

     

    I ask because I’m considering adding something to my cache that could easily get opened after mover started to move it.

     

    I was looking at the original mover script with rsync and it’s obvious it would not handle this.

     

    The latest mover script seems to use something else that is UnRAID internal, so I hope it is fixed.

    Edited by bbqdt
    Link to comment

    Seems there is also an issue with deletions.

     

    If I’m running mover on a large dir with lots of nested dirs and files, and rm -rf it, stuff breaks. rm command fails and mover seems to freak out a little.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.