Jump to content

WeeboTech

Moderators
  • Posts

    9,472
  • Joined

  • Last visited

Posts posted by WeeboTech

  1. I did the ftw across 3 drives of data, storing all stat structures in a .gdbm file.

     

    fetched 0 records, deleted 0 records, stored 1151935 records

     

    real    71m26.353s

    user    0m17.040s

    sys    2m4.010s

     

    root@unRAID:~# ls -l --si /tmp/statcache.gdbm

    -rw-rw-r-- 1 root root 179M 2014-08-16 19:38 /tmp/statcache.gdbm

     

     

    The issue with this approach is scanning through all the keys to find a match can take time as the number of files increase.

     

     

    Here's an example.

     

     

    using a bash loadable library, I'm able to access the gdbm file at the bash level directly.

     

     

    A single key look up is pretty fast.

     

    root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# enable -f ./bash/bash-4.1/examples/loadables/gdbm gdbm

    root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# gdbm

    gdbm: usage: gdbm [-euikvr] [-KVW array] file [key | key value ...]

     

    time gdbm /tmp/statcache.gdbm /mnt/disk3/Music/music.mp3/Jazz/Various\ Artists/The\ Art\ Of\ Electro\ Swing/01\ Tape\ Five\ -\ Madame\ Coquette\ \(Feat.\ Yuliet\ Topaz\).mp3

    <binary data here>

     

    real    0m0.003s

    user    0m0.010s

    sys    0m0.000s

     

     

    Yet traversing all the keys of 1 million files takes time.

     

    root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | grep 'Jazzy Lounge - The Electro Swing Session' | wc -l

    56

     

    real    0m11.225s

    user    0m14.280s

    sys    0m2.290s

     

    root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | wc -l                     

    1,151,935

     

    real    0m7.236s

    user    0m3.930s

    sys    0m8.350s

     

     

    This might be faster with an sqlite table or mmap'ed file.

     

    Point is, there's quite a bit that would have to go into this to cache the stat data outside of the kernel.

    So kernel patches may be better then doing this at an application level. I don't know for sure.

     

    When the next uNRAID release is available with SQLite compiled into PHP we can build a table with filenames and some stat data for a browser based locate function. We'll also be able to store md5's in there too.

  2. My data must already be cached. It should have taken much much longer.

     

    time /mnt/disk1/home/rcotrone/src.slacky/ftwcache/ftwstatcache /tmp/disk3.gdbm /mnt/disk3

     

     

    files processed: 324600, stores: 298901, duplicates: 0, errors: 0

    fetched 0 records, deleted 0 records, stored 298965 records

     

    real    2m22.192s

    user    0m2.830s

    sys    0m8.660s

     

    root@unRAID:~# ls -l /tmp/disk3.gdbm

    -rw-rw-r-- 1 root root 50897023 2014-08-16 18:24 /tmp/disk3.gdbm

     

    root@unRAID:~# ls -l --si /tmp/disk3.gdbm

    -rw-rw-r-- 1 root root 51M 2014-08-16 18:24 /tmp/disk3.gdbm

     

     

    The gdbm contains the key of the full path and the data of struct stat[];

  3. I did not know that however surely a swap file would reduce the risk of the entrys needing to be dropped resulting in the net same result albeit from a less direct/elegant angle?

     

    If this was an efficient way to do what you are trying to do, the kernel developers would have done it.

     

    Is it faster to swap pages in and walk through them, or just go out to disk and re-read the information?

     

     

    What could possibly be done is have the usershare/fuse shfs use some kind of mmap'ed file that is on a cache disk.

    This cached/mmap file could contain all the stat information for all visited files.

     

     

    The downside is that you would be duplicating what the kernel does.

     

    The upside is that you can keep the information longer outside of real ram requirements and connect some kind of inotify so that when files are opened/closed/added/removed the mmap cache for that directory is updated.

     

    If the device being reviewed is spun down, use the data in the stat cache rather then what is actually on the disk.

     

    Allot more work for the fuse layer.

     

    The data can be cached in a mmap file or a .gdbm file. I'm testing how long it takes to store all the stat blocks in a gdbm file now.

  4. parity check AND the mover. Joe's version also monitors the mover.

     

    At one time I tried increasing the dentry queue size, it helped, but since I had so many files, it would cause the system to crash with an OOM.

     

    Last time I looked (back in 2.6 kernel) those 'tunable's relating to inode aging didn't do anything.

     

    When I looked I saw they did, just not very obviously.

     

    In my early tests I set it so they were last to be ejected.

    It helped with directory sweeps, but again, I had so many files, I had all kinds of OOM crashes.

     

    I think with 64bit this will be less of a problem.

     

    FWIW, here is my test on one of my mp3 disks.

    I should re-iterate only one of them. I have many of them.

    Plus I have tons of source code files I've collected over the years.

    You can quickly see how I had millions and millions of files.

     

    root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt
    
    real    32m46.917s
    user    0m4.030s
    sys     0m33.110s
    
    2nd test immediately after.
    root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt
    
    real    3m15.613s
    user    0m0.890s
    sys     0m5.520s
    
    root@unRAID:/mnt/disk3# wc -l /mnt/disk3/filelist.txt
    307013 /mnt/disk3/filelist.txt
    

     

    I'm hoping the move to 64bit will be better for me.

  5. Tom's symlink approach could resolve this nicely. Since the original file will be referenced directly to the original.

    A caveat may be slower access to large user shares with lots of files.

     

    Since a symlink is actually a type of file that references the original file, it has to be stated, opened, readlink, closed before any other filesystem operation.

     

    There's a code example here.

    http://linux.die.net/man/2/readlink

     

    On the pro side, when traversing a user share with ftw or find as in cache_dirs, the original file and the usershare file(symlink) will be in the caches.

     

    On top of that, if this reduces the NFS stale handle issue, that's two issued handled.

  6. Brian => Now that I did that test on a "real" system (instead of the test one I haven't yet added drives to) ... is there any other known impact except for losing the data from the file that was clobbered?

     

    Just wondering if there's anything I should check  :)

    [i guess I'll run a parity check just for grins to be sure parity was maintained okay in the process.]

     

    I doubt it would disrupt parity but certainly no harm doing a parity check.. You might also check the filesystem to make sure this didn't confuse RFS.

     

     

    I don't think it's at this layer guys. With cp and mv, the disk to usershare protection is bypassed.

    The application makes the mistake because of the data returned by FUSE/usershare.

    According to the application, each file is different enough to allow the overwrite.

    i.e. an open create,truncate, then an open read. By the time the read succeeds, the open create has truncated the file.

    with mv, it looks like the file is unlinked before the open read, so the file can disappear.

  7. There is another solution which I've wanted to explore because it helps with NFS stale file handles.  What I would do is make a change in shfs that made every single file look like a symlink!  So if you did an 'ls' on /mnt/user/someshare then what you'd see is all directories normal, but all files are symlinks to the "real" file on a disk share.  this would be completely transparent to SMB and NFS, and would eliminate need to 'cache' FUSE nodes corresponding to files for sake of stopping stale file handle errors.  I dunno, might throw this in as an experimental option...

     

    I like this option.

     

    Will the symlink point to /mnt/disk1/someshare and does that require the disk share to be exposed or is it only at the operating system level.

     

    This could be a good solution.

    Might break something for someone, so having the option of current or new symlink might be needed for a while.

  8. A few suggestions related to the user share solution:

    1 - When enabling user shares, pop up a one time informational message explaining the user share feature and this potential gotcha. Users should be warned that copying files from disk shares to user shares, and user shares to disk shares (i think the same thing could happen) should be avoided.

    2 - Add add'l info on the user share configuration screen that makes the feature and options more clear. Include a read-only field that indicates what disks are REALLY in the share (for read purposes), and clearly label "include" and "exclude" fields only limit writes of new files.

    3 - If a user excludes a disk that is part of the user share, pop up a warning that in order to truly exclude a disk, you would need to rename its root level folder. I am not even sure if that is enough. Would you have to bounce the array afterwards?

     

    Is it possible to eliminate a disk from the group by turning off the implied folders in a user share by some option for PER user share?

     

    While you can have include/exclude, there still is the implied addition to a user share.

     

    The warnings are a good idea. Maybe not a pop up but text directly on the user share configuration page.

    Something that cannot be ignored.

  9. Tom said he was going to explore something with Samba to see if the file is opened once or multiple times. That could have a big impact on a solution.

     

    http://lime-technology.com/forum/index.php?topic=34480.msg321241#msg321241

     

    There is no Samba fix in the world that will fix this.

     

    I'm aware of the root cause. I'm aware that samba is not the cause and samba cannot fix this.

     

    Point of research is,  if file locking is used in any manner, and samba keeps opening and closing the file for chunks, any file locking or pid file lookup in FUSE/usersharefs via fuser will be useless.

     

    File locking or open file lookup to see who has the file open could be one preventative solution.

     

    cp and mv use the stat information, FUSE cannot, therefore FUSE on open needs to do something more intelligent which may mean checking for a file lock, open in use or some other method. I.E. Priority of disk share, Explicit vs implicit include/exclude. 

     

    There are a couple ways this can go, in any case the usershare/FUSE needs an adjustment and that may depend on how Samba access'es the files as per Tom's concern.

  10. Here's a command line example of the problem. Note, I am not root. root access is not the issue.

     

    rcotrone@unRAID:/mnt/disk1/TMP$ ls -l datafile.zip
    -rw-rw-rw- 1 rcotrone users 282096032 2014-08-10 14:34 datafile.zip
    
    rcotrone@unRAID:/mnt/disk1/TMP$ cp datafile.zip datafile.zip
    cp: `datafile.zip' and `datafile.zip' are the same file
    
    rcotrone@unRAID:/mnt/disk1/TMP$ mv datafile.zip datafile.zip 
    mv: `datafile.zip' and `datafile.zip' are the same file
    
    rcotrone@unRAID:/mnt/disk1/TMP$ cd /mnt/user/TMP
    
    rcotrone@unRAID:/mnt/user/TMP$ ls -l datafile.zip
    -rw-rw-rw- 1 rcotrone users 282096032 2014-08-10 14:34 datafile.zip
    
    rcotrone@unRAID:/mnt/user/TMP$ cp datafile.zip datafile.zip
    cp: `datafile.zip' and `datafile.zip' are the same file
    
    rcotrone@unRAID:/mnt/user/TMP$ mv datafile.zip datafile.zip
    mv: `datafile.zip' and `datafile.zip' are the same file
    
    rcotrone@unRAID:/mnt/disk1/TMP$ cp datafile.zip /mnt/user/TMP/datafile.zip
    
    rcotrone@unRAID:/mnt/disk1/TMP$ ls -l datafile.zip
    -rw-rw-rw- 1 rcotrone users 0 2014-08-10 14:35 datafile.zip
    
    rcotrone@unRAID:/mnt/disk1/TMP$ mv -v datafile.zip /mnt/user/TMP/datafile.zip
    `datafile.zip' -> `/mnt/user/TMP/datafile.zip'
    mv: cannot open `datafile.zip' for reading: No such file or directory
    
    rcotrone@unRAID:/mnt/disk1/TMP$ ls -l datafile.zip
    /bin/ls: cannot access datafile.zip: No such file or directory
    
    rcotrone@unRAID:/mnt/disk1/TMP$ cp VPS_048.zip datafile.zip
    
    The device and inodes are different which is part of the issue.
    
    rcotrone@unRAID:/mnt/disk1/TMP$ stat datafile.zip 
      File: `datafile.zip'
      Size: 282096032       Blocks: 551514     IO Block: 4096   regular file
    Device: 901h/2305d      Inode: 718782      Links: 1
    Access: (0666/-rw-rw-rw-)  Uid: ( 1000/rcotrone)   Gid: (  100/   users)
    Access: 2014-08-10 14:37:49.000000000 -0400
    Modify: 2014-08-10 14:37:50.000000000 -0400
    Change: 2014-08-10 14:37:50.000000000 -0400
    
    rcotrone@unRAID:/mnt/disk1/TMP$ stat /mnt/user/TMP/datafile.zip 
      File: `/mnt/user/TMP/datafile.zip'
      Size: 282096032       Blocks: 551514     IO Block: 131072 regular file
    Device: 10h/16d Inode: 165509      Links: 1
    Access: (0666/-rw-rw-rw-)  Uid: ( 1000/rcotrone)   Gid: (  100/   users)
    Access: 2014-08-10 14:37:49.000000000 -0400
    Modify: 2014-08-10 14:37:50.000000000 -0400
    Change: 2014-08-10 14:37:50.000000000 -0400
    

     

    This test should probably be done with a samba mount and/or a NFS mount.

    Anyway, the protection is at the application level, FUSE obfuscates any potential application protection.

     

    At the very least, if a matching share directory is 'excluded' or not explicitly 'included' perhaps there should be some priority adjustment in that files having the same name cannot be overwritten via an open create.

     

    I.E. It takes an explicit definition in order for a user share file to be the target of an open create.

  11. The issue is't that the docker apps are well behaved or not.

    cp and mv protect you from overwriting the same file by checking the stat information.

     

    same device/inode prevents an overwrite.

    These tools fail to protect because the usershare reports different device/inodes.

     

    These tools are well behaved, the core is the disconnect from one filesystem to the other.

    The fuse/usershare layer needs an improvement or a general warning.

     

    If the basic tools can protect you, then any solution we come up with should at least try to do the same.

    To turn your back on not warning people or attempting to provide a potential solution is irresponsible.

     

    We only know how many people have been bitten by this issue because bjp999 has been voicing it.

     

    If midnight commander is included to operate at the command line via telnet, can that be victim of the issue?

     

    FWIW, I'm not expecting a full complete solution in the next release, but we should continue to discuss and try to come up with idea's on the best way to protect people from themselves with the tools that are included in the operating system.

     

    People are going to make mistakes, but the operating system should not add to those without warning.

  12. One thing I'd like to add. We are not and cannot be responsible for what someone does to their system via command line. Root access completely eliminates our ability to do that.  My view is that the solution must work for users but only outside the context of a command line method.

     

     

    I disagree. cp or mv or some derivative vial perl can be used by other programs and tools.

    scripts, plugins and/or docker apps could possibly do this.

     

     

     

     

  13. [ Knowing it is coming from a disk share and is going back to the same directory as the source would make this about the same level of difficulty as some of the options I listed.

     

    It all depends on where the program is running and if the source / destination are the same inode and/or device id.

     

    With disk mount and user share, they vary as can be seen in the stat information I posted previously.

    This is why cp and mv fail, rsync succeeds because it uses hidden tmp files.

     

    My thought is to do something more intelligent and intercept the open in fuse. 

     

    I.E.  Inspect the type of open, If it's a read/write, then open it in place.

    Or do some kind of lock or fcntl lock look up at the pids to see if the file is in use.

    If so, then write to a temporary file and rename it after close. or version the files.  (I.E. mimic rsync)

    Or Version the files when they are opened as output with truncate.

  14. WeeboTech's suggestion to require specific inclusion and not automatically include disks based on top level folder names would work as long as the user specifically removed a disk from the share before doing any copies from it to the share.  But could still cause the same issue if a user simply copied from a disk share to the user share without removing the disk from the share first.

     

    Perhaps the simplest "fix" would be to take a cue from Windows (and other OS's that handle it the same way) ... if a write (create) to a user share specifies a filename that already exists in the share, just append a suffix to the filename.    i.e. if I copy a file, say Test.Doc,  from a folder to the same folder in Windows, the name is automatically changed to Test - Copy.Doc.    If I copy it again, it will automatically be called Test - Copy (2).doc;  etc.    This would eliminate the data loss ... at the "cost" of renaming a few files after you had copied them and then deleted the originals.    And if you simply renamed the top level folder so it was no longer part of the share then the file wouldn't be renamed.

     

    If I remember correctly, there was the topic of version'ing.  So the file could be versioned as the example defines.

    But then how do you handle files that are updated incrementally. There's going to be continual versions.

    Which means every write to update will require copying the file and writing to the new file.

     

    It all depends on how version'ing is used. on VMS new files are made with filename;verison

    however you can also set a limit on version, so anything over 5 versions will start removing the earlier versions so only the latest 5 versions are available.

     

    I'm not educated on FUSE, so I can only speculate, that the intercepted open call can handle all the renaming and pruning.

     

  15.  

     

    There is only ONE way that a user would know to avoid this issue - reading MY posts on the forum. An unknowing "intelligent user" would go into user shares, remove a disk from the share, and then copy data from that disk share to the user share. Perfectly logical. Its what any of us would reasonably do. And we'd lose all our copied data. (I could easily imagine a customer doing this, losing his kids pictures, finding out LimeTech knew this and didn't warn him, and going after LimeTech for the data loss.) It is not intelligence that is needed to avoid the issue!

     

     

    This could be resolved with not allowing implicit usershares to be created at top level directories. (by some option).

    I.E. usershare configurations must be specifically set to the disks for inclusion into the usershare.

    Thus when you take a disk out of a usershare.  FUSE/usershare no longer sees it.

     

     

    This could also be resolved with a user renaming the directory first, then doing the move.

    Either way a user needs to be educated.

     

     

    With locks there may be a way to grab the pid of each process holding a file open, thus determining if an open read/open write with truncate is being done by the same process and catching the condition.

    I would assume FUSE intercepts the open, and at that point can do the lookup if the open is a write that will truncate.

  16. 2. Detecting an overwriting of the same inode may not be technically possible due to the obfuscation that happens within the user shares

     

     

    This 'may' be possible with file locks.

     

    3. An option of detecting an attempt to copy a file from a disk share to a user share and omitting the disk share from the possible destination has been suggested and hopefully Tom will comment on whether that or something similar is possible.

     

     

    If there was an option that could be turned on temporarily in that every new open for write would cause the user share to re-balance the destination, I.E. Write to a new location avoiding the current disk. Write to the new name and remove the old one. 

     

     

    Frankly, the safest way to move files locally via console is with rsync as it creates a .tmpname before removing the old file.

  17. The issue I see is that a disk file and a user share file do not share the same device and inode, which is why cp allows clobbering the file.

     

    root@unRAID:/mnt/disk1/Music/DJMUSIC# ls -l /mnt/user/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml /mnt/disk1/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml 
    -rw-rw---- 1 rcotrone users 22487065 2012-09-01 16:42 /mnt/disk1/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml
    -rw-rw---- 1 rcotrone users 22487065 2012-09-01 16:42 /mnt/user/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml
    
    root@unRAID:/mnt/disk1/Music/DJMUSIC# stat /mnt/user/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml /mnt/disk1/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml       
      File: `/mnt/user/Music/DJMUSIC/VirtualDJ Local Database v6.xml'
      Size: 22487065        Blocks: 43970      IO Block: 131072 regular file
    Device: 10h/16d Inode: 164905      Links: 1
    Access: (0660/-rw-rw----)  Uid: ( 1000/rcotrone)   Gid: (  100/   users)
    Access: 2013-03-20 12:19:03.000000000 -0400
    Modify: 2012-09-01 16:42:03.000000000 -0400
    Change: 2013-03-20 08:20:47.000000000 -0400
    
      File: `/mnt/disk1/Music/DJMUSIC/VirtualDJ Local Database v6.xml'
      Size: 22487065        Blocks: 43970      IO Block: 4096   regular file
    Device: 901h/2305d      Inode: 147702      Links: 1
    Access: (0660/-rw-rw----)  Uid: ( 1000/rcotrone)   Gid: (  100/   users)
    Access: 2013-03-20 12:19:03.000000000 -0400
    Modify: 2012-09-01 16:42:03.000000000 -0400
    Change: 2013-03-20 08:20:47.000000000 -0400
    

  18. The proposed new logic totally breaks how I use my server.

    I organize my files read/write on the disk shares for speed.

    I read some folders via the user shares for organizational consolidation.

     

    Example I have to capture the streamed music to the fastest disk share, then reorganize it after tagging to the user share.

     

    If someone was moving files from a disk share to a movie share, making the matching disk directories on the user share readonly would prevent them from being clobbered.

     

    I.E. disabling writes to a specific disk that is participating in the user share.

     

    The other option is to use rsync, which will copy to a .tmpname then move it into place. 

     

    Is there some way the user share layer can know a file is being opened read and read/write and prevent the clobbering?

    Maybe there's a way to turn on a no clobber switch temporarily if the file being read is the file that is going to be opened for writing.

     

    Perhaps the user share, versions the file during the open for write (renames) and allows the open for write on a new file then removes the old file when it is closed.  This means for the purpose of the write, extra space is used. (as in rsync).

  19. What about something like unionfs or aufs ?

    What for?  BTW I would probably never add those filesystems.

     

    The feature that's wanted is a way to minimize what's stored on the flash.

     

    union or aufs would allow two filesystems to look like one. Thus minimizing what is stored on the flash, but looks like its on the flash.

     

    With something like unionfs and aufs and root on tmpfs, the whole root tree can be made to have a transient ramdisk

    version and/or a full operating system version.

     

    I tested it years ago, It worked for me.  I'm sure they've come a long way since then.  It was just a quick idea.

  20. Maybe it's not as simple as it sounds, but I haven't seen discussion of that.

     

    It's not that simple. The dentry table is an internal kernel table that cannot be swapped out.

     

    The biggest possible advantage would be to have the rootfs changed to tmpfs which could swap out unused parts of the ram filesystem. That's a pretty big internal change to unRAID and I do not think that is going to happen.

     

    Having swap space allows memory hungry program to swap out or allows the /var/log to be swapped out.

    However, In a normal system this is not the case.

×
×
  • Create New...