-
Posts
9,472 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by WeeboTech
-
-
My data must already be cached. It should have taken much much longer.
time /mnt/disk1/home/rcotrone/src.slacky/ftwcache/ftwstatcache /tmp/disk3.gdbm /mnt/disk3
files processed: 324600, stores: 298901, duplicates: 0, errors: 0
fetched 0 records, deleted 0 records, stored 298965 records
real 2m22.192s
user 0m2.830s
sys 0m8.660s
root@unRAID:~# ls -l /tmp/disk3.gdbm
-rw-rw-r-- 1 root root 50897023 2014-08-16 18:24 /tmp/disk3.gdbm
root@unRAID:~# ls -l --si /tmp/disk3.gdbm
-rw-rw-r-- 1 root root 51M 2014-08-16 18:24 /tmp/disk3.gdbm
The gdbm contains the key of the full path and the data of struct stat[];
-
I did not know that however surely a swap file would reduce the risk of the entrys needing to be dropped resulting in the net same result albeit from a less direct/elegant angle?
If this was an efficient way to do what you are trying to do, the kernel developers would have done it.
Is it faster to swap pages in and walk through them, or just go out to disk and re-read the information?
What could possibly be done is have the usershare/fuse shfs use some kind of mmap'ed file that is on a cache disk.
This cached/mmap file could contain all the stat information for all visited files.
The downside is that you would be duplicating what the kernel does.
The upside is that you can keep the information longer outside of real ram requirements and connect some kind of inotify so that when files are opened/closed/added/removed the mmap cache for that directory is updated.
If the device being reviewed is spun down, use the data in the stat cache rather then what is actually on the disk.
Allot more work for the fuse layer.
The data can be cached in a mmap file or a .gdbm file. I'm testing how long it takes to store all the stat blocks in a gdbm file now.
-
A page file will not help, the structures are memory resident kernel structures and do not get swapped out.
Instead they get dropped when there is memory pressure.
-
parity check AND the mover. Joe's version also monitors the mover.
At one time I tried increasing the dentry queue size, it helped, but since I had so many files, it would cause the system to crash with an OOM.
Last time I looked (back in 2.6 kernel) those 'tunable's relating to inode aging didn't do anything.When I looked I saw they did, just not very obviously.
In my early tests I set it so they were last to be ejected.
It helped with directory sweeps, but again, I had so many files, I had all kinds of OOM crashes.
I think with 64bit this will be less of a problem.
FWIW, here is my test on one of my mp3 disks.
I should re-iterate only one of them. I have many of them.
Plus I have tons of source code files I've collected over the years.
You can quickly see how I had millions and millions of files.
root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt real 32m46.917s user 0m4.030s sys 0m33.110s 2nd test immediately after. root@unRAID:/mnt/disk3# time find /mnt/disk3 -type f -print > /mnt/disk3/filelist.txt real 3m15.613s user 0m0.890s sys 0m5.520s root@unRAID:/mnt/disk3# wc -l /mnt/disk3/filelist.txt 307013 /mnt/disk3/filelist.txt
I'm hoping the move to 64bit will be better for me.
-
Tom's symlink approach could resolve this nicely. Since the original file will be referenced directly to the original.
A caveat may be slower access to large user shares with lots of files.
Since a symlink is actually a type of file that references the original file, it has to be stated, opened, readlink, closed before any other filesystem operation.
There's a code example here.
http://linux.die.net/man/2/readlink
On the pro side, when traversing a user share with ftw or find as in cache_dirs, the original file and the usershare file(symlink) will be in the caches.
On top of that, if this reduces the NFS stale handle issue, that's two issued handled.
-
Brian => Now that I did that test on a "real" system (instead of the test one I haven't yet added drives to) ... is there any other known impact except for losing the data from the file that was clobbered?
Just wondering if there's anything I should check
[i guess I'll run a parity check just for grins to be sure parity was maintained okay in the process.]
I doubt it would disrupt parity but certainly no harm doing a parity check.. You might also check the filesystem to make sure this didn't confuse RFS.
I don't think it's at this layer guys. With cp and mv, the disk to usershare protection is bypassed.
The application makes the mistake because of the data returned by FUSE/usershare.
According to the application, each file is different enough to allow the overwrite.
i.e. an open create,truncate, then an open read. By the time the read succeeds, the open create has truncated the file.
with mv, it looks like the file is unlinked before the open read, so the file can disappear.
-
There is another solution which I've wanted to explore because it helps with NFS stale file handles. What I would do is make a change in shfs that made every single file look like a symlink! So if you did an 'ls' on /mnt/user/someshare then what you'd see is all directories normal, but all files are symlinks to the "real" file on a disk share. this would be completely transparent to SMB and NFS, and would eliminate need to 'cache' FUSE nodes corresponding to files for sake of stopping stale file handle errors. I dunno, might throw this in as an experimental option...
I like this option.
Will the symlink point to /mnt/disk1/someshare and does that require the disk share to be exposed or is it only at the operating system level.
This could be a good solution.
Might break something for someone, so having the option of current or new symlink might be needed for a while.
-
A few suggestions related to the user share solution:
1 - When enabling user shares, pop up a one time informational message explaining the user share feature and this potential gotcha. Users should be warned that copying files from disk shares to user shares, and user shares to disk shares (i think the same thing could happen) should be avoided.
2 - Add add'l info on the user share configuration screen that makes the feature and options more clear. Include a read-only field that indicates what disks are REALLY in the share (for read purposes), and clearly label "include" and "exclude" fields only limit writes of new files.
3 - If a user excludes a disk that is part of the user share, pop up a warning that in order to truly exclude a disk, you would need to rename its root level folder. I am not even sure if that is enough. Would you have to bounce the array afterwards?
Is it possible to eliminate a disk from the group by turning off the implied folders in a user share by some option for PER user share?
While you can have include/exclude, there still is the implied addition to a user share.
The warnings are a good idea. Maybe not a pop up but text directly on the user share configuration page.
Something that cannot be ignored.
-
Tom said he was going to explore something with Samba to see if the file is opened once or multiple times. That could have a big impact on a solution.
http://lime-technology.com/forum/index.php?topic=34480.msg321241#msg321241
There is no Samba fix in the world that will fix this.
I'm aware of the root cause. I'm aware that samba is not the cause and samba cannot fix this.
Point of research is, if file locking is used in any manner, and samba keeps opening and closing the file for chunks, any file locking or pid file lookup in FUSE/usersharefs via fuser will be useless.
File locking or open file lookup to see who has the file open could be one preventative solution.
cp and mv use the stat information, FUSE cannot, therefore FUSE on open needs to do something more intelligent which may mean checking for a file lock, open in use or some other method. I.E. Priority of disk share, Explicit vs implicit include/exclude.
There are a couple ways this can go, in any case the usershare/FUSE needs an adjustment and that may depend on how Samba access'es the files as per Tom's concern.
-
Here's a command line example of the problem. Note, I am not root. root access is not the issue.
rcotrone@unRAID:/mnt/disk1/TMP$ ls -l datafile.zip -rw-rw-rw- 1 rcotrone users 282096032 2014-08-10 14:34 datafile.zip rcotrone@unRAID:/mnt/disk1/TMP$ cp datafile.zip datafile.zip cp: `datafile.zip' and `datafile.zip' are the same file rcotrone@unRAID:/mnt/disk1/TMP$ mv datafile.zip datafile.zip mv: `datafile.zip' and `datafile.zip' are the same file rcotrone@unRAID:/mnt/disk1/TMP$ cd /mnt/user/TMP rcotrone@unRAID:/mnt/user/TMP$ ls -l datafile.zip -rw-rw-rw- 1 rcotrone users 282096032 2014-08-10 14:34 datafile.zip rcotrone@unRAID:/mnt/user/TMP$ cp datafile.zip datafile.zip cp: `datafile.zip' and `datafile.zip' are the same file rcotrone@unRAID:/mnt/user/TMP$ mv datafile.zip datafile.zip mv: `datafile.zip' and `datafile.zip' are the same file rcotrone@unRAID:/mnt/disk1/TMP$ cp datafile.zip /mnt/user/TMP/datafile.zip rcotrone@unRAID:/mnt/disk1/TMP$ ls -l datafile.zip -rw-rw-rw- 1 rcotrone users 0 2014-08-10 14:35 datafile.zip rcotrone@unRAID:/mnt/disk1/TMP$ mv -v datafile.zip /mnt/user/TMP/datafile.zip `datafile.zip' -> `/mnt/user/TMP/datafile.zip' mv: cannot open `datafile.zip' for reading: No such file or directory rcotrone@unRAID:/mnt/disk1/TMP$ ls -l datafile.zip /bin/ls: cannot access datafile.zip: No such file or directory rcotrone@unRAID:/mnt/disk1/TMP$ cp VPS_048.zip datafile.zip The device and inodes are different which is part of the issue. rcotrone@unRAID:/mnt/disk1/TMP$ stat datafile.zip File: `datafile.zip' Size: 282096032 Blocks: 551514 IO Block: 4096 regular file Device: 901h/2305d Inode: 718782 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 1000/rcotrone) Gid: ( 100/ users) Access: 2014-08-10 14:37:49.000000000 -0400 Modify: 2014-08-10 14:37:50.000000000 -0400 Change: 2014-08-10 14:37:50.000000000 -0400 rcotrone@unRAID:/mnt/disk1/TMP$ stat /mnt/user/TMP/datafile.zip File: `/mnt/user/TMP/datafile.zip' Size: 282096032 Blocks: 551514 IO Block: 131072 regular file Device: 10h/16d Inode: 165509 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 1000/rcotrone) Gid: ( 100/ users) Access: 2014-08-10 14:37:49.000000000 -0400 Modify: 2014-08-10 14:37:50.000000000 -0400 Change: 2014-08-10 14:37:50.000000000 -0400
This test should probably be done with a samba mount and/or a NFS mount.
Anyway, the protection is at the application level, FUSE obfuscates any potential application protection.
At the very least, if a matching share directory is 'excluded' or not explicitly 'included' perhaps there should be some priority adjustment in that files having the same name cannot be overwritten via an open create.
I.E. It takes an explicit definition in order for a user share file to be the target of an open create.
-
If so, then it should be fairly easy to resolve this like Windows does, by simply appending a suffix to the new filename.
Tom said he was going to explore something with Samba to see if the file is opened once or multiple times. That could have a big impact on a solution.
http://lime-technology.com/forum/index.php?topic=34480.msg321241#msg321241
-
The issue is't that the docker apps are well behaved or not.
cp and mv protect you from overwriting the same file by checking the stat information.
same device/inode prevents an overwrite.
These tools fail to protect because the usershare reports different device/inodes.
These tools are well behaved, the core is the disconnect from one filesystem to the other.
The fuse/usershare layer needs an improvement or a general warning.
If the basic tools can protect you, then any solution we come up with should at least try to do the same.
To turn your back on not warning people or attempting to provide a potential solution is irresponsible.
We only know how many people have been bitten by this issue because bjp999 has been voicing it.
If midnight commander is included to operate at the command line via telnet, can that be victim of the issue?
FWIW, I'm not expecting a full complete solution in the next release, but we should continue to discuss and try to come up with idea's on the best way to protect people from themselves with the tools that are included in the operating system.
People are going to make mistakes, but the operating system should not add to those without warning.
-
One thing I'd like to add. We are not and cannot be responsible for what someone does to their system via command line. Root access completely eliminates our ability to do that. My view is that the solution must work for users but only outside the context of a command line method.
I disagree. cp or mv or some derivative vial perl can be used by other programs and tools.
scripts, plugins and/or docker apps could possibly do this.
-
[ Knowing it is coming from a disk share and is going back to the same directory as the source would make this about the same level of difficulty as some of the options I listed.
It all depends on where the program is running and if the source / destination are the same inode and/or device id.
With disk mount and user share, they vary as can be seen in the stat information I posted previously.
This is why cp and mv fail, rsync succeeds because it uses hidden tmp files.
My thought is to do something more intelligent and intercept the open in fuse.
I.E. Inspect the type of open, If it's a read/write, then open it in place.
Or do some kind of lock or fcntl lock look up at the pids to see if the file is in use.
If so, then write to a temporary file and rename it after close. or version the files. (I.E. mimic rsync)
Or Version the files when they are opened as output with truncate.
-
WeeboTech's suggestion to require specific inclusion and not automatically include disks based on top level folder names would work as long as the user specifically removed a disk from the share before doing any copies from it to the share. But could still cause the same issue if a user simply copied from a disk share to the user share without removing the disk from the share first.
Perhaps the simplest "fix" would be to take a cue from Windows (and other OS's that handle it the same way) ... if a write (create) to a user share specifies a filename that already exists in the share, just append a suffix to the filename. i.e. if I copy a file, say Test.Doc, from a folder to the same folder in Windows, the name is automatically changed to Test - Copy.Doc. If I copy it again, it will automatically be called Test - Copy (2).doc; etc. This would eliminate the data loss ... at the "cost" of renaming a few files after you had copied them and then deleted the originals. And if you simply renamed the top level folder so it was no longer part of the share then the file wouldn't be renamed.
If I remember correctly, there was the topic of version'ing. So the file could be versioned as the example defines.
But then how do you handle files that are updated incrementally. There's going to be continual versions.
Which means every write to update will require copying the file and writing to the new file.
It all depends on how version'ing is used. on VMS new files are made with filename;verison
however you can also set a limit on version, so anything over 5 versions will start removing the earlier versions so only the latest 5 versions are available.
I'm not educated on FUSE, so I can only speculate, that the intercepted open call can handle all the renaming and pruning.
-
There is only ONE way that a user would know to avoid this issue - reading MY posts on the forum. An unknowing "intelligent user" would go into user shares, remove a disk from the share, and then copy data from that disk share to the user share. Perfectly logical. Its what any of us would reasonably do. And we'd lose all our copied data. (I could easily imagine a customer doing this, losing his kids pictures, finding out LimeTech knew this and didn't warn him, and going after LimeTech for the data loss.) It is not intelligence that is needed to avoid the issue!
This could be resolved with not allowing implicit usershares to be created at top level directories. (by some option).
I.E. usershare configurations must be specifically set to the disks for inclusion into the usershare.
Thus when you take a disk out of a usershare. FUSE/usershare no longer sees it.
This could also be resolved with a user renaming the directory first, then doing the move.
Either way a user needs to be educated.
With locks there may be a way to grab the pid of each process holding a file open, thus determining if an open read/open write with truncate is being done by the same process and catching the condition.
I would assume FUSE intercepts the open, and at that point can do the lookup if the open is a write that will truncate.
-
2. Detecting an overwriting of the same inode may not be technically possible due to the obfuscation that happens within the user shares
This 'may' be possible with file locks.
3. An option of detecting an attempt to copy a file from a disk share to a user share and omitting the disk share from the possible destination has been suggested and hopefully Tom will comment on whether that or something similar is possible.
If there was an option that could be turned on temporarily in that every new open for write would cause the user share to re-balance the destination, I.E. Write to a new location avoiding the current disk. Write to the new name and remove the old one.
Frankly, the safest way to move files locally via console is with rsync as it creates a .tmpname before removing the old file.
-
The issue I see is that a disk file and a user share file do not share the same device and inode, which is why cp allows clobbering the file.
root@unRAID:/mnt/disk1/Music/DJMUSIC# ls -l /mnt/user/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml /mnt/disk1/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml -rw-rw---- 1 rcotrone users 22487065 2012-09-01 16:42 /mnt/disk1/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml -rw-rw---- 1 rcotrone users 22487065 2012-09-01 16:42 /mnt/user/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml root@unRAID:/mnt/disk1/Music/DJMUSIC# stat /mnt/user/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml /mnt/disk1/Music/DJMUSIC/VirtualDJ\ Local\ Database\ v6.xml File: `/mnt/user/Music/DJMUSIC/VirtualDJ Local Database v6.xml' Size: 22487065 Blocks: 43970 IO Block: 131072 regular file Device: 10h/16d Inode: 164905 Links: 1 Access: (0660/-rw-rw----) Uid: ( 1000/rcotrone) Gid: ( 100/ users) Access: 2013-03-20 12:19:03.000000000 -0400 Modify: 2012-09-01 16:42:03.000000000 -0400 Change: 2013-03-20 08:20:47.000000000 -0400 File: `/mnt/disk1/Music/DJMUSIC/VirtualDJ Local Database v6.xml' Size: 22487065 Blocks: 43970 IO Block: 4096 regular file Device: 901h/2305d Inode: 147702 Links: 1 Access: (0660/-rw-rw----) Uid: ( 1000/rcotrone) Gid: ( 100/ users) Access: 2013-03-20 12:19:03.000000000 -0400 Modify: 2012-09-01 16:42:03.000000000 -0400 Change: 2013-03-20 08:20:47.000000000 -0400
-
The proposed new logic totally breaks how I use my server.
I organize my files read/write on the disk shares for speed.
I read some folders via the user shares for organizational consolidation.
Example I have to capture the streamed music to the fastest disk share, then reorganize it after tagging to the user share.
If someone was moving files from a disk share to a movie share, making the matching disk directories on the user share readonly would prevent them from being clobbered.
I.E. disabling writes to a specific disk that is participating in the user share.
The other option is to use rsync, which will copy to a .tmpname then move it into place.
Is there some way the user share layer can know a file is being opened read and read/write and prevent the clobbering?
Maybe there's a way to turn on a no clobber switch temporarily if the file being read is the file that is going to be opened for writing.
Perhaps the user share, versions the file during the open for write (renames) and allows the open for write on a new file then removes the old file when it is closed. This means for the purpose of the write, extra space is used. (as in rsync).
-
What about something like unionfs or aufs ?
What for? BTW I would probably never add those filesystems.
The feature that's wanted is a way to minimize what's stored on the flash.union or aufs would allow two filesystems to look like one. Thus minimizing what is stored on the flash, but looks like its on the flash.
With something like unionfs and aufs and root on tmpfs, the whole root tree can be made to have a transient ramdisk
version and/or a full operating system version.
I tested it years ago, It worked for me. I'm sure they've come a long way since then. It was just a quick idea.
-
What about something like unionfs or aufs ?
-
Maybe it's not as simple as it sounds, but I haven't seen discussion of that.
It's not that simple. The dentry table is an internal kernel table that cannot be swapped out.
The biggest possible advantage would be to have the rootfs changed to tmpfs which could swap out unused parts of the ram filesystem. That's a pretty big internal change to unRAID and I do not think that is going to happen.
Having swap space allows memory hungry program to swap out or allows the /var/log to be swapped out.
However, In a normal system this is not the case.
-
Part of it is also how many disks used, buffering (md driver settings) and kernel tunings.
They all come into play. More disks, higher buffering, less low memory.
64 Bit unRAID should alleviate that.
-
Maybe we need to fork this off and speak to limetech about an alternative location on the cache drive.
Native Cache_Dirs Support
in Unscheduled
Posted
I did the ftw across 3 drives of data, storing all stat structures in a .gdbm file.
fetched 0 records, deleted 0 records, stored 1151935 records
real 71m26.353s
user 0m17.040s
sys 2m4.010s
root@unRAID:~# ls -l --si /tmp/statcache.gdbm
-rw-rw-r-- 1 root root 179M 2014-08-16 19:38 /tmp/statcache.gdbm
The issue with this approach is scanning through all the keys to find a match can take time as the number of files increase.
Here's an example.
using a bash loadable library, I'm able to access the gdbm file at the bash level directly.
A single key look up is pretty fast.
root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# enable -f ./bash/bash-4.1/examples/loadables/gdbm gdbm
root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# gdbm
gdbm: usage: gdbm [-euikvr] [-KVW array] file [key | key value ...]
time gdbm /tmp/statcache.gdbm /mnt/disk3/Music/music.mp3/Jazz/Various\ Artists/The\ Art\ Of\ Electro\ Swing/01\ Tape\ Five\ -\ Madame\ Coquette\ \(Feat.\ Yuliet\ Topaz\).mp3
<binary data here>
real 0m0.003s
user 0m0.010s
sys 0m0.000s
Yet traversing all the keys of 1 million files takes time.
root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | grep 'Jazzy Lounge - The Electro Swing Session' | wc -l
56
real 0m11.225s
user 0m14.280s
sys 0m2.290s
root@unRAID:/mnt/disk1/home/rcotrone/src.slacky# time gdbm -k /tmp/statcache.gdbm | wc -l
1,151,935
real 0m7.236s
user 0m3.930s
sys 0m8.350s
This might be faster with an sqlite table or mmap'ed file.
Point is, there's quite a bit that would have to go into this to cache the stat data outside of the kernel.
So kernel patches may be better then doing this at an application level. I don't know for sure.
When the next uNRAID release is available with SQLite compiled into PHP we can build a table with filenames and some stat data for a browser based locate function. We'll also be able to store md5's in there too.