Accelerator drives


jonp

Recommended Posts

So whats the general consensus... keep this simple for the interim (potentially until it is formally accepted) and just use solely a max filesize.

 

For me at least this is surprisingly predictable grabbing almost solely the files I would define regardless of how complicated we could make it.

 

However I am running it on usershare of similar data and potentially we should be looking to do this on the disk level across all shares? Obviously that adds the extra complication of needing to check usershare status both in terms of "is it shared" but also ignore cache only shares. Perhaps this is a complication we dont need at least until it is formally picked up?

 

 

Freddie, I expect I know the answer to this but I can see that the date on some folders after the move is the time of the move and not the src folder as expected. I suspected this is caused when the file was deeper than one folder deep and only the deepest folder was getting the correct date however that doesnt seem to be the case. I am not sure sure this makes 100% sense and I have no clue what the correct action would be when the interim destination folders already exist (update timestamps or ignore) but I thought I would feedback. I would have though that if a folder doesnt exist it should get the timestamp of the src folder?

Link to comment

You can use c instead then do the math in the shell itself.

 

-size -4097c would move files of 1 block.

 

Thanks, I will likely do that.

 

You seem like values around 4096c and now you bring up the concept of blocks. Are you wanting to move small files onto a filesystem with a smaller block size?

Link to comment

So whats the general consensus... keep this simple for the interim (potentially until it is formally accepted) and just use solely a max filesize.

 

I say keep it simple for now. But I don't see how the whole concept of Accelerator drives is going to benefit me all that much. For video management I use Kodi and all the metadata is stored in a shared database and thumbnails are cached on local clients.

 

NAS, have you actually moved files onto an accelerator drive? Have you realized any of the benefits you are looking for?

 

Freddie, I expect I know the answer to this but I can see that the date on some folders after the move is the time of the move and not the src folder as expected. I suspected this is caused when the file was deeper than one folder deep and only the deepest folder was getting the correct date however that doesnt seem to be the case. I am not sure sure this makes 100% sense and I have no clue what the correct action would be when the interim destination folders already exist (update timestamps or ignore) but I thought I would feedback. I would have though that if a folder doesnt exist it should get the timestamp of the src folder?

 

I really haven't paid much attention to the timestamps. My expectation is if a folder doesn't exist on the destination disk, rsync will preserve the times. But if the directory exists on the destination disk and the contents of the directory are changed, the filesystem will update the modified time of the directory.

Link to comment

You can use c instead then do the math in the shell itself.

 

-size -4097c would move files of 1 block.

 

Thanks, I will likely do that.

 

You seem like values around 4096c and now you bring up the concept of blocks. Are you wanting to move small files onto a filesystem with a smaller block size?

 

 

Currently, blocksizes in filesystems are 4096. Advanced format drives write in blocks of 4096. I believe SSD's do as well.

Moving only files smaller then 1024 or so, seems like a waste of space.

 

It can really be anything a user chooses based on spare space of an accelerator drive.

 

 

When reviewing my md5sum/folder.hash files on a huge filesystem, they range from 2k to 8k.

So in thinking that, if I wanted to consolidate all hash files somewhere else in the same type of tree I might elect to move files meeting a certain name standard that are 32K or less. (28K was the largest I've seen).

 

 

At this point I don't see myself needing to accelerate these type of files. However I might want to consolidate them and/or have a way to back them up.  In the future I might have a need for the folder.par2 files to exist on a separate cache drive just for management purposes. I've not gotten that far with it.

Link to comment

One of the problems we are going to have is quantifying just how beneficial this is in a scientific way.

 

Personally I am not to bothered with that as I know that the logic is so sound that there are no real donwsides of putting all your small files on your super fast solid state drive. The SAN guys have been doing similar for as long as I can remember and whilst they do it on the block levels far more dynamically than us the concept is the same.

 

Once i have completed moving  all the files I suspect I will have used no more that a few GB of the SSD. At this point I am going to move the next level of accelerator data across which for me is kids stuff . I have a few shows/movies that the kids watch over and over. I can tell what these are using Kodi playcount since we dont have filesystem stats to use (which is a shame as it could really help us expand this feature).

 

If I have any space left then I will start considering my third tier of accelerator data which is recently aired TV type data. I am thinking that a FILO type system keeping the stuff most liekly to be watched on the accelerator drive might be nice. Think of it like a second tier cache drive

 

Will post some rough space vs file count stats as I go.

Link to comment

Freddie would it be possible to change diskmv to default to 4k but also allow the file size limit to be set at the command line. It feels a bit ugly for users to be editing the script long term.

 

Stats update:

 

using the current default diskmv size cap resulted in 19,109 files being moved from my spinners onto the accelerator drive for a whoping 79MB of disk space.

Link to comment

Freddie would it be possible to change diskmv to default to 4k but also allow the file size limit to be set at the command line. It feels a bit ugly for users to be editing the script long term.

 

That is my plan, except I was going to increase the default size limit. I was also going work on adding an option to specify file name extensions (first cut will probably be hard coded like the size limit is currently).

 

I would still like to hear about any benefits you see from the accelerator drive, doesn't need to be quantitative. Are you spinning up drives less frequently? Are you seeing less lag in media players?

Link to comment

I dont think I will "feel" any different other than the occasionally lower pauses as things spin. This should more marked once I up the file size ceiling beyond subtitle (srr) file size.

 

Time permitting I will do another run changing to 100k tomorrow.

Link to comment

(first cut will probably be hard coded like the size limit is currently)

 

For the future, you can specify the size in a variable with a default and also let it be overridden on the command line.

 

SIZE=${SIZE:=4096}

 

 

Equates to SIZE=4096 if nothing is in the environment.

if you run the script with

 

SIZE=1024000 ./diskmvscript

 

Size will be overridden from the passed environment override.

 

It's a quick way to provide defaults with overrides while developing.

 

Link to comment

Thanks thats a good tip.

 

I went ahead and upped the ceiling to 100kB. This is based on best guesstimated figure using some ad hoc "find" testing.

 

I am quite impressed still, the stats are now 53,748 files consuming only 1.2GB.

 

To go beyind this I am in danger of picking up stuff i dont want so really at this point I need to be thinking in terms of "up to xxMB but only if filetype X,Y or Z"

 

Link to comment

Master branch of diskmv on github now has an argument for the small file option. The user is required to specify the max size in kilobytes, there is no default max size limit. There is also a new option to specify file name extensions.

 

https://github.com/trinapicot/unraid-diskmv

 

example:

root@tower:~# diskmv -e 'txt,n?o,ha sh' -s 4 /mnt/user/dup/test/ cache disk2
Running in test mode, no files will be moved.
Moving /mnt/cache/dup/test into /mnt/disk2/dup/test
./dup/test/xyz zyx/exttest.ha sh
./dup/test/xyz zyx/exttest.nfo
./dup/test/xyz zyx/exttest.txt
./dup/test/exttest.nfo
./dup/test/exttest.txt
diskmv finished
... but it ran in test mode

 

 

Link to comment

It would seem that the either the upgrade to v6 14b or a array restart has resulted in this no longer working at all.

 

A disk excluded from the array in global settings is now completely excluded. Before it was still part of the array but not used in any write actions.

 

This screws us 100%

 

LT/jonp can you look into this?

Link to comment

I would still like to hear about any benefits you see from the accelerator drive, doesn't need to be quantitative. Are you spinning up drives less frequently? Are you seeing less lag in media players?

 

I am prepared to comment on this now. It is unscientific but I have always found that certain usage pattern would starve cache_dirs and result in a wide set of disk spinups. On the face of it populating the Accelerator drive reduces the impact of this considerably. The Accelerator drive is spun up a lot but being a SSD i dont really care about this as the spinners are down more than ever (as in almost always).

 

Unfortunately the only way to use the Accelerator drive this now is part of the normal array which appears to be due to a change in the way globally excluded drives are handled post v6b14b.

 

I am convinced more than ever we want this as an official feature.

Link to comment

Suggestion #1: -e should be case insensitive.

Suggestion #2: a summary at the end of count and size moved would be very helpful

 

Suggestion #1: sure

Suggestion #2: probably not any time soon

 

It would seem that the either the upgrade to v6 14b or a array restart has resulted in this no longer working at all.

 

A disk excluded from the array in global settings is now completely excluded. Before it was still part of the array but not used in any write actions.

 

That's how I expected the global settings to work, based on my recollection of some explanations limetech posted in the user share copy bug discussions. I think there is a good chance the array restart caused the change in behavior, not the version upgrade.

 

Unfortunately the only way to use the Accelerator drive this now is part of the normal array which appears to be due to a change in the way globally excluded drives are handled post v6b14b.

 

Can you make it work by excluding the Accelerator drive in the individual user share settings?

Link to comment

I will need to test properly.

 

I really need to get a new USB key and get a test rig again... hmm time for vmware i think although I was hoping for some real LT feedback on what actually happens back end rather than trying to derive it from observation.

Link to comment

It would seem that the either the upgrade to v6 14b or a array restart has resulted in this no longer working at all.

 

A disk excluded from the array in global settings is now completely excluded. Before it was still part of the array but not used in any write actions.

 

This screws us 100%

 

LT/jonp can you look into this?

 

There is a "global" user share include/exclude mask configured on Settings/Global Share Settings page.  These masks define which disks will be put in the 'shfs' union.  For example if you select Disk 1 and Disk 2 to be excluded, then any shares/files on those disks will be completely invisible via user shares.  These settings may only be changed when the array is Stopped (though in certain beta releases there is a bug which lets you change these settings with array started, but these settings will not take place until next array stop/start).

 

Next there are per-user-share include/exclude masks configured by clicking the named share link on the Shares page.  These masks define which disks are possibly eligible to have new objects (directories and files) created on them with a two exceptions:

1. If share is configured to use cache, and cache has available space, new object will be created there.

2. If the directory where the new object is to be created is beyond the share 'split level' then the new object will be created on that disk which contains the parent directory.

 

These settings apply immediately after they are applied (no need to stop/start array).

 

Other details:

- A 'cache only' share fores new objects to always be created on the cache disk/pool.  If not enough space to create a new object there, the operation fails with "out of space" error.

- The 'mover' follows these same rules when moving objects off the cache disk/pool to the unraid array.

- Note that these rules apply to new object creation.  A file truncate followed by write/append, or an append, or simply opening a file for write all operates on the file on whatever disk it currently exists on.

 

Why both include and exclude?  It just lets you look at how you want to view this function in two different ways.  Sometimes you want to partition your set of disks so that say, Disk1 is for "Backup", Disk2-4 is for "Photos", all remaining disks for "Videos".  In this case it's convenient to:

- set included mask to "Disk1" for Backup

- set included mask to "Disk2, Disk3, Disk4" for Photos

- set excluded mask to "Disk1, Disk2, Disk3, Disk4" for Videos.

 

If an 'included' mask is left blank it means "select all disks".

If an 'excluded' mask is left blank it means "select no disks".

 

Finally, while on the subject, if you want to move all files off a particular disk, moving them to other disks via user share file system, the only safe way to do it is to first remove the source disk from user shares using the Global Share Settings included/excluded mask.

Link to comment

Finally, while on the subject, if you want to move all files off a particular disk, moving them to other disks via user share file system, the only safe way to do it is to first remove the source disk from user shares using the Global Share Settings included/excluded mask.

I think I understand what you mean by moving them to other disks via user share file system, but if you remove the source disk from the global share, then by definition the files are no longer shown in the user share file system.

 

Am I correct in restating what you said if I say...

 

The only safe way to mix user shares and disk shares in a move operation is if the disk share in question is excluded on the Global Share Setting?

 

I was under the impression that moving files is currently completely safe as long as the source and destination are either both user or both disk shares, the risk occurs when you move from a disk share to a user share or vice versa.

Link to comment

... I was under the impression that moving files is currently completely safe as long as the source and destination are either both user or both disk shares, the risk occurs when you move from a disk share to a user share or vice versa.

 

No, the only "safe" way other than using the Global exclude as Tom noted is to ensure BOTH the source and destination are NOT user shares (i.e. both should be disk shares).    This is the "user share copy bug", which can result in significant data loss.  It's arguable whether it's really a "bug" or just a consequence of the structure of the share unions ... but nevertheless it's something to be avoided.

 

Tom has looked into this quite a bit, and his suggestion to simply use the Global excludes is the only safe way to do moves using the user share system.

 

 

 

Link to comment

... I was under the impression that moving files is currently completely safe as long as the source and destination are either both user or both disk shares, the risk occurs when you move from a disk share to a user share or vice versa.

 

No, the only "safe" way other than using the Global exclude as Tom noted is to ensure BOTH the source and destination are NOT user shares (i.e. both should be disk shares).    This is the "user share copy bug", which can result in significant data loss.  It's arguable whether it's really a "bug" or just a consequence of the structure of the share unions ... but nevertheless it's something to be avoided.

So if I understand you correctly Gary, you just told me that I can lose files if I move them using only user shares and not involving disk shares at all? If so, that's definitely a new bug that I was not aware of. I've been moving files around in the user share structure for as long as I've had unraid, and never had an issue.

 

Could you explain to me under what circumstances I would expect to see data loss when moving data using only the user shares?

Link to comment

When the source and destination is the same. It will read a little bit of the source file, then create the destination file by zero out, which clobbers the source file.

 

UNSAFE COPIES examples

 

mv /mnt/user/Movies/MovieName /mnt/user/Movies/MovieName

mv /mnt/disk#/Movies/MovieName /mnt/user/Movies/MovieName

Link to comment

When the source and destination is the same. It will read a little bit of the source file, then create the destination file by zero out, which clobbers the source file.

 

UNSAFE COPIES examples

 

mv /mnt/user/Movies/MovieName /mnt/user/Movies/MovieName

mv /mnt/disk#/Movies/MovieName /mnt/user/Movies/MovieName

That is exactly what I understood as well. It's only dangerous if you mix /mnt/disk and /mnt/user paths.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.