Cache disk like deletes


NAS

Recommended Posts

Cache disks offer a few benefits, the two biggest of which are write speed and predictable and efficient spin disk spin up i.e. a whole days worth of writes will only cause a single array spin up.

 

I wonder if optionally we could do the same with deletes where an array file delete only actually happens at the same time as the mover script thus within the same single array spin up.

 

Obviously the devil is in the detail on things like how do we make a file look deleted when it really isn't "yet" and perhaps auto bypassing this when it is obvious a user needs disk space.

 

An obvious side benefit of this is that you also have a single day window to recover any accidentally deleted files. That in itself is a big deal.

 

Thoughts?

Link to comment

Cache disks offer a few benefits, the two biggest of which are write speed and predictable and efficient spin disk spin up i.e. a whole days worth of writes will only cause a single array spin up.

 

I wonder if optionally we could do the same with deletes where an array file delete only actually happens at the same time as the mover script thus within the same single array spin up.

 

Obviously the devil is in the detail on things like how do we make a file look deleted when it really isn't "yet" and perhaps auto bypassing this when it is obvious a user needs disk space.

 

An obvious side benefit of this is that you also have a single day window to recover any accidentally deleted files. That in itself is a big deal.

 

Thoughts?

Essentially what you're looking for is a recycle bin that empties automatically at the same time that the mover operates.  This is an interesting idea.

Link to comment

With the user share, it's possible. A .gdbm file to cache the files set to be deleted.

If the file is in the .gdbm file, user share should skip it upon directory read.

This keeps the data out of ram, the disks in a spin down state and a highly accessible rapid look up, unless you are traversing the whole user share. Then every file needs to be checked if it's in the recycle cache.

If this cache were kept in ram it would be very fast also.

 

I can tell you this much.

 

When traversing 300,000 files in a pre-cached filesystem there is more time, but it's still feasible.

 

Case in point.

ftw64 traverses my disk 3 which has 300,000 files in 1 second. (precached).

ftw64 traversing disk 3 and also checking for changes against the matching .gdbm stat cache, 4 seconds.

That's 300,000 lookups and memcmp's on a key (file) and stat struct.

 

While my goal is much different then a delete queue, it shows that lookup in the .gdbm table is very fast.

I tried re-doing with an internal memory linked list and I could not achieve the same speed.

That's right, sequentially searching a memory based linked list over and over for each of 300,000 file's status took longer.

 

Mostly because I was doing a sequential search.  When it came down to it, Implementing a hash oriented lookup was more effort then I needed to prove/disprove viability and/or speed comparison.

 

We also have sqlite (Albeit slower) and a flat file that has a list of files that were requested to be unlinked.

Link to comment
  • 5 years later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.