Cache disk like deletes

NAS · October 5, 2014

Cache disks offer a few benefits, the two biggest of which are write speed and predictable and efficient spin disk spin up i.e. a whole days worth of writes will only cause a single array spin up.

I wonder if optionally we could do the same with deletes where an array file delete only actually happens at the same time as the mover script thus within the same single array spin up.

Obviously the devil is in the detail on things like how do we make a file look deleted when it really isn't "yet" and perhaps auto bypassing this when it is obvious a user needs disk space.

An obvious side benefit of this is that you also have a single day window to recover any accidentally deleted files. That in itself is a big deal.

Thoughts?

jonp · October 5, 2014

Cache disks offer a few benefits, the two biggest of which are write speed and predictable and efficient spin disk spin up i.e. a whole days worth of writes will only cause a single array spin up.

I wonder if optionally we could do the same with deletes where an array file delete only actually happens at the same time as the mover script thus within the same single array spin up.

Obviously the devil is in the detail on things like how do we make a file look deleted when it really isn't "yet" and perhaps auto bypassing this when it is obvious a user needs disk space.

An obvious side benefit of this is that you also have a single day window to recover any accidentally deleted files. That in itself is a big deal.

Thoughts?

Essentially what you're looking for is a recycle bin that empties automatically at the same time that the mover operates. This is an interesting idea.

WeeboTech · October 5, 2014

With the user share, it's possible. A .gdbm file to cache the files set to be deleted.

If the file is in the .gdbm file, user share should skip it upon directory read.

This keeps the data out of ram, the disks in a spin down state and a highly accessible rapid look up, unless you are traversing the whole user share. Then every file needs to be checked if it's in the recycle cache.

If this cache were kept in ram it would be very fast also.

I can tell you this much.

When traversing 300,000 files in a pre-cached filesystem there is more time, but it's still feasible.

Case in point.

ftw64 traverses my disk 3 which has 300,000 files in 1 second. (precached).

ftw64 traversing disk 3 and also checking for changes against the matching .gdbm stat cache, 4 seconds.

That's 300,000 lookups and memcmp's on a key (file) and stat struct.

While my goal is much different then a delete queue, it shows that lookup in the .gdbm table is very fast.

I tried re-doing with an internal memory linked list and I could not achieve the same speed.

That's right, sequentially searching a memory based linked list over and over for each of 300,000 file's status took longer.

Mostly because I was doing a sequential search. When it came down to it, Implementing a hash oriented lookup was more effort then I needed to prove/disprove viability and/or speed comparison.

We also have sqlite (Albeit slower) and a flat file that has a list of files that were requested to be unlinked.

drumstyx · January 13, 2020

Necro-ing this thread to put a vote in for this. It's not a *huge* deal, but would be very nice. It basically completes the loop on all writes only ever happening when mover runs, and parity drives staying spun down 23 hours out of the day.

Cache disk like deletes

Recommended Posts

NAS

Link to comment

jonp

Link to comment

WeeboTech

Link to comment

drumstyx

Link to comment

Join the conversation