NAS Posted October 5, 2014 Share Posted October 5, 2014 Cache disks offer a few benefits, the two biggest of which are write speed and predictable and efficient spin disk spin up i.e. a whole days worth of writes will only cause a single array spin up. I wonder if optionally we could do the same with deletes where an array file delete only actually happens at the same time as the mover script thus within the same single array spin up. Obviously the devil is in the detail on things like how do we make a file look deleted when it really isn't "yet" and perhaps auto bypassing this when it is obvious a user needs disk space. An obvious side benefit of this is that you also have a single day window to recover any accidentally deleted files. That in itself is a big deal. Thoughts? Quote Link to comment
jonp Posted October 5, 2014 Share Posted October 5, 2014 Cache disks offer a few benefits, the two biggest of which are write speed and predictable and efficient spin disk spin up i.e. a whole days worth of writes will only cause a single array spin up. I wonder if optionally we could do the same with deletes where an array file delete only actually happens at the same time as the mover script thus within the same single array spin up. Obviously the devil is in the detail on things like how do we make a file look deleted when it really isn't "yet" and perhaps auto bypassing this when it is obvious a user needs disk space. An obvious side benefit of this is that you also have a single day window to recover any accidentally deleted files. That in itself is a big deal. Thoughts? Essentially what you're looking for is a recycle bin that empties automatically at the same time that the mover operates. This is an interesting idea. Quote Link to comment
WeeboTech Posted October 5, 2014 Share Posted October 5, 2014 With the user share, it's possible. A .gdbm file to cache the files set to be deleted. If the file is in the .gdbm file, user share should skip it upon directory read. This keeps the data out of ram, the disks in a spin down state and a highly accessible rapid look up, unless you are traversing the whole user share. Then every file needs to be checked if it's in the recycle cache. If this cache were kept in ram it would be very fast also. I can tell you this much. When traversing 300,000 files in a pre-cached filesystem there is more time, but it's still feasible. Case in point. ftw64 traverses my disk 3 which has 300,000 files in 1 second. (precached). ftw64 traversing disk 3 and also checking for changes against the matching .gdbm stat cache, 4 seconds. That's 300,000 lookups and memcmp's on a key (file) and stat struct. While my goal is much different then a delete queue, it shows that lookup in the .gdbm table is very fast. I tried re-doing with an internal memory linked list and I could not achieve the same speed. That's right, sequentially searching a memory based linked list over and over for each of 300,000 file's status took longer. Mostly because I was doing a sequential search. When it came down to it, Implementing a hash oriented lookup was more effort then I needed to prove/disprove viability and/or speed comparison. We also have sqlite (Albeit slower) and a flat file that has a list of files that were requested to be unlinked. Quote Link to comment
drumstyx Posted January 13, 2020 Share Posted January 13, 2020 Necro-ing this thread to put a vote in for this. It's not a *huge* deal, but would be very nice. It basically completes the loop on all writes only ever happening when mover runs, and parity drives staying spun down 23 hours out of the day. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.