ljm42 Posted February 5, 2016 Share Posted February 5, 2016 I recently defragmented my XFS array drives and I thought I'd document the process. Note: defragmenting takes a long time, so if you decide to do this you should either be working directly on the console (i.e. with a keyboard and monitor attached directly to the system) or via SSH and screen (from the NerdPack plugin). If someone has success using Shell in a Box let me know, otherwise I recommend you avoid it. First, I disabled my monthly parity check and Dynamix File Integrity cron jobs, just to ensure they wouldn't slow things down. I left Cache Dirs running. Then I ran xfs_db on each drive to see whether they were fragmented. Here are the commands I typed for disk3, along with some comments and the results: # The -r means read-only, so for this step no changes are made # do not use /dev/sdXX # for 6.12 and newer use /dev/mdXp1 # for 6.11 and older use /dev/mdX root@Tower:~# xfs_db -r /dev/md3 # "frag" shows overall fragmentation xfs_db> frag actual 2031, ideal 1419, fragmentation factor 30.13% # "frag -d" shows directory fragmentation xfs_db> frag -d actual 29, ideal 29, fragmentation factor 0.00% # "frag -f" shows file fragmentation xfs_db> frag -f actual 1317, ideal 705, fragmentation factor 46.47% xfs_db> quit I'm not sure what the cutoff should be, but I decided 46% file fragmentation was past it. So I ran this to defrag it: xfs_fsr -v /dev/md3 This is a 4TB drive with 3TB of data (mostly movies). It ran for nearly 48 hours! When I checked the results afterwards, fragmentation was improved although not as much as I expected: root@Tower:~# xfs_db -r /dev/md3 xfs_db> frag actual 1608, ideal 1419, fragmentation factor 11.75% xfs_db> frag -d actual 29, ideal 29, fragmentation factor 0.00% xfs_db> frag -f actual 894, ideal 705, fragmentation factor 21.14% xfs_db> quit I ran it a second time, but the results were almost identical. Afterwards: I ran a parity check and there were no errors, confirming that parity is not affected by this process. I used the "check" feature of the Dynamix File Integrity plugin and confirmed that the files were all fine. Crashplan did not detect any changes in the files either. So the process worked, although it didn't reduce fragmentation as much as I expected it to. Open questions: How often does the fragmentation level need to be checked? What fragmentation % warrants a defrag? At what point does fragmentation actually become an issue? I'm guessing somewhere around 80-90%? Anyway, hopefully this will help someone. 3 Quote Link to comment
ljm42 Posted February 5, 2016 Author Share Posted February 5, 2016 A few other details... I did have a minor issue related to the Dynamix File Integrity plugin, but it was easily solved. When the defrag first started moving files, I got warnings related to the temp files it creates: /bin/md5sum: /mnt/disk3/.fsr/ag3/tmp14052: No such file or directory stat: cannot stat '/mnt/disk3/.fsr/ag3/tmp14052' : No such file or directory stat: cannot stat '/mnt/disk3/.fsr/ag3/tmp14052' : No such file or directory setfattr: /mnt/disk3/.fsr/ag3/tmp14052 : No such file or directory getfattr: /mnt/disk3/.fsr/ag3/tmp14052 : No such file or directory Luckily, xfs_fsr creates its temp directory in the root of the disk, so it looks like a user share. To solve this I just went to the File Integrity Settings page and told it exclude the .fsr directory (you'll have to start the defrag before you have the option to exclude the directory) I made a request for the File Integrity plugin to ignore this temp directory by default: https://lime-technology.com/forum/index.php?topic=44989.msg442250#msg442250 -- One of my drives had 7% directory fragmentation and no file fragmentation. I ran xfs_fsr but it had no effect. It is possible that Cache Dirs prevented changes to the directory structure, but I haven't looked into it. Update 2/14 - I disabled Cache Dirs and re-ran it, it had no effect on directory fragmentation 2 Quote Link to comment
CHBMB Posted February 5, 2016 Share Posted February 5, 2016 As a file archival system with the kind of write once, read many sort that a lot of user's media collections are I suspect it's not a massive issue in everyday use. But I can see it being a possible issue when the above isn't the way the array is being used. Interesting though and I'll be following this thread keenly for my own education. Thanks for posting. Quote Link to comment
wgstarks Posted February 5, 2016 Share Posted February 5, 2016 How did you exclude the .xfs directory in File Integrity? The only exclusion options I have to select from is my user shares. Don't see how to manually enter an exclusion. Quote Link to comment
ljm42 Posted February 5, 2016 Author Share Posted February 5, 2016 How did you exclude the .xfs directory in File Integrity? The only exclusion options I have to select from is my user shares. Don't see how to manually enter an exclusion. Once you start the defrag, it will create the .xfs folder. Since it is in the root, it will look like a user share and you'll be able to exclude it. I modified the description to hopefully make that more clear. 2 Quote Link to comment
ljm42 Posted February 5, 2016 Author Share Posted February 5, 2016 As a file archival system with the kind of write once, read many sort that a lot of user's media collections are I suspect it's not a massive issue in everyday use. But I can see it being a possible issue when the above isn't the way the array is being used. Interesting though and I'll be following this thread keenly for my own education. Thanks for posting. Agreed, it may be that this isn't really an issue that we need to be concerned with. But I figured it was worth investigating now that defrag is an option. For science! I was a little surprised that almost half of the files on my disk3 were fragmented. Even so, a single fragment in a 2 hour movie is not going to cause any problems 2 Quote Link to comment
BRiT Posted February 7, 2016 Share Posted February 7, 2016 Of my 4 data drives, only 1 of them was highly fragmented but only in the directory structure portion. That 4TB disk is 53% used at 2TB and had 17% directory fragmentation but only 6% file fragmentation. This drive is only used for TV Show episodes. I started the defrag late Thursday night (2016-02-04) and it's still going. I hope it finishes tonight. Quote Link to comment
BRiT Posted February 7, 2016 Share Posted February 7, 2016 So my drive is still defragging, well over 48 hours I believe. I wonder if it has anything to do with also running cache_dirs and not shutting it down before the defrag process. Quote Link to comment
ljm42 Posted February 7, 2016 Author Share Posted February 7, 2016 I wasn't able to find a "% complete" anywhere, so I don't know how to estimate how much longer it will run. It sounds like yours is running longer than mine did, but with such a small sample size I'm not sure what that proves. One thing I did a couple of times was start another shell and re-run the xfs_db command while xfs_fsr was running, so I could see what progress it had made. It doesn't really tell you how much time is left though, since in my case it didn't take the fragmentation down to zero. I have a theory that Cache Dirs might prevent the directories from defragmenting, but I still haven't disabled it to try. I'm curious to see how yours ends up. 2 Quote Link to comment
BRiT Posted February 7, 2016 Share Posted February 7, 2016 Before Stats: #xfs_db -r /dev/md2 xfs_db> frag actual 17087, ideal 15986, fragmentation factor 6.44% xfs_db> frag -d actual 565, ideal 473, fragmentation factor 16.28% xfs_db> frag -f actual 16522, ideal 15513, fragmentation factor 6.11% Still in process stats: #xfs_db -r /dev/md2 xfs_db> frag actual 16297, ideal 15986, fragmentation factor 1.91% xfs_db> frag -d actual 565, ideal 473, fragmentation factor 16.28% xfs_db> frag -f actual 15732, ideal 15513, fragmentation factor 1.39% Quote Link to comment
BRiT Posted February 7, 2016 Share Posted February 7, 2016 Doing some more reading on this, and it seems you can defrag a particular file if desired. Also, you can give it a duration to run and it will only run for that long, but will produce a checkpoint file in /var/tmp/ so it can resume from that point the next time it's kicked off. I think that it only defrags files, so it might not do anything at all on directories, but I'm not certain. http://archive09.linux.com/feature/141404 You can run xfs_fsr in two ways; either pass it a duration and it will loop through all your XFS filesystems, attempting to optimize the most fragmented files on each filesystem until that duration has passed, or you can explicitly defragment a specific XFS filesystem or file on an XFS filesystem. When you run xfs_fsr with a duration and it runs out of time, it stores information about what it was doing to a file in /var/tmp so that it can continue from the same point the next time it is executed with a duration. This way you can have a cron job perform a little bit of optimization every day when your machine is experiencing a period of low activity. To optimize a file, xfs_fsr creates a new copy of an existing fragmented file with fewer extents (fragments) than the original one had. Once the file contents are copied to the new file, the filesystem metadata is updated so that the new file replaces the old one. This implies that you need to have enough free space on the filesystem to store another copy of anything that you want to defragment. The free space issue extends to disk quotas as well; you cannot defragment a file if storing another complete copy of that file would exceed the disk quota of the user that owns that file. Quote Link to comment
ljm42 Posted February 7, 2016 Author Share Posted February 7, 2016 By default, xfs_fsr will work on all your XFS drives for two hours (or a duration you specify) before stopping. I think the idea is that you could put it in a cron job and have it spend a few hours a day keeping things defragmented. The problem is that "all your XFS drives" includes SSDs, and I don't want it to defrag my SSD. It is possible to pass it a file that lists only the drives you want it to defrag, but I figured it would be easier to pass a single drive on the command line. But then when you do that, it ignores any duration you try to pass it. If you need to stop it, I've read that it is safe to ctrl-c but I haven't tried it. I haven't found a way to defrag just directories. 2 Quote Link to comment
Adam64 Posted February 7, 2016 Share Posted February 7, 2016 Thanks for posting this ljm42 and thanks to all for the discussion. I tested my drives and one of them was >80%, so defragging now. Thanks! Quote Link to comment
trurl Posted February 7, 2016 Share Posted February 7, 2016 By default, xfs_fsr will work on all your XFS drives for two hours (or a duration you specify) before stopping. I think the idea is that you could put it in a cron job and have it spend a few hours a day keeping things defragmented. The problem is that "all your XFS drives" includes SSDs, and I don't want it to defrag my SSD. It is possible to pass it a file that lists only the drives you want it to defrag, but I figured it would be easier to pass a single drive on the command line. But then when you do that, it ignores any duration you try to pass it. If you need to stop it, I've read that it is safe to ctrl-c but I haven't tried it. I haven't found a way to defrag just directories. But this brings up the question: If you don't specify a drive, what does it think "all your XFS drives" means, all the sd or all the md? Quote Link to comment
ljm42 Posted February 8, 2016 Author Share Posted February 8, 2016 But this brings up the question: If you don't specify a drive, what does it think "all your XFS drives" means, all the sd or all the md? When you run it in fully automatic mode, it reads the list of drives from /etc/mtab and pulls everything that specifies XFS. Here is part of mine: /dev/md1 /mnt/disk1 xfs rw,noatime,nodiratime 0 0 /dev/md2 /mnt/disk2 xfs rw,noatime,nodiratime 0 0 /dev/md3 /mnt/disk3 xfs rw,noatime,nodiratime 0 0 /dev/sdg1 /mnt/cache xfs rw,noatime,nodiratime 0 0 I confirmed that it will let me defrag the SSD cache drive (sdg1), although I cancelled it: root@Tower:~# xfs_db -r /dev/sdg1 xfs_db> frag actual 153832, ideal 126418, fragmentation factor 17.82% xfs_db> quit root@Tower:~# xfs_fsr /dev/sdg1 /mnt/cache start inode=0 ^C root@Tower:~# But when I try to run it on the sdXX versions of the hard drives, xfs_db works and xfs_fsr doesn't: root@Tower:~# xfs_db -r /dev/sdh1 xfs_db> frag actual 902201, ideal 900454, fragmentation factor 0.19% xfs_db> quit root@Tower:~# xfs_fsr /dev/sdh1 /dev/sdh1: not fsys dev, dir, or reg file, ignoring 2 Quote Link to comment
ljm42 Posted February 8, 2016 Author Share Posted February 8, 2016 Thanks for posting this ljm42 and thanks to all for the discussion. I tested my drives and one of them was >80%, so defragging now. Thanks! Thanks! I'd be interested in hearing how much it improves (and how long it takes) Quote Link to comment
Adam64 Posted February 8, 2016 Share Posted February 8, 2016 Thanks for posting this ljm42 and thanks to all for the discussion. I tested my drives and one of them was >80%, so defragging now. Thanks! Thanks! I'd be interested in hearing how much it improves (and how long it takes) I had it run for 6 hours and it went from 80%ish to 0.44%. Very cool. Quote Link to comment
BRiT Posted February 8, 2016 Share Posted February 8, 2016 Mine's still running. # xfs_db -c frag -r /dev/md2 actual 16154, ideal 15986, fragmentation factor 1.04% # xfs_db -c 'frag -f' -r /dev/md2 actual 15589, ideal 15513, fragmentation factor 0.49% Quote Link to comment
Adam64 Posted February 8, 2016 Share Posted February 8, 2016 I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours. I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it. Has anyone else seen this? Any thoughts on how to recover? Quote Link to comment
ljm42 Posted February 8, 2016 Author Share Posted February 8, 2016 I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours. I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it. Has anyone else seen this? Any thoughts on how to recover? In my experience, if you specify a drive it ignores any duration you pass it. And 12 hours isn't anything to worry about, based on what other people are seeing. But if the system has actually locked up, that is new. Usually the best way to troubleshoot that is to look on the console for any messages. If you started the defrag from the console you can try ctrl-c to cancel it. Depending on what you see on the console you can either continue waiting or go ahead and power cycle. Quote Link to comment
Adam64 Posted February 8, 2016 Share Posted February 8, 2016 I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours. I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it. Has anyone else seen this? Any thoughts on how to recover? In my experience, if you specify a drive it ignores any duration you pass it. And 12 hours isn't anything to worry about, based on what other people are seeing. But if the system has actually locked up, that is new. Usually the best way to troubleshoot that is to look on the console for any messages. If you started the defrag from the console you can try ctrl-c to cancel it. Depending on what you see on the console you can either continue waiting or go ahead and power cycle. I started it from Putty, which seems to have hung up, so then I hooked up a monitor and keyboard to the box, but no joy -- blank screen and no reaction to a keypress. The HD light is flashing occasionally, but I would expect it to be more solid if it were defragging. Is there any magic to hooking up a monitor and keyboard? HDMI monitor and USB keyboard. Quote Link to comment
Adam64 Posted February 8, 2016 Share Posted February 8, 2016 Well... had to cycle power. BRB after my 10 hour parity check. Quote Link to comment
ljm42 Posted February 8, 2016 Author Share Posted February 8, 2016 I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours. I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it. Has anyone else seen this? Any thoughts on how to recover? In my experience, if you specify a drive it ignores any duration you pass it. And 12 hours isn't anything to worry about, based on what other people are seeing. But if the system has actually locked up, that is new. Usually the best way to troubleshoot that is to look on the console for any messages. If you started the defrag from the console you can try ctrl-c to cancel it. Depending on what you see on the console you can either continue waiting or go ahead and power cycle. I started it from Putty, which seems to have hung up, so then I hooked up a monitor and keyboard to the box, but no joy -- blank screen and no reaction to a keypress. The HD light is flashing occasionally, but I would expect it to be more solid if it were defragging. Is there any magic to hooking up a monitor and keyboard? HDMI monitor and USB keyboard. Not that I know of If you do go ahead and reboot, I'd recommend keeping the monitor plugged in so if it crashes again you'll have a chance at figuring out why. Quote Link to comment
BRiT Posted February 8, 2016 Share Posted February 8, 2016 Mine finished sometime late yesterday after I called it a night... It completely defragmented files. I'm talking 0% fragmentation! Directory fragmentation remained the same as before. Before Stats: #xfs_db -r /dev/md2 xfs_db> frag actual 17087, ideal 15986, fragmentation factor 6.44% xfs_db> frag -f actual 16522, ideal 15513, fragmentation factor 6.11% xfs_db> frag -d actual 565, ideal 473, fragmentation factor 16.28% After stats: # xfs_db -c frag -r /dev/md2 actual 16078, ideal 15986, fragmentation factor 0.57% # xfs_db -c 'frag -f' -r /dev/md2 actual 15513, ideal 15513, fragmentation factor 0.00% # xfs_db -c 'frag -d' -r /dev/md2 actual 565, ideal 473, fragmentation factor 16.28% Quote Link to comment
wgstarks Posted February 8, 2016 Share Posted February 8, 2016 I just started a drive defrag in console. Just out of curiosity, how do I know when the process is complete? I can tell it's running from the jump in cpu activity but that's about it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.