Defrag XFS array drives


ljm42

Recommended Posts

I recently defragmented my XFS array drives and I thought I'd document the process.

 

Note: defragmenting takes a long time, so if you decide to do this you should either be working directly on the console (i.e. with a keyboard and monitor attached directly to the system) or via SSH and screen (from the NerdPack plugin).  If someone has success using Shell in a Box let me know, otherwise I recommend you avoid it.

 

 

First, I disabled my monthly parity check and Dynamix File Integrity cron jobs, just to ensure they wouldn't slow things down.  I left Cache Dirs running.

 

Then I ran xfs_db on each drive to see whether they were fragmented.  Here are the commands I typed for disk3, along with some comments and the results:

 

# The -r means read-only, so for this step no changes are made
# do not use /dev/sdXX
# for 6.12 and newer use /dev/mdXp1
# for 6.11 and older use /dev/mdX
root@Tower:~# xfs_db -r /dev/md3

# "frag" shows overall fragmentation
xfs_db> frag
actual 2031, ideal 1419, fragmentation factor 30.13%

# "frag -d" shows directory fragmentation
xfs_db> frag -d
actual 29, ideal 29, fragmentation factor 0.00%

# "frag -f" shows file fragmentation
xfs_db> frag -f
actual 1317, ideal 705, fragmentation factor 46.47%

xfs_db> quit
 

 

I'm not sure what the cutoff should be, but I decided 46% file fragmentation was past it.  So I ran this to defrag it:

 

xfs_fsr -v /dev/md3
 

 

This is a 4TB drive with 3TB of data (mostly movies).  It ran for nearly 48 hours!  When I checked the results afterwards, fragmentation was improved although not as much as I expected:

 

root@Tower:~# xfs_db -r /dev/md3
xfs_db> frag
actual 1608, ideal 1419, fragmentation factor 11.75%
xfs_db> frag -d
actual 29, ideal 29, fragmentation factor 0.00%
xfs_db> frag -f
actual 894, ideal 705, fragmentation factor 21.14%
xfs_db> quit
 

 

I ran it a second time, but the results were almost identical.

 

Afterwards:

  • I ran a parity check and there were no errors, confirming that parity is not affected by this process.
  • I used the "check" feature of the Dynamix File Integrity plugin and confirmed that the files were all fine.
  • Crashplan did not detect any changes in the files either.

So the process worked, although it didn't reduce fragmentation as much as I expected it to. 

 

Open questions:

  • How often does the fragmentation level need to be checked?
  • What fragmentation % warrants a defrag?
  • At what point does fragmentation actually become an issue?  I'm guessing somewhere around 80-90%?

 

Anyway, hopefully this will help someone.

 

  • Like 3
Link to comment

A few other details...

 

I did have a minor issue related to the Dynamix File Integrity plugin, but it was easily solved.  When the defrag first started moving files, I got warnings related to the temp files it creates:

 

/bin/md5sum: /mnt/disk3/.fsr/ag3/tmp14052: No such file or directory
stat: cannot stat '/mnt/disk3/.fsr/ag3/tmp14052' : No such file or directory
stat: cannot stat '/mnt/disk3/.fsr/ag3/tmp14052' : No such file or directory
setfattr: /mnt/disk3/.fsr/ag3/tmp14052 : No such file or directory
getfattr: /mnt/disk3/.fsr/ag3/tmp14052 : No such file or directory

 

Luckily, xfs_fsr creates its temp directory in the root of the disk, so it looks like a user share.  To solve this I just went to the File Integrity Settings page and told it exclude the .fsr directory (you'll have to start the defrag before you have the option to exclude the directory)

 

I made a request for the File Integrity plugin to ignore this temp directory by default:

  https://lime-technology.com/forum/index.php?topic=44989.msg442250#msg442250

 

--

 

One of my drives had 7% directory fragmentation and no file fragmentation.  I ran xfs_fsr but it had no effect.  It is possible that Cache Dirs prevented changes to the directory structure, but I haven't looked into it.

 

Update 2/14 - I disabled Cache Dirs and re-ran it, it had no effect on directory fragmentation

  • Like 2
Link to comment

As a file archival system with the kind of write once, read many sort that a lot of user's media collections are I suspect it's not a massive issue in everyday use. 

 

But I can see it being a possible issue when the above isn't the way the array is being used. 

 

Interesting though and I'll be following this thread keenly for my own education.  Thanks for posting.

Link to comment

How did you exclude the .xfs directory in File Integrity? The only exclusion options I have to select from is my user shares. Don't see how to manually enter an exclusion.

 

Once you start the defrag, it will create the .xfs folder.  Since it is in the root, it will look like a user share and you'll be able to exclude it.  I modified the description to hopefully make that more clear.

  • Like 2
Link to comment

As a file archival system with the kind of write once, read many sort that a lot of user's media collections are I suspect it's not a massive issue in everyday use. 

 

But I can see it being a possible issue when the above isn't the way the array is being used. 

 

Interesting though and I'll be following this thread keenly for my own education.  Thanks for posting.

 

Agreed, it may be that this isn't really an issue that we need to be concerned with.  But I figured it was worth investigating now that defrag is an option. For science!

 

I was a little surprised that almost half of the files on my disk3 were fragmented.  Even so, a single fragment in a 2 hour movie is not going to cause any problems :)

  • Like 2
Link to comment

Of my 4 data drives, only 1 of them was highly fragmented but only in the directory structure portion. That 4TB disk is 53% used at 2TB and had 17% directory fragmentation but only 6% file fragmentation. This drive is only used for TV Show episodes.

 

I started the defrag late Thursday night (2016-02-04) and it's still going. I hope it finishes tonight.

 

 

Link to comment

I wasn't able to find a "% complete" anywhere, so I don't know how to estimate how much longer it will run.  It sounds like yours is running longer than mine did, but with such a small sample size I'm not sure what that proves.

 

One thing I did a couple of times was start another shell and re-run the xfs_db command while xfs_fsr was running, so I could see what progress it had made.  It doesn't really tell you how much time is left though, since in my case it didn't take the fragmentation down to zero.

 

I have a theory that Cache Dirs might prevent the directories from defragmenting, but I still haven't disabled it to try.  I'm curious to see how yours ends up.

  • Like 2
Link to comment

Before Stats:

#xfs_db -r /dev/md2

xfs_db> frag

actual 17087, ideal 15986, fragmentation factor 6.44%

xfs_db> frag -d

actual 565, ideal 473, fragmentation factor 16.28%

xfs_db> frag -f

actual 16522, ideal 15513, fragmentation factor 6.11%

 

 

Still in process stats:

#xfs_db -r /dev/md2

xfs_db> frag

actual 16297, ideal 15986, fragmentation factor 1.91%

xfs_db> frag -d

actual 565, ideal 473, fragmentation factor 16.28%

xfs_db> frag -f

actual 15732, ideal 15513, fragmentation factor 1.39%

 

Link to comment

Doing some more reading on this, and it seems you can defrag a particular file if desired. Also, you can give it a duration to run and it will only run for that long, but will produce a checkpoint file in /var/tmp/ so it can resume from that point the next time it's kicked off.

 

I think that it only defrags files, so it might not do anything at all on directories, but I'm not certain.

 

http://archive09.linux.com/feature/141404

You can run xfs_fsr in two ways; either pass it a duration and it will loop through all your XFS filesystems, attempting to optimize the most fragmented files on each filesystem until that duration has passed, or you can explicitly defragment a specific XFS filesystem or file on an XFS filesystem. When you run xfs_fsr with a duration and it runs out of time, it stores information about what it was doing to a file in /var/tmp so that it can continue from the same point the next time it is executed with a duration. This way you can have a cron job perform a little bit of optimization every day when your machine is experiencing a period of low activity.

 

To optimize a file, xfs_fsr creates a new copy of an existing fragmented file with fewer extents (fragments) than the original one had. Once the file contents are copied to the new file, the filesystem metadata is updated so that the new file replaces the old one. This implies that you need to have enough free space on the filesystem to store another copy of anything that you want to defragment. The free space issue extends to disk quotas as well; you cannot defragment a file if storing another complete copy of that file would exceed the disk quota of the user that owns that file.

Link to comment

By default, xfs_fsr will work on all your XFS drives for two hours (or a duration you specify) before stopping.  I think the idea is that you could put it in a cron job and have it spend a few hours a day keeping things defragmented.

 

The problem is that "all your XFS drives" includes SSDs, and I don't want it to defrag my SSD.  It is possible to pass it a file that lists only the drives you want it to defrag, but I figured it would be easier to pass a single drive on the command line.

 

But then when you do that, it ignores any duration you try to pass it.  If you need to stop it, I've read that it is safe to ctrl-c but I haven't tried it.

 

I haven't found a way to defrag just directories.

  • Like 2
Link to comment

By default, xfs_fsr will work on all your XFS drives for two hours (or a duration you specify) before stopping.  I think the idea is that you could put it in a cron job and have it spend a few hours a day keeping things defragmented.

 

The problem is that "all your XFS drives" includes SSDs, and I don't want it to defrag my SSD.  It is possible to pass it a file that lists only the drives you want it to defrag, but I figured it would be easier to pass a single drive on the command line.

 

But then when you do that, it ignores any duration you try to pass it.  If you need to stop it, I've read that it is safe to ctrl-c but I haven't tried it.

 

I haven't found a way to defrag just directories.

But this brings up the question: If you don't specify a drive, what does it think "all your XFS drives" means, all the sd or all the md?
Link to comment

But this brings up the question: If you don't specify a drive, what does it think "all your XFS drives" means, all the sd or all the md?

 

When you run it in fully automatic mode, it reads the list of drives from /etc/mtab and pulls everything that specifies XFS. Here is part of mine:

 

/dev/md1 /mnt/disk1 xfs rw,noatime,nodiratime 0 0
/dev/md2 /mnt/disk2 xfs rw,noatime,nodiratime 0 0
/dev/md3 /mnt/disk3 xfs rw,noatime,nodiratime 0 0
/dev/sdg1 /mnt/cache xfs rw,noatime,nodiratime 0 0

 

I confirmed that it will let me defrag the SSD cache drive (sdg1), although I cancelled it:

 

root@Tower:~# xfs_db -r /dev/sdg1
xfs_db> frag
actual 153832, ideal 126418, fragmentation factor 17.82%
xfs_db> quit
root@Tower:~# xfs_fsr /dev/sdg1
/mnt/cache start inode=0
^C
root@Tower:~# 

 

 

But when I try to run it on the sdXX versions of the hard drives, xfs_db works and xfs_fsr doesn't:

 

root@Tower:~# xfs_db -r /dev/sdh1
xfs_db> frag
actual 902201, ideal 900454, fragmentation factor 0.19%
xfs_db> quit
root@Tower:~# xfs_fsr /dev/sdh1
/dev/sdh1: not fsys dev, dir, or reg file, ignoring

 

 

  • Like 2
Link to comment

Thanks for posting this ljm42 and thanks to all for the discussion.  I tested my drives and one of them was >80%, so defragging now.

 

Thanks!

 

Thanks!  I'd be interested in hearing how much it improves (and how long it takes)

 

I had it run for 6 hours and it went from 80%ish to 0.44%.  Very cool.

Link to comment

I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours.  I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it.  Has anyone else seen this?  Any thoughts on how to recover?

Link to comment

I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours.  I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it.  Has anyone else seen this?  Any thoughts on how to recover?

 

In my experience, if you specify a drive it ignores any duration you pass it.  And 12 hours isn't anything to worry about, based on what other people are seeing.

 

But if the system has actually locked up, that is new.  Usually the best way to troubleshoot that is to look on the console for any messages.  If you started the defrag from the console you can try ctrl-c to cancel it.

 

Depending on what you see on the console you can either continue waiting or go ahead and power cycle.

Link to comment

I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours.  I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it.  Has anyone else seen this?  Any thoughts on how to recover?

 

In my experience, if you specify a drive it ignores any duration you pass it.  And 12 hours isn't anything to worry about, based on what other people are seeing.

 

But if the system has actually locked up, that is new.  Usually the best way to troubleshoot that is to look on the console for any messages.  If you started the defrag from the console you can try ctrl-c to cancel it.

 

Depending on what you see on the console you can either continue waiting or go ahead and power cycle.

 

I started it from Putty, which seems to have hung up, so then I hooked up a monitor and keyboard to the box, but no joy -- blank screen and no reaction to a keypress.  The HD light is flashing occasionally, but I would expect it to be more solid if it were defragging.

 

Is there any magic to hooking up a monitor and keyboard?  HDMI monitor and USB keyboard.

 

Link to comment

I started on my second drive that was 75% fragmented and it apparently ignored my -t 21400 as it's been running for over 12 hours.  I'm concerned that it may be locked up now as the GUI is unresponsive and I can't access SMB shares nor open another terminal to it.  Has anyone else seen this?  Any thoughts on how to recover?

 

In my experience, if you specify a drive it ignores any duration you pass it.  And 12 hours isn't anything to worry about, based on what other people are seeing.

 

But if the system has actually locked up, that is new.  Usually the best way to troubleshoot that is to look on the console for any messages.  If you started the defrag from the console you can try ctrl-c to cancel it.

 

Depending on what you see on the console you can either continue waiting or go ahead and power cycle.

 

I started it from Putty, which seems to have hung up, so then I hooked up a monitor and keyboard to the box, but no joy -- blank screen and no reaction to a keypress.  The HD light is flashing occasionally, but I would expect it to be more solid if it were defragging.

 

Is there any magic to hooking up a monitor and keyboard?  HDMI monitor and USB keyboard.

 

Not that I know of  :(  If you do go ahead and reboot, I'd recommend keeping the monitor plugged in so if it crashes again you'll have a chance at figuring out why.

Link to comment

Mine finished sometime late yesterday after I called it a night... It completely defragmented files. I'm talking 0% fragmentation! Directory fragmentation remained the same as before.

 

 

Before Stats:

#xfs_db -r /dev/md2

xfs_db> frag

actual 17087, ideal 15986, fragmentation factor 6.44%

xfs_db> frag -f

actual 16522, ideal 15513, fragmentation factor 6.11%

xfs_db> frag -d

actual 565, ideal 473, fragmentation factor 16.28%

 

After stats:

# xfs_db -c frag -r /dev/md2

actual 16078, ideal 15986, fragmentation factor 0.57%

# xfs_db -c 'frag -f' -r /dev/md2

actual 15513, ideal 15513, fragmentation factor 0.00%

# xfs_db -c 'frag -d' -r /dev/md2

actual 565, ideal 473, fragmentation factor 16.28%

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.