Fixed slow cache disk performance


Recommended Posts

Scenario:

Increasingly slow write performance of SSD cache drive

File transfers are fast for the first few seconds, then speed trails off, and often pauses before restarting.  Copying a directory with a ~3G file and a couple of smaller ones, over Gigabit LAN to cache:

 

BMYGWnV.png

 

This was actually a relatively fast example.  Usually much worse.

 

240G SSD cache drive formatted with Reiserfs many years ago

~50G of this used for Plex to store library information - many thousands of small files

~50GB for VM images, docker, other plugins

Rest used for caching of new files

 

Solution:

The thousands/millions of small files in the Plex library, as well as several years of operation, seem to have made the drive slower - fragmentation?  It seems Reiserfs does not support TRIM and so only choice is to move to another filesystem such as XFS.  The solution here may be obvious to some but thought it might be of benefit to others.

 

1) Use the webinterface to stop anything that stores data on the cache drive, e.g. docker, KVM, any other plugins.

2) Create a backup of current cache drive - we have to find which device it is

 

root@TOWER:/# df | grep /mnt/cache
/dev/sdf1       244191092    96114892  148076200  40% /mnt/cache

 

3) Backup all data on cache drive to a disk in the array with at least as much space as the size of the cache disk

 

dd if=/dev/sdf1 of=/mnt/disk1/cache.img

 

4) Stop the array - if you are connected via SSH make sure you are not inside a directory of the array otherwise webinterface could hang.  On Main tab of web interface, click on cache disk.  Change filesystem type to XFS.

 

5) Verify the image is OK by mounting it

 

mkdir /mnt/oldcache
mount /mnt/disk1/cache.img /mnt/oldcache
ls -al /mnt/oldcache 

 

6) Start the array.  On Main tab, tick box and click button to format cache disk.

 

7) Copy back old data

rsync -aAXS --info=progress2 /mnt/oldcache/ /mnt/cache

 

8) Restart docker, KVM, and other plugins and verify new cache works OK.  If working, unmount and delete cache backup

 

umount /mnt/oldcache
rm -rf /mnt/disk1/cache.img

 

16pYxJp.png

 

The Dynamix TRIM plugin can then be used to (hopefully) keep the drive running fast:

https://lime-technology.com/forum/index.php?topic=36543.0

Link to comment

(1) Fragmentation only affects HDD speed. It doesn't affect SSD.

The reason behind the slow down of SSD over time is more related to lack of TRIM support. Wikipedia has a pretty good explanation as to why.

 

(2) I have no doubt your procedure will improve speed because of 2 reasons. (1) it involves a format (as far as I know, unRAID format does overwrite all data with zero). This is the same method that people would do to restore SSD performance before TRIM became the standard (or where it's not possible e.g. in a RAID0 arrangement). And (2) as you mentioned, TRIM is supported in XFS which will prevent future degradation.

 

(3) Slow transfer speed over network can be due to other reasons beside a slow SSD - e.g. NIC offloading functionality. I think it might be better to explore such fixes before going down any route involving formating (which is easy if you know what you are doing but a mistake can lead to irrepairable damage).

Link to comment

In my case I had exhausted all other options including NIC offloading on the Windows box the screenshots were from.  I figured others with ReiserFS cache drives may be in the same situation, and this post was designed to help people avoid irreparable damage.

 

Perhaps unraid webinterface could mention that ReiserFS is not a suitable filesystem for cache disks.

Link to comment
(1) it involves a format (as far as I know, unRAID format does overwrite all data with zero).
No. Unraid formats the disk exactly like any other linux distro, it writes a new table of contents with only the root entry. Data beyond that point is untouched.

 

Overwriting with zero is a function of either preclear, or clearing when adding a disk to an already parity protected array, and is never needed with a cache device. After the disk is cleared and added to the array, then you format it. A freshly formatted disk isn't cleared any more, so can't be added to the parity protected array without updating parity.

 

Formatting will take care of fragmentation, but since like you said, fragmentation isn't the issue with SSD's, it won't help. Changing the format to an SSD aware file system type and enabling SSD maintenance with that filesystem is what is needed.

Link to comment

Out of curiosity what SSD are you using? 

 

I've found TLC SSDs write speed degrades much more without trim, including Samsung 750/850 EVO and 850 Pro, writes get as low as 25MB/s, MLC models hold up much better, maintaining close to normal speed, Trim should be used, but it's not always possible, like when the SSD is part of the array.

 

Link to comment

Formatting will take care of fragmentation, but since like you said, fragmentation isn't the issue with SSD's, it won't help. Changing the format to an SSD aware file system type and enabling SSD maintenance with that filesystem is what is needed.

Fragmentation isn't the only thing. It's marking the cells that are empty as empty. A zero-ing format will by default mark all cells that are empty as empty (since they are all zero-ed).

I have tested this a long time ago when SSD first came out and 64GB model was "top range"  ::) A non-quick format improved a saturated SSD performance - which I only later found out while researching TRIM as to why that is the case.

Link to comment

Out of curiosity what SSD are you using? 

 

EVO 840

 

Fragmentation isn't the only thing. It's marking the cells that are empty as empty. A zero-ing format will by default mark all cells that are empty as empty (since they are all zero-ed).

I have tested this a long time ago when SSD first came out and 64GB model was "top range"  ::) A non-quick format improved a saturated SSD performance - which I only later found out while researching TRIM as to why that is the case.

 

I don't think unraid did a "non quick format" as it completed in seconds.  Yet the performance increase was also instant - back to maxing out gigabit ethernet before I issued a TRIM command.  Would this suggest there was more wrong with the original ReiserFS filesytsem than simply needing to be TRIMed?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.