Guide: How To Use Rclone To Mount Cloud Drives And Play Files


DZMM

Recommended Posts

Anyone noticed performance of mergerfs has gone down really badly?

 

I noticed my mount suddenly stops saturating gigabit.

So I did a dd test from the console which yielded 45MB/s on the mergerfs mount and 379MB/s writing directly to the same UD drive (that mergerfs was writing to).

 

I don't remember the mount to ever be that slow.

Edited by testdasi
  • Like 1
Link to comment
10 hours ago, trapexit said:

mergerfs hasn't changed in any significant way in 5+ months. In fact I only just released a new version a 2 weeks ago and had no default behavior changes from the release prior.

@trapexit: Can you think of any reasons that would cause write to mergerfs mounts to be very slow? Or maybe some tweaks to improve write speed?

It seems the slow speed only affects write to mergerfs mount. Writing direct to source folder is more than 5 times faster.

My mount command is below. I tried various combinations of options but none offers any improvement.

mergerfs /mnt/disks/ssd860evo_2J/backup:/mnt/disks/ssd860evo_6T/backup /mnt/cache/backup/mergerfs -o rw,async_read=false,use_ino,allow_other,func.getattr=newest,category.action=all,category.create=mfs,cache.files=partial,dropcacheonclose=true,minfreespace=32G

@DZMM: are you able to do a quick write test to see what you get?

 

Edited by testdasi
Link to comment
32 minutes ago, testdasi said:

@DZMM: are you able to do a quick write test to see what you get?

It's slower for me as well.  I did a few tests (just copied a big tar file in ssh) to my 'local' mergerfs:

  1. Array-2-Array (Disk 2--> Disk 1): 108MB/s
  2. Array-2-SSD (Disk 2--> MX500): 188MB/s
  3. Array-2-Mergerfs (Disk 2--> Disk 1): 84MB/s

1 and 3 should be roughly the same. 

 

Separately, I need to work out and fix what's wrong with my mergerfs command as #3 was supposed to write to my SSD not my array:

mergerfs /mnt/disks/ud_mx500/local:/mnt/user/local /mnt/user/mount_mergerfs/local -o rw,async_read=false,use_ino,allow_other,func.getattr=newest,category.action=all,category.create=lus,cache.files=partial,dropcacheonclose=true,moveonenospc=true,minfreespace=150G

 

Link to comment

Outside what is described in the performance section of the docs I really don't have anything else to add without additional information.

 

https://github.com/trapexit/mergerfs#performance

 

Clearly something has changed in your setup because mergerfs really hasn't. You'll need to narrow that down. Has the UnRAID kernel changed in this time? Hardware? There are a lot of things that effect performance.

Link to comment
1 hour ago, DZMM said:

It's slower for me as well.  I did a few tests (just copied a big tar file in ssh) to my 'local' mergerfs:

  1. Array-2-Array (Disk 2--> Disk 1): 108MB/s
  2. Array-2-SSD (Disk 2--> MX500): 188MB/s
  3. Array-2-Mergerfs (Disk 2--> Disk 1): 84MB/s

1 and 3 should be roughly the same. 

 

Separately, I need to work out and fix what's wrong with my mergerfs command as #3 was supposed to write to my SSD not my array:

 

Have you updated to 6.8.3?

Link to comment

No - I'm on 6.8.2 DVB.  I changed my mergerfs command to: 

 

mergerfs /mnt/disks/ud_mx500/local:/mnt/user/local /mnt/user/mount_mergerfs/local -o rw,async_read=false,use_ino,allow_other,func.getattr=newest,category.action=all,category.create=ff,cache.files=partial,dropcacheonclose=true,moveonenospc=true,minfreespace=100G

 

and retested.  Copy is now going to SSD as desired and now I get:

 

2. Array-2-SSD (Disk 2--> MX500): 184MB/s

3. Array-2-Mergerfs (Disk 2--> MX500): 182MB/s

 

I think this was a better test than before as my Disk 1 is doing lots of other things, which I think is why there was such a slow down before. 

 

Maybe something's changed in 6.8.3??

Edited by DZMM
  • Thanks 1
Link to comment
27 minutes ago, trapexit said:

Why are you setting async_read=false? You appear to be using local filesystems.

Thanks for spotting that - changed:

 

mergerfs /mnt/disks/ud_mx500/local:/mnt/user/local /mnt/user/mount_mergerfs/local -o rw,async_read=true,use_ino,allow_other,func.getattr=newest,category.action=all,category.create=ff,cache.files=partial,dropcacheonclose=true,moveonenospc=true,minfreespace=100G

While you're here, can I check the command above is doing what I want please.  I want files to go to my mx500 SSD as long as there's 100GB free, if not write to the array - /mnt/user/local.  Have I got it right?  #4 on your github confused me a bit about setting minfreespace to the size of the largest cache drive

 

Quote

2. The 'cache' pool should have the cache drives listed first.

3. The best create policies to use for the 'cache' pool would probably be ff, epff, lfs, or eplfs. The latter two under the assumption that the cache drive(s) are far smaller than the backing drives. If using path preserving policies remember that you'll need to manually create the core directories of those paths you wish to be cached. Be sure the permissions are in sync. Use mergerfs.fsck to check / correct them. You could also tag the slow drives as =NC though that'd mean if the cache drives fill you'd get "out of space" errors.

4. Enable moveonenospc and set minfreespace appropriately. Perhaps setting minfreespace to the size of the largest cache drive.

 

Link to comment

minfreespace is a global value so sure that'd setup work. Though it means that when both fill to the point of having only 100GB free it will return it's out of space.

 

I could be more clear there but the idea is that you have 2 pools and regularly are syncing files over to the slower drives. Setting minfreespace on the "slow" drive pool to the largest cache drive means that worse case you should be able to fit a full drive worth of content onto the slow pool. I'll update the docs to make it more clear.

 

  • Thanks 1
Link to comment

So after a lot of testings - all to a Crucial MX300 SSD (8GB test file at 1M blocksize)

  • I noticed dd test the way recommended on github acts rather strange. It reduces write speed a lot (down to 50MB/s on mergerfs mount, 170MB/s direct) but actual write is done in burst at 500MB/s with long waits in between. I redid the test on an NMVe and observed similar behaviours. dsync appears to be the parameter causing this.
    • I can improve dd with dsync performance to 100MB/s by adding cache.writeback (which requires cache.files NOT =off).
    • cache.files=off is 50MB/s regardless of any other tuning
  • Now testing over the network (SMB) from NVMe (passed through to a VM) to mergerfs mount
    • cache.files=off: 150-170MB/s
    • cache.files=off + threads=16: 190-250MB/s
    • cache.files=partial: 60-90MB/s
    • cache.files=partial + cache.writeback: around 125MB/s
  • Note1 : Adding threads=16 doesn't improve performance with cache.files=partial regardless of writeback
  • I did 64GB dd test without dsync (64GB so less RAM cache impact due to file size > free RAM) and all 4 sets of parameters yielded similar average speed (which matches average actual write speed).

So I'm sticking to cache.files=off + threads=16 for my all-local mounts.

Edited by testdasi
  • Like 1
Link to comment

Caching can lead to weird issues on some systems depending on the workload which is why I suggest turning it off normally. I don't know what kernel you're running but there have been some funny stuff lately.

 

I can remove that example. These tests are not trivial. Especially on an in use system. There are a lot of variables involved.

  • Like 1
Link to comment

The other thing that can slow down data transfers when using file caching is extended attributes. Did anyone try disabling them? xattr=nosys

 

The kernel unfortunately doesn't cache positive or negative responses for getxattr or listxattr so mergerfs can get hammered with security.capability requests *every single write*. It completely destroys performance on most systems.

 

https://github.com/trapexit/mergerfs#xattr

 

Perhaps I can go into more detail in the docs as to why this might be a problem but it is mentioned in the performance testing section.

  • Like 1
Link to comment

@DZMM, @jrdnlc

Decided to play around with the discord notifications.

Got something working and put in a pull request.

 

Had to rewrite a clean up a bit of it before adding it to your script but it appears to be working as intended (reports transfer number, rate, amount,, ect). I'm sure there is room for improvement like displaying transferred file names would be awesome!

 

 Couldn't get error reporting but didn't go digging too deep. Just scrapped that bit rather than pulling my hair out over trivial things.

 

Credit to no5tyle or SenPaiBox or whoever wrote it. 

image.png.e1cba4523be6a8c97e0fd006cfe1a7ce.png

Edited by watchmeexplode5
  • Like 2
Link to comment

I'm trying to setup the Service Accounts for uploading. How does this work when you have multiple team drives for different storage. So 1 for media, 1 for backups, 1 for cloud storage, etc. Would you then create multiple upload scripts with each it's own project and SA's?

Link to comment
14 minutes ago, Kaizac said:

I'm trying to setup the Service Accounts for uploading. How does this work when you have multiple team drives for different storage. So 1 for media, 1 for backups, 1 for cloud storage, etc. Would you then create multiple upload scripts with each it's own project and SA's?

You only need 1 project.  SAs are associated with your google account, so they can be shared between teamdrives if you want to.

 

Of the 500 or so I created, I assign 16 to each upload script (sa_tdrive1.json ----> sa_tdrive16.json, sa_cloud1.json ----> sa_cloud16.json etc etc)   - don't need that many, but means I've got enough to saturate a gigabit line if I need to.  All you have to do is rename the file, so you might as well assign 16 to each script.

 

If you want to reduce the number of scripts, you could do what I've done:

 

1. I've added the additional rclone mounts as extra mergerfs locations, so that I only have one master mergerfs share for say teamdrive1, td2 etc etc - saves a bit of ram

2. I have one upload moving local files to teamdrive1 - saves a bit of ram and easier to manage bandwidth

3. overnight I do a server side move from td1-->td2, td1-->td3 etc etc for the relevant folders - limited ram and no bandwidth hit as done server-side

4. all files still accessible to mergerfs share in #1 - files are just picked up from their respective rclone mounts, rather than local or the td1 mount

 

 

Edited by DZMM
Link to comment
45 minutes ago, DZMM said:

You only need 1 project.  SAs are associated with your google account, so they can be shared between teamdrives if you want to.

 

Of the 500 or so I created, I assign 16 to each upload script (sa_tdrive1.json ----> sa_tdrive16.json, sa_cloud1.json ----> sa_cloud16.json etc etc)   - don't need that many, but means I've got enough to saturate a gigabit line if I need to.  All you have to do is rename the file, so you might as well assign 16 to each script.

 

If you want to reduce the number of scripts, you could do what I've done:

 

1. I've added the additional rclone mounts as extra mergerfs locations, so that I only have one master mergerfs share for say teamdrive1, td2 etc etc - saves a bit of ram

2. I have one upload moving local files to teamdrive1 - saves a bit of ram and easier to manage bandwidth

3. overnight I do a server side move from td1-->td2, td1-->td3 etc etc for the relevant folders - limited ram and no bandwidth hit as done server-side

4. all files still accessible to mergerfs share in #1 - files are just picked up from their respective rclone mounts, rather than local or the td1 mount

 

 

I have no idea how you manage to do all those 4 steps. Care to share some parts of those scripts/merger commands?

Link to comment
1 hour ago, Kaizac said:

1. I've added the additional rclone mounts as extra mergerfs locations, so that I only have one master mergerfs share for say teamdrive1, td2 etc etc - saves a bit of ram

In the script.  Mount your other tdrives as normal, but enter 'ignore' for the mergerfs location, so that you don't get a corresponding mergerfs mount.  Then for the merged mount add the extra locations:

 

# OPTIONAL SETTINGS

# Add extra paths to mergerfs mount in addition to LocalFilesShare
LocalFilesShare2="/mnt/user/mount_rclone/gdrive_media_vfs"
LocalFilesShare3="/mnt/user/mount_rclone/backup_vfs"
LocalFilesShare4="ignore"

Above you can see I've added my music and backup teamdrives to my main plex teamdrive.

 

Then run your upload as usual against this mergerfs mount.

 

1 hour ago, Kaizac said:

3. overnight I do a server side move from td1-->td2, td1-->td3 etc etc for the relevant folders - limited ram and no bandwidth hit as done server-side

4. all files still accessible to mergerfs share in #1 - files are just picked up from their respective rclone mounts, rather than local or the td1 mount

 

I worked out what the goobledygook name was in my crypts e.g. if 'music' is crazy_folder_name, it should match up in both teamdrives if you've used the same passwords.  You have to use the encrypted remotes to do all server side:

rclone move tdrive:crypt/crazy_folder_name gdrive:crypt/crazy_folder_name --user-agent="transfer" -vv --buffer-size 512M --drive-chunk-size 512M --tpslimit 8 --checkers 8 --transfers 4 --order-by modtime,ascending --exclude *fuse_hidden* --exclude *_HIDDEN --exclude .recycle** --exclude .Recycle.Bin/** --exclude *.backup~* --exclude *.partial~* --drive-stop-on-upload-limit --delete-empty-src-dirs

I probably don't need all the options, just copied from the main script.  Transfer flies at an insane speed and is over in seconds.  You do need --drive-stop-on-upload-limit as it abides by the 750GB/day limit.  If you need to move more daily, then rotate service accounts by just repeating the command:

 

rclone move tdrive:crypt/crazy_folder_name gdrive:crypt/crazy_folder_name --drive-service-account-file=$ServiceAccountDirectory/SA1.json

rclone move tdrive:crypt/crazy_folder_name gdrive:crypt/crazy_folder_name --drive-service-account-file=$ServiceAccountDirectory/SA2.json

rclone move tdrive:crypt/crazy_folder_name gdrive:crypt/crazy_folder_name --drive-service-account-file=$ServiceAccountDirectory/SA3.json

etc etc

Add as many as you need.

Edited by DZMM
Link to comment

I've tried finding the final consensus in this topic, but it's becoming a bit too large for easy finding. I've created 100 service accounts now, added them to my teamdrives.

 

How should I now setup my rclone remote? I should only need 2 right (1 drive and 1 crypt of that drive)? And should I set it up with it's own client id/secret when using SA's. According to your github it seems like I just create a remote with rclone's own ID and secret, so no defining on my side.

Link to comment
1 hour ago, DZMM said:

That doesn't answer my question unfortunately. In your readme you mention this:

 

Quote

Or, like this if using service accounts:

 

[gdrive]
type = drive
scope = drive
team_drive = TEAM DRIVE ID
server_side_across_configs = true

 

[gdrive_media_vfs]
type = crypt
remote = gdrive:crypt
filename_encryption = standard
directory_name_encryption = true
password = PASSWORD1
password2 = PASSWORD2

 

If you need help doing this, please consult the forum thread above.

It is advisable to create your own client_id to avoid API bans. More Details

So it seems in your example you don't configure your client id and password. But then later on you mention you do need it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.