Problems with the cache, mover, appdata, docker and symlinks.


Recommended Posts

Hi,

 

Running 6.9.1.

 

I have some repeatable behavior that I can't explain and I would appreciate your input.

 

I typically run my appdata, system and domains shares with cache set to prefer.

 

I recently decided to move all these shares back to the array in preparation for some work I am doing, so I followed the wisdom of the FAQ, stopped docker and vms in settings, changed my shares to Cache:Yes and ran the mover.

 

It ran for a while and moved almost all the data from the cache to the array.  However it did  not totally empty the cache.

 

There are a few "files" remaining. The cache reports it has ~3.5G still in use.

 

I have had a poke around. and the issues seem to be related to files belonging to the following dockers

 

letsencrypt
binhex-krusader
swag

 

My first observation is that the space calculation seems weird as there are a lot less "files" in there than 3.5GB.

 

If I take swag as my example  I notice that this docker makes use of symlinks.

 

So the problem path for swag is a broken symlink that does not exist.

 

root@cosmos:/mnt/cache/appdata/swag/keys# ls -l
total 0
lrwxrwxrwx 1 nobody users 38 Mar 15 21:31 letsencrypt -> ../etc/letsencrypt/live/obfuscated-domain.co.uk

 

 

When I go and poke into the other problem dockers they also all seem to have broken symlinks.

 

So these 3 "problem" dockers all make use of relative symlinks in their docker volumes.

 

So my question is this.  Does the mover copy symlinks as links?  Or does it follow links?

It looks to me like maybe it is trying to follow links, but as the target has already been moved.

This is backed up my the fact that the target that the symlink points to exists on the array. (not that that really matters a broken symlink should be copied as a broken symlink anyway, even if the target does not exist)

 

 

It has broken the dockers involved here (not that i really care, I can sort them out), but  I am worried that the mover seems to be struggling with relative symlinks. between the cache and the array.

If I use Unbalance plugin, or use rsync myself then the move works as I would expect.

 

Anyone else seeing this or anything like it?

 

Thanks in advance for your time and efforts.

 

Diags attached for your grepping pleasure.

 

UPDATE:  Exactly same behaviour when using the mover to get back onto the cache, same 3 dockers, same issues with symlinks.

 

 

cosmos-diagnostics-20210319-0912.zip

Edited by wuftymerguftyguff
UPDATE WITH FURTHER DETAILS
Link to comment
On 3/21/2021 at 5:37 PM, wuftymerguftyguff said:

Well, as there were no replies, there wasn’t much to de distracted from!

At 6.9.1 does mover deal with symlinks in shares properly at all? Maybe it not just a docker thing?

 

It looks like your thread was hijacked.

 

Yes, the mover treats symlinks as links. Remember though that in the case of your example

 

root@cosmos:/mnt/cache/appdata/swag/keys# ls -l
total 0
lrwxrwxrwx 1 nobody users 38 Mar 15 21:31 letsencrypt -> ../etc/letsencrypt/live/obfuscated-domain.co.uk

 

the object to which your symlink points has already been moved to the array so it will not be seen at /mnt/cache/appdata/swag/etc/letsencrypt/live/obfuscated-domain.co.uk

but at

/mnt/diskN/appdata/swag/etc/letsencrypt/live/obfuscated-domain.co.uk

where diskN is the array disk to where it was moved. If you 

 

cd  /mnt/user/appdata/swag/keys
ls -l

 

your symlink should no longer show as broken.

 

Edited by John_M
typo
Link to comment
2 hours ago, wuftymerguftyguff said:

VMs and dockers were stopped, the mover should have moved everything shouldn’t it?

 

It should have moved the symlink as a symlink shouldn’t it?

 

Yes it should and yes it should. You have file system corruption on Disk 1 though and that's probably the cause of the problem. You need to stop the array and restart it in Maintenance mode and run a file system check on Disk 1.

 

Mar 18 21:36:58 cosmos kernel: XFS (md1): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x600df19f dinode
Mar 18 21:36:58 cosmos kernel: XFS (md1): Unmount and run xfs_repair

 

Link to comment

OK, thanks for the spot.  I had certainly not noticed that.

My appdata was back on the cache (with my manual interventions)

 

I restarted in maint. mode and ran 

 

xfs_repair -v /dev/md0

 

This fixed some dir entries and completed.

 

I restarted the array normally.

 

Rebooted to clear out my logs

 

I have now set about trying to recreate my problem by setting the appdata share to "Yes" and the mover is running and is draining my cache to drive 3 by the look of things.

 

I will update this thread when it completes.

 

Thanks again for your attention.

Link to comment

Update,

 

The move of app data share from the cache completed and left LOTS of symlinks and dirs behind in the cache.

 

This is really starting to look like the moved is not behaving as I understand that it should.

 

I now have the problem causing issues for 4 dockers (plex, letsencrypt,swag and binhex-krusader)

 

The ONLY things remaining in the cache for appdata after the mover has finished are broken symlinks and the directories containing them. 

 

This is a count of broken links under appdata in in the cache

 

root@cosmos:/mnt/cache/appdata# find . -xtype l | wc -l
33442

 

This is a count of things that are NOT a directory or a broken link under appdata in the cache.

 

oot@cosmos:/mnt/cache/appdata# find . !  -xtype l,d | wc -l
0

 

It looks like the mover is NOT treating these links as links and copying them as is, it seems to be trying to follow the links and in the case where the target has already been moved this fails and the links and structure remain on the source.

 

Updated diags attached.

 

As this appears to be a generic problem relating to the mover and symlinks, not anything to do with docker and appdata specifically then I will work on a simpler test case that we can us from now on.  That will allow me to stop messing with my docker and VM workload.

 

 

 

cosmos-diagnostics-20210324-0946.zip

Link to comment
On 3/24/2021 at 2:52 AM, wuftymerguftyguff said:

Update,

 

The move of app data share from the cache completed and left LOTS of symlinks and dirs behind in the cache.

 

This is really starting to look like the moved is not behaving as I understand that it should.

 

I now have the problem causing issues for 4 dockers (plex, letsencrypt,swag and binhex-krusader)

 

The ONLY things remaining in the cache for appdata after the mover has finished are broken symlinks and the directories containing them. 

 

This is a count of broken links under appdata in in the cache

 


root@cosmos:/mnt/cache/appdata# find . -xtype l | wc -l
33442

 

This is a count of things that are NOT a directory or a broken link under appdata in the cache.

 


oot@cosmos:/mnt/cache/appdata# find . !  -xtype l,d | wc -l
0

 

It looks like the mover is NOT treating these links as links and copying them as is, it seems to be trying to follow the links and in the case where the target has already been moved this fails and the links and structure remain on the source.

 

Updated diags attached.

 

As this appears to be a generic problem relating to the mover and symlinks, not anything to do with docker and appdata specifically then I will work on a simpler test case that we can us from now on.  That will allow me to stop messing with my docker and VM workload.

 

 

 

cosmos-diagnostics-20210324-0946.zip 90.66 kB · 1 download

 

The 'mover' does indeed move symlinks correctly, well I ran a test script and it worked.  It will not move any files that are "in use" meaning opened by some process.   I suggest you turn on mover logging (Settings/Scheduler) and see what shows up in the system log.

 

Link to comment

Hi,

 

Updated diagnostics attached. (with mover logging in place)

 

I emptied my cache again, by stopping all vms and dockers and their respective services, getting out of all the user shares, seeing the appdata from Cache:Prefer to cache:Yes and running the mover.

 

I  had a look at the mover script, it seems to be using find in depth mode, and then handing this off to move that is not a script so i can't see what it i really doing (without work)

 

Once again all that remains under appdata is broken symlinks.

 

The dockers still work then they access things via /mnt/user mount but fail otherwise. 

 

If I copy thing smanually using rsync the links move an everything is happy.

 

I think that as a test case you could try using a default install of binhex-krusader docker.  This one seems to have the problem every single time, whether moving from cache to array or back again.

 

Thanks again for your time

 

Jeff

 

 

cosmos-diagnostics-20210327-2335.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.