wuftymerguftyguff Posted March 19, 2021 Share Posted March 19, 2021 (edited) Hi, Running 6.9.1. I have some repeatable behavior that I can't explain and I would appreciate your input. I typically run my appdata, system and domains shares with cache set to prefer. I recently decided to move all these shares back to the array in preparation for some work I am doing, so I followed the wisdom of the FAQ, stopped docker and vms in settings, changed my shares to Cache:Yes and ran the mover. It ran for a while and moved almost all the data from the cache to the array. However it did not totally empty the cache. There are a few "files" remaining. The cache reports it has ~3.5G still in use. I have had a poke around. and the issues seem to be related to files belonging to the following dockers letsencrypt binhex-krusader swag My first observation is that the space calculation seems weird as there are a lot less "files" in there than 3.5GB. If I take swag as my example I notice that this docker makes use of symlinks. So the problem path for swag is a broken symlink that does not exist. root@cosmos:/mnt/cache/appdata/swag/keys# ls -l total 0 lrwxrwxrwx 1 nobody users 38 Mar 15 21:31 letsencrypt -> ../etc/letsencrypt/live/obfuscated-domain.co.uk When I go and poke into the other problem dockers they also all seem to have broken symlinks. So these 3 "problem" dockers all make use of relative symlinks in their docker volumes. So my question is this. Does the mover copy symlinks as links? Or does it follow links? It looks to me like maybe it is trying to follow links, but as the target has already been moved. This is backed up my the fact that the target that the symlink points to exists on the array. (not that that really matters a broken symlink should be copied as a broken symlink anyway, even if the target does not exist) It has broken the dockers involved here (not that i really care, I can sort them out), but I am worried that the mover seems to be struggling with relative symlinks. between the cache and the array. If I use Unbalance plugin, or use rsync myself then the move works as I would expect. Anyone else seeing this or anything like it? Thanks in advance for your time and efforts. Diags attached for your grepping pleasure. UPDATE: Exactly same behaviour when using the mover to get back onto the cache, same 3 dockers, same issues with symlinks. cosmos-diagnostics-20210319-0912.zip Edited March 19, 2021 by wuftymerguftyguff UPDATE WITH FURTHER DETAILS Quote Link to comment
ChatNoir Posted March 21, 2021 Share Posted March 21, 2021 Now I am afraid that we got derailed from the initial questions of @wuftymerguftyguff about mover, symlink and some dockers. I cannot help on this but hopfully this will get the topic back on track and on top. Quote Link to comment
wuftymerguftyguff Posted March 21, 2021 Author Share Posted March 21, 2021 Well, as there were no replies, there wasn’t much to de distracted from!At 6.9.1 does mover deal with symlinks in shares properly at all? Maybe it not just a docker thing?Sent from my iPhone using Tapatalk Quote Link to comment
John_M Posted March 23, 2021 Share Posted March 23, 2021 (edited) On 3/21/2021 at 5:37 PM, wuftymerguftyguff said: Well, as there were no replies, there wasn’t much to de distracted from! At 6.9.1 does mover deal with symlinks in shares properly at all? Maybe it not just a docker thing? It looks like your thread was hijacked. Yes, the mover treats symlinks as links. Remember though that in the case of your example root@cosmos:/mnt/cache/appdata/swag/keys# ls -l total 0 lrwxrwxrwx 1 nobody users 38 Mar 15 21:31 letsencrypt -> ../etc/letsencrypt/live/obfuscated-domain.co.uk the object to which your symlink points has already been moved to the array so it will not be seen at /mnt/cache/appdata/swag/etc/letsencrypt/live/obfuscated-domain.co.uk but at /mnt/diskN/appdata/swag/etc/letsencrypt/live/obfuscated-domain.co.uk where diskN is the array disk to where it was moved. If you cd /mnt/user/appdata/swag/keys ls -l your symlink should no longer show as broken. Edited March 23, 2021 by John_M typo Quote Link to comment
wuftymerguftyguff Posted March 23, 2021 Author Share Posted March 23, 2021 Hi, Thanks for your attention. My problem is that I was trying to empty the cache. VMs and dockers were stopped, the mover should have moved everything shouldn’t it? It should have moved the symlink as a symlink shouldn’t it? Or do I have a fundamental gap in my understanding here? Quote Link to comment
John_M Posted March 23, 2021 Share Posted March 23, 2021 2 hours ago, wuftymerguftyguff said: VMs and dockers were stopped, the mover should have moved everything shouldn’t it? It should have moved the symlink as a symlink shouldn’t it? Yes it should and yes it should. You have file system corruption on Disk 1 though and that's probably the cause of the problem. You need to stop the array and restart it in Maintenance mode and run a file system check on Disk 1. Mar 18 21:36:58 cosmos kernel: XFS (md1): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x600df19f dinode Mar 18 21:36:58 cosmos kernel: XFS (md1): Unmount and run xfs_repair Quote Link to comment
wuftymerguftyguff Posted March 23, 2021 Author Share Posted March 23, 2021 OK, thanks for the spot. I had certainly not noticed that. My appdata was back on the cache (with my manual interventions) I restarted in maint. mode and ran xfs_repair -v /dev/md0 This fixed some dir entries and completed. I restarted the array normally. Rebooted to clear out my logs I have now set about trying to recreate my problem by setting the appdata share to "Yes" and the mover is running and is draining my cache to drive 3 by the look of things. I will update this thread when it completes. Thanks again for your attention. Quote Link to comment
wuftymerguftyguff Posted March 24, 2021 Author Share Posted March 24, 2021 Update, The move of app data share from the cache completed and left LOTS of symlinks and dirs behind in the cache. This is really starting to look like the moved is not behaving as I understand that it should. I now have the problem causing issues for 4 dockers (plex, letsencrypt,swag and binhex-krusader) The ONLY things remaining in the cache for appdata after the mover has finished are broken symlinks and the directories containing them. This is a count of broken links under appdata in in the cache root@cosmos:/mnt/cache/appdata# find . -xtype l | wc -l 33442 This is a count of things that are NOT a directory or a broken link under appdata in the cache. oot@cosmos:/mnt/cache/appdata# find . ! -xtype l,d | wc -l 0 It looks like the mover is NOT treating these links as links and copying them as is, it seems to be trying to follow the links and in the case where the target has already been moved this fails and the links and structure remain on the source. Updated diags attached. As this appears to be a generic problem relating to the mover and symlinks, not anything to do with docker and appdata specifically then I will work on a simpler test case that we can us from now on. That will allow me to stop messing with my docker and VM workload. cosmos-diagnostics-20210324-0946.zip Quote Link to comment
John_M Posted March 24, 2021 Share Posted March 24, 2021 You could enable Mover logging on the Settings -> Scheduler page, in the Mover Settings section. It's quite verbose so it's off by default. Run the Mover and grab new diagnostics. Quote Link to comment
trurl Posted March 26, 2021 Share Posted March 26, 2021 On 3/23/2021 at 3:44 PM, John_M said: It looks like your thread was hijacked. hijack split 1 Quote Link to comment
limetech Posted March 26, 2021 Share Posted March 26, 2021 On 3/24/2021 at 2:52 AM, wuftymerguftyguff said: Update, The move of app data share from the cache completed and left LOTS of symlinks and dirs behind in the cache. This is really starting to look like the moved is not behaving as I understand that it should. I now have the problem causing issues for 4 dockers (plex, letsencrypt,swag and binhex-krusader) The ONLY things remaining in the cache for appdata after the mover has finished are broken symlinks and the directories containing them. This is a count of broken links under appdata in in the cache root@cosmos:/mnt/cache/appdata# find . -xtype l | wc -l 33442 This is a count of things that are NOT a directory or a broken link under appdata in the cache. oot@cosmos:/mnt/cache/appdata# find . ! -xtype l,d | wc -l 0 It looks like the mover is NOT treating these links as links and copying them as is, it seems to be trying to follow the links and in the case where the target has already been moved this fails and the links and structure remain on the source. Updated diags attached. As this appears to be a generic problem relating to the mover and symlinks, not anything to do with docker and appdata specifically then I will work on a simpler test case that we can us from now on. That will allow me to stop messing with my docker and VM workload. cosmos-diagnostics-20210324-0946.zip 90.66 kB · 1 download The 'mover' does indeed move symlinks correctly, well I ran a test script and it worked. It will not move any files that are "in use" meaning opened by some process. I suggest you turn on mover logging (Settings/Scheduler) and see what shows up in the system log. Quote Link to comment
wuftymerguftyguff Posted March 27, 2021 Author Share Posted March 27, 2021 Hi, Updated diagnostics attached. (with mover logging in place) I emptied my cache again, by stopping all vms and dockers and their respective services, getting out of all the user shares, seeing the appdata from Cache:Prefer to cache:Yes and running the mover. I had a look at the mover script, it seems to be using find in depth mode, and then handing this off to move that is not a script so i can't see what it i really doing (without work) Once again all that remains under appdata is broken symlinks. The dockers still work then they access things via /mnt/user mount but fail otherwise. If I copy thing smanually using rsync the links move an everything is happy. I think that as a test case you could try using a default install of binhex-krusader docker. This one seems to have the problem every single time, whether moving from cache to array or back again. Thanks again for your time Jeff cosmos-diagnostics-20210327-2335.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.