Cessquill Posted December 6, 2017 Share Posted December 6, 2017 Hi - is there a known phenomenon where a cache drive might report being full one minute and 160GB free the next? Specifically when writing a lot of small files? I have been restoring my appdata folder, which contains a lot of small Plex files (about 1,000,000). My appdata folder totals about 20GB. I was initially trying to do this manually from a Windows box (since it was only the Plex docker I needed to repair), but copying the appdata files from there kept returning a "drive full" message every couple of minutes. A few seconds later, plenty of space free. In the end I restored from the CA backup plugin, which seemed to work, but generated a couple of the same errors. I'm running the latest rc, have a Crucial 250GB SSD cache, and can look into diagnostics when I get back if it's not a schoolboy error. I'm suspecting it's a known condition from a perfect storm of actions, since there's no way a drive can physically fill up and empty so quick. Quote Link to comment
BRiT Posted December 6, 2017 Share Posted December 6, 2017 If its btrfs then its possible it might be due to housekeeping items. Lookup btrfs balance. Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 20 hours ago, BRiT said: If its btrfs then its possible it might be due to housekeeping items. Lookup btrfs balance. Ahh - it's BTRFS, yes. Good call... ...appears that the size, although being 250 GB in the description is 75.5 GB in the report. Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 That looks wrong, post your diagnostics. Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 (edited) Thanks - see attached. I'm guessing that there's going to be a lot of docker errors due to no space, but we'll see. Downloaded via mobile connected over VPN, so I hope it's OK. I was thinking of switching to XFS after doing a bit of reading. Recently changed to a Supermicro 846 chassis with expander backplane, but the cache is still in the same SATA port on the mobo. unraid1-diagnostics-20171207-0942.zip Edited April 8 by Cessquill Removed diagnostics file for safety Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 Diagnostics zip is empty. Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 (edited) D'oh! Second try. There will also be UPS errors in the log - the machine's out the rack at the moment. Also, all dockers have disappeared. Currently backing up cache drive to the array. unraid1-diagnostics-20171207-1311.zip Edited April 8 by Cessquill Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 This happened before on array disks, first time I've seen it on the cache disk, but this should fix it: Copy a large file locally from a disk to cache, e.g., use midnight commander (mc) and a copy a large (5GB or so) file from /mnt/disk1 to /mnt/cache, after that check but the GUI should start displaying both the correct size and free space. Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 Thank you - I'll try that. MC is currently busy copying stuff off there. Once it's done I'll go the other way. I'm suspecting it's because of lots of very small files (copying Plex appdata). I did try clicking on the cache drive's balance button, but nothing seemed to happen - going to read up on it later. Is this likely to be a condition of BTRFS? I'm quite happy to switch to XFS if it's preferred for a single cache. Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 That issue is not balance related, it's juts space being incorrectly reported, and it will only affect Samba transfers, i.e. locally you'll have no issues, the problem comes from here: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 233G 71G 0 100% /mnt/cache As you see the total size is correctly reported here, but not the free size, btrfs has some quirks including free space reporting sometimes, if you don't plan to use a pool and don't care for snapshots and/or checksums might as well use xfs. Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 Starting to make sense now, thank you. It initially worked, and the drive reported the correct size again. A few minutes later it broke again. Then fine, then broke, then fine again and finally broke. All within the space of about 10 minutes. I started in maintenance mode, and running a check found a few issues which I repaired. I'm now in the position where I'm trying to delete something from the cache drive in order to free up space to copy a file back onto it. MC gives me the message... Cannot delete file "...test file..." No space left on device (28) I get the same when I try and overwrite the file on that drive. Heading home shortly, so I'll be able to connect to it properly. Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 btrfs fsck should really only be used as a last resort, post current diags. Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 (edited) Thank you! unraid1-diagnostics-20171207-1811.zip Edited April 8 by Cessquill Removed diagnostics file for safety Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 You have a stale browser window open spamming the log with wrong csrf_token errors, please close it, reboot, try do delete a file so you get the error and then grab new diags. Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 (edited) Really sorry about that - had left the laptop on at home with the browser open. Rebooted (which although was clean, triggered a parity check) and got the new file... unraid1-diagnostics-20171207-2143.zip Edited April 8 by Cessquill Removed diagnostics file for safety Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 Only thing wrong I see for now in the log is a corrupt docker image, that's an easy fix, delete and re-create: SSD still reporting the wrong free space, did you try the local copy using mc and are you still having issues deleting cache files? Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 I can recreate the docker img file (well, I can when it'll let me). Unfortunately I can't copy to or delete from the cache drive, and I think that's also preventing me from ditching the img file. Also, if I go to the docker tab, I get Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 672 Couldn't create socket: [111] Connection refused Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 852 Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (Connection refused) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 672 Couldn't create socket: [111] Connection refused Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 852 (which I assume is related) Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 Not being able to copy to cache from Windows is expected because of the wrong free space reported, not being able to delete files is not, can you post new diags (or just the syslog if you prefer) after attempting to e.g. delete the docker image? Quote Link to comment
Cessquill Posted December 7, 2017 Author Share Posted December 7, 2017 (edited) The copying/deleting was all from within mc - I'm just using Windows to putty onto it. Just rebooted, trying to copy, delete & use the docker settings to clear img file... (thanks for your time with this) unraid1-diagnostics-20171207-2318.zip Edited April 8 by Cessquill Removed diagnostics file for safety Quote Link to comment
JorgeB Posted December 7, 2017 Share Posted December 7, 2017 Still don't see nothing wrong on the logs except a corrupt docker image, but if you can't delete it probably best just to backup anything important still on cache and re-format it. Quote Link to comment
Cessquill Posted December 8, 2017 Author Share Posted December 8, 2017 Thanks for your time - I'd done quite a bit of reading and had a few ideas, but ultimately yes - it's much easier to wipe and start again. I'm guessing something to do with metadata and the fact that a lot of small files had been copied on there (but that's about where my knowledge drops off). Stopped array, changed file system, started array, formatted cache, restored appdata and copied other files back onto the drive. All started back straight away - even the docker.img file was fine once there was space. When I get some time over the weekend, I'll clear the img file and reinstall, but it's sorted for now. Thanks! Quote Link to comment
pwm Posted December 8, 2017 Share Posted December 8, 2017 35 minutes ago, Cessquill said: All started back straight away - even the docker.img file was fine once there was space. When I get some time over the weekend, I'll clear the img file and reinstall, but it's sorted for now. No reason to fix what isn't broken. BTRFS with copy-on-write means any existing file you try to modify will fail the writes if BTRFS doesn't have space to create new instances of the changed data blocks. So even a perfectly fine docker.img is expected to fail if BTRFS is full - while modifications to existing files will succeed if using a file system that doesn't use CoW. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.