drumstyx Posted December 30, 2022 Share Posted December 30, 2022 (edited) Long story short, I upgraded to 3gbps fiber internet, and accidentally filled my 2tb cache pool (dual 2tb nvme drives) with the automated download processes. The filesystem went readonly, and I tried to just reboot, and of course, readonly again. On top of this, a disk in my array happened to fail a couple days ago, and I didn't notice, so I'm running a rebuild on a warm spare I had in there to replace it. Problem is, now I can't run mover, and of course even after copying manually, I can't delete anything to free up space. The global reserve is at 512MiB and says 0 is used, but still, it's just not happy. The rebuild means I can't go into maintenance mode right now. I'll wait it out if I have to, but it'd be awesome if I could just figure out how to delete even a single file from the cache to free it up a bit. Scrub doesn't work (readonly) balancing doesn't work (readonly)....seems like I'm stuck? It's frustrating because I went dual btrfs on the cache to prevent issues like this -- the suggestions I see on here have been "best bet is to backup cache, reformat and rebuild cache". Edited December 30, 2022 by drumstyx Quote Link to comment
JorgeB Posted December 30, 2022 Share Posted December 30, 2022 Pleas post the diagnostics. 30 minutes ago, drumstyx said: because I went dual btrfs on the cache to prevent issues like this raid1 protects against a device failure, not filesystem issues, since both are affected at the same time. Quote Link to comment
drumstyx Posted December 30, 2022 Author Share Posted December 30, 2022 Right, but why is there a filesystem issue when it's "full"? Isn't the global reserve supposed to prevent this? I guess I didn't have min free space configured on cache, so my bad on that, but filling a filesystem shouldn't result in unresolvable readonly. If I accept the risk of some corrupted data in whatever was last written, is there any way to just say "to hell with integrity, delete this file"? Quote Link to comment
JorgeB Posted December 30, 2022 Share Posted December 30, 2022 If the filesystem went read only there's a filesystem issue, diags might show what happened. Quote Link to comment
drumstyx Posted December 30, 2022 Author Share Posted December 30, 2022 Fair enough -- I've attached diags. I'm not quite sure where to look in them for this issue. As far as I can tell, there have been no dropouts or errors on the cache other than a failure to write due to insufficient space. datnas-diagnostics-20221230-1334.zip Quote Link to comment
Solution JorgeB Posted December 31, 2022 Solution Share Posted December 31, 2022 One of your NVMe devices is slightly larger than the other, this can cause issues with a full filesystem since btrfs still sees free space on one of them, but because it's raid1 it cannot write to the other one, try disabling all services that use the pool, like VMs/dockers before array start and then start the array, without any new writes the fs might not go immediately read only. Quote Link to comment
drumstyx Posted December 31, 2022 Author Share Posted December 31, 2022 Hooray! You did it! What a weird problem lol Now my first world problem is that my internet is actually faster than my SATA drives (not to mention the disk shelf they're in is a DS4246 so even turbo write peaks at about 85MB/s), so I can't rely on the mover to keep up with data intake. Maybe I can throttle nzbget at a certain cache usage threshold... 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.