Cache Pool Corruption

April 19, 20233 yr

Hi All,

I've run into a bit of an issue that I think is stemming from some corruption errors in my cache.

Initial symptom was that the cache wasn't being emptied when mover was running. I followed this guide to get everything off of the cache: Backing up the Pool to the Array

During the move, I thought I noticed that the pool still wasn't emptying. In fact it was emptying, it just wasn't giving me more free space. Please see attached 2 screen shots taken over a few hours. You'll note that I have 2 x 1 TB Drives in a BTRFS Raid 1. Initially 418GB was Used. Then 284 GB was Used but the Free Space only went up by about 10 GB. Both Drives are identical in size so no funny calculations on the BTRFS side.

After some sleuthing I found this thread because I also got the same error message mentioned. BTRFS Pool Too Many Profiles. I've also now run the dev stats command and got this:

Quote

root@Radon:~# btrfs dev stats /mnt/cache
[/dev/nvme0n1p1].write_io_errs 0
[/dev/nvme0n1p1].read_io_errs 0
[/dev/nvme0n1p1].flush_io_errs 0
[/dev/nvme0n1p1].corruption_errs 862164
[/dev/nvme0n1p1].generation_errs 0
[/dev/sdc1].write_io_errs 0
[/dev/sdc1].read_io_errs 0
[/dev/sdc1].flush_io_errs 0
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0

I had some problems a while ago with bad RAM corrupting my cache, the sdc drive was added after this corruption was solved (or so I thought).

So my question is: If I was to remove the NVME drive, zero it, and then re-add it, would that solve my issue? or is there more at play here that I haven't yet found? Do I need to do anything to be able to safely remove the NVME Drive?

Results of the Pool Balance are below for reference (I'm not 100% sure how to tell if it is balanced and I can remove a drive)

Quote

Data, single: total=867.49GiB, used=445.59GiB

Data, RAID1: total=60.97GiB, used=40.22GiB

System, single: total=4.00MiB, used=80.00KiB

System, RAID1: total=32.00MiB, used=64.00KiB

Metadata, single: total=2.01GiB, used=428.72MiB

Metadata, RAID1: total=1.00GiB, used=152.41MiB

GlobalReserve, single: total=512.00MiB, used=0.00B

No balance found on '/mnt/cache'

radon-diagnostics-20230419-1456.zip

Quote

April 19, 20233 yr

Community Expert
Solution

With all that corruption best way forward would be to re-format the pool, is all data you want or can recover already backed up?

Quote

April 19, 20233 yr

Author

No, Unfortunatly not. My Docker Image is still there along with some Nextcloud and swag data. How do I force it to move off? I have a spare sata SSD and sata port that I can use if that helps?

Quote

April 19, 20233 yr

Community Expert

Docker image can easily be recreated, you can use btrfs restore to restore ignoring the corrupt files, of course the files will still be corrupted, so some might not work correctly after.

Quote

April 19, 20233 yr

Author

Hey JorgeB,

Thanks heaps for the super quick replies yesturday. I've been trying to do the restore like you said but I've run into some problems. I've tried multiple commands, These are the errors I'm getting:

Quote

root@Radon:/mnt# btrfs restore -v /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue
root@Radon:/mnt# btrfs restore -v /dev/cache /mnt/disk1/restore
ERROR: mount check: cannot open /dev/cache: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue

root@Radon:/mnt# btrfs restore -vi /dev/nvme0n11 /mnt/disk1/restore
ERROR: mount check: cannot open /dev/nvme0n11: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/nvme0n1 /mnt/disk1/restore
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super

So it won't work if the array is started, then it won't work if they array is stopped, and it also won't work if the array is in maintenance mode. I am super confused as to what I need to do here ...

Quote

April 20, 20233 yr

Community Expert

9 hours ago, Rattus said:

root@Radon:/mnt# btrfs restore -v /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue
root@Radon:/mnt# btrfs restore -v /dev/cache /mnt/disk1/restore
ERROR: mount check: cannot open /dev/cache: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue

It must be unmounted.

9 hours ago, Rattus said:

root@Radon:/mnt# btrfs restore -vi /dev/nvme0n11 /mnt/disk1/restore
ERROR: mount check: cannot open /dev/nvme0n11: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/nvme0n1 /mnt/disk1/restore
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super

Wrong device, should be /dev/nvme0n1p1

Quote

April 21, 20233 yr

Author

Hello Again JorgeB and all,

So its been a process but:

- BTRFS Restore moved all of the files leftover files from cache to disk 1 under /mnt/disk1/restore

- I've now got the pool back up, Pool is formatted to BTRFS and showing all available storage (see screenshot attached)

I followed this method from Squid with some tweaks to reformat the pool. I had to remove the SATA Drive, reduce the pool size to one, then format as per Squid, then re-add the drive.

Now the dev stats command shows no errors:

Quote

[/dev/nvme0n1p1].write_io_errs 0
[/dev/nvme0n1p1].read_io_errs 0
[/dev/nvme0n1p1].flush_io_errs 0
[/dev/nvme0n1p1].corruption_errs 0
[/dev/nvme0n1p1].generation_errs 0
[/dev/sdc1].write_io_errs 0
[/dev/sdc1].read_io_errs 0
[/dev/sdc1].flush_io_errs 0
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0

My question now is, what is the safest way to move the files from the restore folder I created back to the drive so that they are recognised as never actually leaving the cache? Is that even what I am supposed to do here?

Is it just the BTRFS Restore in reverse?

Quote

April 21, 20233 yr

Community Expert

You can just copy the files to their original locations, but like mentioned some files will still be corrupt, and that can cause issues down the line.

Quote

Cache Pool Corruption

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)