Cache Pool Corruption


Go to solution Solved by JorgeB,

Recommended Posts

Hi All,

 

I've run into a bit of an issue that I think is stemming from some corruption errors in my cache.

 

Initial symptom was that the cache wasn't being emptied when mover was running. I followed this guide to get everything off of the cache: Backing up the Pool to the Array

 

During the move, I thought I noticed that the pool still wasn't emptying. In fact it was emptying, it just wasn't giving me more free space. Please see attached 2 screen shots taken over a few hours. You'll note that I have 2 x 1 TB Drives in a BTRFS Raid 1. Initially 418GB was Used. Then 284 GB was Used but the Free Space only went up by about 10 GB. Both Drives are identical in size so no funny calculations on the BTRFS side.

 

After some sleuthing I found this thread because I also got the same error message mentioned. BTRFS Pool Too Many Profiles. I've also now run the dev stats command and got this:

 

Quote

root@Radon:~# btrfs dev stats /mnt/cache
[/dev/nvme0n1p1].write_io_errs    0
[/dev/nvme0n1p1].read_io_errs     0
[/dev/nvme0n1p1].flush_io_errs    0
[/dev/nvme0n1p1].corruption_errs  862164
[/dev/nvme0n1p1].generation_errs  0
[/dev/sdc1].write_io_errs    0
[/dev/sdc1].read_io_errs     0
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0

 

I had some problems a while ago with bad RAM corrupting my cache, the sdc drive was added after this corruption was solved (or so I thought). 

 

So my question is: If I was to remove the NVME drive, zero it, and then re-add it, would that solve my issue? or is there more at play here that I haven't yet found? Do I need to do anything to be able to safely remove the NVME Drive?

 

Results of the Pool Balance are below for reference (I'm not 100% sure how to tell if it is balanced and I can remove a drive)

 

Quote

Data, single: total=867.49GiB, used=445.59GiB

Data, RAID1: total=60.97GiB, used=40.22GiB

System, single: total=4.00MiB, used=80.00KiB

System, RAID1: total=32.00MiB, used=64.00KiB

Metadata, single: total=2.01GiB, used=428.72MiB

Metadata, RAID1: total=1.00GiB, used=152.41MiB

GlobalReserve, single: total=512.00MiB, used=0.00B

 

No balance found on '/mnt/cache'

 

 

Screenshot 2023-04-19 at 07.39.43.png

Screenshot 2023-04-19 at 14.15.26.png

radon-diagnostics-20230419-1456.zip

Link to comment

Hey JorgeB,

Thanks heaps for the super quick replies yesturday. I've been trying to do the restore like you said but I've run into some problems. I've tried multiple commands, These are the errors I'm getting:

Quote

root@Radon:/mnt# btrfs restore -v /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue
root@Radon:/mnt# btrfs restore -v /dev/cache /mnt/disk1/restore
ERROR: mount check: cannot open /dev/cache: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue

root@Radon:/mnt# btrfs restore -vi /dev/nvme0n11 /mnt/disk1/restore
ERROR: mount check: cannot open /dev/nvme0n11: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/nvme0n1 /mnt/disk1/restore
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super

 

So it won't work if the array is started, then it won't work if they array is stopped, and it also won't work if the array is in maintenance mode. I am super confused as to what I need to do here ... :(

Link to comment
9 hours ago, Rattus said:

root@Radon:/mnt# btrfs restore -v /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue
root@Radon:/mnt# btrfs restore -v /dev/cache /mnt/disk1/restore
ERROR: mount check: cannot open /dev/cache: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/sdc1 /mnt/disk1/restore
ERROR: /dev/sdc1 is currently mounted, cannot continue

It must be unmounted.

 

9 hours ago, Rattus said:

root@Radon:/mnt# btrfs restore -vi /dev/nvme0n11 /mnt/disk1/restore
ERROR: mount check: cannot open /dev/nvme0n11: No such file or directory
ERROR: could not check mount status: No such file or directory
root@Radon:/mnt# btrfs restore -vi /dev/nvme0n1 /mnt/disk1/restore
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super
No valid Btrfs found on /dev/nvme0n1
Could not open root, trying backup super

Wrong device, should be /dev/nvme0n1p1

Link to comment

Hello Again JorgeB and all,

 

So its been a process but:

- BTRFS Restore moved all of the files leftover files from cache to disk 1 under /mnt/disk1/restore

- I've now got the pool back up, Pool is formatted to BTRFS and showing all available storage (see screenshot attached)

 

I followed this method from Squid with some tweaks to reformat the pool. I had to remove the SATA Drive, reduce the pool size to one, then format as per Squid, then re-add the drive.

 

Now the dev stats command shows no errors:

 

Quote

[/dev/nvme0n1p1].write_io_errs 0
[/dev/nvme0n1p1].read_io_errs 0
[/dev/nvme0n1p1].flush_io_errs 0
[/dev/nvme0n1p1].corruption_errs 0
[/dev/nvme0n1p1].generation_errs 0
[/dev/sdc1].write_io_errs 0
[/dev/sdc1].read_io_errs 0
[/dev/sdc1].flush_io_errs 0
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0

 

My question now is, what is the safest way to move the files from the restore folder I created back to the drive so that they are recognised as never actually leaving the cache? Is that even what I am supposed to do here?

 

Is it just the BTRFS Restore in reverse?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.