What to do next? - Corrupt BTRFS Raid1 Cache Pool

Followers

May 6, 20233 yr

Hey All,

I have two 2TB nvme drives in a RAID1 btrfs and am getting errors.

I've tried doing a scrub but it seems to immediately be aborted.

Since I have a RAID1 setup, can I simply remove the problematic drive and replace it later this week when a new one arrives?

Diagnostics are attached.

diagnostics-20230506-1013.zip

Quote

Solved by JorgeB

May 6, 20233 yr

Go to solution

May 6, 20233 yr

Community Expert
Solution

This shows that one of the NVMe devices dropped offline in the past

May  6 09:47:03 Sunshine kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 25028584, rd 1469323, flush 64083, corrupt 119757, gen 0

A scrub should correct this, but you have at least one share set to NOCOW, this is a problem, recommend saving what you can from the pool and then recreate, with all shares set to COW, also see here for more info.

Quote

May 6, 20233 yr

Author

Thanks @JorgeB, a short time ago I realised the data on the pool was lost. Thankfully I did actually have recent backups.

I've just finished restoring everything, sans one of the nvme's which I think is dead.

It is my "Domains" share that is set to NOCOW because, as per the Unraid GUI help it says "We recommend this setting for shares used to store vdisk images, including the Docker loopback image file. This setting has no effect on non-btrfs file systems."

As my domains share is used solely for vdisks, I presumed this was the right approach?

Should I set it back to Auto? And if so, would I need to empty the share, recreate it and copy files back so they inherit the COW attribute?

Quote

May 6, 20233 yr

Community Expert

46 minutes ago, Congles said:

It is my "Domains" share that is set to NOCOW because, as per the Unraid GUI help it says "We recommend this setting for shares used to store vdisk images, including the Docker loopback image file. This setting has no effect on non-btrfs file systems."

NOCOW should only be used with single device btrfs filesystem, for raid1 always use COW, it might have a small performance inpact but it won't corrupt the pool if a devices drops and comes back.

Quote

May 6, 20233 yr

Author

Much appreciated insight.

If I delete my domains share and recreate it with the correct settings and copy files back. Will that be sufficient to fix this issue going forward?

Quote

May 6, 20233 yr

Community Expert

Yep

Quote

May 6, 20233 yr

Author

perfect, thanks!

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

What to do next? - Corrupt BTRFS Raid1 Cache Pool

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)