Power failure leading to corrupted cache drive(s?)

Followers

July 19, 20241 yr

I've dug in the forums and found various similar issues identified, but figured as I'm only moderately experienced with linux I should post here before taking the nuclear option.

Last night we had a power outage in my building and today I see my dockers are "running" but not working.

I had appdata on my mirrored cache drives, and it looks like I have the dreaded error fillimg my logs:

Jul 18 16:50:26 STORAGE kernel: BTRFS error (device sdf1: state EA): bad tree block start, mirror 2 want 271073280 have 0

sdf is one of my cache pool SSDs, and sdg is the other.

So before I completely blow away my cache dives and rebuild all my dockers, hoping there my be something else I can do to make this easier. (though when I look into the cache drive, many of the appdata\dockername\ folders are empty... possibly to more than one restart as the power did cycle several times over an hour. (Yes, I know I should get a UPS.... lesson learned....)

Diagnostics attached,

Thanks in advance.

storage-diagnostics-20240718-1755.zip

Quote

Solved by JorgeB

July 19, 20241 yr

Go to solution

July 19, 20241 yr

Community Expert

Syslog rotated, reboot and post new diags after array start.

Quote

July 19, 20241 yr

Author

Rebooted and array started. diags attached... TIA!

storage-diagnostics-20240719-0830.zip

Quote

July 19, 20241 yr

Community Expert

Try running a scrub but that doesn't look recoverable, can you still access the data?

Quote

July 19, 20241 yr

Author

Yes I can still access the data after the reboot.

Unfortunately I can't scrub as it is mounted read only:

root@STORAGE:~# btrfs scrub start -B /mnt/cache/
ERROR: scrubbing /mnt/cache/ failed for device id 1: ret=-1, errno=30 (Read-only file system)
ERROR: scrubbing /mnt/cache/ failed for device id 2: ret=-1, errno=30 (Read-only file system)
scrub canceled for 769a2235-054f-413d-bb43-68b51c7171de
Scrub started:    Fri Jul 19 10:02:26 2024
Status:           aborted
Duration:         0:00:00
Total to scrub:   0.00B
Rate:             0.00B/s
Error summary:    no errors found

Any other suggestions?

(On a side note - I never had this type of problem in the past.... is btrfs really ready for prime time? or should I rebuild the cache as ext4?)

Quote

July 19, 20241 yr

Community Expert
Solution

I would recommend backing up the pool and recreating.

20 minutes ago, ffaat said:

is btrfs really ready for prime time?

Yes, though you can now also use zfs, and zfs is better at recovering from a dropped device, so now I usually recommend it.

Quote

July 19, 20241 yr

Author

woot!

Thanks @JorgeB. I was successfully able to use mover to get everything onto the array, reformat the drives as a ZFS mirror and then restore everything back tot he cache.

Any tips on setting up a regular docker.img and appdata backup routine?

-Rob A.

Quote

July 20, 20241 yr

Community Expert

Docker image can easily be recreated, for the appdata you can use the appdata-backup plugin

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Power failure leading to corrupted cache drive(s?)

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)