Jump to content

Power failure leading to corrupted cache drive(s?)

Featured Replies

Posted

I've dug in the forums and found various similar issues identified, but figured as I'm only moderately experienced with linux I should post here before taking the nuclear option.

 

Last night we had a power outage in my building and today I see my dockers are "running" but not working.

 

I had appdata on my mirrored cache drives, and it looks like I have the dreaded error fillimg my logs:

 

Jul 18 16:50:26 STORAGE kernel: BTRFS error (device sdf1: state EA): bad tree block start, mirror 2 want 271073280 have 0

 

sdf is one of my cache pool SSDs, and sdg is the other.

 

So before I completely blow away my cache dives and rebuild all my dockers, hoping there my be something else I can do to make this easier.  (though when I look into the cache drive, many of the appdata\dockername\ folders are empty... possibly to more than one restart as the power did cycle several times over an hour.  (Yes, I know I should get a UPS.... lesson learned....)

 

Diagnostics attached, 

 

Thanks in advance.

storage-diagnostics-20240718-1755.zip

Solved by JorgeB

Go to solution
  • Community Expert

Syslog rotated, reboot and post new diags after array start.

  • Community Expert

Try running a scrub but that doesn't look recoverable, can you still access the data?

  • Author

Yes I can still access the data after the reboot.

 

Unfortunately I can't scrub as it is mounted read only:

root@STORAGE:~# btrfs scrub start -B /mnt/cache/
ERROR: scrubbing /mnt/cache/ failed for device id 1: ret=-1, errno=30 (Read-only file system)
ERROR: scrubbing /mnt/cache/ failed for device id 2: ret=-1, errno=30 (Read-only file system)
scrub canceled for 769a2235-054f-413d-bb43-68b51c7171de
Scrub started:    Fri Jul 19 10:02:26 2024
Status:           aborted
Duration:         0:00:00
Total to scrub:   0.00B
Rate:             0.00B/s
Error summary:    no errors found

 

Any other suggestions?

 

(On a side note - I never had this type of problem in the past.... is btrfs really ready for prime time? or should I rebuild the cache as ext4?)

  • Community Expert
  • Solution

I would recommend backing up the pool and recreating.

 

20 minutes ago, ffaat said:

is btrfs really ready for prime time?

 

Yes, though you can now also use zfs, and zfs is better at recovering from a dropped device, so now I usually recommend it.

  • Author

woot!

 

Thanks @JorgeB.  I was successfully able to use mover to get everything onto the array, reformat the drives as a ZFS mirror and then restore everything back tot he cache.

 

Any tips on setting up a regular docker.img and appdata backup routine?

 

-Rob A.

  • Community Expert

Docker image can easily be recreated, for the appdata you can use the appdata-backup plugin

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...