Cache disk unmountable - Corrupt Leaf

Greywall · August 30, 2020

I have had this Unraid server running for 3-4 years without any issues. The config and hardware has been static since initially built/setup. Early this morning, ran into an "Unmountable: No file system" disk issue with disk 1 of a 2-disk cache pool...

Restarted the array in maintenance mode and attempted a btrfs check, but got the following in the Web UI:


bad key ordering 126 127
bad key ordering 126 127
bad key ordering 126 127
bad key ordering 126 127
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system
Opening filesystem to check...

Attempted mounting the filesystem per the FAQ post but got error:

root@Homeserver:/# mount -o ro,notreelog,nologreplay /dev/sdb1 /x
mount: /x: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error.

Attempted doing an xfs_repair, but got issues with finding superblock:

root@Homeserver:/# xfs_repair -v /dev/sdb1
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
...........Sorry, could not find valid secondary superblock
Exiting now.
root@Homeserver:/#

Rebooted server and ran diagnostics (attached). The following is from syslog after reboot:

Aug 30 10:20:42 Homeserver emhttpd: shcmd (40): mkdir -p /mnt/cache
Aug 30 10:20:42 Homeserver emhttpd: cache uuid: b4197f0b-68d1-4496-9837-6e480fcf1bbc
Aug 30 10:20:42 Homeserver emhttpd: cache TotDevices: 2
Aug 30 10:20:42 Homeserver emhttpd: cache NumDevices: 2
Aug 30 10:20:42 Homeserver emhttpd: cache NumFound: 2
Aug 30 10:20:42 Homeserver emhttpd: cache NumMissing: 0
Aug 30 10:20:42 Homeserver emhttpd: cache NumMisplaced: 0
Aug 30 10:20:42 Homeserver emhttpd: cache NumExtra: 0
Aug 30 10:20:42 Homeserver emhttpd: cache LuksState: 0
Aug 30 10:20:42 Homeserver emhttpd: shcmd (41): mount -t btrfs -o noatime,nodiratime -U b4197f0b-68d1-4496-9837-6e480fcf1bbc /mnt/cache
Aug 30 10:20:42 Homeserver kernel: BTRFS info (device sdb1): disk space caching is enabled
Aug 30 10:20:42 Homeserver kernel: BTRFS info (device sdb1): has skinny extents
Aug 30 10:20:42 Homeserver kernel: BTRFS critical (device sdb1): corrupt leaf: root=2 block=243354976256 slot=127, bad key order, prev (576460982172839936 168 4096) current (229869420544 168 4096)
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Aug 30 10:20:42 Homeserver kernel: BTRFS error (device sdb1): failed to read block groups: -5
Aug 30 10:20:42 Homeserver root: mount: /mnt/cache: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
Aug 30 10:20:42 Homeserver emhttpd: shcmd (41): exit status: 32
Aug 30 10:20:42 Homeserver emhttpd: /mnt/cache mount error: No file system
Aug 30 10:20:42 Homeserver emhttpd: shcmd (42): umount /mnt/cache
Aug 30 10:20:42 Homeserver kernel: BTRFS error (device sdb1): open_ctree failed
Aug 30 10:20:42 Homeserver root: umount: /mnt/cache: not mounted.
Aug 30 10:20:42 Homeserver emhttpd: shcmd (42): exit status: 32
Aug 30 10:20:42 Homeserver emhttpd: shcmd (43): rmdir /mnt/cache

Guessing the disk is beyond repair and I would need to something like:

remove bad disk from pool
attempt to wipe/format/salvage disk
attempt to use what's on cache pool disk 2 to restore content to disk 1?
rejoin disks into pool?

Before I start going down these paths, thought it would be best to consult the sages on forums and avoid any (more) bad moves I could make.

Thanks in advance!

Mark.

homeserver-diagnostics-20200830-1044.zip

trurl · August 30, 2020

1 hour ago, Greywall said:

Attempted doing an xfs_repair, but got issues with finding superblock:

Not surprising since it is not XFS

trurl · August 30, 2020

3 hours ago, Greywall said:
Attempted mounting the filesystem per the FAQ post but got error:
root@Homeserver:/# mount -o ro,notreelog,nologreplay /dev/sdb1 /x
mount: /x: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error.

That looks like the 3rd mount command mentioned in the FAQ. Did you try the 1st 2 mount commands at that link?

Of course, the author of that FAQ, @johnnie.black is the expert. I have btrfs raid1 cache pool but have never had to deal with these problems.

Greywall · August 30, 2020

Yeah, I'm kinda grasping at straws... not really sure about what's the best thing to do at this point. Been googling stuff and trying them out.

Is my general thought process correct?

Quote

remove bad disk from pool

attempt to wipe/format/salvage disk

attempt to use what's on cache pool disk 2 to restore content to disk 1?

rejoin disks into pool?

Greywall · August 31, 2020

6 hours ago, trurl said:

That looks like the 3rd mount command mentioned in the FAQ. Did you try the 1st 2 mount commands at that link?

Sorry, missed this question earlier. Yep, all three mount commands resulted in the same error.

JorgeB · August 31, 2020

You can try btrfs restore, if that doesn't help you can look for help on IRC or the mailing list, also mentioned on the same FAQ entry.

Cache disk unmountable - Corrupt Leaf

Recommended Posts

Greywall

Link to comment

trurl

Link to comment

trurl

Link to comment

Greywall

Link to comment

Greywall

Link to comment

JorgeB

Link to comment

Join the conversation