BTRFS Errors, forced readonly, needs some direction

cpxazn · October 11, 2022

I upgraded to 6.11.1 recently, but don't think that is the cause of my issue. After the upgrade, everything was running fine for about 1.5 days, but today I found a few dockers unresponsive and having issues creating new files on shares. I'm seeing quite a bit of btrfs errors in the syslogs. I'm unable to successfully run scrub/balance as those will error within 1 minute of running. Currently I'm running a memtest to see if that's the problem, however I think it may be a drive problem. If it does end up being a drive problem, how can I tell which drive needs replacing, and once I figure that out, can I just update the drive assignment to point to a new empty drive, and run a balance?

daniel-nas-diagnostics-20221011-1814.zip

cpxazn · October 12, 2022

So the memtest completed without any issues. Running btrfs check on all 5 drives returns similar errors

Opening filesystem to check...
Checking filesystem on /dev/sdd1
UUID: 827b699d-5ce4-4987-a44c-9bb0a4055ada
[1/7] checking root items
[2/7] checking extents
ref mismatch on [7393788952576 544768] extent item 0, found 1
data backref 7393788952576 parent 7446032334848 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 7393788952576 parent 7446032334848 owner 0 offset 0 found 1 wanted 0 back 0x15a0bf30
backpointer mismatch on [7393788952576 544768]
extent item 7393935491072 has multiple extent items
ref mismatch on [7393935491072 262144] extent item 1, found 2
backref disk bytenr does not match extent record, bytenr=7393935491072, ref bytenr=7393935683584
backref bytes do not match extent backref, bytenr=7393935491072, ref bytes=262144, backref bytes=12288
backpointer mismatch on [7393935491072 262144]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
there is no free space entry for 7393788952576-7393789497344
cache appears valid but isn't 7393781088256
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
there are no extents for csum range 7393788952576-7393789497344
csum exists for 7393624522752-7393878589440 but there is no extent record
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 2946302922752 bytes used, error(s) found
total csum bytes: 2824465988
total tree bytes: 4455694336
total fs tree bytes: 918994944
total extent tree bytes: 245547008
btree space waste bytes: 636103196
file data blocks allocated: 3249394266112
 referenced 2938461696000

Currently I am running a btrfs restore to backup before running a check --repair, but I don't know if I have to run restore multiple times for each drive in the array to get all data? I couldn't find anything searching google on how restore command works if you have an array.

Edited October 12, 2022 by cpxazn

JorgeB · October 12, 2022

read time tree block corruption detected

This suggests some old undetected corruption being now detected by the newer kernel, if you have difficulties backing it up now you can downgrade to previous Unraid and it should mount, then backup pool and re-format the pool.

cpxazn · October 12, 2022

8 hours ago, JorgeB said:
read time tree block corruption detected
This suggests some old undetected corruption being now detected by the newer kernel, if you have difficulties backing it up now you can downgrade to previous Unraid and it should mount, then backup pool and re-format the pool.

I backed up and reformatted the pool, but every time I start the array, I still get btrfs errors. Scrub and btrfs check do not return any errors. All SMART health are passing for the cache drives. Not exactly sure what's causing these errors.

Oct 12 12:53:47 daniel-nas root: mount: /etc/libvirt: wrong fs type, bad option, bad superblock on /dev/loop3, missing codepage or helper program, or other error.
Oct 12 12:53:47 daniel-nas root:        dmesg(1) may have more information after failed mount system call.
Oct 12 12:53:47 daniel-nas root: mount error
Oct 12 12:53:47 daniel-nas  emhttpd: shcmd (33653): exit status: 1
Oct 12 12:53:47 daniel-nas  emhttpd: nothing to sync
Oct 12 12:53:47 daniel-nas kernel: BTRFS error (device loop3): bad fsid on block 22036480
Oct 12 12:53:47 daniel-nas kernel: BTRFS error (device loop3): bad fsid on block 22036480
Oct 12 12:53:47 daniel-nas kernel: BTRFS error (device loop3): failed to read chunk root
Oct 12 12:53:47 daniel-nas kernel: BTRFS error (device loop3): open_ctree failed

daniel-nas-diagnostics-20221012-1257.zip

JorgeB · October 12, 2022

That's the libvirt image, try restoring it from a backup again, ideally an earlier one if available.

BTRFS Errors, forced readonly, needs some direction

Recommended Posts

cpxazn

Link to comment

cpxazn

Link to comment

JorgeB

Link to comment

cpxazn

Link to comment

JorgeB

Link to comment

Join the conversation