BTRFS cache is suddenly read-only

JohnnyCache · November 5, 2021

Good Morning,

I saw some docker issues this AM and probed around to find that my cache drive is suddenly a read-only file system. How did this happen and how can I fix it? I've searched around a bit to try to repair this but have not had any luck.

I know next to nothing about BTRFS but am decent with Linux in general. the only thing that stood out to me was when running

btrfs fi df /mnt/cache

The numbers on Data make it look like the drive is nearly full. In other posts where users were faced with the "my BTRFS drive is full but not reporting that in the UI" issue, the size/used issue was identified in the btrfs fi show /mnt/cache command.

Does that make any difference? Am I seeing the same issue? In the threads I've found, removing some data and rebalancing was the fix. But I can't rebalance due to the "read-only file system" issue. I can't even create an empty file on my cache disk -

root@jonas:~# touch /mnt/cache/Logs/testwrite.log
touch: cannot touch '/mnt/cache/Logs/testwrite.log': Read-only file system

btrfs fi outputs:

root@jonas:~# btrfs fi show /mnt/cache
Label: none  uuid: 06942e55-3e85-4a3d-a70a-f5d321bea2a3
        Total devices 1 FS bytes used 217.21GiB
        devid    1 size 465.76GiB used 224.02GiB path /dev/sde1


root@jonas:~# btrfs fi df /mnt/cache
Data, single: total=222.01GiB, used=216.62GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=2.01GiB, used=606.70MiB
GlobalReserve, single: total=341.06MiB, used=0.00B


root@jonas:~# btrfs balance start -dusage=5 /mnt/cache
ERROR: error during balancing '/mnt/cache': Read-only file system
There may be more info in syslog - try dmesg | tail

Thanks for you help!!

Edited November 6, 2021 by JohnnyCache

JorgeB · November 5, 2021

Cache filesystem is corrupt, best bet is to backup and re-format, there some recovery options here if needed.

JohnnyCache · November 5, 2021

45 minutes ago, JorgeB said:

Cache filesystem is corrupt, best bet is to backup and re-format, there some recovery options here if needed.

thanks for the quick reply.

Dang... Do you mean just backup and re-format the cache drive only? Would the best process for this be to:

1. image the drive just in case

2. set cache=no on all shares

3. run the mover to ensure all data is moved to array

4. run appdatabackup

5. format the cache

6. restore appdata

7. restore cache setting on file shares

Is there anything I need to do to to prevent this in the future? I haven't had any power failures/unclean shutdowns.

Would it be worthwhile to switch to XFS as others have done?

JorgeB · November 5, 2021

23 minutes ago, JohnnyCache said:

Do you mean just backup and re-format the cache drive only?

Yes.

23 minutes ago, JohnnyCache said:

Would it be worthwhile to switch to XFS as others have done?

Probably best, since you're running single device pool and if you don't need any of the btrfs features, though in my experience most btrfs issues are caused by hardware, still xfs is usually more tolerant if there are any issues.

JonathanM · November 5, 2021

1 hour ago, JohnnyCache said:

2. set cache=no on all shares

3. run the mover to ensure all data is moved to array

cache no disables mover. cache yes is what you want, turn on the help beside the setting for a more thorough explanation.

JohnnyCache · November 5, 2021

1 hour ago, JorgeB said:

[...]

in my experience most btrfs issues are caused by hardware, still xfs is usually more tolerant if there are any issues.

Is there anything specific I should be looking for? The SSD is relatively new and does not get a ton of use.

Or is it more that btrfs is expecting better ECC on the drive itself (thinking an enterprise SSD vs the off-the-shelf EVO I have)?

Edited November 5, 2021 by JohnnyCache

JohnnyCache · November 5, 2021

Last question; should get me going -
Is there a guide available on how to convert a cache disk from btrfs to XFS?

trurl · November 6, 2021

https://wiki.unraid.net/Manual/Storage_Management#Reformatting_a_cache_drive

JohnnyCache · November 6, 2021

32 minutes ago, trurl said:

https://wiki.unraid.net/Manual/Storage_Management#Reformatting_a_cache_drive

Yessss thank you for sharing this. I've guessed some of the steps but only as far as stopping the array and backing up my cache drive. I'll work through these steps tomorrow.

Thank you everyone for your help!

I'm still curious off btrfs relies on enterprise hardware or what could have been done to prevent this corruption to begin with. Can anyone share those details?

JorgeB · November 6, 2021

11 hours ago, JohnnyCache said:

Is there anything specific I should be looking for?

Could be a RAM issue, without ECC RAM you can get a bit flipped anytime, and one in the wrong place can really corrupt a btrfs filesystem.

JohnnyCache · November 6, 2021

7 hours ago, JorgeB said:

Could be a RAM issue, without ECC RAM you can get a bit flipped anytime, and one in the wrong place can really corrupt a btrfs filesystem.

Thanks for the tip, that makes a ton of sense. It could also explain why several others had the same issue, and why they never had it again after moving to xfs.

eth4ck1e · December 18, 2021

Hello,

I am having a similar issue I was hoping someone could help me track down. The logs show an error and then a “tree first key mismatch” and the cache drives are placed in read only mode. I have just installed 2 brand new 1tb ssds.

Dec 18 03:11:48 Tower kernel: ------------[ cut here ]------------
Dec 18 03:11:48 Tower kernel: BTRFS: Transaction aborted (error -117)
Dec 18 03:11:48 Tower kernel: WARNING: CPU: 2 PID: 18715 at fs/btrfs/inode.c:2730 btrfs_finish_ordered_io+0x38b/0x623
Dec 18 03:11:48 Tower kernel: Modules linked in: ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart corefreqk(O) ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd kvm_amd ccp kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel igb aesni_intel mxm_wmi wmi_bmof crypto_simd cryptd i2c_piix4 i2c_algo_bit input_leds i2c_core led_class k10temp fam15h_power glue_helper ahci wmi libahci button
Dec 18 03:11:48 Tower kernel: CPU: 2 PID: 18715 Comm: kworker/u16:5 Tainted: P           O      5.10.28-Unraid #1
Dec 18 03:11:48 Tower kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R3.0, BIOS 0212 07/18/2016
Dec 18 03:11:48 Tower kernel: Workqueue: btrfs-endio-write btrfs_work_helper
Dec 18 03:11:48 Tower kernel: RIP: 0010:btrfs_finish_ordered_io+0x38b/0x623
Dec 18 03:11:48 Tower kernel: Code: 8d b0 40 0a 00 00 e8 46 9b ff ff 84 c0 75 1d 41 83 fc fb 74 17 41 83 fc e2 74 11 44 89 e6 48 c7 c7 fc f1 d8 81 e8 d0 96 47 00 <0f> 0b 44 89 e1 ba aa 0a 00 00 e9 80 00 00 00 48 8d 45 58 48 89 44
Dec 18 03:11:48 Tower kernel: RSP: 0018:ffffc90001f87d90 EFLAGS: 00010286
Dec 18 03:11:48 Tower kernel: RAX: 0000000000000000 RBX: 0000000000002000 RCX: 0000000000000027
Dec 18 03:11:48 Tower kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff88882ec98920
Dec 18 03:11:48 Tower kernel: RBP: ffff888128dbfb18 R08: 0000000000000000 R09: 00000000ffffefff
Dec 18 03:11:48 Tower kernel: R10: ffffc90001f87bc0 R11: ffffc90001f87bb8 R12: 00000000ffffff8b
Dec 18 03:11:48 Tower kernel: R13: ffff8881363b9af8 R14: ffff88821ff59410 R15: 0000000000000000
Dec 18 03:11:48 Tower kernel: FS:  0000000000000000(0000) GS:ffff88882ec80000(0000) knlGS:0000000000000000
Dec 18 03:11:48 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 18 03:11:48 Tower kernel: CR2: 000014ff0f488000 CR3: 00000001b7e74000 CR4: 00000000000406e0
Dec 18 03:11:48 Tower kernel: Call Trace:
Dec 18 03:11:48 Tower kernel: btrfs_work_helper+0xe4/0x1e1
Dec 18 03:11:48 Tower kernel: process_one_work+0x13c/0x1d5
Dec 18 03:11:48 Tower kernel: worker_thread+0x18b/0x22f
Dec 18 03:11:48 Tower kernel: ? process_scheduled_works+0x27/0x27
Dec 18 03:11:48 Tower kernel: kthread+0xe5/0xea
Dec 18 03:11:48 Tower kernel: ? __kthread_bind_mask+0x57/0x57
Dec 18 03:11:48 Tower kernel: ret_from_fork+0x22/0x30
Dec 18 03:11:48 Tower kernel: ---[ end trace 11f1661a615b5b6d ]---

trurl · December 18, 2021

attach diagnostics to your NEXT post in this thread.

JohnnyCache · December 20, 2021

There may be specific solutions for different causes of this issue, but I'd like to share that I've had no issues since I followed the instructions to reformat my cache to xfs

BTRFS cache is suddenly read-only

Recommended Posts

JohnnyCache

Link to comment

JorgeB

Link to comment

JohnnyCache

Link to comment

JorgeB

Link to comment

JonathanM

Link to comment

JohnnyCache

Link to comment

JohnnyCache

Link to comment

trurl

Link to comment

JohnnyCache

Link to comment

JorgeB

Link to comment

JohnnyCache

Link to comment

eth4ck1e

Link to comment

trurl

Link to comment

JohnnyCache

Link to comment

Join the conversation