BTRFS cache is suddenly read-only


Recommended Posts

Good Morning, 

 

I saw some docker issues this AM and probed around to find that my cache drive is suddenly a read-only file system. How did this happen and how can I fix it? I've searched around a bit to try to repair this but have not had any luck. 

 

I know next to nothing about BTRFS but am decent with Linux in general. the only thing that stood out to me was when running 

btrfs fi df /mnt/cache

The numbers on Data make it look like the drive is nearly full. In other posts where users were faced with the "my BTRFS drive is full but not reporting that in the UI" issue, the size/used issue was identified in the btrfs fi show /mnt/cache command. 

 

Does that make any difference? Am I seeing the same issue? In the threads I've found, removing some data and rebalancing was the fix. But I can't rebalance due to the "read-only file system" issue. I can't even create an empty file on my cache disk -

root@jonas:~# touch /mnt/cache/Logs/testwrite.log
touch: cannot touch '/mnt/cache/Logs/testwrite.log': Read-only file system

 

btrfs fi outputs: 

root@jonas:~# btrfs fi show /mnt/cache
Label: none  uuid: 06942e55-3e85-4a3d-a70a-f5d321bea2a3
        Total devices 1 FS bytes used 217.21GiB
        devid    1 size 465.76GiB used 224.02GiB path /dev/sde1


root@jonas:~# btrfs fi df /mnt/cache
Data, single: total=222.01GiB, used=216.62GiB
System, single: total=4.00MiB, used=48.00KiB
Metadata, single: total=2.01GiB, used=606.70MiB
GlobalReserve, single: total=341.06MiB, used=0.00B


root@jonas:~# btrfs balance start -dusage=5 /mnt/cache
ERROR: error during balancing '/mnt/cache': Read-only file system
There may be more info in syslog - try dmesg | tail

 

Thanks for you help!!

 

Edited by JohnnyCache
Link to comment
45 minutes ago, JorgeB said:

Cache filesystem is corrupt, best bet is to backup and re-format, there some recovery options here if needed.

 

 

thanks for the quick reply. 

 

Dang... Do you mean just backup and re-format the cache drive only? Would the best process for this be to: 

1. image the drive just in case

2. set cache=no on all shares

3. run the mover to ensure all data is moved to array

4. run appdatabackup 

5. format the cache

6. restore appdata

7. restore cache setting on file shares

 

 

Is there anything I need to do to to prevent this in the future? I haven't had any power failures/unclean shutdowns. 

Would it be worthwhile to switch to XFS as others have done? 

Link to comment
23 minutes ago, JohnnyCache said:

Do you mean just backup and re-format the cache drive only?

Yes.

 

23 minutes ago, JohnnyCache said:

Would it be worthwhile to switch to XFS as others have done? 

Probably best, since you're running single device pool and if you don't need any of the btrfs features, though in my experience most btrfs issues are caused by hardware, still xfs is usually more tolerant if there are any issues.

Link to comment
1 hour ago, JorgeB said:

[...]

in my experience most btrfs issues are caused by hardware, still xfs is usually more tolerant if there are any issues.

 

 

Is there anything specific I should be looking for? The SSD is relatively new and does not get a ton of use. 

Or is it more that btrfs is expecting better ECC on the drive itself (thinking an enterprise SSD vs the off-the-shelf EVO I have)?

Edited by JohnnyCache
Link to comment
32 minutes ago, trurl said:

 

 

Yessss thank you for sharing this. I've guessed some of the steps but only as far as stopping the array and backing up my cache drive. I'll work through these steps tomorrow. 

Thank you everyone for your help!

 

 

I'm still curious off btrfs relies on enterprise hardware or what could have been done to prevent this corruption to begin with. Can anyone share those details?

 

 

 

 

 

 

Link to comment
7 hours ago, JorgeB said:

Could be a RAM issue, without ECC RAM you can get a bit flipped anytime, and one in the wrong place can really corrupt a btrfs filesystem.

 

 

Thanks for the tip, that makes a ton of sense. It could also explain why several others had the same issue, and why they never had it again after moving to xfs. 

 

Link to comment
  • 1 month later...

Hello,

 

I am having a similar issue I was hoping someone could help me track down.  The logs show an error and then a “tree first key mismatch” and the cache drives are placed in read only mode.  I have just installed 2 brand new 1tb ssds.

 

Dec 18 03:11:48 Tower kernel: ------------[ cut here ]------------
Dec 18 03:11:48 Tower kernel: BTRFS: Transaction aborted (error -117)
Dec 18 03:11:48 Tower kernel: WARNING: CPU: 2 PID: 18715 at fs/btrfs/inode.c:2730 btrfs_finish_ordered_io+0x38b/0x623
Dec 18 03:11:48 Tower kernel: Modules linked in: ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart corefreqk(O) ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd kvm_amd ccp kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel igb aesni_intel mxm_wmi wmi_bmof crypto_simd cryptd i2c_piix4 i2c_algo_bit input_leds i2c_core led_class k10temp fam15h_power glue_helper ahci wmi libahci button
Dec 18 03:11:48 Tower kernel: CPU: 2 PID: 18715 Comm: kworker/u16:5 Tainted: P           O      5.10.28-Unraid #1
Dec 18 03:11:48 Tower kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R3.0, BIOS 0212 07/18/2016
Dec 18 03:11:48 Tower kernel: Workqueue: btrfs-endio-write btrfs_work_helper
Dec 18 03:11:48 Tower kernel: RIP: 0010:btrfs_finish_ordered_io+0x38b/0x623
Dec 18 03:11:48 Tower kernel: Code: 8d b0 40 0a 00 00 e8 46 9b ff ff 84 c0 75 1d 41 83 fc fb 74 17 41 83 fc e2 74 11 44 89 e6 48 c7 c7 fc f1 d8 81 e8 d0 96 47 00 <0f> 0b 44 89 e1 ba aa 0a 00 00 e9 80 00 00 00 48 8d 45 58 48 89 44
Dec 18 03:11:48 Tower kernel: RSP: 0018:ffffc90001f87d90 EFLAGS: 00010286
Dec 18 03:11:48 Tower kernel: RAX: 0000000000000000 RBX: 0000000000002000 RCX: 0000000000000027
Dec 18 03:11:48 Tower kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff88882ec98920
Dec 18 03:11:48 Tower kernel: RBP: ffff888128dbfb18 R08: 0000000000000000 R09: 00000000ffffefff
Dec 18 03:11:48 Tower kernel: R10: ffffc90001f87bc0 R11: ffffc90001f87bb8 R12: 00000000ffffff8b
Dec 18 03:11:48 Tower kernel: R13: ffff8881363b9af8 R14: ffff88821ff59410 R15: 0000000000000000
Dec 18 03:11:48 Tower kernel: FS:  0000000000000000(0000) GS:ffff88882ec80000(0000) knlGS:0000000000000000
Dec 18 03:11:48 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 18 03:11:48 Tower kernel: CR2: 000014ff0f488000 CR3: 00000001b7e74000 CR4: 00000000000406e0
Dec 18 03:11:48 Tower kernel: Call Trace:
Dec 18 03:11:48 Tower kernel: btrfs_work_helper+0xe4/0x1e1
Dec 18 03:11:48 Tower kernel: process_one_work+0x13c/0x1d5
Dec 18 03:11:48 Tower kernel: worker_thread+0x18b/0x22f
Dec 18 03:11:48 Tower kernel: ? process_scheduled_works+0x27/0x27
Dec 18 03:11:48 Tower kernel: kthread+0xe5/0xea
Dec 18 03:11:48 Tower kernel: ? __kthread_bind_mask+0x57/0x57
Dec 18 03:11:48 Tower kernel: ret_from_fork+0x22/0x30
Dec 18 03:11:48 Tower kernel: ---[ end trace 11f1661a615b5b6d ]---

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.