JohnnyCache Posted November 5, 2021 Share Posted November 5, 2021 (edited) Good Morning, I saw some docker issues this AM and probed around to find that my cache drive is suddenly a read-only file system. How did this happen and how can I fix it? I've searched around a bit to try to repair this but have not had any luck. I know next to nothing about BTRFS but am decent with Linux in general. the only thing that stood out to me was when running btrfs fi df /mnt/cache The numbers on Data make it look like the drive is nearly full. In other posts where users were faced with the "my BTRFS drive is full but not reporting that in the UI" issue, the size/used issue was identified in the btrfs fi show /mnt/cache command. Does that make any difference? Am I seeing the same issue? In the threads I've found, removing some data and rebalancing was the fix. But I can't rebalance due to the "read-only file system" issue. I can't even create an empty file on my cache disk - root@jonas:~# touch /mnt/cache/Logs/testwrite.log touch: cannot touch '/mnt/cache/Logs/testwrite.log': Read-only file system btrfs fi outputs: root@jonas:~# btrfs fi show /mnt/cache Label: none uuid: 06942e55-3e85-4a3d-a70a-f5d321bea2a3 Total devices 1 FS bytes used 217.21GiB devid 1 size 465.76GiB used 224.02GiB path /dev/sde1 root@jonas:~# btrfs fi df /mnt/cache Data, single: total=222.01GiB, used=216.62GiB System, single: total=4.00MiB, used=48.00KiB Metadata, single: total=2.01GiB, used=606.70MiB GlobalReserve, single: total=341.06MiB, used=0.00B root@jonas:~# btrfs balance start -dusage=5 /mnt/cache ERROR: error during balancing '/mnt/cache': Read-only file system There may be more info in syslog - try dmesg | tail Thanks for you help!! Edited November 6, 2021 by JohnnyCache Quote Link to comment
JorgeB Posted November 5, 2021 Share Posted November 5, 2021 Cache filesystem is corrupt, best bet is to backup and re-format, there some recovery options here if needed. Quote Link to comment
JohnnyCache Posted November 5, 2021 Author Share Posted November 5, 2021 45 minutes ago, JorgeB said: Cache filesystem is corrupt, best bet is to backup and re-format, there some recovery options here if needed. thanks for the quick reply. Dang... Do you mean just backup and re-format the cache drive only? Would the best process for this be to: 1. image the drive just in case 2. set cache=no on all shares 3. run the mover to ensure all data is moved to array 4. run appdatabackup 5. format the cache 6. restore appdata 7. restore cache setting on file shares Is there anything I need to do to to prevent this in the future? I haven't had any power failures/unclean shutdowns. Would it be worthwhile to switch to XFS as others have done? Quote Link to comment
JorgeB Posted November 5, 2021 Share Posted November 5, 2021 23 minutes ago, JohnnyCache said: Do you mean just backup and re-format the cache drive only? Yes. 23 minutes ago, JohnnyCache said: Would it be worthwhile to switch to XFS as others have done? Probably best, since you're running single device pool and if you don't need any of the btrfs features, though in my experience most btrfs issues are caused by hardware, still xfs is usually more tolerant if there are any issues. Quote Link to comment
JonathanM Posted November 5, 2021 Share Posted November 5, 2021 1 hour ago, JohnnyCache said: 2. set cache=no on all shares 3. run the mover to ensure all data is moved to array cache no disables mover. cache yes is what you want, turn on the help beside the setting for a more thorough explanation. Quote Link to comment
JohnnyCache Posted November 5, 2021 Author Share Posted November 5, 2021 (edited) 1 hour ago, JorgeB said: [...] in my experience most btrfs issues are caused by hardware, still xfs is usually more tolerant if there are any issues. Is there anything specific I should be looking for? The SSD is relatively new and does not get a ton of use. Or is it more that btrfs is expecting better ECC on the drive itself (thinking an enterprise SSD vs the off-the-shelf EVO I have)? Edited November 5, 2021 by JohnnyCache Quote Link to comment
JohnnyCache Posted November 5, 2021 Author Share Posted November 5, 2021 Last question; should get me going - Is there a guide available on how to convert a cache disk from btrfs to XFS? Quote Link to comment
trurl Posted November 6, 2021 Share Posted November 6, 2021 https://wiki.unraid.net/Manual/Storage_Management#Reformatting_a_cache_drive Quote Link to comment
JohnnyCache Posted November 6, 2021 Author Share Posted November 6, 2021 32 minutes ago, trurl said: https://wiki.unraid.net/Manual/Storage_Management#Reformatting_a_cache_drive Yessss thank you for sharing this. I've guessed some of the steps but only as far as stopping the array and backing up my cache drive. I'll work through these steps tomorrow. Thank you everyone for your help! I'm still curious off btrfs relies on enterprise hardware or what could have been done to prevent this corruption to begin with. Can anyone share those details? Quote Link to comment
JorgeB Posted November 6, 2021 Share Posted November 6, 2021 11 hours ago, JohnnyCache said: Is there anything specific I should be looking for? Could be a RAM issue, without ECC RAM you can get a bit flipped anytime, and one in the wrong place can really corrupt a btrfs filesystem. Quote Link to comment
JohnnyCache Posted November 6, 2021 Author Share Posted November 6, 2021 7 hours ago, JorgeB said: Could be a RAM issue, without ECC RAM you can get a bit flipped anytime, and one in the wrong place can really corrupt a btrfs filesystem. Thanks for the tip, that makes a ton of sense. It could also explain why several others had the same issue, and why they never had it again after moving to xfs. Quote Link to comment
eth4ck1e Posted December 18, 2021 Share Posted December 18, 2021 Hello, I am having a similar issue I was hoping someone could help me track down. The logs show an error and then a “tree first key mismatch” and the cache drives are placed in read only mode. I have just installed 2 brand new 1tb ssds. Dec 18 03:11:48 Tower kernel: ------------[ cut here ]------------ Dec 18 03:11:48 Tower kernel: BTRFS: Transaction aborted (error -117) Dec 18 03:11:48 Tower kernel: WARNING: CPU: 2 PID: 18715 at fs/btrfs/inode.c:2730 btrfs_finish_ordered_io+0x38b/0x623 Dec 18 03:11:48 Tower kernel: Modules linked in: ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart corefreqk(O) ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd kvm_amd ccp kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel igb aesni_intel mxm_wmi wmi_bmof crypto_simd cryptd i2c_piix4 i2c_algo_bit input_leds i2c_core led_class k10temp fam15h_power glue_helper ahci wmi libahci button Dec 18 03:11:48 Tower kernel: CPU: 2 PID: 18715 Comm: kworker/u16:5 Tainted: P O 5.10.28-Unraid #1 Dec 18 03:11:48 Tower kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R3.0, BIOS 0212 07/18/2016 Dec 18 03:11:48 Tower kernel: Workqueue: btrfs-endio-write btrfs_work_helper Dec 18 03:11:48 Tower kernel: RIP: 0010:btrfs_finish_ordered_io+0x38b/0x623 Dec 18 03:11:48 Tower kernel: Code: 8d b0 40 0a 00 00 e8 46 9b ff ff 84 c0 75 1d 41 83 fc fb 74 17 41 83 fc e2 74 11 44 89 e6 48 c7 c7 fc f1 d8 81 e8 d0 96 47 00 <0f> 0b 44 89 e1 ba aa 0a 00 00 e9 80 00 00 00 48 8d 45 58 48 89 44 Dec 18 03:11:48 Tower kernel: RSP: 0018:ffffc90001f87d90 EFLAGS: 00010286 Dec 18 03:11:48 Tower kernel: RAX: 0000000000000000 RBX: 0000000000002000 RCX: 0000000000000027 Dec 18 03:11:48 Tower kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff88882ec98920 Dec 18 03:11:48 Tower kernel: RBP: ffff888128dbfb18 R08: 0000000000000000 R09: 00000000ffffefff Dec 18 03:11:48 Tower kernel: R10: ffffc90001f87bc0 R11: ffffc90001f87bb8 R12: 00000000ffffff8b Dec 18 03:11:48 Tower kernel: R13: ffff8881363b9af8 R14: ffff88821ff59410 R15: 0000000000000000 Dec 18 03:11:48 Tower kernel: FS: 0000000000000000(0000) GS:ffff88882ec80000(0000) knlGS:0000000000000000 Dec 18 03:11:48 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 18 03:11:48 Tower kernel: CR2: 000014ff0f488000 CR3: 00000001b7e74000 CR4: 00000000000406e0 Dec 18 03:11:48 Tower kernel: Call Trace: Dec 18 03:11:48 Tower kernel: btrfs_work_helper+0xe4/0x1e1 Dec 18 03:11:48 Tower kernel: process_one_work+0x13c/0x1d5 Dec 18 03:11:48 Tower kernel: worker_thread+0x18b/0x22f Dec 18 03:11:48 Tower kernel: ? process_scheduled_works+0x27/0x27 Dec 18 03:11:48 Tower kernel: kthread+0xe5/0xea Dec 18 03:11:48 Tower kernel: ? __kthread_bind_mask+0x57/0x57 Dec 18 03:11:48 Tower kernel: ret_from_fork+0x22/0x30 Dec 18 03:11:48 Tower kernel: ---[ end trace 11f1661a615b5b6d ]--- Quote Link to comment
trurl Posted December 18, 2021 Share Posted December 18, 2021 attach diagnostics to your NEXT post in this thread. Quote Link to comment
JohnnyCache Posted December 20, 2021 Author Share Posted December 20, 2021 There may be specific solutions for different causes of this issue, but I'd like to share that I've had no issues since I followed the instructions to reformat my cache to xfs Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.