BTRFS cache pool went read only then unmountable


trashman

Recommended Posts

My cache pool went read only after having some docker issues that lead me to reboot. This happened to me previously and I solved it by moving all my shares to the array, formatting the drives, then moving them back. Now, before I could move my appdata share to the array, I started getting this message in the Main tab:

 

Unmountable disk present:
Cache • Samsung_SSD_970_EVO_1TB_S5H9NS0NB61321J (nvme0n1)
Cache 2 • Samsung_SSD_970_EVO_Plus_500GB_S4P2NG0M218576P (nvme1n1)

 

And the following message when I try to mount manually from the console:

 

$ mount /dev/nvme0n1p1 /mnt/tmp
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error.

 

This is the second time the cache pool has randomly crapped out me on going read only so I’d love to hear any steps I should take to avoid having that occur again.

 

My diagnostics are attached.

 

Thank you for any help!

trashnet-diagnostics-20210226-1311.zip

Edited by trashman
Fix mount command
Link to comment
Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 12722, gen 0
Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 17619, gen 0

 

Both devices are detecting a lot of data corruption, start by running memtest.

Link to comment
17 minutes ago, JorgeB said:



Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 12722, gen 0
Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 17619, gen 0

 

Both devices are detecting a lot of data corruption, start by running memtest.

 

Running it right now and it is finding what looks like a lot errors (see attached).

E7151B44-DC82-4DC4-98BE-10A295546E74.jpeg

Link to comment
14 minutes ago, trashman said:

So to clarify, corruptions with my RAM are causing the issues with my NVME drives?

 

Almost certainly! 

 

RAM issues are unpredictable as to the symptoms they cause but file system corruption is not an uncommon one.

 

The only acceptable number of errors when running memtest is 0 :( 

Link to comment
1 minute ago, JorgeB said:

And btrfs is much more sensitive to RAM errors than for example xfs, so much more likely for problems to be detected there first, and a couple of bit flips in the wrong place can destroy a btrfs filesystem.

Thanks guys!

 

So two followup questions:

 

1. Should I start using XFS for my cache pool, or is BTRFS fine if your RAM isn’t shitting the bed?

2. Should I expect the drives to start functioning again after I identify and remove/replace the faulty DIMM? If not is there anyway to recover the appdata folder before reformatting?

Link to comment
Just now, trashman said:

 

1. Should I start using XFS for my cache pool, or is BTRFS fine if your RAM isn’t shitting the bed?

It's up to you, btrfs works great with good hardware, and unlike xfs it will easily detect data corruption, but if you don't need any of its features and plan to use a single device you can switch to xfs.

 

3 minutes ago, trashman said:

2. Should I expect the drives to start functioning again after I identify and remove/replace the faulty DIMM? If not is there anyway to recover the appdata folder before reformatting?

Data will remain corrupt, and possibly the filesystem, might be better to recreate.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.