trashman Posted February 26, 2021 Share Posted February 26, 2021 (edited) My cache pool went read only after having some docker issues that lead me to reboot. This happened to me previously and I solved it by moving all my shares to the array, formatting the drives, then moving them back. Now, before I could move my appdata share to the array, I started getting this message in the Main tab: Unmountable disk present: Cache • Samsung_SSD_970_EVO_1TB_S5H9NS0NB61321J (nvme0n1) Cache 2 • Samsung_SSD_970_EVO_Plus_500GB_S4P2NG0M218576P (nvme1n1) And the following message when I try to mount manually from the console: $ mount /dev/nvme0n1p1 /mnt/tmp mount: /mnt/tmp: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error. This is the second time the cache pool has randomly crapped out me on going read only so I’d love to hear any steps I should take to avoid having that occur again. My diagnostics are attached. Thank you for any help! trashnet-diagnostics-20210226-1311.zip Edited February 26, 2021 by trashman Fix mount command Quote Link to comment
JorgeB Posted February 26, 2021 Share Posted February 26, 2021 Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 12722, gen 0 Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 17619, gen 0 Both devices are detecting a lot of data corruption, start by running memtest. Quote Link to comment
trashman Posted February 26, 2021 Author Share Posted February 26, 2021 17 minutes ago, JorgeB said: Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 12722, gen 0 Feb 26 09:58:52 Trashnet kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 17619, gen 0 Both devices are detecting a lot of data corruption, start by running memtest. Running it right now and it is finding what looks like a lot errors (see attached). Quote Link to comment
JorgeB Posted February 26, 2021 Share Posted February 26, 2021 5 minutes ago, trashman said: it is finding what looks like a lot errors Yep, try with just one dimm to find the problem one. Quote Link to comment
trashman Posted February 26, 2021 Author Share Posted February 26, 2021 1 minute ago, JorgeB said: Yep, try with just one dimm to find the problem one. So to clarify, corruptions with my RAM are causing the issues with my NVME drives? Quote Link to comment
itimpi Posted February 26, 2021 Share Posted February 26, 2021 14 minutes ago, trashman said: So to clarify, corruptions with my RAM are causing the issues with my NVME drives? Almost certainly! RAM issues are unpredictable as to the symptoms they cause but file system corruption is not an uncommon one. The only acceptable number of errors when running memtest is 0 Quote Link to comment
JorgeB Posted February 26, 2021 Share Posted February 26, 2021 And btrfs is much more sensitive to RAM errors than for example xfs, so much more likely for problems to be detected there first, and a couple of bit flips in the wrong place can destroy a btrfs filesystem. Quote Link to comment
trashman Posted February 26, 2021 Author Share Posted February 26, 2021 1 minute ago, JorgeB said: And btrfs is much more sensitive to RAM errors than for example xfs, so much more likely for problems to be detected there first, and a couple of bit flips in the wrong place can destroy a btrfs filesystem. Thanks guys! So two followup questions: 1. Should I start using XFS for my cache pool, or is BTRFS fine if your RAM isn’t shitting the bed? 2. Should I expect the drives to start functioning again after I identify and remove/replace the faulty DIMM? If not is there anyway to recover the appdata folder before reformatting? Quote Link to comment
JorgeB Posted February 26, 2021 Share Posted February 26, 2021 Just now, trashman said: 1. Should I start using XFS for my cache pool, or is BTRFS fine if your RAM isn’t shitting the bed? It's up to you, btrfs works great with good hardware, and unlike xfs it will easily detect data corruption, but if you don't need any of its features and plan to use a single device you can switch to xfs. 3 minutes ago, trashman said: 2. Should I expect the drives to start functioning again after I identify and remove/replace the faulty DIMM? If not is there anyway to recover the appdata folder before reformatting? Data will remain corrupt, and possibly the filesystem, might be better to recreate. Quote Link to comment
trurl Posted February 26, 2021 Share Posted February 26, 2021 21 minutes ago, itimpi said: The only acceptable number of errors when running memtest is 0 If memory isn't perfect nothing else can be trusted. Everything goes through RAM, your data, applications, the OS, everything. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.