Cache: read only file system

Followers

November 24, 20232 yr

I replaced my cache drive about a week ago and also added a second drive as a raid1 pool.

Since then, approximately every 2 days the dockers and VMs lock up and trying to write to the cache drive returns a message that the file system is read only.

I've tried running a balance and a scrub and the scrub returns no errors, yet the problem keeps recurring.

The only way to bring it back to life is to reboot, but it soon happens again.

What have I missed? Or could the new SSDs just be faulty?

Diagnostics attached.

lisa-diagnostics-20231124-0959.zip

Quote

November 24, 20232 yr

Author

A second diagnostics download immediately after rebooting if that's of any use:

lisa-diagnostics-20231124-1015.zip

Quote

November 24, 20232 yr

Community Expert

write time tree block corruption detected

This error in the first diags suggests a RAM issue, also btrfs has been detecting considerable data corruption

Nov 24 10:04:31 LISA kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 973, gen 0

Start by running memtest.

Quote

November 24, 20232 yr

Author

Thanks, I'll run a memtest now.

I saw those messages about nvme1 which is what made me suspect a bad ssd? I also see messages about multiple uncorrected fatal error received, frozen state error detected, and device recovery successful.

Quote

November 24, 20232 yr

Community Expert

21 minutes ago, geekypenguin said:

I saw those messages about nvme1 which is what made me suspect a bad ssd?

For now it's only a filesystem issue, not a device problem.

Quote

November 24, 20232 yr

Author

First two passes of memtest have returned zero errors. I'll keep it running a bit longer to be sure none materialise

Quote

November 24, 20232 yr

Community Expert

If no errors are found I would reset the stats and try with just one stick of RAM, if more errors come up try the other one, that will basically rule out a RAM issue, see here for how to reset the stats and monitor the pool.

Quote

November 24, 20232 yr

Author

Thanks for you help.

Removed one ram stick and reset the stats and configured the userscript as suggested.

I'll let you know how it gets on

Quote

November 27, 20232 yr

Author

Sorry it's taken a few days to respond, there's been a lot to work through and while I still don't know the cause, I reached a point where I had to stop and revert to known good.

Firstly I had the macvlan kernel issue that's known in 6.12 which frustrated things.

With both sticks of ram on their own, I was getting data corruption errors, always in the same disk. I was also getting corruption of my docker.img which was causing docker's to crash without the cache going read only.

As the nvme drives were new, I got a warranty replacement on the nvme drive with all the errors and attempted to rebuild the cache pool onto the second drive, but was flooded with "nvme frozen state error detected, reset controller" etc messages for the replacement drive. I read in a few bug reports to add ```nvme_core.default_ps_max_latency_us=0 pcie_aspm=off``` to the boot config but this didn't help either.

This is unfortunately where I had to stop. I've removed the second cache drive and reverted to single drive mode for my cache which has been working fine for a few days now with all the ram re-installed.

Not sure where else to go from this to be honest. I can stay like this with no redundancy on my cache I suppose but would like to get to the bottom of it.

Quote

November 27, 20232 yr

Community Expert

24 minutes ago, geekypenguin said:

"nvme frozen state error detected, reset controller"

These are hardware/firmware related errors, easiest would be to try a different NVMe device, or a different board.

Quote

November 27, 20232 yr

Author

Would trying a bios update be worthwhile before I go spending money?

Edited November 28, 20232 yr by geekypenguin

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Cache: read only file system

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)