Bttrfs Issues

Followers

August 2, 20232 yr

Hi,

I have been experiencing unexplained shutdowns with increasing frequcncy recently. Right now, I get about ~2 days of time before I have to do a hard reset. I thought this was an issue with my macvlan setup (will be setting up a separate post about that) and I have been trying to troubleshoot, however its difficult because once it freezes I cant pull diagnostics. (even via local)

However yesterday, common problems alerted me that docker could not write to cache. This has happened once before but a restart fixed the issue. It also said that it could not write to cache (separate of docker) which is new. When I looked at thy syslog, I see many of the errors show below. Does this mean one of my cache drives is failing (a 6 month old Samsung 980Pro M.2 SSD)? I have also included a copy of my diagnostics that I just pulled.

It seems odd to me that the drive would fail so soon. Also, it is part of a pool of two (two 1 TB drives) and on my dashboard it shows up as a 1 TB pool. I am assuming that this means the drives are configured in RAID 1 (or the bttrFS equivalent) and as such if I lose one of those drives I will not lose any data. Am I correct in this assumption? I tried to backup the important items on the cache drive, however my download speed was in the kb/s range for some reason. For the whole system, not just the cache drive.

Any help would be appreciated.

BttrFS.txt nancenas-diagnostics-20230802-1322.zip

Quote

August 2, 20232 yr

Community Expert

14 minutes ago, rnance said:

I see many of the errors show below

Those are checksum errors, not device errors, and they are the result of one of the devices (nvme0n1) dropping offline in the past:

Aug  2 10:50:24 NanceNAS kernel: BTRFS info (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1470168, rd 4095, flush 123185, corrupt 206206, gen 0

Run a correcting scrub and check that all errors were corrected, then see here for better pool monitoring in case it happens again.

As for the crashing issues enable the syslog server and post that after a crash.

Quote

August 2, 20232 yr

Author

38 minutes ago, JorgeB said:
Those are checksum errors, not device errors, and they are the result of one of the devices (nvme0n1) dropping offline in the past:
Aug  2 10:50:24 NanceNAS kernel: BTRFS info (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1470168, rd 4095, flush 123185, corrupt 206206, gen 0
Run a correcting scrub and check that all errors were corrected, then see here for better pool monitoring in case it happens again.

As for the crashing issues enable the syslog server and post that after a crash.

Ok I'll give that a shot. Would this be an explanation for the inability to write to the cache?

Quote

August 3, 20232 yr

Community Expert

It could, if it went read only.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Bttrfs Issues

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)