August 2, 20232 yr Hi, I have been experiencing unexplained shutdowns with increasing frequcncy recently. Right now, I get about ~2 days of time before I have to do a hard reset. I thought this was an issue with my macvlan setup (will be setting up a separate post about that) and I have been trying to troubleshoot, however its difficult because once it freezes I cant pull diagnostics. (even via local) However yesterday, common problems alerted me that docker could not write to cache. This has happened once before but a restart fixed the issue. It also said that it could not write to cache (separate of docker) which is new. When I looked at thy syslog, I see many of the errors show below. Does this mean one of my cache drives is failing (a 6 month old Samsung 980Pro M.2 SSD)? I have also included a copy of my diagnostics that I just pulled. It seems odd to me that the drive would fail so soon. Also, it is part of a pool of two (two 1 TB drives) and on my dashboard it shows up as a 1 TB pool. I am assuming that this means the drives are configured in RAID 1 (or the bttrFS equivalent) and as such if I lose one of those drives I will not lose any data. Am I correct in this assumption? I tried to backup the important items on the cache drive, however my download speed was in the kb/s range for some reason. For the whole system, not just the cache drive. Any help would be appreciated. BttrFS.txt nancenas-diagnostics-20230802-1322.zip
August 2, 20232 yr Community Expert 14 minutes ago, rnance said: I see many of the errors show below Those are checksum errors, not device errors, and they are the result of one of the devices (nvme0n1) dropping offline in the past: Aug 2 10:50:24 NanceNAS kernel: BTRFS info (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1470168, rd 4095, flush 123185, corrupt 206206, gen 0 Run a correcting scrub and check that all errors were corrected, then see here for better pool monitoring in case it happens again. As for the crashing issues enable the syslog server and post that after a crash.
August 2, 20232 yr Author 38 minutes ago, JorgeB said: Those are checksum errors, not device errors, and they are the result of one of the devices (nvme0n1) dropping offline in the past: Aug 2 10:50:24 NanceNAS kernel: BTRFS info (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1470168, rd 4095, flush 123185, corrupt 206206, gen 0 Run a correcting scrub and check that all errors were corrected, then see here for better pool monitoring in case it happens again. As for the crashing issues enable the syslog server and post that after a crash. Ok I'll give that a shot. Would this be an explanation for the inability to write to the cache?
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.