August 11, 20232 yr Hi I'm getting a lot of errors in my nvme cache pool. Is one of my drives failing? The errors seem isolated to just one drive. I saw this could be a ram issue elsewhere in forums but I would've thought if it was ram I would be seeing errors from both drives. The offending drive is nvme1 the Seagate Firecuda tower-diagnostics-20230811-0105.zip
August 11, 20232 yr Community Expert Diags show that one of the NVMe devices dropped offline in the past: Aug 10 19:43:05 Tower kernel: BTRFS info (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 69135141, rd 10897984, flush 41279, corrupt 67436803, gen 96154 Run a correcting scrub and post the results, also see here for better pool monitoring.
August 11, 20232 yr Author 2 minutes ago, JorgeB said: Diags show that one of the NVMe devices dropped offline in the past: Aug 10 19:43:05 Tower kernel: BTRFS info (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 69135141, rd 10897984, flush 41279, corrupt 67436803, gen 96154 Run a correcting scrub and post the results, also see here for better pool monitoring. I've run a scrub a few times now and the issue keeps popping up, I'm currently running memtest on the server so I can't run a new scrub at the moment.
August 11, 20232 yr Author I've noticed this in the drive's smart report. Is this not an issue? Error Information (NVMe Log 0x01, 16 of 63 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 2279 0 0x500a 0x4004 0x028 0 0 -
August 11, 20232 yr Author UUID: 0cdbaac7-0611-4b09-844d-2f02f46768a0 Scrub started: Fri Aug 11 10:40:31 2023 Status: finished Duration: 0:08:08 Total to scrub: 1.51TiB Rate: 3.17GiB/s Error summary: no errors found Here's a recent scrub. I'm not surprised it hasn't shown any errors I ran a scrub and repair before I shut down for the memtest (which passed atleast after 1 run) I'll keep an eye on the server today. I've installed the script you recommended and I'll update the post if anything happens.
August 11, 20232 yr Community Expert Those errors in the log are not RAM related, the device is just out of sync, if it drops again it will become out of sync again, see the link I posted above, you will be notified if it drops.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.