May 10, 20242 yr Hello, I am getting lots of btrfs errors from one of my (two) SSD cache disks. Example messages below. I have also attached diagnostics.... I have tried running memtest for about an hour without any errors occurring.....I would appreciate any help with confirming if it is likely the disk itself gone bad, or something else. May 10 09:57:53 lily kernel: BTRFS info (device sde1): read error corrected: ino 0 off 582228443136 (dev /dev/sdd1 sector 632576) May 10 09:57:53 lily kernel: BTRFS info (device sde1): read error corrected: ino 0 off 582228447232 (dev /dev/sdd1 sector 632584) May 10 09:57:53 lily kernel: BTRFS info (device sde1): read error corrected: ino 0 off 582228451328 (dev /dev/sdd1 sector 632592) May 10 09:57:53 lily kernel: BTRFS info (device sde1): read error corrected: ino 0 off 582228455424 (dev /dev/sdd1 sector 632600) May 10 10:03:17 lily kernel: BTRFS error (device sde1): bdev /dev/sdd1 errs: wr 176644, rd 111, flush 20656, corrupt 272231, gen 0 May 10 10:03:17 lily kernel: BTRFS warning (device sde1): csum failed root 5 ino 263 off 17238224896 csum 0x8941f998 expected csum 0xc5cc3b53 mirror 1 May 10 10:03:17 lily kernel: BTRFS error (device sde1): bdev /dev/sdd1 errs: wr 176644, rd 111, flush 20656, corrupt 272232, gen 0 May 10 10:03:17 lily kernel: BTRFS warning (device sde1): csum failed root 5 ino 263 off 17238228992 csum 0x8941f998 expected csum 0x269d1fed mirror 1 May 10 10:03:17 lily kernel: BTRFS error (device sde1): bdev /dev/sdd1 errs: wr 176644, rd 111, flush 20656, corrupt 272233, gen 0 May 10 10:03:17 lily kernel: BTRFS warning (device sde1): csum failed root 5 ino 263 off 17238233088 csum 0x8941f998 expected csum 0x4a44f6b8 mirror 1 May 10 10:03:17 lily kernel: BTRFS error (device sde1): bdev /dev/sdd1 errs: wr 176644, rd 111, flush 20656, corrupt 272234, gen 0 lily-diagnostics-20240510-0959.zip
May 10, 20242 yr Community Expert May 10 09:57:11 lily kernel: BTRFS info (device sde1): bdev /dev/sdd1 errs: wr 176644, rd 111, flush 20656, corrupt 36360, gen 0 One of the devices dropped offline in the past, run a correcting scrub and make sure all errors are corrected. Also recommend seeing here for better pool monitoring.
May 10, 20242 yr Author Solution 48 minutes ago, JorgeB said: May 10 09:57:11 lily kernel: BTRFS info (device sde1): bdev /dev/sdd1 errs: wr 176644, rd 111, flush 20656, corrupt 36360, gen 0 One of the devices dropped offline in the past, run a correcting scrub and make sure all errors are corrected. Also recommend seeing here for better pool monitoring. @JorgeB , thanks so much for the feedback, and for the pointer to the pool monitoring FAQ, I will implement the monitoring suggestion there. I ran "/btrfs dev stats /mnt/cache" and I had one cache disk that is showing errors: [/dev/sdd1].write_io_errs 176644 [/dev/sdd1].read_io_errs 111 [/dev/sdd1].flush_io_errs 20656 [/dev/sdd1].corruption_errs 277249 [/dev/sdd1].generation_errs 0 I tried a different cable, zeroed the errors using "btrfs dev stats -z /mnt/cache", and new errors were still being generated. I tried moving the disk to a different sata slot (I have an open one), and was still getting errors from that same disk. So l installed a new cache disk, and btrfs is running now. The old cache disk is still in the machine, but it is not assigned to the pool, and I'm not getting any new errors.
May 11, 20242 yr Community Expert 15 hours ago, wmcneil said: and new errors were still being generated Read and write errors or only corruption errors? The latter would be normal until the mirror is synced.
May 11, 20242 yr Author 40 minutes ago, JorgeB said: Read and write errors or only corruption errors? The latter would be normal until the mirror is synced. It was only corruption errors, so its possible it was a bad cable. In the scenario where only a cable is replaced, what command or gui operation forces the mirror to sync? (Do you just start a scrub?)
May 11, 20242 yr Community Expert 9 minutes ago, wmcneil said: It was only corruption errors Those are normal until a scrub is done, to bring the device up to sync, the device should be OK.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.