mlapaglia Posted February 24, 2020 Share Posted February 24, 2020 After moving to my new server, Aorus x570 Pro with a 3900X, my cache is throwing lots of BTRFS errors. I reformatted the cache drive and restored by appdata but am still getting this issue. The cache drives are two nvme drives attached to the motherboard. I've tried reseating them. tower-diagnostics-20200224-0156.zip Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 There are problems writing to one of the cache devices, cache2, this is a hardware issue: Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme1n1p1 Quote Link to comment
mlapaglia Posted February 24, 2020 Author Share Posted February 24, 2020 Thanks @johnnie.black, i swapped the drives on the motherboard, it looks like the problem is with the nvme drive since I am now seeing errors about nvme1n1p1? tower-diagnostics-20200224-0816.zip Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 The errors you're sing now is because both devices are online and one of them has old data that is being corrected as it's being read, run a scrub and check that there are no uncorrectable errors, also see here for better pool monitoring. Quote Link to comment
mlapaglia Posted February 24, 2020 Author Share Posted February 24, 2020 (edited) Ok I booted up with the `624` drive inserted by itself and immediately got an IO error and the drive being put into read only mode. I put `552` in by itself and it passes the scrub check with no errors. The btrfs command gives this though: root@Tower:~# btrfs dev stats /mnt/cache [/dev/nvme0n1p1].write_io_errs 6455 [/dev/nvme0n1p1].read_io_errs 5533 [/dev/nvme0n1p1].flush_io_errs 22 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 It was like this right after boot up, and the value hasn't changed in 15 minutes. VMs and docker are running off of this drive, they all appear to be functioning. I looks like it is re-balancing now since the other drive was removed. Should I try and format the `624` drive and put it back into the array? Is anything like an "IO Error" indicative of a hardware error? Edited February 24, 2020 by mlapaglia Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 The errors are for the life of the filesystem, see the link above how to reset them. Quote Link to comment
mlapaglia Posted February 25, 2020 Author Share Posted February 25, 2020 So this might have something to do with the board and RAID1. On the suspect drive I formatted it as btrfs separately from the cache and used it as an unassigned device. I copied the entire appdata folder over to it for testing. It copied without any io errors. Since there were no issues here I put the drive back into the cache array. It formatted and set up the RAID1 without any errors. After a restart though, it started throwing IO errors again until I removed the 2nd drive. Quote Link to comment
JorgeB Posted February 25, 2020 Share Posted February 25, 2020 Look for a BIOS update, this looks like a hardware problem/compatibility issue. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.