eribob Posted January 30 Share Posted January 30 (edited) Hi! One of my NVMe drives has suddenly started giving me a lot of BTRFS errors. Se attached syslog. Jan 30 07:35:27 MONSTERSERVERN kernel: I/O error, dev loop2, sector 37325840 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 2 Jan 30 07:35:27 MONSTERSERVERN kernel: I/O error, dev loop2, sector 37300584 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2 Jan 30 07:35:27 MONSTERSERVERN kernel: loop: Write error at byte offset 16593756160, length 4096. Jan 30 07:35:27 MONSTERSERVERN kernel: I/O error, dev loop2, sector 32409680 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 2 Jan 30 07:35:27 MONSTERSERVERN kernel: loop: Write error at byte offset 19110830080, length 4096. Jan 30 07:35:27 MONSTERSERVERN kernel: I/O error, dev loop2, sector 37325840 op 0x1:(WRITE) flags 0x100000 phys_seg 4 prio class 2 Jan 30 07:35:30 MONSTERSERVERN kernel: btrfs_dev_stat_inc_and_print: 330006 callbacks suppressed [...] Jan 30 07:35:30 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 172, rd 2403805, flush 0, corrupt 0, gen 0 Jan 30 07:35:30 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 172, rd 2403807, flush 0, corrupt 0, gen 0 Jan 30 07:35:30 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 172, rd 2403809, flush 0, corrupt 0, gen 0 [...] Jan 30 07:35:37 MONSTERSERVERN kernel: I/O error, dev loop2, sector 37430928 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2 [...] Jan 30 09:28:14 MONSTERSERVERN kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1066408400, 8 blocks, I/O Error (sct 0x3 / sc 0x71) Jan 30 09:28:14 MONSTERSERVERN kernel: I/O error, dev nvme0n1, sector 1178887112 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2 [...] Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0 Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0 Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0 Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 4, flush 0, corrupt 0, gen 0 Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 3, rd 4, flush 0, corrupt 0, gen 0 Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 Jan 30 09:28:14 MONSTERSERVERN kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 3, rd 6, flush 0, corrupt 0, gen 0 Do you think this means that the drive has gone bad or are the errors caused by a problem with my RAM? I have non-ECC DDR4, and recently applied an XMP profile to run it at its native speed (3200MHz). Maybe that was too stressful for the RAM? Previously I ran it at 2133MHz for stability. My cache pool and another NVMe drive are also using BTRFS so I want to know whether there is a risk that they might fail as well. Is there a way to recover data on the drive or should I just format it? Thank you in advance! monsterservern-syslog-20240130-0635.zip monsterservern-diagnostics-20240130-0938.zip Edited January 30 by eribob Quote Link to comment
JorgeB Posted January 30 Share Posted January 30 Jan 30 09:28:14 MONSTERSERVERN kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1066408400, 8 blocks, I/O Error (sct 0x3 / sc 0x71) Jan 30 09:28:14 MONSTERSERVERN kernel: I/O error, dev nvme0n1, sector 1178887112 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2 Jan 30 09:28:14 MONSTERSERVERN kernel: nvme0n1: detected capacity change from 3907029168 to 0 NVMe device dropped offline, try a different m.2 slot if available, if the same it may be a device issue. Quote Link to comment
eribob Posted January 30 Author Share Posted January 30 (edited) Thank you for the reply! 24 minutes ago, JorgeB said: NVMe device dropped offline, try a different m.2 slot if available, if the same it may be a device issue. So that would mean that my NVMe slot on the motherboard suddenly stopped working? The drive has been in it for 1-2 years without issues. Sounds more likely that it is an issue with the drive in that case? Moving the NVMe drive is not trivial hehe I have to disassemble the server... I also tried another suggetstion from another thread: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off But this did not help, rather I got even more problems afterwards. Maybe just a coincidence though. But can this code make things worse in some circumstances? Edited January 30 by eribob Quote Link to comment
Michael_P Posted January 30 Share Posted January 30 23 minutes ago, eribob said: So that would mean that my NVMe slot on the motherboard suddenly stopped working No, it means it's either a bad drive, bad connection, or bad slot - moving it to another slot is a diagnostic step. My money is on bad drive. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.