February 3, 20233 yr Need some assistance on verifying the casus of the errors and high read/write on 1 drive in the cache. I just installed 2 new 2tb WD Blue and seeing the syslog fill up with these logs. Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fbf8 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc00 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc08 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc10 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d390 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d398 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3a0 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3a8 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3b0 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3b8 len 4096 err no 10 Feb 3 07:27:09 nas2-jag kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 1 Feb 3 07:27:10 nas2-jag kernel: btrfs_dev_stat_print_on_error: 953 callbacks suppressed Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078845, rd 36520810, flush 2929354, corrupt 8123, gen 0 Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078846, rd 36520810, flush 2929354, corrupt 8123, gen 0 Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078847, rd 36520810, flush 2929354, corrupt 8123, gen 0 Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078848, rd 36520810, flush 2929354, corrupt 8123, gen 0 nas2-jag-diagnostics-20230203-0720.zip
February 4, 20233 yr Community Expert One of the NVMe devices dropped offline, this can sometimes help with that, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Then power cycle the server, a reboot might not be enough, to see if the device comes back online, if yes run a scrub and see here for better pool monitoring.
February 4, 20233 yr Author Thanks for the reply but I don't think it helped. What do you think? Scrub started: Sat Feb 4 00:36:05 2023 Status: finished Duration: 0:22:36 Total to scrub: 670.17GiB Rate: 157.24MiB/s Error summary: read=12875666 super=3 Corrected: 0 Uncorrectable: 12875666 Unverified: 0 nas2-jag-diagnostics-20230204-1049.zip
February 5, 20233 yr Community Expert Solution Earlier diags didn't show the begging of the problem because of all the spam, NVMe device is not dropping offline, it's being passed-through to a VM, disable that and run another scrub.
February 6, 20233 yr Author Thank you!!! ... Totally missed that. I removed it from the VM and rebooted last night. After that the it started syncing and this morning I ran the scrub with our errors.
August 1, 20241 yr I'm having a similar issue after adding another SSD to the pool for RAID1 redundancy. Diags attached. Any idea on what's going wrong here? A BTFS scrub gives me thousands of uncorrectable errors: UUID: f2dc86c3-83b5-42ae-b0c3-c1448598d741 Scrub started: Thu Aug 1 17:05:04 2024 Status: aborted Duration: 0:00:09 Total to scrub: 615.78GiB Rate: 230.30MiB/s Error summary: read=189382 super=3 Corrected: 0 Uncorrectable: 189382 Unverified: 0 diags.zip Edited August 1, 20241 yr by 0weavern
August 1, 20241 yr Community Expert You are having issues with cache_apps2, replace its cables and post new diags after array start.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.