Ferdaze Posted February 3, 2023 Share Posted February 3, 2023 Need some assistance on verifying the casus of the errors and high read/write on 1 drive in the cache. I just installed 2 new 2tb WD Blue and seeing the syslog fill up with these logs. Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fbf8 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc00 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc08 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc10 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d390 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d398 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3a0 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3a8 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3b0 len 4096 err no 10 Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3b8 len 4096 err no 10 Feb 3 07:27:09 nas2-jag kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 1 Feb 3 07:27:10 nas2-jag kernel: btrfs_dev_stat_print_on_error: 953 callbacks suppressed Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078845, rd 36520810, flush 2929354, corrupt 8123, gen 0 Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078846, rd 36520810, flush 2929354, corrupt 8123, gen 0 Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078847, rd 36520810, flush 2929354, corrupt 8123, gen 0 Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078848, rd 36520810, flush 2929354, corrupt 8123, gen 0 nas2-jag-diagnostics-20230203-0720.zip Quote Link to comment
JorgeB Posted February 4, 2023 Share Posted February 4, 2023 One of the NVMe devices dropped offline, this can sometimes help with that, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Then power cycle the server, a reboot might not be enough, to see if the device comes back online, if yes run a scrub and see here for better pool monitoring. 1 Quote Link to comment
Ferdaze Posted February 4, 2023 Author Share Posted February 4, 2023 Thanks for the reply but I don't think it helped. What do you think? Scrub started: Sat Feb 4 00:36:05 2023 Status: finished Duration: 0:22:36 Total to scrub: 670.17GiB Rate: 157.24MiB/s Error summary: read=12875666 super=3 Corrected: 0 Uncorrectable: 12875666 Unverified: 0 nas2-jag-diagnostics-20230204-1049.zip Quote Link to comment
Solution JorgeB Posted February 5, 2023 Solution Share Posted February 5, 2023 Earlier diags didn't show the begging of the problem because of all the spam, NVMe device is not dropping offline, it's being passed-through to a VM, disable that and run another scrub. Quote Link to comment
Ferdaze Posted February 6, 2023 Author Share Posted February 6, 2023 Thank you!!! ... Totally missed that. I removed it from the VM and rebooted last night. After that the it started syncing and this morning I ran the scrub with our errors. 1 Quote Link to comment
0weavern Posted August 1 Share Posted August 1 (edited) I'm having a similar issue after adding another SSD to the pool for RAID1 redundancy. Diags attached. Any idea on what's going wrong here? A BTFS scrub gives me thousands of uncorrectable errors: UUID: f2dc86c3-83b5-42ae-b0c3-c1448598d741 Scrub started: Thu Aug 1 17:05:04 2024 Status: aborted Duration: 0:00:09 Total to scrub: 615.78GiB Rate: 230.30MiB/s Error summary: read=189382 super=3 Corrected: 0 Uncorrectable: 189382 Unverified: 0 diags.zip Edited August 1 by 0weavern Quote Link to comment
JorgeB Posted August 1 Share Posted August 1 You are having issues with cache_apps2, replace its cables and post new diags after array start. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.