BTRFS errors on raid1 cache

Followers

February 3, 20233 yr

Need some assistance on verifying the casus of the errors and high read/write on 1 drive in the cache. I just installed 2 new 2tb WD Blue and seeing the syslog fill up with these logs.

Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fbf8 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc00 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc08 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29b6fc10 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d390 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d398 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3a0 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3a8 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3b0 len 4096 err no 10
Feb 3 07:27:05 nas2-jag kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 325 rw 0,0 sector 0x29d7d3b8 len 4096 err no 10
Feb 3 07:27:09 nas2-jag kernel: BTRFS error (device nvme1n1p1): error writing primary super block to device 1
Feb 3 07:27:10 nas2-jag kernel: btrfs_dev_stat_print_on_error: 953 callbacks suppressed
Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078845, rd 36520810, flush 2929354, corrupt 8123, gen 0
Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078846, rd 36520810, flush 2929354, corrupt 8123, gen 0
Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078847, rd 36520810, flush 2929354, corrupt 8123, gen 0
Feb 3 07:27:10 nas2-jag kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 1272078848, rd 36520810, flush 2929354, corrupt 8123, gen 0

nas2-jag-diagnostics-20230203-0720.zip

Quote

Solved by JorgeB

February 5, 20233 yr

Go to solution

February 4, 20233 yr

Community Expert

One of the NVMe devices dropped offline, this can sometimes help with that, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

Then power cycle the server, a reboot might not be enough, to see if the device comes back online, if yes run a scrub and see here for better pool monitoring.

Quote

February 4, 20233 yr

Author

Thanks for the reply but I don't think it helped. What do you think?

Scrub started:    Sat Feb  4 00:36:05 2023
Status:           finished
Duration:         0:22:36
Total to scrub:   670.17GiB
Rate:             157.24MiB/s
Error summary:    read=12875666 super=3
  Corrected:      0
  Uncorrectable:  12875666
  Unverified:     0

nas2-jag-diagnostics-20230204-1049.zip

Quote

February 5, 20233 yr

Community Expert
Solution

Earlier diags didn't show the begging of the problem because of all the spam, NVMe device is not dropping offline, it's being passed-through to a VM, disable that and run another scrub.

Quote

February 6, 20233 yr

Author

Thank you!!! ... Totally missed that. I removed it from the VM and rebooted last night. After that the it started syncing and this morning I ran the scrub with our errors.

Quote

1 year later...

August 1, 20241 yr

I'm having a similar issue after adding another SSD to the pool for RAID1 redundancy. Diags attached. Any idea on what's going wrong here? A BTFS scrub gives me thousands of uncorrectable errors:

UUID: f2dc86c3-83b5-42ae-b0c3-c1448598d741
Scrub started: Thu Aug 1 17:05:04 2024
Status: aborted
Duration: 0:00:09
Total to scrub: 615.78GiB
Rate: 230.30MiB/s
Error summary: read=189382 super=3
Corrected: 0
Uncorrectable: 189382
Unverified: 0

diags.zip

Edited August 1, 20241 yr by 0weavern

Quote

August 1, 20241 yr

Community Expert

You are having issues with cache_apps2, replace its cables and post new diags after array start.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

BTRFS errors on raid1 cache

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)