BTRFS issues: read error corrected


Daxxio

Recommended Posts

Hello Fellow Unraiders,

 

Today the error log of Unraid showed a lot of BTRFS errors, so I searched what you can do and what causes it. Unfortunately I did not find the cause of the issues.

Anyone who knows what is the best thing todo at this point. Diagnostics are attached. The harddrives are 2 SAMSUNG_MZVLB256HAHQ.

Unraid version 6.8.1 

 

Found this command

root@Unraid:~# btrfs device stats /mnt/cache
[/dev/nvme1n1p1].write_io_errs    31955
[/dev/nvme1n1p1].read_io_errs     4654
[/dev/nvme1n1p1].flush_io_errs    2150
[/dev/nvme1n1p1].corruption_errs  0
[/dev/nvme1n1p1].generation_errs  0
[/dev/nvme0n1p1].write_io_errs    4107117
[/dev/nvme0n1p1].read_io_errs     1036814
[/dev/nvme0n1p1].flush_io_errs    128904
[/dev/nvme0n1p1].corruption_errs  0
[/dev/nvme0n1p1].generation_errs  0

and here is a part of the log files: 

Feb 24 16:07:53 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 615257444352 (dev /dev/nvme0n1p1 sector 228489672)
Feb 24 16:07:53 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 615257448448 (dev /dev/nvme0n1p1 sector 228489680)
Feb 24 16:07:53 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 615257452544 (dev /dev/nvme0n1p1 sector 228489688)
Feb 24 16:09:17 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 7348103 off 5263360 csum 0x4a11041d expected csum 0xf96124e9 mirror 1
Feb 24 16:10:21 Unraid kernel: BTRFS error (device nvme1n1p1): parent transid verify failed on 614962642944 wanted 19507514 found 19505513
Feb 24 16:10:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614962642944 (dev /dev/nvme0n1p1 sector 227913888)
Feb 24 16:10:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614962647040 (dev /dev/nvme0n1p1 sector 227913896)
Feb 24 16:10:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614962651136 (dev /dev/nvme0n1p1 sector 227913904)
Feb 24 16:10:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614962655232 (dev /dev/nvme0n1p1 sector 227913912)
Feb 24 16:11:20 Unraid kernel: BTRFS error (device nvme1n1p1): parent transid verify failed on 614829047808 wanted 19507387 found 19505396
Feb 24 16:11:20 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614829047808 (dev /dev/nvme0n1p1 sector 227652960)
Feb 24 16:11:20 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614829051904 (dev /dev/nvme0n1p1 sector 227652968)
Feb 24 16:11:20 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614829056000 (dev /dev/nvme0n1p1 sector 227652976)
Feb 24 16:11:20 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614829060096 (dev /dev/nvme0n1p1 sector 227652984)
Feb 24 16:13:21 Unraid kernel: BTRFS error (device nvme1n1p1): parent transid verify failed on 614831276032 wanted 19507389 found 19505396
Feb 24 16:13:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614831276032 (dev /dev/nvme0n1p1 sector 227657312)
Feb 24 16:13:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614831280128 (dev /dev/nvme0n1p1 sector 227657320)
Feb 24 16:13:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614831284224 (dev /dev/nvme0n1p1 sector 227657328)
Feb 24 16:13:21 Unraid kernel: BTRFS info (device nvme1n1p1): read error corrected: ino 0 off 614831288320 (dev /dev/nvme0n1p1 sector 227657336)

 

unraid-diagnostics-20200224-1616.zip

Link to comment

Looking at the stats both devices are having issues, likely dropping offline, so once it comes back online btfs brings the filesystem up to date, and hence the "read error corrected" info, see here for more info on how to better monitor the pool.

 

Since no cables are involved it's not likely a connection issue, look for a bios update, also some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append"

 

nvme_core.default_ps_max_latency_us=0

 

Reboot and see if it makes a difference.

 

 

Link to comment

Thanks a lot Johnnie! I will look into it. The NVMe drive also goes offline if I copy back and forth huge files (40g). 

I'll report back when I'm done.

 

EDIT: Performed a BIOS update, appended the line you mentioned and cleared the btrfs stats. Will monitor if the problem still exists.

 

EDIT2: Still had some errors after doing that but it was better. In the end I formatted my cache drives and made a new btrfs cahe pool. 

After that I still had a lot of PCIe Bus Error: severity=Corrected, type=Physical Layer errors. That seems to be resolved with booting Unraid like this: 

 

Unraid OS

kernel /bzimage
append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

 

Edited by Daxxio
update
  • Like 1
Link to comment
  • 2 years later...
On 2/26/2020 at 3:30 PM, Daxxio said:

After that I still had a lot of PCIe Bus Error: severity=Corrected, type=Physical Layer errors. That seems to be resolved with booting Unraid like this: 

 

Unraid OS

kernel /bzimage
append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

So, i am having similar issues to this with a couple of my cache pools dropping drives (without cause...3 month old NVME drives; that do not show any smart errors).

I am a little confused on the syntax order for this "nvme_core.default_ps_max_latency_us=0" string.  i have seen some posts (including this one) have it both ways; Between "append" and "initrd=/bzroot", and AFTER "initrd=/bzroot"....Which is officially correct?   (or does it even matter where it sits in the command?)

Thanks

Edited by miicar
Clarity
Link to comment
On 1/25/2023 at 3:00 AM, JorgeB said:

both :)

 

So i will assume it just my s*** 3 month old Adata NVME drives that were causing the issues then...i have since replaced them with Samsung 980 Pro NVME's, so i hope that issue is dealt with.   anyone wanna buy lightly used Adata drives? *smh*

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.