[SOLVED] BTRFS CSUM error


Recommended Posts

Unraid v6.8.3

 

void-diagnostics-20200521-0651.zip <--- diagnostics before any troubleshooting.

 

I'm receiving the following error from my cache drive, it is always the same inode #:

 

BTRFS warning (device sdg1): csum failed root 5 ino 156381873 off 143360 csum 0xf58f6015 expected csum 0xf58f6055 mirror 1

I ran a find on the inode # and it is an Emby poster:

 

find /mnt/cache -inum 156381873
/mnt/cache/appdata/EmbyServer/data/collections/Toy Story Collection [boxset]/poster.jpg

 

I just finished a scrub:

May 21 07:08:25 VOID ool www[20615]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' '-r'

May 21 07:10:32 VOID kernel: BTRFS warning (device sdg1): checksum error at logical 1627265449984 on dev /dev/sdg1, physical 57454903296, root 5, inode 156381873, offset 143360, length 4096, links 1 (path: appdata/EmbyServer/data/collections/Toy Story Collection [boxset]/poster.jpg)

May 21 07:10:32 VOID kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0


UUID:             cc9f1614-fc5d-406a-8ee7-58a5651dc9ae
Scrub started:    Thu May 21 07:08:25 2020
Status:           finished
Duration:         0:03:08
Total to scrub:   75.15GiB
Rate:             409.40MiB/s
Error summary:    csum=1
  Corrected:      0
  Uncorrectable:  0
  Unverified:     0


Should I attempt to repair the corrupted block with BTRFS Scrub? Or should I just delete the affected file and let it be regenerated?

 

I plan on running a memtest later to ensure it isn't bad RAM, though i think if it was bad RAM i would have more than just one bad file after this error going on for over a week.

 

I appear to have at least one pending allocated sector a reserve block used according to SMART for the SSD: void-smart-20200521-0711.zip

 

I wanted to check and see how much data I have written to this cache drive so I found a calculator, this seems wildly out of bounds for a 3 1/2 year old SSD, there is no way I have written 300TB through this drive.

image.thumb.png.05bf14a9a65d039c340e1ae79b52437a.png

 

BTW, why aren't warnings like this picked up by FCP (Fix Common Problems)? It would be nice to have BTRFS errors reported (a notification generated) for those with cache devices.

 

Edited by weirdcrap
added more info
  • Like 1
Link to comment

It means that block doesn't have the checksum it should have, i.e., data is corrupt, you can fix it by deleting the file or overwriting it.

 

Most likely cause would be the SSD, there's a bad block, that was reallocated:

183 Runtime_Bad_Block       PO--C-   099   099   010    -    1

 

The SSD firmware shouldn't reallocate a block containing data without being able to write it correctly to another place, but it's known to happen, it happened to me a few years ago with a Sandisk SSD, another option would be a one time RAM bit flip, if it was bad RAM in general it would likely cause more issues.

 

 

  • Thanks 1
Link to comment
2 minutes ago, johnnie.black said:

It means that block doesn't have the checksum it should have, i.e., data is corrupt, you can fix it by deleting the file or overwriting it.

 

Most likely cause would be the SSD, there's a bad block, that was reallocated:


183 Runtime_Bad_Block       PO--C-   099   099   010    -    1

 

The SSD firmware shouldn't reallocate a block containing data without being able to write it correctly to another place, but it's known to happen, it happened to me a few years ago with a Sandisk SSD, another option would be a one time RAM bit flip, if it was bad RAM in general it would likely cause more issues.

 

 

Ok cool so I don't necessarily need to do a scrub repair? Neat I'll just delete the file then.

 

Thanks for the reassurance 😃

Link to comment
21 minutes ago, johnnie.black said:

Not a bad idea to run a scrub to check for more corruption, but since it's a single device pool it can only detect corruption, not fix it.

Yeah, I ran a second scrub after deleting the corrupted file and it reports no further errors:

 


UUID:             cc9f1614-fc5d-406a-8ee7-58a5651dc9ae
Scrub started:    Thu May 21 07:58:40 2020
Status:           finished
Duration:         0:02:48
Total to scrub:   75.17GiB
Rate:             458.17MiB/s
Error summary:    no errors found

Thanks for reminding me about not being able to repair without a pool, i forgot that was the case.

Edited by weirdcrap
Link to comment
  • JorgeB changed the title to [SOLVED] BTRFS CSUM error
  • 1 year later...
  • 2 months later...

Hello

 

I have a similar error on my system:

 

Jan 12 11:11:32 Tower kernel: BTRFS warning (device loop2): csum failed root 5 ino 70328 off 2191360 csum 0x9fca66a3 expected csum 0xa9d3c4a6 mirror 1

Jan 12 11:11:32 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 3792, gen 0

Jan 12 11:16:15 Tower kernel: BTRFS warning (device loop2): csum failed root 5 ino 70328 off 2191360 csum 0x9fca66a3 expected csum 0xa9d3c4a6 mirror 1

Jan 12 11:16:15 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 3793, gen 0

 

I tried find the corrupt file with with the find command - but nothing was found.

 

find /mnt/user -inum 70328

 

Any recommendations how to solve that 

 

Thanks!

Link to comment

The failure is gone after re-create the docker.img - thank you very much for your help!

 

The RAM is new roughly 8 month old. How can I run a memtest 

I using the ASRock Fatal1ty B450 board?

 

I had a system crash after trying to pass trough some hardware to a VM one month ago  - So I did a hard reset of the system.

Could be that was the root cause for that issue as well?  

 

Edited by snowy00
Link to comment
  • 2 years later...

thank you so much, i love this forum... even if i don't find the specific things, i always find ways to my goal.

in this case the "find -inum" command pointed me to this file:

/tmp/pkg/538087d6d87660382fef5ac4ab400402587f3f1d97e2d97cd5cc502f37c11e01/@vmngr/libvirt/build/Release/obj.target/virt/src/hypervisor-node.o

that's how i found out that my libvirt.img was corrupted. After deleting it and renew it my issue was gone.
unfortunately my vm is now gone too, but that's ok. 

at least the issue with "BTRFS warning" is solved! 

thanks again!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.