weirdcrap Posted May 21, 2020 Share Posted May 21, 2020 (edited) Unraid v6.8.3 void-diagnostics-20200521-0651.zip <--- diagnostics before any troubleshooting. I'm receiving the following error from my cache drive, it is always the same inode #: BTRFS warning (device sdg1): csum failed root 5 ino 156381873 off 143360 csum 0xf58f6015 expected csum 0xf58f6055 mirror 1 I ran a find on the inode # and it is an Emby poster: find /mnt/cache -inum 156381873 /mnt/cache/appdata/EmbyServer/data/collections/Toy Story Collection [boxset]/poster.jpg I just finished a scrub: May 21 07:08:25 VOID ool www[20615]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' '-r' May 21 07:10:32 VOID kernel: BTRFS warning (device sdg1): checksum error at logical 1627265449984 on dev /dev/sdg1, physical 57454903296, root 5, inode 156381873, offset 143360, length 4096, links 1 (path: appdata/EmbyServer/data/collections/Toy Story Collection [boxset]/poster.jpg) May 21 07:10:32 VOID kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 UUID: cc9f1614-fc5d-406a-8ee7-58a5651dc9ae Scrub started: Thu May 21 07:08:25 2020 Status: finished Duration: 0:03:08 Total to scrub: 75.15GiB Rate: 409.40MiB/s Error summary: csum=1 Corrected: 0 Uncorrectable: 0 Unverified: 0 Should I attempt to repair the corrupted block with BTRFS Scrub? Or should I just delete the affected file and let it be regenerated? I plan on running a memtest later to ensure it isn't bad RAM, though i think if it was bad RAM i would have more than just one bad file after this error going on for over a week. I appear to have at least one pending allocated sector a reserve block used according to SMART for the SSD: void-smart-20200521-0711.zip I wanted to check and see how much data I have written to this cache drive so I found a calculator, this seems wildly out of bounds for a 3 1/2 year old SSD, there is no way I have written 300TB through this drive. BTW, why aren't warnings like this picked up by FCP (Fix Common Problems)? It would be nice to have BTRFS errors reported (a notification generated) for those with cache devices. Edited May 21, 2020 by weirdcrap added more info 1 Quote Link to comment
JorgeB Posted May 21, 2020 Share Posted May 21, 2020 It means that block doesn't have the checksum it should have, i.e., data is corrupt, you can fix it by deleting the file or overwriting it. Most likely cause would be the SSD, there's a bad block, that was reallocated: 183 Runtime_Bad_Block PO--C- 099 099 010 - 1 The SSD firmware shouldn't reallocate a block containing data without being able to write it correctly to another place, but it's known to happen, it happened to me a few years ago with a Sandisk SSD, another option would be a one time RAM bit flip, if it was bad RAM in general it would likely cause more issues. 1 Quote Link to comment
weirdcrap Posted May 21, 2020 Author Share Posted May 21, 2020 2 minutes ago, johnnie.black said: It means that block doesn't have the checksum it should have, i.e., data is corrupt, you can fix it by deleting the file or overwriting it. Most likely cause would be the SSD, there's a bad block, that was reallocated: 183 Runtime_Bad_Block PO--C- 099 099 010 - 1 The SSD firmware shouldn't reallocate a block containing data without being able to write it correctly to another place, but it's known to happen, it happened to me a few years ago with a Sandisk SSD, another option would be a one time RAM bit flip, if it was bad RAM in general it would likely cause more issues. Ok cool so I don't necessarily need to do a scrub repair? Neat I'll just delete the file then. Thanks for the reassurance 😃 Quote Link to comment
JorgeB Posted May 21, 2020 Share Posted May 21, 2020 Not a bad idea to run a scrub to check for more corruption, but since it's a single device pool it can only detect corruption, not fix it. 1 Quote Link to comment
weirdcrap Posted May 21, 2020 Author Share Posted May 21, 2020 (edited) 21 minutes ago, johnnie.black said: Not a bad idea to run a scrub to check for more corruption, but since it's a single device pool it can only detect corruption, not fix it. Yeah, I ran a second scrub after deleting the corrupted file and it reports no further errors: UUID: cc9f1614-fc5d-406a-8ee7-58a5651dc9ae Scrub started: Thu May 21 07:58:40 2020 Status: finished Duration: 0:02:48 Total to scrub: 75.17GiB Rate: 458.17MiB/s Error summary: no errors found Thanks for reminding me about not being able to repair without a pool, i forgot that was the case. Edited May 21, 2020 by weirdcrap Quote Link to comment
zamri Posted October 23, 2021 Share Posted October 23, 2021 I learned that if there's checksum error and the file(s) included in snapshot, btrfs send will fail Quote Link to comment
JorgeB Posted October 24, 2021 Share Posted October 24, 2021 9 hours ago, zamri said: btrfs send will fail That's normal, btrfs will abort any file operation with i/o error if data corruption is detected, so you don't unknowingly copy corrupt data. Quote Link to comment
snowy00 Posted January 12, 2022 Share Posted January 12, 2022 Hello I have a similar error on my system: Jan 12 11:11:32 Tower kernel: BTRFS warning (device loop2): csum failed root 5 ino 70328 off 2191360 csum 0x9fca66a3 expected csum 0xa9d3c4a6 mirror 1 Jan 12 11:11:32 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 3792, gen 0 Jan 12 11:16:15 Tower kernel: BTRFS warning (device loop2): csum failed root 5 ino 70328 off 2191360 csum 0x9fca66a3 expected csum 0xa9d3c4a6 mirror 1 Jan 12 11:16:15 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 3793, gen 0 I tried find the corrupt file with with the find command - but nothing was found. find /mnt/user -inum 70328 Any recommendations how to solve that Thanks! Quote Link to comment
JorgeB Posted January 12, 2022 Share Posted January 12, 2022 Without the diags can't say for sure but loop2 is usually the docker image, if yes just delete and recreate. Quote Link to comment
snowy00 Posted January 12, 2022 Share Posted January 12, 2022 tower-diagnostics-20220112-1136.zip Here is my diagnostic file Quote Link to comment
snowy00 Posted January 12, 2022 Share Posted January 12, 2022 27 minutes ago, JorgeB said: Without the diags can't say for sure but loop2 is usually the docker image, if yes just delete and recreate. Yes yo are right that is the docker img Quote Link to comment
JorgeB Posted January 12, 2022 Share Posted January 12, 2022 Just re-create it then: https://forums.unraid.net/topic/57181-docker-faq/?do=findComment&comment=564309 Quote Link to comment
JorgeB Posted January 12, 2022 Share Posted January 12, 2022 Also note that this is usually the result of bad RAM, so good idea to run memtest. Quote Link to comment
snowy00 Posted January 12, 2022 Share Posted January 12, 2022 (edited) The failure is gone after re-create the docker.img - thank you very much for your help! The RAM is new roughly 8 month old. How can I run a memtest I using the ASRock Fatal1ty B450 board? I had a system crash after trying to pass trough some hardware to a VM one month ago - So I did a hard reset of the system. Could be that was the root cause for that issue as well? Edited January 12, 2022 by snowy00 Quote Link to comment
JorgeB Posted January 12, 2022 Share Posted January 12, 2022 18 minutes ago, snowy00 said: Could be that was the root cause for that issue as well? No, that won't case checksum errors, you can run memtest from Unraid's boot menu (Legacy/CSM boot only) Quote Link to comment
scissabob Posted February 2 Share Posted February 2 thank you so much, i love this forum... even if i don't find the specific things, i always find ways to my goal. in this case the "find -inum" command pointed me to this file: /tmp/pkg/538087d6d87660382fef5ac4ab400402587f3f1d97e2d97cd5cc502f37c11e01/@vmngr/libvirt/build/Release/obj.target/virt/src/hypervisor-node.o that's how i found out that my libvirt.img was corrupted. After deleting it and renew it my issue was gone. unfortunately my vm is now gone too, but that's ok. at least the issue with "BTRFS warning" is solved! thanks again! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.