BTRFS warning: csum failed


ryoko227

Recommended Posts

This and a multiple varients of this error are getting spammed across my system log since updating to 6.6.0

BTRFS warning (device sdb1): csum failed root 5 ino 274 off 62905892864 csum 0x2d8eafc5 expected csum 0xea417956 mirror 1

The device in question is a singular SSD mounted outide the array via the go file at startup.

 

Running stats on the device results in this

[/dev/sdb1].write_io_errs    0
[/dev/sdb1].read_io_errs     0
[/dev/sdb1].flush_io_errs    0
[/dev/sdb1].corruption_errs  0
[/dev/sdb1].generation_errs  0

Scrubbing the device also gives 0 errors.

scrub started at Thu Sep 27 14:24:13 2018 and finished after 00:05:14
total bytes scrubbed: 153.88GiB with 0 errors

I have copied all of the data off and reformated the device. (This did have the added benefit of UD plug-in now showing the: FS, temp, and capacity).

However, the errors continue.

 

 

I've read quite a few  posts related to this, but still haven't found a solution on my own.

I admittedly don't know enough about the btrfs file system to understand which file(s)/metadata are causing the error.

Any help would be greatly appreciated.

Thank you in advance o/

 

 

yes-mediaserver-diagnostics-20180927-1436.zip

Link to comment

Should I try running memtest or something to troubleshoot it further? I'm thinking that might help with trying to sort out which piece of hardware might be causing it. I'm also going to check and see if there is a new BIOS for this specific motherboard as well. Thank you as always for your help johnnie!

 

EDIT-

Updated the BIOS as it specifically had increased memory compatiblity listed. Also took the time to clean the dust out and reseat everything. Still popping the error. I'll run memtest on it later when people aren't using it and see if that identifies/finds the issue.

Edited by ryoko227
Added more information
Link to comment
  • 4 weeks later...

Just a follow up to this, since I thought it was weird  and also so when I forget what I did I can check back later, www

 

So I never got around to running the memtest, partially out of laziness, mostly out of not wanting to stay after hours.

I noticed yesterday, that the mounting point for the drive no longer seemed to have any files or folders in it.

So I tried copying the backups over just to see, and it said the disk was full...

 

I decided I would try reformating the drive (again just to see) and pulled the mount commands from the go file and rebooted.

Drive showed up totally fine in Unassigned Devices, could mount with no issues, and all the files and folders were still there.

So, I set it to automount, changed my VM settings to point to this new mounting point, and profit.

EDIT - Even with mutliple reboots nothing has changed... weird

 

Everything seems to be running fine and I have had 0 checksum errors since.

Keeping in mind, I did the 6.6.3 update right before all of this. So TBH, I don't know what "fixed" it, or even what was ultimately wrong with it.

 

I know I should still run the mem test to verify

Edited by ryoko227
Added more information
  • Like 1
Link to comment
  • 1 year later...
7 hours ago, Marshalleq said:

I've heard more than one person say BTRFS often corrupts.

I won't argue that btrfs doesn't have its bugs, but I've been using it for a long time as well as following the development on the mailing list and never heard of any data checksum related bug, that feature is pretty much bullet proof, i.e., if there's a checksum error data doesn't match the checksum stored at write time, this happens most often in Unraid with raid based pools when one of the members dropped offline and then comes back online, the old data will be stale and fail checksums, scrub will bring up to date, but if this is happening on a single device filesystem then you can be pretty sure data corruption occurred, or there's a hardware problem, like bad RAM.

 

7 hours ago, Marshalleq said:

It's only BTRFS complaining though, not XFS

Well, XFS would never complain, since it doesn't ckecskum data, it will happy feed you corrupted data.

 

I also use ZFS for a couple of servers, no doubt more stable than btrfs, but it's not perfect, and not as flexible.

 

 

Link to comment
4 hours ago, johnnie.black said:

I won't argue that btrfs doesn't have its bugs, if this is happening on a single device filesystem then you can be pretty sure data corruption occurred, or there's a hardware problem, like bad RAM.

It's a BTRFS mirror.  I wouldn't have chosen BTRFS if I had any other choice.  Of the space of the few years I've been using it I've had 3 maybe 4 issues, all different.  I think one other where I needed to format and start again, and two related to the mirror not being created as advertised - that's probably an unraid bug more than a btrfs bug though.

 

4 hours ago, johnnie.black said:

Well, XFS would never complain, since it doesn't ckecskum data, it will happy feed you corrupted data.

The context was provided to indicate it wasn't likely a whole system issue impacting disks as one other person on here seemed to think when I dug back into the archives.

 

I'll just invoke the mover and reformat it.  I'm half tempted to get rid of the mirror altogether as I think a single disk will have less issues than BTRFS will.  But provided it doesn't actually corrupt my data, I'll keep the mirror.  Though the jury's out on that one.

Link to comment

Thanks but it has impacted both BTRFS on the mirror and the btrfs on the docker image.  I assume it must have been caused when I had to force restart the box the other day due to an Nvidia lockup.  I don't really expect otherwise, but btrfs does seem to be more picky about such things and I would have thought a scrub or repair would have sorted it.  But no.

Link to comment

Thanks, the saga continues.  Fixed all this up, but now getting transport endpoint is not connected on /mnt/user.  Also lots of segfaults when docker tries to run and the shares have all disappeared.  Fun times.  I officially dislike BTRFS - even though I recognise it may not be entirely the fault of BTRFS - I have to blame something until I know better! :D

Edited by Marshalleq
Link to comment
On 2/26/2020 at 7:37 PM, Marshalleq said:

 I officially dislike BTRFS - even though I recognise it may not be entirely the fault of BTRFS - I have to blame something until I know better!

Just for any future reader I see like suspected this wasn't a btrfs problem:

 

https://forums.unraid.net/topic/41333-zfs-plugin-for-unraid/?do=findComment&comment=828132

On 2/29/2020 at 7:21 PM, Marshalleq said:

Edit 2: Memtest confirms I have faulty, or possibly misconfigured memory.  There goes my morning....

Link to comment

Yep, your gut was right, it was a faulty memory stick.  Though this exercise I have learnt the following:

 

I don't know how long the memory was faulty - my assumption is many months, it was at about 79GB so it may not always have been used.  Also that:

 

  1. All file systems were impacted
  2. The BTRFS file system required to be formatted to be recovered because it wouldn't rebalance.  
  3. The BTRFS docker image required to be deleted and recreated (or older version restored from backup) because it wouldn't repair
  4. ZFS clearly pointed me directly at the corrupted file (on my single disk ZFS volume) so I could restore it which was nice
  5. The ZFS mirror healed with a simple scrub.  
  6. XFS of course just ran an fsck type thing, so something could still be lingering, but that data is not very important, that's why it's on XFS.  I'd like something more robust, but short of memory errors like this and cold reboots, it's probably pretty safe for what it is.

I do believe BTRFS will also point me at the corrupted file, maybe it did, my memory on that is struggling.

 

It has been a good exercise.  It's possible someone with more knowledge of BTRFS could have fixed it, though a corrupted image file didn't sound very fixable to me so I elected to start again because I don't trust it based on past experience.  I could be completely wrong, but that's where I landed.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.