Jump to content

BTRFS checksum failure questions


Recommended Posts

I got a BTRFS checksum failure on one of my array disks (btrfs dev stats shows "corrupt 1"). It's a single disk BTRFS filesystem. This is on the same HDD that randomly had a single UDMA CRC error count a couple months ago. Is it possible the corruption occured due to that UDMA CRC but was only reported now? It's in an old infrequently accessed file which is being seeded through torrent, so maybe it was never accessed till now. I'm currently running a scrub.

 

Does BTRFS only calc/report checksum errors on access? Trying to figure out if it's more than that random UDMA error (if so, I may need to replace the drive). SMART is healthy except for the UDMCA CRC error count = 1.

 

What I can't explain, however, is that the array has finished scheduled parity checks since that UDMCA error without errors/correction I believe.

 

[edit] btw I am running ECC RAM and no errors in bios log

Edited by harvany
addendum
Link to comment
3 hours ago, harvany said:

Is it possible the corruption occured due to that UDMA CRC but was only reported now?

Unlikely, after those errors transfer should be retried, but can't say impossible.

 

3 hours ago, harvany said:

Does BTRFS only calc/report checksum errors on access?

They are calculated when the block is written and checked every time it's read/accessed.

 

 

Link to comment

Ran scrub on all my disks. The only errors were on this same disk: 5 corruption errors in total (across 3 files). Also, extended SMART on the disk came back clean (reallocated, pending, uncorrectable all 0).

 

The 3 files were all modified a year ago within hours of each other and inode numbers are just 10 apart or so. On the other hand, the single UDMA CRC error appeared 6 months ago, so even more likely to be unrelated per your comment.

 

Do you think this was non-hardware BTRFS corruption during writing or afterwards? Or some physical bitrot that didn't show up on SMART? Both would be pretty rare, but can't think of other explanations, especially given this a single disk BTRFS, no ECC errors in BIOS log, no SMART extended test errors.

 

[edit] actually physical bitrot should have resulted in a CRC error from the SMART test, so probably some random BTRFS software corruption

Edited by harvany
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...