harvany Posted April 30, 2021 Posted April 30, 2021 (edited) I got a BTRFS checksum failure on one of my array disks (btrfs dev stats shows "corrupt 1"). It's a single disk BTRFS filesystem. This is on the same HDD that randomly had a single UDMA CRC error count a couple months ago. Is it possible the corruption occured due to that UDMA CRC but was only reported now? It's in an old infrequently accessed file which is being seeded through torrent, so maybe it was never accessed till now. I'm currently running a scrub. Does BTRFS only calc/report checksum errors on access? Trying to figure out if it's more than that random UDMA error (if so, I may need to replace the drive). SMART is healthy except for the UDMCA CRC error count = 1. What I can't explain, however, is that the array has finished scheduled parity checks since that UDMCA error without errors/correction I believe. [edit] btw I am running ECC RAM and no errors in bios log Edited April 30, 2021 by harvany addendum Quote
JorgeB Posted April 30, 2021 Posted April 30, 2021 3 hours ago, harvany said: Is it possible the corruption occured due to that UDMA CRC but was only reported now? Unlikely, after those errors transfer should be retried, but can't say impossible. 3 hours ago, harvany said: Does BTRFS only calc/report checksum errors on access? They are calculated when the block is written and checked every time it's read/accessed. Quote
harvany Posted May 1, 2021 Author Posted May 1, 2021 (edited) Ran scrub on all my disks. The only errors were on this same disk: 5 corruption errors in total (across 3 files). Also, extended SMART on the disk came back clean (reallocated, pending, uncorrectable all 0). The 3 files were all modified a year ago within hours of each other and inode numbers are just 10 apart or so. On the other hand, the single UDMA CRC error appeared 6 months ago, so even more likely to be unrelated per your comment. Do you think this was non-hardware BTRFS corruption during writing or afterwards? Or some physical bitrot that didn't show up on SMART? Both would be pretty rare, but can't think of other explanations, especially given this a single disk BTRFS, no ECC errors in BIOS log, no SMART extended test errors. [edit] actually physical bitrot should have resulted in a CRC error from the SMART test, so probably some random BTRFS software corruption Edited May 1, 2021 by harvany Quote
JorgeB Posted May 2, 2021 Posted May 2, 2021 18 hours ago, harvany said: Both would be pretty rare They are, but never heard of unexplained btrfs data corruption, there were never any parity sync errors detected since those files were first written? Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.