Bitrot: Can bits on a hard disk unintentionally change?


Recommended Posts

Can bits on a hard disk unintentionally change after the data has been written to disk?

 

If yes:

How does consumer level hardware handle this?

How does Unraid handle this?

Will btfrs detect and easily solve this problem?

Does Unraid have tools to manage this?

 

Edited by Jaybau
Refine my question.
Link to comment

If hard disk ECC was all that was needed, then why do we have checksums via Btfrs, ZFS, Dynamix File Integrity, SnapRAID?

There's even an Unraid Wiki document for checking disk/filesystems (https://wiki.unraid.net/index.php/Check_Disk_Filesystems).

 

This leads me to believe bit errors can silently occur; a problem hard drive ECC nor Unraid handle.  And requires additional maintenance to prevent, detect, and correct (hence the checksum tools mentioned and provided by 3rd parties).

 

 

 

 

Link to comment
14 minutes ago, Jaybau said:

why do we have checksums via Btfrs, ZFS, Dynamix File Integrity, SnapRAID?

There's even an Unraid Wiki document for checking disk/filesystems

These problems are not typically the result of bitrot. Several recent examples on this forum where bad RAM caused corruption, and there are other causes. Filesystems can become corrupt from incomplete writes due to power loss, for example. And of course, malware can result in data not as it should be.

Link to comment
10 minutes ago, trurl said:

These problems are not typically the result of bitrot. Several recent examples on this forum where bad RAM caused corruption, and there are other causes. Filesystems can become corrupt from incomplete writes due to power loss, for example. And of course, malware can result in data not as it should be.

 

To be more clear regarding my original question:  Can bits on a hard disk unintentionally change after the data has been written to disk?

 

 

Link to comment
13 minutes ago, Jaybau said:

 

To be more clear regarding my original question:  Can bits on a hard disk unintentionally change after the data has been written to disk?

 

 

 

Yes. But unlikely (random cosmic ray blasts a bit on the drive for instance), the drive would still likely report a read error. Data "decaying" in any reasonable amount of time, really unlikely.

Link to comment

When people think of defining bitrot they are mostly concerned about data decaying/degradation (rotting) over time (decades).

 

These two comments address the question of how "rotting" could be possible, and address the probability/risks:

1)  Random cosmic ray blasts a bit on the drive.

2)  Data "decaying" in any reasonable amount of time, really unlikely.

 

Hard disk ECC is not 100% reliable, example:

Seagate Ironwolf drives have a 1 per 10E14 unrecoverable read errors per bits read.

Seagate Ironwolf Pro drives have a 1 per 10E15 unrecoverable read errors per bits read.

 

I do not know what above really means for a home data hoarder.  1 per 10E14 = 1 URE per 12.5 terabytes?  1 per 10E15 = 1 URE per 125 terabytes?  Is that the probability within the MTBR time frame?

 

I'm not sure how many bits would be rotted if such an event would occur (one bit over 100's of years?)

Nor how many rotted/decayed bits would make a difference.  Especially since the source is not perfect (movies, audio, photos, my eyes, my ears, headphones, TV display). 

I'm not sure what 1 bit off is going to look or sound like (I presume the file will still be read as normal).

 

Link to comment

That's not rot, that's an error. URE's would still be logged (in theory) as an error by the drive and can be immediately recognized if monitored by the OS. 

 

The theory is that it will pretty much guarantee the death of a normal RAID implementation since drives are well above 12~ TBs now, and read failures during a, for example, RAID 5 array will drop a disk during rebuild and thus the array will be lost. It's only theory tho, as the MTBURE is not set in stone. It's a guess as to the chance, and even then the drive is likely to recover from the error anyway.

 

IMHO, bit rot and URE are WAAAAAAY less important to worry about than just keeping backups of your important data, and verifying your backups

  • Upvote 1
Link to comment
On 4/22/2022 at 12:22 AM, Jaybau said:

Will btfrs detect and easily solve this problem?

It will detect it but since each array disk is an individual filesystem it can't fix it, you can restore the affect file(s) from backups, it can detect and fix it for redundant pools.

 

Like mentioned bitrot is way the down list of things to worry about, data corruption due to bad RAM or other hardware hardware issues, is much more common.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.