Should I worry about bit rot on my new server?


ph0ton

Recommended Posts

Hi guys,

 

I am researching my options wrt. a home sever. I have read lots of pages about hardware on your (very) friendly forums and tried out Unraid on a small test server, and it seems very straight forward and easy to use. As I understand it, Unraid works by make realtime parity calculations between a data drive and parity drive. A clever share layer is placed on top of the drive partitions, so shares can spand multiple drives. It is also able to use an SSD to give burst write qualities to the array.

 

I was fairly convinced that I was going to use unraid. However, it seems that the old saying "the more I learn, the less I know" seem to have struck me a lot. I have 3 options to choose from:

 

Unraid:

+ Burn USB & go

+ Power efficient data access (only partition drive and required data drive spins up during access)

+ Core docker and VM features for easy server application management

+ Very active community

- No bit rot / silent error protection

- Some of the software features seem a little "homemade" (e.g. the cache is working on file level, not on a block/sector level - Please lecture me if I'm wrong )

 

Debian / Snapraid install:

+ Debian very well supported. Almost all issues can be googled (I am not intimidated by the commandline as well)

+ Mature package system

+ Snapraid seems fairly simple to setup, altough not exactly  a turn key solution (you need to some homecooking on some maintenance script to start the ball running).

+ I can use Open Media vault for the day-to-day management.

+ Snapraid offers bit rot protection and recovery.

- Snapraid is maintained by one man, although his software seems very mature

- Snapraid does not offer realtime protection. If a disk fails, and you have deleted a lot of files after the last sync, then you might not be able to recover all data from the failed drive.

 

Freenas:

+ hugh user base (or that's IxSystems want's you to think)

+ The user interface seems reasonably easy to use.

+ Apps and plugins + docker and VMs

+ ZFS is very advanced, including bit rot protection.

- Hugh upfront costs, each raid level requires a specfic set of disks + one GB of ram per TB of disk space

- Expanding the array is costly, since vdevs can't be extended with new disks (although pools can). Upgrade is disk in - resilver - repeat

- All disks needs to spin, when accessing your pool.

- (I was going to write something about the way people talk to each other on the freenas forums, but I deleted it)

 

I am fairly certain that I won't be using Freenas (artillery guns to hit wasps). Snapraid was a favourite, but the file deletion / recover ability is an issue for me when I read it. In both cases it is mostly the bit rot protection which keeps them on the list. I have never experienced bit rot my self, or at least i have never noticed it in my files, e.g. a zip archive which wont unzip.

 

TL;DR: So, my main quetions are:

  • Is bit rot a real thing ?
  • Why doesn't Unraid support checksum correction of silent errors?
  • Is e.g. SMART data monitoring enough?
  • Is it possible to combined snapraid with unraid (e.g. using the unassigned disk plugin to mount a disk for the snapraid parity file)?

 

 

Sorry for the long post :)

 

B.R.

Kevin

 

Link to comment

1)    Is bit rot a real thing ? yes, it is real. Monthly parity checks are recommended.

 

2)    Why doesn't Unraid support checksum correction of silent errors? unRaid is not a filesystem, but you can use btrfs got get that if desired.

 

3)    Is e.g. SMART data monitoring enough? no, but what else is available? SMART is not predictive, but reporting.

 

4)    Is it possible to combined snapraid with unraid (e.g. using the unassigned disk plugin to mount a disk for the snapraid parity file)? This might be possible, but how is this helpful? Are you seeking triple parity?

 

 

unRaid is a solid solution for home media servers. if your use case is outside that, it may not be a good fit. Write performance is limited, and use of a cache drive to improve write performance puts data outside the array.

 

 

For many this simple question decides between non-realtime and realtime. It is acceptable to be without until you have time to troubleshoot and complete the repair?

Link to comment
Some of the software features seem a little "homemade" (e.g. the cache is working on file level, not on a block/sector level)

Not really sure how to respond to this.  Maybe read about the cache drive here?

  https://lime-technology.com/network-attached-storage/

 

One of the big benefits of unRAID is that all the disks (other than parity) use standard filesystems.  You wouldn't want the cache drive to do something weird at the block/sector level.

 

Is bitrot real?
Many of us are running the Dynamix file Integrity plugin:

  https://lime-technology.com/forum/index.php?topic=44989.0

which creates checksum values on files so you can tell if they have been modified.  Although I have had some false positivies (i.e. itunes modifies an mp3 without changing the timestamp) I don't think anyone has identified an actual instance of bitrot. 

 

If they did, the solution would be to restore the affected file from backup.  Having a parity drive does not absolve you of making backups, see https://lime-technology.com/forum/index.php?topic=31020.5

 

Why doesn't Unraid support checksum correction of silent errors?
See https://lime-technology.com/forum/index.php?topic=47875.msg460754#msg460754

 

Is e.g. SMART data monitoring enough?
Be sure to enable notifications, then unRAID will alert you if it detects any problems with your drives.

 

Is it possible to combined snapraid with unraid?
  AFAIK, nobody does this.
Link to comment

Thank you for the responses. First off, let me appologize for the use of "homemade", it raised eyebrows and it wasn't my intention.

 

If they did, the solution would be to restore the affected file from backup.  Having a parity drive does not absolve you of making backups, see https://lime-technology.com/forum/index.php?topic=31020.5

I completely agree. A lot of people think that disk redundancy is the same as backup. I have my most valued files backed up to two different off site backups. No file integrety checking though, as I was completely oblivious to this until I started preparing to retire my current Synology NAS. I have never used disk redundancy, but in the coming months I will move my movie collection onto the new server and I don't want to redo rips, if I loose a disk.

 

Some of the software features seem a little "homemade" (e.g. the cache is working on file level, not on a block/sector level)

Not really sure how to respond to this.  Maybe read about the cache drive here?

  https://lime-technology.com/network-attached-storage/

One of the big benefits of unRAID is that all the disks (other than parity) use standard filesystems.  You wouldn't want the cache drive to do something weird at the block/sector level.

Maybe I'm just surprised that it works well - K.I.S.S I guess :). My initial thought was that such a cache would be implemented as a blocklevel write FIFO, since it would reduce time window where the data isn't protected by the array and wouldn't fail if a write is bigger than the cache disk, the write speed would just drop to array speed. I hope you don't take my curiosity for criticism.

 

2)    Why doesn't Unraid support checksum correction of silent errors? unRaid is not a filesystem, but you can use btrfs got get that if desired.

I do understand this distinction. Snapraid isn't a filesystem either, but it does checksumming as well as parity calculations. I was simply wondering why Unraid doesn't employ a similar scheme. It seems that disk recovery is more important than silent errors:

 

So parity correction over silent error detection. Seems to make sense.

 

 

Link to comment

The main use-case for the cache drive is actually very simple.  One of the downsides to unRAID is that because it doesn't do striping, writes to the parity-protected array are slower than writes to a standard disk.  To mitigate this, you can set it up to write to a cache drive first (at full speed) and then the system will automatically move the files from the cache drive to the array during the middle of the night when it doesn't take any of your time.  ( There is talk of a "continuous move" feature that would move files from the cache drive to the array immediately, but it hasn't been implemented yet )

 

This does mean that your data is vulnerable for a day or so.  If that concerns you, you can setup a btrfs cache pool, so that your cache drive has redundancy as well (I haven't done this)

 

 

To give you an idea of speeds... I just used Teracopy to copy a 5GB media file directly to the array from my main Win 10 desktop (hard-wired, 1GB lan).  The numbers were bouncing around a bit, but it looked like 45 MB/s was the average speed. When I copied to my cache drive instead, the speed was 85 MB/s. This is what I normally do.

 

 

You'll also store your docker and vm images on the cache drive.  Ideally you will use an SSD for this, to maximize their speed.  And again, you can use a cache pool if you are worried about your docker/vm files being unprotected.

 

I hope that helps!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.