Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Corrupted disk recovery?

Featured Replies

Hard drives are tremendously reliable.  For this reason, the vast majority of unRAIDers reading this post will never have one fail. We parity check and look at smart reports and replace failing disks. But if a drive were really to fail in my server, and my server crashed as a result, and I rebooted to find a red ball and a dead data drive, I would be forced to do a failed drive rebuild. And I'd expect unRAID to rebuild the disk flawlessly. And guess what? It can't.

 

Why not? I can upsize a disk and it works perfectly. The problem is, in order to work, a parity drive needs to be in perfect sync with its surviving data drives. After a clean shutdown it is. But after a crash it may not be, particularly if I were in the middle of writing data to the array, possibly writing multiple streams to multiple disks.  So it is possible, even likely, that parity may be a little off after a crash. So what does that mean? It means that after the recovery, which might appear to go perfectly, some file or set of files on my recovered drive are a little off. Maybe it is in the middle of an mp3 file, and creates a non-audible change. Maybe in the middle of a movie, creating a brief flicker. So maybe I don't much care. But it could be worse. It could be that my media player will display an error I've never seen before the next time we play Suzy's 4th grade piano recital video. Maybe a few videos are affected. The opportunity for corruption gets much worse if I have a data drive fail while building parity on a new disk. I'd be forced to go back to your old parity drive, which I am holding in reserve for just such an event. But if I was using the array while parity was building, even not trying to write anything, a media player, plug ins, and even Windows may be writing tiny little files to the array. And the old parity sees none of these writes. So rebuilding with old parity will create a drive riddled with bullet holes of corruption. Its certainly better than losing the whole disk, but far from the results I want.

 

So we ask ourselves - how can I figure out what files, if any, were corrupted when I do a rebuild. The answer is, I can't. Unless I have zip or rar files, whose integrity can be checked, I am out of luck. All I can do is hope. But there are ways to prepare for this exact scenario. And that is the purpose of this post.

 

There are two options that I know of, and a third that may be coming to the rescue. I will explain them below:

 

1 - Keep md5sums for all of your files. This is a little time consuming, but not that hard. A Linux guy could probably write a cryptic command in minutes that would capture these on an entire array, that would run for days and produce a 30K text file full of the magic numbers that could be used to detect corruption later.  This is a great PLUGIN opportunity (keep a database of md5s of files and allow it to compare to current disk contents).  But DETECT is the key word here. If there is a problem, you could not correct it. You would have to recover from backup (if you had one), recreate from the original media, view / listen to the file to see if the corruption is noticeable. But at least you would know the (hopefully) small universe of affected files, and not be looking at 3T of files and wondering if any could be corrupted.

 

2 - Create PAR2 sets. I won't explain PAR2 here, but you can search on PAR2 and quickpar and learn about what it is and how it works. Bottom line, PAR2 files will allow corruptions to be DETECTED and, if there are not too many of them, corruptions to be CORRECTED. The problem is that PAR2 sets are based on a fixed set of files. You can't easily add one more file without recreating the whole thing. The PAR2 tools also don't have the ability to traverse directories. And PAR2 takes quite a while, much longer than md5, to do its magic. But PAR2 is pretty useful if you have a disk full of media that you plan to keep forever. If they are all in one flat directory you can easily run par2cmdline to produce the blocks. If you files span folders, with some effort you could create a flat directory full of links to the media files (maybe ignoring the fanart) and unleash par2 on that. It might take days or even a week or more to complete, but if the disk is static, it would run once and you'd be able to detect any corrupt file, and correct minor corruption on that set of files forever. You never have to redo it. If it doesn't come across, I am a fan of this option. And I have created these for some of my disks. I plan to recreate them for all of my disks over the coming weeks.

 

3 - P+Q / diagonal parity. I am not sure, but I believe that these technologies may be able to help. Maybe others know more and can elaborate. Two problems I see, 1 - unRAID doesn't support them yet, and 2 - these volumes are subject to the same type of corruption as other drives, and I'm not sure solve this problem. Also, they are going to slow down the array and/or cause all drives to spin up to do a write. Many users will opt to not implement P+Q parity. Although it might be able to help protect against 2 simultaneous drive failures, drive failures are extremely rare, so dual drive failures may be much less likely than theft, fire, failed fans, exploding PSUs, dogs peeing on our servers, meteors, and other events that are as likely to take out the whole server as just 2 drives. So if P+Q/diagonal parity is painful in terms of performance or has side effects, many will pass on using it.

 

By and large we are a pretty anal group. We realize that the risk of data loss is low, and we know how to see failures coming and head them off. But we want one additional level of protection against drive failure. unRAID is a great solution. Although unRAID may be able to perfectly rebuild a drive if the array is shut down cleanly, rebuilding a truly failed drive that crashes the server leaves its owner with a queasy feeling of not knowing if any data was corrupted. There are solutions to help, but they have to be done proactively. I hope this helps educate you about the risks, and provides some ideas of how to verify that a recovery was truly 100% effective, and know (or even fix) corruptions.

 

Hope this generates some discussion of other options to detect or correct drive corruptions.

Where's the emoticon for applause?

1.0.gif.377eac1c5a7fe6a4a5850757a20bafaa.gif

A few related but somewhat random thoughts ...

 

One key point that I've made numerous times:  UnRAID ('nor any other RAID system) is NOT a backup.    If your data is important enough that you keep it on a fault-tolerant server;  it's likely also important enough that you should have a backup copy of it.  I've written a good bit about it over time, and summarized a lot of my thoughts here:  http://lime-technology.com/forum/index.php?topic=31020.0

 

As for detecting corrupted files -- I DO maintain checksums (MD5's, although the utility I use also supports SHA1's) for all of my media files, so it's trivial to determine if anything's been corrupted.  I use the excellent Corz Checksum utility running from Windows [http://corz.org/windows/software/checksum/ ]    ... and of course if I was to find a corrupted file, I'd simply restore it from my backups  :)

 

A dual-fault-tolerant approach (P+Q or diagonal parity) would indeed provide better protection.  With dual drive fault tolerance, the likelihood of data loss before you rebuild a failed drive is MUCH lower, since a 2nd drive failure during that process would not impact the rebuild.    Clearly there are "costs" associated with this approach.  In addition to the dollar cost of changing the data drive count to N-2 (from the current N-1),  there may be an additional licensing cost (diagonal parity is the easiest approach to implement, but is patented and requires a licensing fee -- which I'd certainly expect Tom to pass along);  and there's the "cost" of additional spinups for writes (and marginally slower writes).    Not everyone is willing to absorb these costs -- that's why folks still buy RAID controllers that don't support RAID-6 and many folks who have RAID-6 capable controllers still build RAID-5 arrays.

 

One other key feature that I think UnRAID needs even more than dual-fault tolerance:  e-mail notification as a default feature.    I think many folks don't realize it when they have a drive fail -- and in many cases probably don't know it until there's a 2nd data-loss-inducing failure (which finally causes an actual error message when they try to access the array).    This would ensure that current users could start a rebuild very soon after a failure ... hopefully before they have a 2nd failure that would cause data loss (which would, of course, be eliminated if we later get dual-fault tolerance).

 

Having said all that, I do agree that there needs to be a more reliable way to do drive rebuilds.  I don't think the "bad parity due to crash" is a very likely scenario -- I think most drive failures will be properly reported when a write error occurs on the drive ... or if the drive totally fails it will simply be red-balled immediately, but not cause a system crash.  I think the reason things get worse than that is largely because there's no built-in notification of the event, so the failure isn't known about until something else happens.    If the failure was reported immediately via e-mail; and the user than replaced the bad drive and initiated a rebuild, I think the scenario you've outlined would be far less likely.

 

... as for resolving that scenario if/when it does happen => there's really nothing you can do if the parity drive isn't correct.  You simply MUST have some means of determining which files have been corrupted if you want to ensure everything's good.    Checksums (MD5 or SHA1), error correction sets (PAR2), or simply comparing everything with the backups are what we currently have available with the current feature-set in UnRAID.  An eventual dual-failure tolerance capability would simply eliminate this problem altogether  :)

 

 

  • Author

A few comments -

 

unRAID is not a backup is true. But backups for tens of terabytes is quite expensive, especially offsite backups. For many, it is just not practical to have a real backup solution for all that data. Often the cost is just time. Re-ripping a bunch of movies for example. But I don't think that the fact that we should all be backing up our data should be held as a reasons to not make unRAID as good as possible about maintaining data integrity.

 

The couple times I've seen a person report a drive failure (and these were all years ago), unRAID crashed and could not be shutdown cleanly. And users were not certain if there was corruption or not.

 

You are smart to have the discipline to do the checksums!!! Everyone should be running a plugin that does this automatically behind the scenes.

... Often the cost is just time.

 

True.  I've seen many cases, however, where folks who felt that way changed their mind big-time after actually incurring a catastrophic loss where they were actually going to have to reconstruct all their data.    And while the only cost may be time, time is in fact a commodity that's worth something.  With modern drives costing $40/TB or less, it's certainly not very expensive to maintain a set of backups -- it's just a matter of the discipline to do it.  In fact, you can use older, smaller drives that you're replacing for the backups and likely not have to buy a lot of new "just for backup" drives.

 

Granted, if you've built up a 20-30TB collection without ever backing up, the "catch-up" cost to back everything up is notable ... but if you backup as you go along, it's a modest incremental cost.    Between my 2 servers I've got nearly 40TB of media and other data ... and it's all backed up very well  :)  [better than it needs to be in most cases]

 

 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.