New build, need to figure out some components...

wsume99 · October 7, 2010

I'm glad I'm not the only one to think that it's a little far fetched to suggest that a 2TB drive can be worn out after half a dozen preclear cycles.

KYThrill · October 7, 2010

Many of us feel the analysis by KYThrill of wear is incorrect. You will not wear out a drive 50% in 40 hours. If your drive fails in that time-frame you do not want it in your array.

Now, all that said, all drives will eventually fail. I just expect it to be many many years. I've probably had 1 disk failure every 5 years in my household. (But then I've got a LOT of disks spinning here and some have been spinning for more than 5 years)

Well, I don't like being misrepresented. I never said the drive was worn out. I said it would throw a URE. A URE may be completely recoverable or it could result in something worse. I've stated that since the beginning of this thread. It is random. A file could have a single corrupted bit and still be opened perfectly. Or you may loose one file, or you may loose one sector, or you could loose the partition table. It all depends on where the bad bit occurs and how the drive addresses that. And if unRAID rebuilds on the fly from an error, you may never see anything has happened until you run a SMART report and find a reallocated sector.

But in my above example, lets say you pre-clear, build parity, populate and run two parity checks over two months. Statistically (with the information provided by the manufacturer) you will encounter a URE with a 2TB 10^14 drive. You can't predict what the effect of that URE will be. But lets say the drive/unRAID handles the URE transparently. Well, you will have another 6-7 months before you potentially encounter another URE. That isn't a worn out drive.

What I am advocating is minimizing risk. If HDD A (2TB Seagate LP) says that statistically you will encounter one of these situations just pre-clearing, building parity, and filling once with data, and HDD B (2TB WD Green) says that it will take 10 times longer, why would you ever expose yourself to the added risk? Brand loyalty?

No matter how you decide to define a URE (read on spot a trillion times or disregard it because you believe it doesn't apply when you are filling a disk), it is still 10X more likely to occur with a 10^14 drive vs a 10^15 drive. Your risk of a URE is always 10X greater with a 10^14 drive vs a 10^15 drive.

You will minimize your risk by buying discs rated for longer life (mechanical life, higher load/unload cycles, higher URE rating). Risk is also minimized to a point by using a larger number of smaller drives vs smaller number of larger drives. However, there is a point where you can have too many drives and end up increasing your risk.

KYThrill · October 7, 2010

It's not like the URE happens automatically at a certain number of I/Os. It is a probability. And there is zero evidence of UREs happening and causing any type of problem. Otherwise we'd be seeing 1 or 2 parity errors on every few parity checks. We have NEVER seen this type of unexplained parity sync error.

Run preclear, keep your drive temps in the 20s or 30s, don't buy large batches of disks at the same time, and run monthly parity checks. Those are the best ways to have a healthy array.

Sure we have. There have been several people post who get parity errors on every check or every couple of checks. I think you've even posted in response to some of those. Naturally, everyone blames a cable, a PSU, or the motherboard, or the memory. And most of the time the OP returns, tries a few suggestions that don't work, and then they disappear. I don't know if that means the last suggestion fixed the problem, or they just gave up and switched to WHS.

Sometimes it is explained as a failed parity disk with reallocated sectors, but we never know if that disk is 6 months old or 6 years old. What caused the reallocation? A URE is certainly a possibility for the cause.

I mean there are 21 pages of hits for "failed parity" and 37 pages of hits for "parity error". At 30 posts per page, it is obviously a well talked about topic. Of course, sometimes it is a bad cable, a bad PSU, or a bad MB. I'm not suggesting that every parity failure or parity error is the result of URE, but odds are that some of them are. And my experience is that there are probably 10X more parity errors/failures in the real world than what are discussed on these forums. Most people just swap in their spare drive and go about their business and don't come here to post.

KYThrill · October 7, 2010

Microsoft agrees that statistically, URE is signifigant:

Vendor drive specifications predict an uncorrectable bit error rate (UER) every 1015 to 1016 bits read for SCSI and 1013 to 1015 bits read for various PATA and SATA drives. But, that is just the drive error rate. Architects must add the error rate for the disk controller, the cables, the PCI bus, the memory, and the processor, so observed uncorrectable bit errors will be more frequent than the drive level fault rate. Architects must also recognize that any of these errors can be masked by retry or other error correction strategies in the controller or operating system software. When you consider that the specified drive failure rates are more than ten times too optimistic, the actual uncorrectable error rate could be even worse than this one-in-10TB estimate.

However, Microsoft did do testing (on 17 250GB-400Gb drives) to try to verify the vendor specs. They setup four JBOD arrays (17 disks total, WD and Seagate). They randomly wrote 10GB at a time with a random data pattern to each array. Then they read it back and calculated checksum. The drives used were rated 10^14 and 10^15 for unrecoverable errors. They predicted there would be 112 drive errors if the drive ratings were correct.

There conclusion was a little unclear, as they stated from a programmer perspective, they only encountered 5 errors. But then they stated from a drive perspective, they encountered 30 errors. They don't really explain the difference in perspective. However, their ultimate conclusion was that it occurred less often than the drive spec would indicate. They only lost the entire 10GB file three times out of 65000 total reads. It was recovered using various methods on the other errors.

The capacity of all the drives used was 4.6TB. So statistically, each bit had been written to 140 times.

They also performed a test on 8 drives where they wrote a 100GB file to each drive, and then did a repeated read-only checksum on the file. So this test read the same bits over and over again. They had read 756TB of data (94.5 TB per drive) and not encountered any errors (each bit had been read 945 times).

This leads me to believe that the unrecoverable errors may be somewhat random in nature. In one instance, you have a drive where bits were read 945 times and there were zero errors. Then on other drives, you read each bit 140 times and had 30 errors on 17 drives (or average two errors per drive). So it isn't reading the same bit over and over that causes an error. It is some other mechanism.

It's nice to know that errors may not occur as often as drive ratings would indicate. They may be rating them like my employer rates some of its products. They are made in batches and the ratings given reflect the lowest yield for the batch. So every product sold out of the batch is guaranteed to meet the rating, but many out of the batch are much better and could be given a higher rating if we wanted to test them individually. Maybe the HDD ratings are the minimum for any drive they sell, but most drives will be much higher (fewer errors).

Joe L. · October 7, 2010

Sure we have. There have been several people post who get parity errors on every check or every couple of checks. I think you've even posted in response to some of those. Naturally, everyone blames a cable, a PSU, or the motherboard, or the memory. And most of the time the OP returns, tries a few suggestions that don't work, and then they disappear. I don't know if that means the last suggestion fixed the problem, or they just gave up and switched to WHS.

Other than 2 (possibly 3) hard disks that did not error, but did return occasional incorrect data those random parity errors have been memory. or the nforce4 chipset on the motherboard. As you now know, unRAID in combination with the SMART firmware handle UNC errors just fine. Many people will see them, and those un-readable sectors are quietly re-allocated by the drive when the correct data is re-generated by parity in combination with the other disks and re-written to the drive with the media error. Parity errors are not usually bad cables, but bad cables, or a bad/under-rated power supply can certainly cause disk errors.

Of course the posters disappear once their problem is resolved... Most sane people do not hang around here.

SSD · October 7, 2010

I mean there are 21 pages of hits for "failed parity" and 37 pages of hits for "parity error". At 30 posts per page, it is obviously a well talked about topic. Of course, sometimes it is a bad cable, a bad PSU, or a bad MB. I'm not suggesting that every parity failure or parity error is the result of URE, but odds are that some of them are. And my experience is that there are probably 10X more parity errors/failures in the real world than what are discussed on these forums. Most people just swap in their spare drive and go about their business and don't come here to post.

When people post about parity and other errors, they are usually posting about more than one is a blue moon. And research finds a cause (usually a power failure / hard shutdown, bad cabling or a bad drive with increasing reallocated sectors). Occasionally there have been people that have had problems that could not be isolated, and the motherboard has been blamed. But even in these situations, errors consistantly occurred. Even if it is hard to isolate the cause, it is not hard to make them happen.

A URE error (as I understand from this therad) would present very differently. Here is the kind of post we'd expect with this type of error. I challenge anyone to find such a post - ever. This is made up.

--------------------------

I've been running my array for months with no problems. A few days ago my monthly pariy check kicked off and reported one parity error. Surprised, I ran smart tests on each disk and saw no reallocated sectors or anything else unusual. My syslog also showed nothing unusual except the single parity error. The next night I ran another parity check and got another parity error - at exactly the same spot. So I thought I must have a problem on one of my disks at that specific sector, and decided to run a third parity check the following night. Completely clean. I have since run 3 more in a row, all completely clean. I now believe that the first parity check must have misread something, and resulted in unRAID falsely "correcting" parity, and the second check put things right again. Can anyone help me isolate which disk is returning faulty results or help my diagnose further?

--------------------------

As I said, this is completely made up. I've never seen one anything like it.

Sometimes it is explained as a failed parity disk with reallocated sectors, but we never know if that disk is 6 months old or 6 years old. What caused the reallocation? A URE is certainly a possibility for the cause.

This sounds like a very different definition of a URE. A URE can now take down a drive or create reallocated sectors? If the drive is detecting the problem, it is not a problem for the user. If that's what a URE is, then we're debating about nothing.

The following are some misc comments from previous posts in this thread ...

Microsoft agrees that statistically, URE is signifigant:

Could you please provide a reference to the article you were apprarently quoting about a test Micrsosoft ran.

But in my above example, lets say you pre-clear, build parity, populate and run two parity checks over two months. Statistically (with the information provided by the manufacturer) you will encounter a URE with a 2TB 10^14 drive. You can't predict what the effect of that URE will be. But lets say the drive/unRAID handles the URE transparently. Well, you will have another 6-7 months before you potentially encounter another URE. That isn't a worn out drive.

Please be careful with how you say things like this. These URE's happen based on a probability. If one occurs, anther could occur the same day or 20 years from now - anytime. You can't say "you will have another 6-7 months before you potentially encounter another URE". I enjoy debates like this one, but you lose credibiilty with this type of inaccuracy.

I didn't realize unRAID attempted to reconstruct data on the fly. I just though that if it encountered an error, you had to run a parity sync or rebuild to correct it. Nice to know that it does it on the fly. It sounds more like a traditional RAID array in this regard (RAID would rad the bad bit from parity on another disc, attempt to write it to the bad location, and if unsuccessful, would reallocate, and then pass the new "good" data on to whatever process was calling it).

unRAID ONLY does this if it encounters a read error on a drive. In such a situation it will read the values of all of the other disks at the corresponding offset and reconstruct the data. And it will also write that data back to the problematic drive to let SMART correct the problems. I can only remember a small handful of cases where users saw this happen, but it does happen and it does work. If this is a symptom of a URE, no problem.

Yep, miscorrection is rare, but it sucks. supposedly it can occur every 10^21 bits, but that is much less frequently than URE (6-7 orders of magnitude less frequent).

What is "miscorrection"? If UREs present some symptom (reallocated sectors, read errors, intrenal retries by the drive, etc.) they are not a worry. But if one of them lets wrong data come back to the OS with no symptoms, those are the scary ones. If the chances of that are 1 in 10^21 bits, I think I can live with that.

wsume99 · October 7, 2010

I don't have a copy of the MS report that you are speaking of but it seems as though you might be misinterpreting what it is saying.

Microsoft agrees that statistically, URE is signifigant

I don't agree that it is statistically significant. In fact the data that you cite is problematic. You quote MS...

When you consider that the specified drive failure rates are more than ten times too optimistic, the actual uncorrectable error rate could be even worse than this one-in-10TB estimate.

What basis does MS have to say that drive failure rates are more than 10 times too optimistic? Both tests conducted by MS yielded an actual URE rate that was much lower than the vendor's predicted URE rate. This is supported by their ultimate conclusion.

However, their ultimate conclusion was that it occurred less often than the drive spec would indicate.

There conclusion was a little unclear, as they stated from a programmer perspective, they only encountered 5 errors. But then they stated from a drive perspective, they encountered 30 errors. They don't really explain the difference in perspective.

Seems like there is a simple explanation for their findings. The programmer perspective they speak of is how often the drive cannot provide an accurate response to a query from an external source (i.e. program request for data). The drive perspective would be internal read erros that occurred, some of which were masked at the programmer level because the read was probably retried and accomplished successfully. So my interpretation of the data is that the drives experienced 30 raw read errors, 5 of which could not be corrected and became UREs.

The capacity of all the drives used was 4.6TB. So statistically, each bit had been written to 140 times.

ON AVERAGE each bit had been written to 140 times, however the test used 10GB random writes in a random pattern. So STATISTICALLY speaking there would be many bits that experienced very few reads(i.e. <10) and likewise many other that had much more than 140 reads. Heck statistically there could be some bits with over 100,000 reads. Perhaps the errors only occurred on the bits that experienced an unusually high number of reads.

In one instance, you have a drive where bits were read 945 times and there were zero errors.

Correct. Based on the test it is a fact that every bit was read 945 times.

Then on other drives, you read each bit 140 times and had 30 errors on 17 drives (or average two errors per drive).

Wrong. As I said above, on average each bit was read 140 times, however we know that due to the randomness of the reads that some bits would experience much more than 140 reads. In fact there are many that probably exceeded 945 reads (the frequency seen in the second test). And some that were even higher.

So it isn't reading the same bit over and over that causes an error. It is some other mechanism.

I disagree. There is insufficient evidence to support such a conclusion, but I am willing to agree that it might not be the only contributor. I'm surprised that the almighty MS did not have a better DOE (Design of Experiment). They ran two tests and based on the outcome of both cannot determine if there is a correlation between the numer of times a specific bit is read and UREs. That's lousy engineering. No wonder their products are so overpriced and rife with defects. They are probably wasting lots of money doing stupid things like this instead of making their products more robust.

KYThrill · October 7, 2010

When people post about parity and other errors, they are usually posting about more than one is a blue moon. And research finds a cause (usually a power failure / hard shutdown, bad cabling or a bad drive with increasing reallocated sectors). Occasionally there have been people that have had problems that could not be isolated, and the motherboard has been blamed. But even in these situations, errors consistantly occurred. Even if it is hard to isolate the cause, it is not hard to make them happen.

A URE error (as I understand from this therad) would present very differently. Here is the kind of post we'd expect with this type of error. I challenge anyone to find such a post - ever. This is made up.

--------------------------

I've been running my array for months with no problems. A few days ago my monthly pariy check kicked off and reported one parity error. Surprised, I ran smart tests on each disk and saw no reallocated sectors or anything else unusual. My syslog also showed nothing unusual except the single parity error. The next night I ran another parity check and got another parity error - at exactly the same spot. So I thought I must have a problem on one of my disks at that specific sector, and decided to run a third parity check the following night. Completely clean. I have since run 3 more in a row, all completely clean. I now believe that the first parity check must have misread something, and resulted in unRAID falsely "correcting" parity, and the second check put things right again. Can anyone help me isolate which disk is returning faulty results or help my diagnose further?

--------------------------

As I said, this is completely made up. I've never seen one anything like it.

See, you are misunderstanding URE and that wouldn't be an example of what you would see. Let's look at WD drives. A WD drive is trying to read a bit of data and it encounters an error. The first thing it will do is kick in its ECC and try to correct the error. If it is successful, you do not get a URE. If it fails once, it will keep trying several times. If all attempts fail, the drive will then log a URE error and pass that on to the system where it is installed. The WD drive will then mark that sector as bad, and reallocate the data from the sector. Now, if you have something like unRAID, it will attempt to recover that unreadable bit from parity and write it to the new reallocated sector. If successful, you loose nothing and the whole process is transparent to you. On your next smartctl, you would see a reallocated sector in SMART.

If you didn't have unRAID, you could loose that bit and potentially whatever file/folder it is attached to. Or other software you can run may completely correct the error and you end up loosing nothing. But back to unRAID, if the error was a read from your parity drive, I'm not certain what would happen. Would unRAID rebuild parity on the fly for that one bit? I thought it required a complete parity rebuild if the parity drive encounters a URE, but maybe it doesn't. Joe?

If it does not and must rebuild the entire parity drive, then you end up with a situation where parity is bad and needs a rebuild. But your other drives, statistically, are primed to encounter a URE too. There is no valid parity (you started a rebuild, invalidating the old parity) from which to correct their URE errors. So you potentially loose whatever that bit is attached to. There may be other manual or software based methods to try to correct the problem, but it will be a hassle and in the end, may not work.

Now other drives do not automatically mark it as a bad sector. They will internally try to determine if the bit is really bad, and can determine it is not and won't reallocate the sector. But my understanding on WD drives is a URE will always result in a reallocated sector.

Please be careful with how you say things like this. These URE's happen based on a probability. If one occurs, anther could occur the same day or 20 years from now - anytime. You can't say "you will have another 6-7 months before you potentially encounter another URE". I enjoy debates like this one, but you lose credibiilty with this type of inaccuracy.

It is not inaccurate. It is inaccurate to assume that in more cases than not, the statistics will not play out as defined (unless their is a fault in determining the statistics). If I say that 50% of the time, a coin toss will come up heads. It is inaccurate to assume that every time I flip it, it will be tails. Sure, the next time I flip it, it could be tails. I could flip tails 10 times in a row. But statistically, as N approaches infinity, the odds are 50/50. To say I loose credibility by saying that after 100000 coin flips, 50% of them will be heads, because it is possible they all 100000 could have been tails, is what I find inaccurate. At that seems to be what you are suggesting.

True, you could have back to back URE's, but the probability of that happening is less than the probability of having them spaced further apart. The drive rating is a probability, and in the absence of contrary evidence, you must assume that the probability is your most likely scenario and define your risk based on the most likely scenario. It is more likely you will have URE errors far apart (manufacturers say 10^14 or 10^15 reads) than you will have them back to back.

What is "miscorrection"? If UREs present some symptom (reallocated sectors, read errors, intrenal retries by the drive, etc.) they are not a worry. But if one of them lets wrong data come back to the OS with no symptoms, those are the scary ones. If the chances of that are 1 in 10^21 bits, I think I can live with that.

Miscorrection is when a drive thinks it has recovered an error (read or write) with its ECC. However, the drive actually corrected to the wrong value. So now you have corrupted data on your drive, and your drive believes that it is the correct data and lets the wrong data come back. The problem can be compounded if the error is because of a bad sector. The drive thinks that the error is corrected (but it isn't), so the drive does not mark that sector as bad. Then you potentially can loose everything in that sector because it did not get reallocated. But yes, drive manufacturers do not publish that spec, but verify the probability as 10^21 bits.

KYThrill · October 7, 2010

What basis does MS have to say that drive failure rates are more than 10 times too optimistic?

It was based on the number of read errors Microsoft encountered in their web applications over the course of a year. They had 10 times more read errors than the drive spec would have predicted. So this data was the basis for performing the test.

Seems like there is a simple explanation for their findings. The programmer perspective they speak of is how often the drive cannot provide an accurate response to a query from an external source (i.e. program request for data). The drive perspective would be internal read erros that occurred, some of which were masked at the programmer level because the read was probably retried and accomplished successfully. So my interpretation of the data is that the drives experienced 30 raw read errors, 5 of which could not be corrected and became UREs.

Well, the paper stated "30 unrecoverable read errors" from the drive perspective. So I don't think they were raw read errors. What I think it means (this is just my stab at it as the paper didn't say), is that they had 30 URE's on the drive (internal). They then used either software based or manual methods external to the drive to try and recover the error. In 25 of the 30 cases, these external methods (no idea what they were) worked and data was recovered. In 5 cases, it could not be recovered. In 3 of those 5, the entire data file was lost due to the corrupted bit.

ON AVERAGE each bit had been written to 140 times, however the test used 10GB random writes in a random pattern. So STATISTICALLY speaking there would be many bits that experienced very few reads(i.e. <10) and likewise many other that had much more than 140 reads. Heck statistically there could be some bits with over 100,000 reads. Perhaps the errors only occurred on the bits that experienced an unusually high number of reads.

I don't interpret it that way. The way I read it is that they created these 10GB files from a random ordering of bits (ie, no two files end up being the same) on the host system. Then they checksum that file, and then copy it to the JBOD (so it will be a large sequential transfer). Read and calculate the checksum again. The JBOD should write one file, and then the next, and the next, until the first drive is full. Then data goes to the second drive, etc. Then when the JBOD is full, it goes back and starts overwriting. In this method every bit would roughly be written to the same amount of times. The only bits that may get more reads would be in the Master File Table or some other drive system related location.

I think writing 650TB would have taken way to long if they were random writes to bits spread all over the JBOD, which is what I think you are suggesting. Even a good drive would only be 1 MB/s for small random writes, and poor drives might be .3-.4 MB/s. At 1 MB/s, it would take 20 years to write this randomly to a JBOD (still 5 years if you assume the file is spread across 4 discs and all 4 discs wrote at the same time).

I think the method I described above is the one that was used.

I disagree. There is insufficient evidence to support such a conclusion, but I am willing to agree that it might not be the only contributor. I'm surprised that the almighty MS did not have a better DOE (Design of Experiment). They ran two tests and based on the outcome of both cannot determine if there is a correlation between the numer of times a specific bit is read and UREs. That's lousy engineering. No wonder their products are so overpriced and rife with defects. They are probably wasting lots of money doing stupid things like this instead of making their products more robust.

I agree their experiment was lacking. First, they didn't explain it very well. It also probably wasn't the best design. For example, one JBOD was 1TB and contained discs with a 10^15 URE rating. The other three JBOD's were 1.2 TB and contained 10^14 drives. The read/wrote the 10GB file to each of the 10^14 JBODS a little under 11000 times. So in theory, you should read/write to the 10^15 drives 10X that number, but the only read/wrote to them 35,000 times. So of course none of those drives should have statistically had a URE.

Then they also didn't disclose on which drives the 30 URE's occurred. For all we know, all 30 could have been on one bad drive, or only half the drives, etc. They don't say, so we will never know.

Joe L. · October 7, 2010

See, you are misunderstanding URE and that wouldn't be an example of what you would see. Let's look at WD drives. A WD drive is trying to read a bit of data and it encounters an error. The first thing it will do is kick in its ECC and try to correct the error.

all modern drive do this. The "bits" written to the tracks are NOT what we are saving, but a bit pattern of ones and zeros that can be used in combination with error correction bits at the end of the sector to verify it. If the bits read do not agree with the error correcting checksum on the sector it is re-read. This is a "raw" read error. All drives have them, some report them to us, others do not.

If it is successful, you do not get a URE. If it fails once, it will keep trying several times. If all attempts fail, the drive will then log a URE error and pass that on to the system where it is installed.

That is a "read" error (as far as the OS is concerned.

The WD drive will then mark that sector as bad, and reallocate the data from the sector.

It will mark the sector as bad and pending re-allocation. It does not know if the sector is bad, or if it was just written poorly. It will NOT re-allocate the sector until it is written to, since it currently has NO idea what the contents should be. When a write occurs to that sector it will first try to write to the original sector and see if it can be re-read. If it can no re-allocation occurs. If it cannot write to the original sector successfully it is re-allocated to one from its pool of spare sectors.

Now, if you have something like unRAID, it will attempt to recover that unreadable bit from parity and write it to the new reallocated sector.

It will read the corresponding sectors from the parity disk AND ALL THE OTHER DATA disks to re-construct the sector it could not read. It is not just the parity drive, all the other drives must be present and working.

If successful, you loose nothing and the whole process is transparent to you. On your next smartctl, you would see a reallocated sector in SMART.

Only if the original sector was un-writable. In many cases it is not re-allocated.

If you didn't have unRAID, you could loose that bit and potentially whatever file/folder it is attached to.

True... but the un-readable sector may have nothing on it that matters to the OS. It might be a movie or piece of music, and all you might hear is a click, or see some pixels on the screen that would not be there. It might be affiliated with some feature in a program you do not use, and will never use.

Or other software you can run may completely correct the error and you end up loosing nothing.

True.

But back to unRAID, if the error was a read from your parity drive, I'm not certain what would happen. Would unRAID rebuild parity on the fly for that one bit? I thought it required a complete parity rebuild if the parity drive encounters a URE, but maybe it doesn't. Joe?

When reading disks they are all treated identically (according to Tom at Lime-tech) So, a read of the parity disk that returns a "read" failure will result in the data that should be on it being re-constructed from all the other disks. In this case, they are all the data disks. That data is then written back to the disk that failed the "read". So, yes, the parity disk is fixed if it reports a "read" failure. (The process actually works on a "stripe" basis, I think a stripe is 128k) Therefore, the read of 128k fails, and 128k is re-constructed and subsequently written to the disk.

If it does not and must rebuild the entire parity drive, then you end up with a situation where parity is bad and needs a rebuild. But your other drives, statistically, are primed to encounter a URE too. There is no valid parity (you started a rebuild, invalidating the old parity) from which to correct their URE errors. So you potentially loose whatever that bit is attached to. There may be other manual or software based methods to try to correct the problem, but it will be a hassle and in the end, may not work.

As I said, it does not work that way. Parity is re-written to the parity drive ans its sector re-allocated if needed by the SMART firmware on the disk. Read failures on the parity drive are treated identically to read failures on the data drives.

Now other drives do not automatically mark it as a bad sector. They will internally try to determine if the bit is really bad, and can determine it is not and won't reallocate the sector. But my understanding on WD drives is a URE will always result in a reallocated sector.

It is possible that WD would have to pay license fees to use that reconstruction logic and elect not to, but instead to deplete their pool of spare sectors instead. In any case, it is still pending re-allocation until next written.

What is "miscorrection"? If UREs present some symptom (reallocated sectors, read errors, intrenal retries by the drive, etc.) they are not a worry. But if one of them lets wrong data come back to the OS with no symptoms, those are the scary ones. If the chances of that are 1 in 10^21 bits, I think I can live with that.

The internal correction of a bit to a wrong value is not what we've seen in the few times where a disk reported bad data, but no errors. If it had been internally corrected on the disk it would have been a single parity error and the user might not have noticed too much. Instead it has been a few disks where the data returned to the OS gets corrupted, but randomly, as if the internal memory cache in the disk sometimes flips a bit when nobody is looking and in between the various checksums that are performed. The ERC when reading the data from the disk platter into the disk's cache was correct. The checksum used then communicating the data to the disk controller across the SATA cable was correct, but the internal cache memory was flaky. No error was detected or reported by the disk.

so far, this class of error has been a real disk problem. In these cases we've seen no disk errors, no smart data reflecting an issue, but repeated reads of the same file end up with different MD5 checksums, and repeated parity checks end up with repeated random parity errors.

The first disk we saw with this behavior also did so in an entirely different PC as well, so it was a bad disk and not power related. (It could be one specific disk that is more sensitive to noise on the 12 Volt of 5 Volt power supply line, so it is possible for a disk to act fine in one PC and not in another, or fine until an additional load is placed on the power supply.)

These disks that have flaky internal electronics, but no errors, are the disks that will bring a grown IT person to tears.

Joe L.

joeyke87 · October 7, 2010

Who knew i would get i discussion like this, doesn't matter it's very interesting but also a bit frustrating for me.

The only question i have left: is it okay to buy 2tb drives (the good ones) or should i really go for 1tb max? In the end having 29tb or 58 tb is a big difference. If i would go for the first option than i would already need to fill up 9 of the 30 slots. But if that means that the system will work really better than i would really consider this.

So i could use any advice on this, the other components are ready to order. Btw i also got a reply from Supermicro:

Suppose this should work due to it has slots for these cards, however we did not test such with Asus board.

So again, it should work.......

Joe L. · October 7, 2010

Who knew i would get i discussion like this, doesn't matter it's very interesting but also a bit frustrating for me.

The only question i have left: is it okay to buy 2tb drives (the good ones) or should i really go for 1tb max? In the end having 29tb or 58 tb is a big difference. If i would go for the first option than i would already need to fill up 9 of the 30 slots. But if that means that the system will work really better than i would really consider this.

So i could use any advice on this, the other components are ready to order. Btw i also got a reply from Supermicro:

Suppose this should work due to it has slots for these cards, however we did not test such with Asus board.

So again, it should work.......

I would purchase the 2TB drives. I would stay away from the "advanced format drives unless they have the "jumper" tat will make them more efficient in the unRAID server.

Joe L.

(unless there is a really good deal on a 1.5TB drive. The smaller ones just do not make sense today cost-wise. I have a mixture of 2TB and 1.5 TB drives in my newer server, and multiple IDE drives from 400Gig - 750Gig in my older one... along with a pair of SATA 1TB drives... the biggest available just a few years ago.)

Rajahal · October 7, 2010

Talk about a thread highjack

I think the majority of the people here would recommend 2 TB drives. Your best choices at the moment are Samsung F3 (which are NOT advanced format), and WD Green EARS (which ARE advanced format and require a jumper on pins 7-. You could also take a gamble on WD Green EADS which traditionally are not advanced format (so no jumper needed), however, we've had a few reports of advanced format models of the EADS drives. So you would just have to see what you got, and use a jumper or not as appropriate.

I recommend against Seagate LPs for the simple fact that upgrading the firmware is a pain in the ass. KYThrill says they also have lower URE ratings, so that just doubly confirms my recommendation against them.

Personally, I would just go for the WD Green EARS and expect to add jumpers because that is the cheapest option.

And again, do not buy more than one drive from the same vendor at the same time.

joeyke87 · October 7, 2010

Finally placed the complete order:

2 x Icy Dock 5 Bays SATA II Enclosure

1 x AMD Sempron 140 Single Core 2.7GHz

1 x Corsair HX850W Power Supply

1 x Kingston 2GB 1333MHz DDR3 Non-ECC CL9 DIMM (Kit of 2)

1 x Asus M4A89GTD PRO

4 x Noctua NF-P12

1 x Western digital Caviar Green 2TB, 64 MB

So for now all i can do is wait, i will post an update once everything is installed (may take a couple of days). All i have to do left is buy 4 more drives from different vendors. Then i can start with 1 parity drive and 4 data drives, all on the onboard sata. Last thing is buying the Supermicro cards soon. (oh and some jumpers i guess).

Thanks everybody for all the input

Rajahal · October 7, 2010

You might want to check with local shops for the jumpers, since shipping something so cheap will likely be more expensive than the item itself. I found a 100 pack for $12 locally.

ohlwiler · October 7, 2010

Even though I flashed the firmware on my Seagate 2T LPs I'm not convinced it is required. Most people will have no problem flashing the firmware if they have the right tools. If you can attach the drive directly to the motherboard and have a CDROM to boot from with no other drives attached, most of the time you will have no problems. I know Rajahal never could flash his Seagate drive(s), but a lot of others have had success. I like WDs, Seagates and Samsungs (F3). The Hitachi is great for a parity drive.

RGL · October 7, 2010

I have seen some analyses of failure rates on drives and the resulting (im)probability of losing data. Has anyone looked at the probability of a heavily loaded box with many drives attached suffering a catastophic failure such that the whole thing goes up in smoke? It would not be a first.

RGL

wsume99 · October 7, 2010

You might want to check with local shops for the jumpers, since shipping something so cheap will likely be more expensive than the item itself. I found a 100 pack for $12 locally.

This is so right. I forgot to remove the jumper from the WD20EARS drive I just RMA'd (oops), so now I'm out of jumpers. I found a 24 pack of jumpers online here for $1.29 and they wanted $6 to ship it to me. Now I'm looking for a local source of supply.

KYThrill · October 7, 2010

Who knew i would get i discussion like this, doesn't matter it's very interesting but also a bit frustrating for me.

The only question i have left: is it okay to buy 2tb drives (the good ones) or should i really go for 1tb max? In the end having 29tb or 58 tb is a big difference. If i would go for the first option than i would already need to fill up 9 of the 30 slots. But if that means that the system will work really better than i would really consider this.

So i could use any advice on this, the other components are ready to order. Btw i also got a reply from Supermicro:

Suppose this should work due to it has slots for these cards, however we did not test such with Asus board.

So again, it should work.......

I would purchase the 2TB drives. I would stay away from the "advanced format drives unless they have the "jumper" tat will make them more efficient in the unRAID server.

Joe L.

(unless there is a really good deal on a 1.5TB drive. The smaller ones just do not make sense today cost-wise. I have a mixture of 2TB and 1.5 TB drives in my newer server, and multiple IDE drives from 400Gig - 750Gig in my older one... along with a pair of SATA 1TB drives... the biggest available just a few years ago.)

Like I said, it really depends on how much space you need, which you didn't specify originally. Too many new drives can be worse than any URE problems because new drives represent the potential for a total drive failure. You can withstand one total drive failure, but not two. Remember, the industry trend seems to be that 3.1% of new drives fail within three months. So 12 drives means a 10% chance of having two failed drives within a quarter (5% chance of double failure withing a year). Again, much smaller odds that both are at the same time, assuming you buy different brands from different vendors. But if you buy all the drives from the same place, you probably have a 10% chance of a double failure.

I would never build a new unRAID with more than 12 new drives. So at 1TB per pop, if you see yourself needing more than 11TB in the next quarter, then you will need some 2TB drives, but stay away from Seagates. Samsung F2's (which have some SMART firmware bugs I think, but nothing major, and a lower load/unload cycle rating), F3's, and WD EAD's are your best option. And most people seem to be having good luck with jumpered EARS drives, but I'm a little iffy on them as they are too new a technology. All these have a 10^15 URE rating.

I also didn't realize unRAID could rebuild a parity error on the fly. My understanding of RAID is that if the array encountered the equivalent of a URE (UBE), that the array would be degraded and the entire array needs to be rebuilt. I thought this was the same for unRAID, but apparently it can just partially rebuild. Which RAID4 is really not used, so I don't fully understand how it operates (no experience), and Joe says unRAID is similar to RAID4. In that scenario, your risk is now much less, as your rebuild is minimal.

This is what I've always done in RAID, and now with unRAID. First, no build with more than 12 new drives. I always start with smaller drives (1TB), add until I hit my 12 drive max, and then start replacing with bigger drives as needed. I also sell my drives and replace with new ones around the 2 year mark on a drive. Typically, the drives are still reporting a clean SMART, still have a year or more of factory warranty, and so I can still get a decent price for them on E-Bay (list them up with a SMART report). 8% of drives fail when they are 2 years old, so I try to get rid of my drives at their 2 year birthday. So if you have the same 12 discs from your build, you have a 63% of a drive failing between 2-3 years of age. Then you keep those same drives (replace the failed one) and 9% of drives fail when they are 3, so between years 3-4, you are looking at a 67% chance of another drive failure.

Here is a good paper by Google on failure rates:

http://docs.google.com/viewer?a=v&q=cache:q7nqEvSwhZIJ:labs.google.com/papers/disk_failures.pdf+3%25+hard+drive+failure&hl=en&gl=us&pid=bl&srcid=ADGEEShvzuvEYXdh1yhO0lkXfTbeRY56hp7UC0GJysAZRtxmUu_ftFJlKMx08F_72uDEXlI4tiFJme5OLduNdpGRYeMnrOzabAUUpmf5Z_idY2urJldpF9ONfykBSbHIow18LW4k7EdO&sig=AHIEtbRzYROfTfJxBF4S0K9uX_31Dwra2A

Notice that heavy utilization of a drive increases the 3 month failure rate to 10%. Likewise, light utilization increased it to 4%.

I also disagree that it is too expensive to buy 1TB. I recently added a 1TB Samsung for $.0467/GB. The current deals on EARS drives has been $95 for 2TB, or $.0475/GB. So my 1TB addition was actually cheaper per GB. As with anything, you just have to keep an eye out for deals. It is true that currently I'm not seeing anything better than $.055/GB for 1TB models. New Egg recently had (may still have) 1.5TB models for $70, or $.0467/GB.

Rajahal · October 7, 2010

So KYThrill, what 1 TB drives do you buy? The EADS models? Because there are 1 TB EARS out now as well. I think there are even 500 GB EARS.

While the price per GB of a 1 TB may be cheaper or equivalent to a 2 TB, the overhead cost is still higher since you will require twice as many SATA ports, drive bays, etc and you will use twice as much power.

Joe L. · October 7, 2010

I also didn't realize unRAID could rebuild a parity error on the fly. My understanding of RAID is that if the array encountered the equivalent of a URE (UBE), that the array would be degraded and the entire array needs to be rebuilt. I thought this was the same for unRAID, but apparently it can just partially rebuild. Which RAID4 is really not used, so I don't fully understand how it operates (no experience), and Joe says unRAID is similar to RAID4. In that scenario, your risk is now much less, as your rebuild is minimal.

That is not exactly what I said. I said a "read" error on the parity drive during a parity check it treated exactly like a read error on any other drive at any time.

The post where Tom @ limetech described it is here:

http://lime-technology.com/forum/index.php?topic=6948.msg73899;topicseen#msg73899

He said:

The driver operates on disks in terms of "columns", that is, each disk is a column, and it so happens that column 0 is parity, but during a parity check the driver has no knowledge really of this. It just issues reads for each enabled disk in the column, waits for them all to finish, and then xor's all the data together to see if the result is 0. If a read fails, then there's a different code path where it instead xors all the data together except for the failing column; it then writes this resultant data to the failing column disk. So whether the column that failed is parity or a data disk, makes no difference. The applicable code is on lines 1171-1230 in unraid.c.

This would then indicate a regular parity check will self-heal those un-readable sectors. As long as the SMART firmware on the disk does its proper task of re-allocating the sector when written after a URE occurs and it reports a "read" failure to the OS.

The driver does not check "parity" on every read of every sector... It could not and still allow the other disks to spin down. It could not and maintain any kind of performance. The logic described above is only if a drive, parity OR data, reports a "read" failure.

Joe L.

wsume99 · October 7, 2010

Here is a good paper by Google on failure rates:
http://docs.google.com/viewer?a=v&q=cache:q7nqEvSwhZIJ:labs.google.com/papers/disk_failures.pdf+3%25+hard+drive+failure&hl=en&gl=us&pid=bl&srcid=ADGEEShvzuvEYXdh1yhO0lkXfTbeRY56hp7UC0GJysAZRtxmUu_ftFJlKMx08F_72uDEXlI4tiFJme5OLduNdpGRYeMnrOzabAUUpmf5Z_idY2urJldpF9ONfykBSbHIow18LW4k7EdO&sig=AHIEtbRzYROfTfJxBF4S0K9uX_31Dwra2A

I've actually scanned through that paper before. I wish they would have published the data by mfg so we would know which drives performed the best. But they purposefully decided to mask the mfg info from the results (most likely to avoid any legal liability) and only do analysis on the aggregate data. This time I read it in more detail. There is some really interesting stuff in there.

Before being put into production, all disk drives go through a short burn-in process, which consists of a combination of read/write stress tests designed to catch many of the most common assembly, con?guration, or component-level problems. The data shown here do not include the fallout from this phase, but instead begin when the systems are of?cially commissioned for use. Therefore our data should be consistent with what a regular end-user should see, since most equipment manufacturers put their systems through similar tests before shipment.

I believe that most unRAID users purchase OEM drives which means that the equipment manufacturer's stress test would not be performed. So if you're using an OEM drive and are hoping to see similar failure rates then you'd better be doing some burn-in testing. Even more of a reason to run preclear on a new OEM drive.

One of our key ?ndings has been the lack of a consistent pattern of higher failure rates for higher temperature drives or for those drives at higher utilization levels.

So their findings indicate that high utilization is not correlated to higher failure rates. Doesn't this disagree with your point KYThrill? I believe you are trying to make the point that the more you use (i.e. read from) a drive the sooner you are likely to encounter an error which will eventually lead to a failure.

PeterB · October 8, 2010

One of our key ?ndings has been the lack of a consistent pattern of higher failure rates for higher temperature drives or for those drives at higher utilization levels.

So their findings indicate that high utilization is not correlated to higher failure rates. Doesn't this disagree with your point KYThrill? I believe you are trying to make the point that the more you use (i.e. read from) a drive the sooner you are likely to encounter an error which will eventually lead to a failure.

My experience, from more than 30 years working in the IT industry, is that most drive failures are age-related, not use-related.

This concentration on read errors seems to be ignoring any write activity. If a failure really was going to depend on the number of reads, being, in some way, wear related (This itself is illogical - can I read a single sector 10^15 times before I would expect it to exhibit a hard error, as long as I don't read any other sectors??? If I spread those reads over two sectors, does a failure become more, or less, likely???)) do the number of writes have no effect on the likelihood of a read error?

SSD · October 8, 2010

See, you are misunderstanding URE and that wouldn't be an example of what you would see. Let's look at WD drives. A WD drive is trying to read a bit of data and it encounters an error. The first thing it will do is kick in its ECC and try to correct the error. If it is successful, you do not get a URE. If it fails once, it will keep trying several times. If all attempts fail, the drive will then log a URE error and pass that on to the system where it is installed. The WD drive will then mark that sector as bad, and reallocate the data from the sector. Now, if you have something like unRAID, it will attempt to recover that unreadable bit from parity and write it to the new reallocated sector. If successful, you loose nothing and the whole process is transparent to you. On your next smartctl, you would see a reallocated sector in SMART.

Thanks for clearing that up. Now that you've defined it clearly, I am much less concerned. So a URE is basically a bad read (or write) that is typically handled by the drive itself via retries, but could result in an unrecoverable read that gets back to the OS.

If you didn't have unRAID, you could loose that bit and potentially whatever file/folder it is attached to. Or other software you can run may completely correct the error and you end up loosing nothing. But back to unRAID, if the error was a read from your parity drive, I'm not certain what would happen. Would unRAID rebuild parity on the fly for that one bit? I thought it required a complete parity rebuild if the parity drive encounters a URE, but maybe it doesn't. Joe?

Understood - I've had this happen before. Spinrite is a tool I used to use periodically when I had such as error. It did a pretty good job of reading all or most of the bits in a bad sector and using SMART to reallocate the sector.

As far as correcting parity on a read, the only time parity is normally read (unless the array is simulating a disk in which case correction can't happen), is ironically when the disk is being written to. Unread first reads the block about to be overwritten from the data disk and the corresponding block from parity. With these two blocks and the knowledge of the block of data it wants to write, unRAID can recompute the new parity block and then write the new data and the new parity block to the respective disks. So if the parity block could not be read due to a bad sector (unhandled URE), it could read the corresponding data blocks from all of the other drives and determine what parity should have been.

If it does not and must rebuild the entire parity drive, then you end up with a situation where parity is bad and needs a rebuild. But your other drives, statistically, are primed to encounter a URE too. There is no valid parity (you started a rebuild, invalidating the old parity) from which to correct their URE errors. So you potentially loose whatever that bit is attached to. There may be other manual or software based methods to try to correct the problem, but it will be a hassle and in the end, may not work.

Now other drives do not automatically mark it as a bad sector. They will internally try to determine if the bit is really bad, and can determine it is not and won't reallocate the sector. But my understanding on WD drives is a URE will always result in a reallocated sector.

It does, so this is not relevant.

Please be careful with how you say things like this. These URE's happen based on a probability. If one occurs, anther could occur the same day or 20 years from now - anytime. You can't say "you will have another 6-7 months before you potentially encounter another URE". I enjoy debates like this one, but you lose credibiilty with this type of inaccuracy.

It is not inaccurate. It is inaccurate to assume that in more cases than not, the statistics will not play out as defined (unless their is a fault in determining the statistics). If I say that 50% of the time, a coin toss will come up heads. It is inaccurate to assume that every time I flip it, it will be tails. Sure, the next time I flip it, it could be tails. I could flip tails 10 times in a row. But statistically, as N approaches infinity, the odds are 50/50. To say I loose credibility by saying that after 100000 coin flips, 50% of them will be heads, because it is possible they all 100000 could have been tails, is what I find inaccurate. At that seems to be what you are suggesting.

But you are not saying that AFTER 100,000 flips 50% would have been heads. With such a large number of samples that is going to be very close to true some humongous percentage of the time. You are trying to say that AFTER an unlikely event occurs, that chances of that unlikely event occurring again will be reduced for some period of time. This is just not true.

If you flip a coin 10 times. 100 times, or 1,000,000 times, not matter what the results, the chance of the next flip being heads is still 50%. Same is true with your UREs. Each read has an equal chance of producing a URE - 1 in 10^14 chance. You would never get a free ride of "6-7 months" after you got one. The idea that you are somehow "using up" something based on incrementing the number of trials is just not true. If I ran a preclears that resulted in 10^14 I/Os, and got not a single URE, the chances of that next I/O producing a URE is still 1 in 10^14 - the same as if I hadn't run a preclear at all.

True, you could have back to back URE's, but the probability of that happening is less than the probability of having them spaced further apart.

The chances of having 2 UREs in 2 trials is not as great as the chances of having 2 UREs in 10,000,000 trials. But the chance of any one trial producing a URE is exactly the same - be it the 1st, 2nd, 99th, or millionth - regardless of what has happened in the past.

The drive rating is a probability, and in the absence of contrary evidence, you must assume that the probability is your most likely scenario and define your risk based on the most likely scenario. It is more likely you will have URE errors far apart (manufacturers say 10^14 or 10^15 reads) than you will have them back to back.

It is not likely at all that your failures are going to follow any sort of predictable pattern. True, over the long term the average I/Os between UREs wil approach a theoretically computed #, but the next URE could come sooner or later - you have no idea.

Miscorrection is when a drive thinks it has recovered an error (read or write) with its ECC. However, the drive actually corrected to the wrong value. So now you have corrupted data on your drive, and your drive believes that it is the correct data and lets the wrong data come back. The problem can be compounded if the error is because of a bad sector. The drive thinks that the error is corrected (but it isn't), so the drive does not mark that sector as bad. Then you potentially can loose everything in that sector because it did not get reallocated. But yes, drive manufacturers do not publish that spec, but verify the probability as 10^21 bits.

This is the truly scary scenario that I thought you were saying happened 1 in 10^14 times. 1 in 10^21 is not nearly as scary. That is one in 10 million times less likely than 1 in 10^14. It is extremely unlikely that any of us would ever seen one! But I still wonder if 1 in 10^21 is actually correct.

Final throught on the topic - we all know that weak spots on drives can and do fail. Drives can and do try to fix these failures by performing retries and otherwise looking out for failed or failing sectors, and use SMART to reallocate them. But unRAID provides the additional protection that even if sectors start to fail, it can correct them and keep your data safe.

None of this thread has any impact on whether you should run preclears or not. If you run one or don't run one, the next I/O has the same chance of a producing a URE. Remember that UREs are not the only force at play. The fact that drives have a greater failure rate early in their lifespan should encourage users to give them a thorough test before trusting your data. Before preclear it was faily common to have users reporting SMART errors and drive failures shortly after putting drives in service. I have yet to see anyone say they ran preclear and shortly after the drive failed.

KYThrill · October 8, 2010

So their findings indicate that high utilization is not correlated to higher failure rates. Doesn't this disagree with your point KYThrill? I believe you are trying to make the point that the more you use (i.e. read from) a drive the sooner you are likely to encounter an error which will eventually lead to a failure.

No. Not at all. First, I never claimed the failure rate was higher or lower. Failure rate does not equal URE rate. I merely stated what the published rate of URE was for some drives and that you are statistically likely to encounter a URE on each drive just in the process of launching your unRAID, if you use 10^14 2TB drives.

Second, the process I described (pre-clear, build parity, fill disks with data) would not be considered high utilization. The Google paper doesn't include their definition of high utilization. However, I think a better example of high utilization would be the IBM test I mentioned, where they read 94.5 TB from the drive in the form of a 100GB file. Those were 250GB drives, so they did a large read (40% of the drive) from the drive 945 times using a process that ran 24/7 (took a month to complete the test). I would consider that high utilization.

The Google data is compiled from the hard drives used in Office PC's, their web servers, etc. Basically, Google has a massive base of hard drives that are managed by their IT staff. They decide to do a study and track the life and performance of this massive amount of various HDD's under their control, and then report their findings. So "high utilization" would be from Google's perspective. So I would guess their web servers are probably what they are calling "high utilization".

New build, need to figure out some components...

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived