Reallocated sectors on Parity


Recommended Posts

Parity took longer than normal and dashboard shows a thumbs down. I read some posts that say 4 reallocated sectors isn't bad and to watch it after next parity check to see if it goes up. Is that the general consensus?

 

Parity attached to port: sdh

ID# ATTRIBUTE NAME FLAG VALUE WORST THRESH TYPE UPDATED FAILED RAW VALUE

1 Raw Read Error Rate 0x002f 200 200 051 Pre-fail Always Never 11776

3 Spin Up Time 0x0027 168 141 021 Pre-fail Always Never 8566

4 Start Stop Count 0x0032 098 098 000 Old age Always Never 2874

5 Reallocated Sector Ct 0x0033 200 200 140 Pre-fail Always Never 4

7 Seek Error Rate 0x002e 200 200 000 Old age Always Never 0

9 Power On Hours 0x0032 059 059 000 Old age Always Never 30005

10 Spin Retry Count 0x0032 100 100 000 Old age Always Never 0

11 Calibration Retry Count 0x0032 100 100 000 Old age Always Never 0

12 Power Cycle Count 0x0032 100 100 000 Old age Always Never 314

192 Power-Off Retract Count 0x0032 200 200 000 Old age Always Never 144

193 Load Cycle Count 0x0032 196 196 000 Old age Always Never 14189

194 Temperature Celsius 0x0022 121 106 000 Old age Always Never 31

196 Reallocated Event Count 0x0032 196 196 000 Old age Always Never 4

197 Current Pending Sector 0x0032 066 066 000 Old age Always Never 65527

198 Offline Uncorrectable 0x0030 200 200 000 Old age Offline Never 0

199 UDMA CRC Error Count 0x0032 200 197 000 Old age Always Never 11

200 Multi Zone Error Rate 0x0008 200 200 000 Old age Offline Never 0

Link to comment

It is not usually good to have any, but I usually keep and eye on it and if it remains constant I don't worry about it.  I have a drive that has had 16 relocated sectors for about 5 years, and it has remained the same and I haven't encountered any other errors on the drive.

 

You may want to run a short and long drive test to see if it identifies any other errors with the drive.

Link to comment

I personally would consider the Current Pending Sector (#197) count is a bigger matter of concern.  This is the information on #197 from the S.M.A.R.T article on Wikipedia:

 

"Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written.  ...  "

 

I don't know if your drive will even have enough spare sectors to rewrite that many sectors that are marked for relocation!

 

I personally would replace that drive ASAP.  You could then test it by running at least three preclear cycles on it (if you want to see if it might be recoverable). 

Link to comment

I think this is the most pending sections I have ever seen.

 

Interesting it is so close but just barely (9) under 64K. If you add the 4 actual reallocation, it is even closer to 64k (5).

 

Maybe just a coincidence but I find it interesting.

 

As the parity drive you are really not risking any data if it crapped out (although your ability to recover if another drive failed would be impacted). Although replacing the drive is very likely the right answer, unless you have one on hand I'd likely do a parity check and see what the smart report looks like afterwards. Pending sectors sometimes clear on their own, and other times don't cause any read errors that the OS sees.

 

I've heard it said that a read will not reallocate a sector, but that has not been my experience. If the drive has difficulty reading but is successful it may reallocate the sector without a write. My guess is the parity check would see quite a lot of new reallocated sectors and fewer pending. But that is just an educated guess.

 

Final point - it is interesting that the normalized values are showing 66 and the failure threshold is very far away at 0. I wonder how many pending sectors there have to be before the manufacturer claims the drive is "FAILING NOW"?

Link to comment

Clearly the drive has issues.    I'm not sure that pending sector count is real ... it seems more like an error in the reporting, since I doubt there are that many spare sectors available.    This may be a firmware bug in the S.M.A.R.T. implementation on the bug => but in any event it's clearly BAD and indicates the drive should be replaced.

 

Link to comment

Clearly the drive has issues.    I'm not sure that pending sector count is real ... it seems more like an error in the reporting, since I doubt there are that many spare sectors available.

 

I doubt the disk is going to list that number based on how many spare sectors it has. It likely will just try reallocating them until it runs out. No reason to limit that number based on how many spares are available as in that case it wouldn't be a true value for that field.

Link to comment

Clearly the drive has issues.    I'm not sure that pending sector count is real ... it seems more like an error in the reporting, since I doubt there are that many spare sectors available.

 

I doubt the disk is going to list that number based on how many spare sectors it has. It likely will just try reallocating them until it runs out. No reason to limit that number based on how many spares are available as in that case it wouldn't be a true value for that field.

 

Still find it interesting with that many pending sectors the drive doesn't report it is failing now, or at least darn close.

Link to comment

Clearly the drive has issues.    I'm not sure that pending sector count is real ... it seems more like an error in the reporting, since I doubt there are that many spare sectors available.

 

I doubt the disk is going to list that number based on how many spare sectors it has. It likely will just try reallocating them until it runs out. No reason to limit that number based on how many spares are available as in that case it wouldn't be a true value for that field.

 

If the number exceeds the actual number of spare sectors, S.M.A.R.T.  should FAIL the drive.    I do NOT think that's a "real" number.    ... and if by chance it is then there's still something wrong with the S.M.A.R.T. firmware since it still passes the drive.

 

Link to comment

Pending sectors are often successfully read and then the count is decreased and no reallocation occurs. A power issue could have been the cause or the disk is bad.

 

Generally agree. 64K pending sectors sounds very suspect. A firmware bug? Maybe, but certainly seems a little arrogant insisting it must be. More likely an unusual event (like a power issue) that caused the firmware to pend a large block of sectors. I can't help but feel these will mostly clear given an opportunity (like a parity check or preclear). The parity check is safer, because as bad as this looks, there is still a pretty good chance parity would be usable to do a reconstruction if another disk were to fail. Pull it out to do a preclear and you absolutely loose any ability to reconstruct.

 

I haven't seen thousands, but have seen over a hundred pending sectors clear with a parity check with no actual reallocations and no apparent ill-effects. But even if they all clear, if the actual reallocated sectors starts to climb the disk's days are numbered anyway.

Link to comment

A number this close to the 2^16 makes me wonder if somehow the drive firmware mistakenly decremented the pending sectors until it "went negative".

 

Possible but unlikely IMO. Unless this is some off brand (like Samsung or Toshiba) which we've had little or no experience, I highly doubt a bug like this would just happen in normal operation of the drive. The drive would only decrement the count if a sector that was previously marked bad were found to be good. Still think it is more likely that some "event" (power or otherwise) occurred that caused the drive to appear to fail a block of reads and marked them all pending. Might have been a 32Meg block (64k sectors = 32Meg).

 

I'm kind of curious what brand and model drive this is.

Link to comment

Wow a lot of valuable insight here, from all the experts. Thanks a ton guys.

 

Couple other bits of information to add. This last parity check took an unusual 4.5 days to complete. which is very odd to me. At times it would get very low disk write speed, and other times very high (for me) at 11Mb/s.

 

So the consensus is to rerun parity?

Link to comment

... So the consensus is to rerun parity?

 

I'd replace the drive before doing anything else.  You've got at least 3 1/2 years use out of it (at 24/7 operation -- more if it's not always on) ... and it's clearly got issues => so bite the bullet and get a new drive.

 

Link to comment

... So the consensus is to rerun parity?

 

I'd replace the drive before doing anything else.  You've got at least 3 1/2 years use out of it (at 24/7 operation -- more if it's not always on) ... and it's clearly got issues => so bite the bullet and get a new drive.

 

+1    I would replace the drive and figure out later if the drive can be recovered.  If it can be, you could keep it a spare.  But I know I would always be worried that it would fail again...

Link to comment

Wow a lot of valuable insight here, from all the experts. Thanks a ton guys.

 

Couple other bits of information to add. This last parity check took an unusual 4.5 days to complete. which is very odd to me. At times it would get very low disk write speed, and other times very high (for me) at 11Mb/s.

 

So the consensus is to rerun parity?

 

Before doing anything else, I'd suggest getting smart reports on all drives in the array to make sure you don't have any other disks with signs of failure. You should also post a syslog taken when the drive was experiencing the very slow I/O.

 

If indeed the parity is the only one with problems and causing very slow parity checks and writes, it is the last nail in the coffin for that drive. What was the brand?

 

I was interested in what the parity check would do to the SMART attributes - not really thinking that it would vindicate the drive, but just in an attempt to better understand the behavior. This is an unusual drive situation and thought we'd learn something. But this was under the apparently mistaken impression that the drive was otherwise behaving pretty normally. Now seeing that there are very slow I/Os, it certainly starts to look like the pending sectors could very well be real and the source of the slowdowns.

 

If you are holding off buying a replacement, I would proceed with your purchase. Run SMART reports on all drives and post them. Don't do any parity checks or rebuild parity until we've had a look.

 

BTW, 11 MB/sec write is very slow. You should be getting 2x, 3x, or even faster.

 

 

Link to comment

I wonder if a SMART long test would cause the drive to correct its numbers, since it should cause every sector to be examined.  At least, that seems how a decent programmer would do it, but I'm not convinced that the best programmers work on the SMART part of the firmware.  There are too many silly or inconsistent behaviors.  That Pending sector number looks like a bug to me.

Link to comment

... That Pending sector number looks like a bug to me.

 

Agree ... as I noted earlier.  May have been caused by some electrical interference or perhaps a power shutdown in the midst of an operation, but nevertheless it seems likely it's a bug.

 

Certainly won't hurt to run a long S.M.A.R.T.  test to see if that clears it up => but with a drive of that age I'd still be inclined to simply replace it.

 

Link to comment

Couple other bits of information to add. This last parity check took an unusual 4.5 days to complete. which is very odd to me. At times it would get very low disk write speed, and other times very high (for me) at 11Mb/s.

 

RobJ / garycase - not sure if you saw this. A 4.5 day parity check is certainly indicating a problem. And 11 MB/sec also looks bad. My first concern is does he have another disk with issues. If he does, I'd rather not risk parity failure with a long smart test. I have requested SMART reports from his other disks before proceeding.

Link to comment

A long S.M.A.R.T.  test isn't likely to cause any issues, and MAY clear the (likely erroneous) pending sector count.

 

But I agree there's any even BETTER thing to do first ==> be sure your backups are up-to-date.    If you don't have backups, then NOW is a good time to correct that oversight ... at least copy anything you'd be upset about losing in the process of resolving this.

 

... but with the status shown for the parity drive, getting it replaced is high on the list of need-to-get-dones => otherwise you likely won't be able to recover from any other failures.

 

Link to comment
  • 1 month later...

I am seeing a similar issue just after upgrading to v6 Not sure if this is a coincidence but I do know that drive was showing fine on v5 this is my parity drive as well.

 

1 Raw Read Error Rate 0x002f 200 200 051 Pre-fail Always - 2

3 Spin Up Time 0x0027 119 116 021 Pre-fail Always - 7025

4 Start Stop Count 0x0032 097 097 000 Old age Always - 3746

5 Reallocated Sector Ct 0x0033 200 200 140 Pre-fail Always - 1

7 Seek Error Rate 0x002e 200 200 000 Old age Always - 0

9 Power On Hours 0x0032 069 069 000 Old age Always - 22768

10 Spin Retry Count 0x0032 100 100 000 Old age Always - 0

11 Calibration Retry Count 0x0032 100 100 000 Old age Always - 0

12 Power Cycle Count 0x0032 100 100 000 Old age Always - 169

192 Power-Off Retract Count 0x0032 200 200 000 Old age Always - 142

193 Load Cycle Count 0x0032 186 186 000 Old age Always - 43355

194 Temperature Celsius 0x0022 107 103 000 Old age Always - 40

196 Reallocated Event Count 0x0032 199 199 000 Old age Always - 1

197 Current Pending Sector 0x0032 200 200 000 Old age Always - 0

198 Offline Uncorrectable 0x0030 200 200 000 Old age Offline - 0

199 UDMA CRC Error Count 0x0032 200 200 000 Old age Always - 2

200 Multi Zone Error Rate 0x0008 200 200 000 Old age Offline - 0

 

I went ahead and ordered a new drive but I won't be able to install it for over a week. I leave Monday for trip :( I just hope all will be fine while I am gone and it holds out until I get home on the 10th!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.