Drives , sector, counts, replacements, and parity

pyrater · January 17, 2018

Can someone confirm / shed some light on this for me. The way i understand it is:

If the hard drive tries to write to a sector and is unable to do so so it just moves the data to the next sector. (hence reallocated). The big problem is when it tries to remap a sector and cannot read it (pending sector count).

"When the hard drive finds a read/write/verification error, it marks this sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping, and "reallocated"sectors are called remaps."

"Current Pending Sector Count indicates the current count of unstable sectors (waiting for remapping). The raw value of this attribute indicates the total number of sectors waiting for remapping"

So basically if you get reallocated sector counts and they stop or "normalize" then the drive is fine its just starting to show some age /wear. Where its a pending failure is when you get alot (100+) reallocated AND pending sector counts... correct???

Basically:

Reallocted = I found issues and fixed them

Pending = I have issues i cant fix or havent fixed yet.

So when do you pull the drive? Isnt the point of parity to be able to handle a drive failure? Do you pull it when it fails or before.... why..? The data is going to the parity and the drive so there is no risk of loss correct?

For example i have dual parity so i can handle 3 drives dying at once before i lose data...?

JonathanM · January 17, 2018

1 hour ago, pyrater said:

So when do you pull the drive? Isnt the point of parity to be able to handle a drive failure?

Keep in mind that not all drive failures are predictable by smart data or otherwise. Sometimes a drive just quits out of the blue. When that happens, would you rather know the rest of your drives are perfect and able to recreate the failed drive? Dual parity helps, but what happens if a drive fails that you didn't anticipate, and you have a couple marginal drives with pending sectors still in the array?

SSD · January 17, 2018

1 hour ago, pyrater said:

Can someone confirm / shed some light on this for me. The way i understand it is:

If the hard drive tries to write to a sector and is unable to do so so it just moves the data to the next sector. (hence reallocated). The big problem is when it tries to remap a sector and cannot read it (pending sector count).

Not exactly. Each drive has a set of spare sectors (~5000 I think). If the drive detects a bad sector on a read, it will mark that sector as pending. The next time that sector is written, the drive will examine the sector one last time, and if still bad or marginal, it will remap. Let's say the sector # of 123456 is remapped. The next time that sector is requested to be read or written, the spare sector will be used. The original sector 123456 is now inaccessible. This is how it works more or less.

1 hour ago, pyrater said:

"When the hard drive finds a read/write/verification error, it marks this sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping, and "reallocated"sectors are called remaps."

"Current Pending Sector Count indicates the current count of unstable sectors (waiting for remapping). The raw value of this attribute indicates the total number of sectors waiting for remapping"

So basically if you get reallocated sector counts and they stop or "normalize" then the drive is fine its just starting to show some age /wear. Where its a pending failure is when you get alot (100+) reallocated AND pending sector counts... correct???

Basically:

Reallocted = I found issues and fixed them

Pending = I have issues i cant fix or havent fixed yet.

So when do you pull the drive? Isnt the point of parity to be able to handle a drive failure? Do you pull it when it fails or before.... why..? The data is going to the parity and the drive so there is no risk of loss correct?

unRAID will kick a disk on a failed write. Parity will continue to be accurate. So a rebuild will be accurate.

Generally you want to pull a disk when it is showing signs of failure. Waiting until actual failure creates a risk that if another drive should happen to fail, you recovery is dependent on a marginal drive. I suggest pulling marginal drives and using them for backups, keeping the array full of nice healthy disks.

1 hour ago, pyrater said:

For example i have dual parity so i can handle 3 drives dying at once before i lose data...?

If you get to this point, you would loose up to 3 drives of data. Not a good idea. Second parity is really designed to protect you from a failure trying to rebuild a failed disk. Third parity would then step in if you are trying to rebuild a disk and have 2 other disks fail. Very very unlikely. (Third parity feature does not exist).

pwm · January 17, 2018

3 hours ago, SSD said:

If the drive detects a bad sector on a read, it will mark that sector as pending.

I think this varies depending on manufacturer - and possibly also depending on what the sector looks like.

I have seen some drives count up the pending number on a read.

And other drives does not count up the pending number but directly count up the reallocated count.

So it isn't clear when/if the drive requires a write - or will give the sector a second chance - before reallocating.

3 hours ago, SSD said:

Third parity feature does not exist

Third parity doesn't exist for unRAID. But the concept exists. ZFS can do triple-parity. And Snapraid ("lazy" RAID - only refreshes parity on manual request) can produce up to 6 parity.

But it quickly becomes important with enterprise-class hardware for really high-end solutions - large numbers of parity drives aren't too useful if the content of them can't be trusted because of bit errors while computing the data to write to the disks.

Drives , sector, counts, replacements, and parity

Recommended Posts

pyrater

Link to comment

JonathanM

Link to comment

SSD

Link to comment

pwm

Link to comment

Archived