Careful community support -or- Preclear and Current Pending Sectors


RobJ

Recommended Posts

I was somewhat alarmed by a certain support thread, with a confusing situation that only seemed to get worse, and conflicting advice being given, some of it possibly wrong, possibly damaging.  The thread involved a user finding Current Pending Sectors and using Preclear.  I'd like to make a few comments, for the helpful members of our community that jump in to advise those with issues.

 

It's great to see more people reaching out to help others, and I really don't want to discourage that at all, but we as a community (I believe) have some responsibility to be very careful when we are dealing with other users data.  We need to be as careful as if it was our own data.  Some situations can be very confusing, and if advice is flying in from all sides, some of it can be conflicting, and perhaps even irresponsible.  Hopefully a veteran user will step in and try to slow things down and make sure the situation is fully understood.  When we make sure we have all the facts, then a good strategy usually becomes self-evident.

 

Concerning Preclear, Preclear is a terrific tool if used correctly.  But we need to remember that Preclear is a sledgehammer, that completely wipes out data.  When it is used for the tasks it was designed for, it does a great job.  But in my view, it should NEVER NEVER be used on a live data disk.  Not until we are absolutely sure all possible data has been recovered from the disk, or is safely backed up elsewhere, and the disk has been taken out of service with nothing of value left on it.  And that means we have a very clear idea of what the status of that disk is, and the status of all data on it, and the status of the entire array.  Please be slow to precribe Preclear without a full understanding of the situation.

 

In my view, Current Pending Sectors by themselves are a relatively minor issue, not a major issue, unless we make it a major issue by removing the drive with unbacked up data from the array, or by clearing the drive, or if we lose another array drive.  Current pending sectors carry a possible future risk to data, but there is no data loss yet.  They are like termites, and you don't take a sledgehammer to a termite problem.  The structure is fine, there is no loss yet, they aren't even an immediate threat, just something you will have to take care of, with wise well-informed planning, in order to avoid future damage.  The only time it is an urgent and imminent threat is when subsequent SMART reports show that the number is steadily increasing, and then of course that becomes a major issue requiring immediate attention.  I would still want to try simpler low-impact solutions first before using Preclear, unless the drive clearly needs a thorough testing.

 

We all want to help, and that is great.  And in a volunteer community like this, it is normal for help and support to be a very disorganized process, completely uncoordinated.  But there have been a few cases where I really think we can do better.  We almost turned a normal support issue into a disaster.

 

 

Just a crazy idea, for a simpler way to deal with current pending sectors, would it not be possible to run a single dd of the entire drive, using dd to read every sector, which should be harmless.  If any sectors could not be read, then does not that cause UnRAID to rewrite the sector from the virtual copy, thereby forcing the drive's firmware to test the sector and either return it to service or remap it and return it to service?  That should clear all Current Pending Sectors, theoretically.  Or have I forgotten something simple?  Easy enough to test.

Link to comment

I would not consider current pending sectors to be a minor issue.

If you've ever had to recover from a failed drive and had the recovery fail because of a current pending sector, you would realize how dastardly this issue is.

 

I would consider it an intermediate issue.

There is no currrent data loss yet but the chance of future data loss is higher then normal.

a. when trying to get data off the drive.

b. trying to recover from another failed drive.

I would never suggest to use preclear on a live drive.

I would suggest a DD or better yet, the use of badblocks to read and log every sector it cannot read.

I would also suggest a review of the syslog after the badblocks and a smart long test.

 

If there are current pending sectors and they are growing, consider it as impending danger of failure. If the smart long test says read failure at LBA, that is also impending danger of failure.

 

At that point I would rsync data off the drive or swap out the drive for a replacement.

 

 

As far as a read causing unRAID to re-write the failed sector. I'm unsure of that.

I had always thought a correcting parity check would do fix the read failure, but I've wondered if it fixed it on the parity drive or on the data drive, thus causing remapping.

 

In any case, badblocks is a good way to read the drive and get an indication of read failures.

It also requires checking the syslog since the kernel does it's own retries before badblocks sees the error.  However, if badblocks is reporting errors, you are pretty sure to have bad sectors on the drive and need to take action.

 

Link to comment

Current pending sectors...are sectors waiting to be rewritten, failure to do so means it will red ball the drive on the next write. I don't like to temp the bit-gods that way, especially if the pending sectors just continue to increase. Just an indicator that failure is pending.    I wouldn't suggest preclear until the drive has been removed from the array, and the data is safe.

Link to comment

Current pending sectors...are sectors waiting to be rewritten, failure to do so means it will red ball the drive on the next write.

Not true.  The next "write" to that sector should re-allocate it if it cannot be re-written in  place in the same sector. 
I don't like to temp the bit-gods that way, especially if the pending sectors just continue to increase. Just an indicator that failure is pending.
That is true, if the numbers are constantly increasing, it is an indication of an eventual failure.
  I wouldn't suggest preclear until the drive has been removed from the array, and the data is safe.

Me either.  Destroying your only copy of data, trusting you can re-construct it, is an entirely unnecessary risk.

 

It is my understanding that if unRAID gets a "read failure" from the OS it will reconstruct the un-readable sector, return the reconstructed data to the program requesting it, and ALSO write the sector back to the disk to allow the SMART firmware to re-allocate it (if needed)

 

The first step I would perform on a disk assigned to the array that has a sector pending re-allocation is a full read of it.  If my understanding is correct, it should re-construct and re-write what it cannot read.    I'm not sure if this same logic is applied during a parity check (correcting or non-correcting) but it would be interesting to test.

 

Joe L.

Link to comment

Current pending sectors...are sectors waiting to be rewritten, failure to do so means it will red ball the drive on the next write.

Not true.  The next "write" to that sector should re-allocate it if it cannot be re-written in  place in the same sector. 

Joe L.

Agreed Joe.  I was being lazy in my response...and I don't have much faith in drives with reallocated sectors.

Link to comment

I would not consider current pending sectors to be a minor issue.

If you've ever had to recover from a failed drive and had the recovery fail because of a current pending sector, you would realize how dastardly this issue is.

 

I would consider it an intermediate issue.

There is no currrent data loss yet but the chance of future data loss is higher then normal.

Yeah, I probably 'over-minimized' it, 'intermediate issue' is a better term.  Just the fact it does carry a risk to data makes it more than a minor issue, even though in an UnRAID array, chances are very high that all data will be recovered.

 

I would never suggest to use preclear on a live drive.

I would suggest a DD or better yet, the use of badblocks to read and log every sector it cannot read.

I would also suggest a review of the syslog after the badblocks and a smart long test.

It's really embarrassing but I STILL haven't gotten around to playing with badblocks, and therefore don't even think of it even in situations like above where it could be very useful.  Mea culpa.

 

As far as a read causing unRAID to re-write the failed sector. I'm unsure of that.

Needs to be tested.  I *think* Tom has said it worked this way, and it really makes sense that it should.

Link to comment

I would never suggest to use preclear on a live drive.

I would suggest a DD or better yet, the use of badblocks to read and log every sector it cannot read.

I would also suggest a review of the syslog after the badblocks and a smart long test.

It's really embarrassing but I STILL haven't gotten around to playing with badblocks, and therefore don't even think of it even in situations like above where it could be very useful.  Mea culpa.

 

 

It's a pretty useful tool. It's not fool proof. If the operating system is able to retry and get back a failing sector badblocks is none the wiser. If the operating system returns indication of a failed sector you have a clear indication there is impending danger.  a few tweaks to the software to report MB/s and it would be very useful.

Currently it can write any single byte patterns to the drive.

So when you want to retire the drive you can do the dod erase.

 

 

It also does a read/rewrite mode if choosen. It reads the sector tests it and rewrites it. I've forced drives to reallocate bad sectors doing this but it takes a heck of a long time. (and preserves the data).  You can also give it starting block and number of blocks to work on.

 

 

I had hoped to add a MB/s reporting mechanism and a variance on the time it is reported. It is currently reporting status every second. If it were changed to another time frame and used newlines instead of backspaces, it could be used in scripts or piped into syslog thus lending itself for use in a web based context.

 

 

It's not the be all end all of tests, But if you take a smart log before, a smart log after, compare them along with the syslog after a badblocks test you have allot of good information to make a decision on the drive's health

Link to comment

I'm quoting it here just for completeness of the discussion.

Thanks mbryanr for pointing it out.

 

btw, read this thread on parity checks etc.  From my <limited> understanding - a correcting parity check with an corresponding read failure on a data disk would first attempt to reallocate the sector that it could not be read from before updating the parity. 

 

See Joe L's post #14

http://lime-technology.com/forum/index.php?topic=20006.msg178551#msg178551

 

Maybe someone with a better understanding than me could explain what happens when there is a read failure during a correcting parity check.

I have no idea if the re-construction comes first, or the parity calc...  Perhaps Tom @ lime-tech can respond to what happens if there is a read-failure of a data disk during a "correcting" parity check?  does parity think there are zeros on the data disk, or does parity use the re-constructed data???

 

Joe L.

 

In a "correcting" parity check, for a given block address, data is read from all data drives and the parity drive.  That is, a read I/O operation is sent to each drive.  When the read I/O's complete we look at the completion statuses.

 

Normally, each I/O completion will say "success".  In this case the xor engine xor's all the data blocks together and compares it with the content of the parity block.  If they are equal, then everything's fine, move on to the next block.  If they are not equal, then this is logged in the system log as "incorrect parity", and the block we generated by xor'ing all the data blocks is written to the parity drive. (This is the same thing that happens in a normal parity sync except the parity drive is not read and there's no comparison - it just is written with the calculated parity.)

 

If one of the read I/O's returns "failure" however, then what we do is xor together all the data from each drive except the one that failed.  The resultant block is the data that we should have read from the disk that failed, if the I/O to that disk didn't fail.  After creating this block, we then write it to the same Logical Block Address to the disk that previously returned an I/O error reading that block.

 

If that "write" of calculated data to the failed disk itself fails, well then we disable the data drive and terminate the parity check.

 

What the drive firmware should do is this: "Hey I just got sent data for a sector that I know I couldn't read before, so what the heck, let's write it back to the same sector and immediately re-read to see if it now works, because I know that sometimes re-writing the data fixes the magnetic structure on the disk and I won't have to use one of my limited reallocation sectors. Then if my verify fails, I'll write the data to a reallocated sector."  Or the firmware could say, "Screw it, let's just write the data to the sector again and forget about verifying."  Or the firmware could say, "Forget about writing to the same sector and doing a verify because that's slow, let's just reallocate the sector."  What a particular drive does, you will have to find out who the firmware designer is for the company and ask them.

 

If 2 or more of the read I/O operations sent to the data and parity disk for the same LBA fails, the parity check is immediately terminated.

Link to comment

I would never suggest to use preclear on a live drive.

I would suggest a DD or better yet, the use of badblocks to read and log every sector it cannot read.

I would also suggest a review of the syslog after the badblocks and a smart long test.

It's really embarrassing but I STILL haven't gotten around to playing with badblocks, and therefore don't even think of it even in situations like above where it could be very useful.  Mea culpa.

I have played with badblocks, but just to zero a drive with a full four value pattern.

 

This is the script I've used.  It will completely zero a drive... It is NOT a preclear replacement as it does not write the preclear signature.  It takes a LONG time on a 2TB drive... (days, not hours)

 

You must edit the top line to have the name of the disk it is run on.

There is NO check to see if the disk is already assigned to the array.  There is nothing to keep you from securely wiping your data away, and destroying any chance of recovery.  There is nothing to keep you from shooting yourself in the foot.

That being said, it does write progress to a log file /tmp/badblocks_progress_sdX and also logs the actual bad blocks to /boot/badblocks_out.txt

My goal (eventually) is to allow it to be run from a GUI, and for it to monitor the /tmp/badblocks_progress_sdX file.

You can do that in another telnet session by typing

while true;do sleep 10;cat /tmp/badblocks_progress_sdX; done

 

again, you've been warned, this is a destructive process on a disk.  It will completely erase anything on it.  If you give it the device name of a disk with data by mistake, sorry... you will have cleared your data...  It finishes with writing all zeros... but NOT the preclear signature... so follow it with a single pass of the preclear_disk.sh program if you will be using it for a drive destined for the server.

Joe L.

 

device=/dev/sdX

disk=`basename $device`

badblocks -c 1024 -b 65536 -vsw -o /boot/badblocks_out_$disk.txt $device 2>&1 |

awk 'BEGIN {

            RS="^H";

            mode="";

            pattern="";

            progress="/tmp/badblocks_progress_'$disk'"

    }

    /Testing with/ {

      if ( $15 == "pattern" ) pattern=$16

      if ( $4 == "pattern" ) pattern=$5

      mode="writing"

    }

    /Reading and comparing/ {

      mode="reading"

    }

    /Pass completed/ {

      mode=""

      print "Pass Completed: " $0 > progress

    }

    {

    if ( $1 == "done" && mode == "writing" ) {

      print "WritinG pattern " pattern, $6, $7, $8, $9 > progress

    }

    if ( $2 == "done," && mode == "writing" ) {

      print "Writing pattern " pattern, $1, $2, $3, $4 > progress

    }

    if ( $1 == "done" && mode == "reading" ) {

      print "VerifyinG pattern " pattern, $5, $6, $7, $8 > progress

    }

    if ( $2 == "done," && mode == "reading" ) {

      print "Verifying pattern " pattern, $1, $2, $3, $4 > progress

    }

    close(progress)

    }'

Link to comment

Here's something I was thinking about since reading these discussions on badblocks and it not always detecting there was an error. Can the unRAID driver actually detect and correct the disk errors when badblocks is unable to detect them? I recall reading bb threads before but can't recall if I asked this or not.

Link to comment

Here's something I was thinking about since reading these discussions on badblocks and it not always detecting there was an error. Can the unRAID driver actually detect and correct the disk errors when badblocks is unable to detect them? I recall reading bb threads before but can't recall if I asked this or not.

 

 

If the kernel and/or drive retries and eventually succeeds. badblocks, unraid or dd are probably none the wiser.

Link to comment

Here's something I was thinking about since reading these discussions on badblocks and it not always detecting there was an error. Can the unRAID driver actually detect and correct the disk errors when badblocks is unable to detect them? I recall reading bb threads before but can't recall if I asked this or not.

 

 

If the kernel and/or drive retries and eventually succeeds. badblocks, unraid or dd are probably none the wiser.

The more difficult question to answer is exactly what happens when a disk with a sector marked as un-readable is asked to return the same sector again.  Does it attempt to read it once more? or does it just fail the read returning nothing knowing it is marked as un-readable and no subsequent "write" has occurred?    I would think that is entirely up to the specific disk's firmware.

 

 

Link to comment

So, basically, the testing showing that badblocks wasn't catching disk issues due to them not being reported back to the program means that unRAID could also miss them and not correct them. The parity might get updated instead of the array disk being re-written like it should have.

 

 

It means the firmware and kernel retried and successfully received a correct data block. If the read had failed altogether, it would have been reported back to the unRAID driver. Or possibly badblocks depending if it was reading the physical drive (/dev/sd?) or unRAID drive (/dev/md?)

 

 

badblocks isn't the be-all end-all of tests, but it's pretty reliable in the fact that it will read all sectors sequentially and report which sectors are reported as unreadable. It's also good at testing sectors and/or refreshing them with a read/write/re-write.  I would only use the last mode as a purely last ditch effort after I had copied usable data. But in my experience, it has returned bad sectors back to reliable service and/or reallocated them.

 

 

A SMART long test and review of pending sectors is ultimately the most reliable test since it is using the drive's firmware. If a SMART long tests reports READ FAILURE AT LBA # then you know it cannot be read and to get your data off immediately and refresh the format either with badblocks in read/write/rewrite mode or write mode.. or do a preclear.

 

 

If there was a command line programmable way to turn off drive spin down, we could automate the smart long tests once a month per drive. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.