Disk Behavior - Trust or No Trust?


mifronte

Recommended Posts

Recently I was moving media files between my disks to consolidate and free up disks.  This is when unRAID flagged one disk and it turned out it had developed 7 pending sectors (attribute 197).  So after I moved all files off the disk I performed the extended SMART self-tests.  It passed the the short offline SMART self-test, but failed the extended offline self-test with a read error.  I ran the extended self-test again and again it failed with a read error at the exact same LBA. So I removed the disk from the array and started preclear_bjp.sh on it to fully test it.

 

After the first run, the 7 pending sectors were moved to offline uncorrectable (attribute 198) and pending sector went down to 0 and reallocated sector count (attribute 5) remained at 0.  UnRAID then showed the disk wiht the green thumb up.  On the second run, pending sector went to 5 at the pre-read, dropped to 1 during post-read (with no changes in attribute 197 or 5) and is now at 28 at pending sector at 96% into the post-read of a 2TB disk.

 

So the pending sector count is fluctuating, but the reallocated sector count remain unchanged at 0 and offline uncorrectable  is constant at 7.  So what is going on with this disk?  Why is pending sector count increasing and decreasing, but reallocated count or offline uncorrectable not changing?

 

Most importantly, should I trust this disk or trash it?

 

Diagnostics attached and /dev/sdg is the troubled drive.

diagnostics-20180220-1308.zip

Link to comment

The disk just started the post-read of the 3rd preclear run and reallocated sector count is at 0, pending sector is at 0, uncorrectable is still at 7.  At the end of the post-read of the 2nd run, the pending sector went up to as high as 138, but it looked like no sectors were remapped or marked uncorrectable.

 

As of now the disk looks heathly as can be.  I will wait toward the end of the 3rd run post-read to see if the pending sector goes back up, which could mean that part of the disk may be wacky.  This is very odd since I would think if the pending sector count goes to 0, then reallocated or uncorrectable should go up, right?

Edited by mifronte
Link to comment
8 hours ago, mifronte said:

The disk just started the post-read of the 3rd preclear run and reallocated sector count is at 0, pending sector is at 0, uncorrectable is still at 7.  At the end of the post-read of the 2nd run, the pending sector went up to as high as 138, but it looked like no sectors were remapped or marked uncorrectable.

 

As of now the disk looks heathly as can be.  I will wait toward the end of the 3rd run post-read to see if the pending sector goes back up, which could mean that part of the disk may be wacky.  This is very odd since I would think if the pending sector count goes to 0, then reallocated or uncorrectable should go up, right?

 

When a sector is marked pending, the next write to that sector will check that sector and could decide it is ok (pending goes away) or decide to remap it (pending goes away but reallocated increments.

 

SMART is designed to think of all errors as media errors, but media errors is only one possibility. With some of the symptoms you are mentioning, seems drive has other issues.

 

I have seen disks where I have a small number of pending sectors that neither get worse or remap. Not sure if it is a bug in the drive's firmware or what. But I have not replaced such drives. I've also seen a few drives that mark a bunch of sectors pending initially, and then they all clear and drive seems none the worse for wear. You might try running an extended smart test. If it passes and smart attributes are stable, you might consider keeping it. But with symptoms you've reported, looks like it would need to be returned.

Link to comment

Ideally, the first thing you should have done is rebuild to another known good disk. Then there would be no need to move the data off the disk (moving files to other disks in the array is just more writes to the original worrisome disk as well as others), and your array would be protected again and you could do whatever you wanted to test the original disk.

Link to comment

In my mind, I have a hard time, trying to justify hanging onto a disk that I have any question about its stability.  The only reason that I might even consider it is if the cost (or value) of the data stored there is less than the cost of the a new disk.  The reason I would be testing the disk to make sure that the issues that flagged it in the first place was truly a function of the disk and not some other hardware/software/environment issue.  As soon as I determined that the disk had any sort of issue, I am (1) looking for my sledge hammer to destroy it for trashing or (2) a carton to return it with a RMD for replacement under warranty!

Link to comment
2 minutes ago, Frank1940 said:

In my mind, I have a hard time, trying to justify hanging onto a disk that I have any question about its stability.  The only reason that I might even consider it is if the cost (or value) of the data stored there is less than the cost of the a new disk.  The reason I would be testing the disk to make sure that the issues that flagged it in the first place was truly a function of the disk and not some other hardware/software/environment issue.  As soon as I determined that the disk had any sort of issue, I am (1) looking for my sledge hammer to destroy it for trashing or (2) a carton to return it with a RMD for replacement under warranty!

 

"Any doubt" is a very high standard. For me it's all about is the problem getting worse. If the smart attributes are stable across several parity checks, I can live with a few issues I the smart attributes. But will say that I see fewer and fewer such issues with drives in past several years.

 

As far as what to do with a disk that is giving problems. Often such drives work fine as backup drives, where their powered on time goes down drastically, and it's far better than nothing. Best to get such drives out of the array quickly to leave useful life for this purpose, rather than wait for things to get worse and worse, until the drive is worthless and sledgehammer is warranted.

Link to comment

Since the data on the disk were all archived movies and my array was only 50% full, my guess is that the disk has not seen a write for 7 years. unRAID only flagged the disk when I started moving files to free the disk for other use. Otherwise the disk always showed up as healthy. Normally with a bad disk, I do replace it and rebuild the array, however in this case, the disk was flagged after I moved the files.

 

I will run preclear a few more time and probably will not put it back into the array. Anyone know the commands to perform a secure erase or pre_clear takes care of it?

Link to comment
4 minutes ago, mifronte said:

will run preclear a few more time and probably will not put it back into the array. Anyone know the commands to perform a secure erase or pre_clear takes care of it?

A pre-clear writes zeroes to the whole drive.    It might be possible with special (expensive) forensic equipment to still recover the data but the pre-clear is a simple method sufficient in the vast majority of cases.

Link to comment

 

9 minutes ago, trurl said:

Not clear to me how this should have changed what you "normally" do.

 

It doesn't.  I am just saying at the time I initiated the batch file move, there was no indication that the disk was bad.  Only after the batch move did the disk get flagged by unRAID. By that time, all the files had already been moved.  I am sure one or more files may be corrupted as a result and I will not know which, but the worse is the corrupted files will just show up as an anomaly during the playback of a movie and I will just have to replace the corrupted movie from the original disc.

Link to comment
18 minutes ago, Frank1940 said:

It seems to me that the preclear plugin has an option that will write 'garbage' to the disk.  Following that operation with a zero write operation should do the job.  

 

Has the preclear plugin been cleared to work with unRAID 6.4.1?  I was under the impression that it was incompatible with the latest unRAID.

Link to comment

Well the disk completed its 3rd run of preclear and pending sector is back up to 60, but allocated sector count and uncorrectable are stable and hasn't change.  I am getting a lot of read erros in the unRAID log.  It appears there is an area on the disk that are giving read errors at around the 95% mark of the preclear post-read step.

 

Anyone know where in the unRAID GUI can I just pull up the log entries for a disk?  I remember once accidentally finding the feature, but now I can't seem to recall how I did it.

Link to comment
8 minutes ago, mifronte said:

 

Has the preclear plugin been cleared to work with unRAID 6.4.1?  I was under the impression that it was incompatible with the latest unRAID.

 

See here and basically, the reminder of the thread.  

 

     https://lime-technology.com/forums/topic/54648-preclear-plugin/?page=72&tab=comments#comment-632655

 

If you are the cautious type, someone ran a   diff    on both .plg files and listed the differences.  He also ran md5 check sums so that you would be able to see that you are getting the same file as he looked at.  

 

Link to comment
2 minutes ago, mifronte said:

Anyone know where in the unRAID GUI can I just pull up the log entries for a disk?  I remember once accidentally finding the feature, but now I can't seem to recall how I did it.

In Main in the Identification column right before the disk serial is an icon that looks maybe like a printer. Mouseover it.

 

Of course, syslog doesn't survive a reboot so there won't be anything from before.

Link to comment

After 3 runs of preclear here is what the disk SMART attributes were doing:

0 sectors were pending re-allocation before the start of the preclear.
37 sectors were pending re-allocation after pre-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 3.
100 sectors were pending re-allocation after post-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 2 of 3.
60 sectors were pending re-allocation after post-read in cycle 2 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 3 of 3.
96 sectors are pending re-allocation at the end of the preclear, a change of 96 in the number of sectors pending re-allocation.
0 sectors had been re-allocated before the start of the preclear.
1 sector is re-allocated at the end of the preclear,
a change of 1 in the number of sectors re-allocated. 

Disk failed SMART extended self-test.  Looks like a drive for the trash bin.

 

I was thinking of doing a secure erase using ATA commands as specified here at the unRAID command line.  Is that a good idea?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.