[Solved]Jumped In with both feet, maybe got burned


Recommended Posts

So I built my first unRAID NAS last weekend.  I started with just two 2TB drives.  One was from a broken external enclosure (bad usb port) and the other was new.  I had a full 2TB drive on my desktop that I was going to use for parity after copying my data to the NAS. 

 

I precleared the two drives with 1 pass each.  I created the user shares and started copying my data over the network.  Thats when I realized my Ethernet Card was only a 10/100 on the desktop.  I stopped the copy, put the Desktop 2TB NTFS drive in the NAS and used the cp command to copy all of my stuff off. 

 

Now I am preclearing the 2TB from my desktop and am going to install it as a parity drive.  I also bought two more 2TB drives for a larger array. 

 

While watching this preclear I decided to do some more reading about various options and noticed the paragraph below from Joe L. in the preclear post:

 

Any sectors pending re-allocation AFTER a preclear  are particularly bad.  Any un-readable sectors identified in the pre-read phase should have been re-allocated in the zeroing (writing) phase.  Any remaining after the preclear would have been identified in the post-read phase. (indicating what was written could not be read back)  An additional pre-clear should be performed, and if the numbers do not stabilize (additional non-readable sectors are found) then the disk should be returned as defective.

 

I vaguely remembered something about re-allocation when the first preclear finished so I looked back into the logs and this is what I saw:

 

========================================================================1.14

== invoked as: ./preclear_disk.sh /dev/sdb

== ST2000DM001-9YN164  W1E1HT5D

== Disk /dev/sdb has been successfully precleared

== with a starting sector of 63

== Ran 1 cycle

==

== Using :Read block size = 8388608 Bytes

== Last Cycle's Pre Read Time  : 3:58:57 (139 MB/s)

== Last Cycle's Zeroing time  : 3:28:41 (159 MB/s)

== Last Cycle's Post Read Time : 10:39:50 (52 MB/s)

== Last Cycle's Total Time    : 18:08:28

==

== Total Elapsed Time 18:08:29

==

== Disk Start Temperature: 32C

==

== Current Disk Temperature: 37C,

==

============================================================================

** Changed attributes in files: /tmp/smart_start_sdb  /tmp/smart_finish_sdb

                ATTRIBUTE  NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE

      Raw_Read_Error_Rate =  119    100            6        ok          209368312

        Spin_Retry_Count =  100    100          97        near_thresh 0

        End-to-End_Error =  100    100          99        near_thresh 0

      Reported_Uncorrect =    88    100            0        ok          12

  Airflow_Temperature_Cel =    63      68          45        near_thresh 37

      Temperature_Celsius =    37      32            0        ok          37

No SMART attributes are FAILING_NOW

 

0 sectors were pending re-allocation before the start of the preclear.

0 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

16 sectors are pending re-allocation at the end of the preclear,

    a change of 16 in the number of sectors pending re-allocation.

0 sectors had been re-allocated before the start of the preclear.

0 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

============================================================================

 

So now I sit with data on this disk and I have already precleared the source.  I have a few questions.

 

1:  Is this something to be concerned about?  The Status looks kind of scary on the ones that say near_thresh.

 

2:  Should I preclear this disk again and see if it settles out or should I RMA it?  If either option how do I copy the data from this disk to another?  Currently I have 4x2TB disks in my array with no parity with data, 1x2TB disk (for parity) and 1x500GB (for cache) I am in the process of preclearing both.

 

3:  Should I run multiple passes of preclear on a used disk?  I thought that was only for new disks as a stress test.  If I've been using a disk for a while now doesn't that indicate it is functioning well?

 

4:  What is the likelyhood of losing data in all of this?

 

Thanks in advance,

 

Leo

Link to comment

So now I sit with data on this disk and I have already precleared the source.  I have a few questions.

 

1:  Is this something to be concerned about?  The Status looks kind of scary on the ones that say near_thresh.

Yes. This is why a parity drive should be installed from the start. Or at least use teracopy.

2:  Should I preclear this disk again and see if it settles out or should I RMA it?  If either option how do I copy the data from this disk to another?  Currently I have 4x2TB disks in my array with no parity with data, 1x2TB disk (for parity) and 1x500GB (for cache) I am in the process of preclearing both.

Yes. See here: http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector

3:  Should I run multiple passes of preclear on a used disk?  I thought that was only for new disks as a stress test.  If I've been using a disk for a while now doesn't that indicate it is functioning well?

Multiple passes should be performed until there are no pending sectors

4:  What is the likelyhood of losing data in all of this?

Very high.
Link to comment

I built a server last weekend also and the main advice or lesson on a new server (IMHO) is never move data to the server until all the bugs are worked through. Copy your data keeping the originals where they are until you are 100% sure everything is working correctly. Before I moved to unRAID I always waited before moving data to a new drive without having that data backed up somewhere else.

 

As for using older drives, normal desktop usage can very often work fine due to no checks on the drives and everything "seems" fine. In a setup like unRAID it is built to check stuff Windows or Macs are not checking unless you install apps like HDTune to test the drive.

Link to comment

OK, so let me clarify a little, I did wipe my source disk right away and yes it was probably not smart, it was not my only backup however. 

 

So, after reading the post about resolving the pending sectors from dgaschk and doing some further reading about shrinking the array in the wiki, I copied all of my data off of the disk in question, did the preclear script two more times, and both time is came back with no pending reallocations.  Then I added the drive back in, reconfigured my shares properly, installed the parity drive and the cache and it is up and running now.  It doesnt appear that I lost anything in this process. Guess I got lucky. 

 

Even though there said that zero sectors were pending re-allocation, I am still a little concerned about the near_thresh values in the report again as shown below.  Is this something to be concerned about?  The drive is under warranty and I can RMA it if necessary.

 

ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE
      Raw_Read_Error_Rate =   114     119            6        ok          80310528
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
  Airflow_Temperature_Cel =    66      67           45        near_thresh 34
      Temperature_Celsius =    34      33            0        ok          34
No SMART attributes are FAILING_NOW

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.