Jump to content
jbuszkie

Preclear.sh results - Questions about your results? Post them here.

2840 posts in this topic Last Reply

Recommended Posts

Hi, I made another 4 preclears of disks formerly used in another windows raid-5. the Script claims there are some differences pre and post - can you have a look on the message of seekerrorrate and comment on it if it is something to worry? tnx, Guzzi

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdb

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  21:41:19

============================================================================

==

== Disk /dev/sdb has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

58c58

<   7 Seek_Error_Rate         0x000e   100   253   051    Old_age   Always       -       0

---

>   7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0

63c63

< 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always       -       72598

---

> 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always       -       72599

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdc

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  23:20:10

============================================================================

==

== Disk /dev/sdc has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

58c58

<   7 Seek_Error_Rate         0x000e   100   253   051    Old_age   Always                                            -       0

---

>   7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always                                            -       0

63c63

< 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always                                            -       73100

---

> 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always                                            -       73101

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdd

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  26:25:24

============================================================================

==

== Disk /dev/sdd has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

58c58

<   7 Seek_Error_Rate         0x000e   100   253   051    Old_age   Always                                                    -       0

---

>   7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always                                                    -       0

63c63

< 193 Load_Cycle_Count        0x0032   173   173   000    Old_age   Always                                                    -       81301

---

> 193 Load_Cycle_Count        0x0032   173   173   000    Old_age   Always                                                    -       81306

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sde

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  25:00:20

============================================================================

==

== Disk /dev/sde has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

19,20c19,20

< Offline data collection status:  (0x82)      Offline data collection activity

<                                      was completed without error.

---

> Offline data collection status:  (0x84)      Offline data collection activity

>                                      was suspended by an interrupting command from host.

63c63

< 193 Load_Cycle_Count        0x0032  176  176  000    Old_age  Always      -      72723

---

> 193 Load_Cycle_Count        0x0032  176  176  000    Old_age  Always      -      72724

============================================================================

 

 

Share this post


Link to post

For that particular model of disk, the "Load_Cycle_Count" raw value seems to be incremented by one each time the disk heads are taken out of the "parked" position.

 

Of the three columns of values, the first is the current value, the second is the worst value (lowest ever encountered) and the third is the failure threshold.   For "Load_cycle_count" the disk will be considered as "failed" when the current value reached zero.  At that point, the disk will be considered to be "worn out" by the firmware.

 

(I'm guessing here... but the math almost looks like it might work something like this) 

Let's theorize that for that disk, the "current value" starts when the disk is brand new at 200.

Let's also guess that every 3000 "load-cycles" will decrement the "Current" value by 1.   

 

If we used those... the numbers come out somewhat close... If I was a manufacturer, I'd use powers of 2 and divide by 1024, but you get the idea. Each manufacturer has their own internal algorithm...

 

Using my math,  you have over 500,000 more head-load cycles before the overall wear and tear on the drive is expected to be an issue.  Of course, this is entirely a prediction by the manufacturer.

 

Will the drive be "defective" at that point... no, not necessarily, but it will have been subjected to some wear....

 

The same can be said of the "Seek_Error_Rate."   The worst value encountered so far is "200" and the threshold to fail is 51.  I'd say it is doing just fine.

 

Joe L.

Share this post


Link to post

Joe, Thanks for your answer - to be honest, I do not fully understand those values - understand your theory ... is there some sort of standard values to look after when checking drive health? The idea to check the drives is good - but interpreting the values is difficult (at least for me ;-)) - Is it correct that sector reallocation is the thing to have focus on?

Share this post


Link to post

Joe, Thanks for your answer - to be honest, I do not fully understand those values - understand your theory ... is there some sort of standard values to look after when checking drive health?

The three columns of values are all internal to the drive.  Each manufacturer has their own "normal" values and threshold for failures.  The "raw" value is also only known to the manufacturer.   All we can go by is "trends" ... and, of course, if a value meets its threshold, the drive will then fail the SMART test. (but may still be perfectly normal... just reached an end-of-life wear limit thought to be appropriate by the manufacturer.)

The idea to check the drives is good - but interpreting the values is difficult (at least for me ;-)) - Is it correct that sector reallocation is the thing to have focus on?

Interpreting is difficult for everybody...   For the most part,  re-allocated sectors is the one thing we know we can focus on... but even there, according to seagate, modern drives have thousands of spare sectors.  If 6 are re-allocated when you first exercise the drive,  in my mind it is fine, unless the numbers of re-allocated sectors increase more and more each time you use the disk.  Then, it is replacement time.

 

Look here for a good summary of what and how to interpret what you are seeing:

http://en.wikipedia.org/wiki/S.M.A.R.T.

 

Joe L.

Share this post


Link to post

thanks for the infos - did some reading, lot's of details. Hmmm, I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails ;-) - just kidding - I like the concept of the preclear script very much - once reading and writing the whole HD before using it in production IS a help to discover problems in advance. At least I found 2 harddiscs behaving strange - will have a closer look to them after doing my migration to the healthy drives.

Share this post


Link to post
I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails

Don't you mean... just installing and being surprised, if when something fails  :( :( :(

 

On MS-Windows, these disk issues are not visible.  We happily go along until the OS will no longer boot, or we cannot open the critical document we need...  Now, I'm sure many of those issues are bugs in programs... but some might not be... some are disk sectors that become unreadable.

 

Remember, there are only two types of disk drives.

 

No, not IDE and SATA.

 

The two disk drive types are:

1. Those disks that have already failed.

2. Those disks that have not YET failed, but will... it's just a matter of time.

 

Joe L.

Share this post


Link to post

I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails

Don't you mean... just installing and being surprised, if when something fails  :( :( :(

[...]

Yes, you're absolutely right - but you noticed my smiley also, didn't you ...

It IS a positive thing to get those extended informations - I appreciate it - and as you might have seen to my last posts: at least 2 drives of my former windows raid-5 do not behave good - and I am more than happy to identify them and throw them out of my box. It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-)

Share this post


Link to post
It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-)

Well... sorry about causing you extra "trouble" but then I figure you might want to avoid extra issues that can be uncovered before you move your files...   The cost of a few new drives is small compared to the amount of time and effort needed otherwise.

 

I hope your data transfer goes smoothly once you have a set of disks to move it to.   From what you've said, your RAID-5 array would have had to deal with the defects on those two old disks at some point... and it might not have been as easy to swap in a new larger drive.

 

Joe L.

Share this post


Link to post

Ran Preclear on a new WD 750 Green drive, and this is the before & after:

 

Before:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   199   199   021    Pre-fail  Always       -       5050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       8
194 Temperature_Celsius     0x0022   126   119   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

 

After:

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   199   199   021    Pre-fail  Always       -       5050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       21
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       8
194 Temperature_Celsius     0x0022   113   111   000    Old_age   Always       -       37
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

 

 

The only one that concerns me is the Raw_Read_Error_Rate, looks like the VALUE doubled, but the WORST went down, although the RAW_VALUE is still 0?  Anybody see any reason I shouldn't put this drive in my unraid box?

 

Share this post


Link to post

Ran Preclear on a new WD 750 Green drive, and this is the before & after:

 

Before:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   199   199   021    Pre-fail  Always       -       5050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       8
194 Temperature_Celsius     0x0022   126   119   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

 

After:

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   199   199   021    Pre-fail  Always       -       5050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       21
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       6
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       8
194 Temperature_Celsius     0x0022   113   111   000    Old_age   Always       -       37
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

 

 

The only one that concerns me is the Raw_Read_Error_Rate, looks like the VALUE doubled, but the WORST went down, although the RAW_VALUE is still 0?  Anybody see any reason I shouldn't put this drive in my unraid box?

 

From what I've read, the "VALUE" is the current value.  The "WORST" is frequently initialized at the factory with a starting value of 253, and the "THRESH" is a value where (when reached) that parameter will be considered as failed.  So, for your drive, the "VALUE" would need to go down to 51 for the RAW_READ_ERROR_RATE parameter to be an issue. 

 

Since you have just put the drive into service, you should see the RAW_READ_ERROR_RATE "VALUE" will stay pretty stable over time.  If, at some point in the future you find it getting closer to the THRESH value, it would indicate some kind of problem getting worse.  At that point in time, you can replace the drive proactively.

 

All that has happened in the pre-clear process is that the "VALUE" and "WORST" are now changed from the initialized factory values to those that are reflecting how your drive is actually performing.

 

As far as the "raw" value, it may never change from 0 for that drive.  The internal method used to calculate most parameters is only known by the drive manufacturer.

 

I see no reason why you should not add the drive to your array. Looks pretty good to me.

 

Joe L.

Share this post


Link to post

It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-)

Well... sorry about causing you extra "trouble" but then I figure you might want to avoid extra issues that can be uncovered before you move your files...   The cost of a few new drives is small compared to the amount of time and effort needed otherwise.

 

I hope your data transfer goes smoothly once you have a set of disks to move it to.   From what you've said, your RAID-5 array would have had to deal with the defects on those two old disks at some point... and it might not have been as easy to swap in a new larger drive.

 

Joe L.

I appreciate the help and the abilities of your tools - I didn't complain, just reported back my experience. Please don't misunderstand me - I am happy to discover the problems in advance instead of having the trouble later and yes, you're completely right - the price of a new disk is nothing compared to trouble of a machine and the data on it - that's why I replaced the failing drives quickly with new ones...

 

Share this post


Link to post

It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-)

Well... sorry about causing you extra "trouble" but then I figure you might want to avoid extra issues that can be uncovered before you move your files...   The cost of a few new drives is small compared to the amount of time and effort needed otherwise.

 

I hope your data transfer goes smoothly once you have a set of disks to move it to.   From what you've said, your RAID-5 array would have had to deal with the defects on those two old disks at some point... and it might not have been as easy to swap in a new larger drive.

 

Joe L.

I appreciate the help and the abilities of your tools - I didn't complain, just reported back my experience. Please don't misunderstand me - I am happy to discover the problems in advance instead of having the trouble later and yes, you're completely right - the price of a new disk is nothing compared to trouble of a machine and the data on it - that's why I replaced the failing drives quickly with new ones...

 

I did not misunderstand... Just wanted to save you, and any others reading this thread from problems you might otherwise avoid.   I've spent many hours loading my array, I'm sure every unRAID owner's experience is much the same.      I appreciate feedback... good and bad...  Most important, I learn from everyone's experience... there is just no way for me to duplicate everyone's hardware and errors experienced.  if the script needs improvement, I'm the first to admit it.    I saw the "smileys" in your previous post and understood their meaning.

 

I know your plans for a quick migration of data were put aside when the old disks you intended to use did not test well... but with new replacements in place it should be much better. I am hoping by now you are starting your data migration.  

 

Joe L.

Share this post


Link to post

Hi,

 

I think this should be the right thread to post my question.  :)

 

I just purchased two 1.5 TB Samsung SATA drives and tried to use the preclear.sh script to prepare the HD. What I normally do is to connect the HD to the external SATA port on the system and ran though the preclear.sh scripts one disk at a time. Unfortunately Both of them returned the same unsuccessful results, which shows below.

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdk

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

=

Elapsed Time:  7:56:59

============================================================================

==

== SORRY: Disk /dev/sdk MBR could NOT be precleared

==

============================================================================

0+0 records in

0+0 records out

0 bytes (0 B) copied, 2.8617e-05 s, 0.0 kB/s

0000000

 

The only difference is that when I ran the first HD, the syslog was filled up with the following message like 600MB. I then deleted the syslog and do a touch command to created a new syslog. I then ran the preclear.sh on my second drives and the syslog remains size of 0.

 

Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32563752

Aug 20 03:37:23 Tower kernel: sd 9:0:0:0: [sdk] Result: hostbyte=0x04 driverbyte=0x00

Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32579816

Aug 20 03:37:23 Tower kernel: sd 9:0:0:0: [sdk] Result: hostbyte=0x04 driverbyte=0x00

Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32595880

Aug 20 03:37:23 Tower kernel: sd 9:0:0:0: [sdk] Result: hostbyte=0x04 driverbyte=0x00

Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32611944

 

With the message such as "SORRY: Disk /dev/sdk MBR could NOT be precleared", does it imply that both of my HD are defect?

 

I appreciate if anyone can chime-in what I should do next?

 

--Tom

Share this post


Link to post

That message indicates that the cleared "signature" expected in the first 512 bytes of the disk was not found when the disk was read after being written to.

 

Your error messages seem to indicate that reading the drive is failing.

 

I'd stop the array, power down, and check the cabling.  Odds are either it is not seated properly, or one of the cables to the drive is defective.

 

You can use the

preclear_disk.sh -t /dev/sdk

command to test if the pre-clear was successful. (With all the errors, it might not have been)

It will run in a few seconds and let you know if the disk is cleared.

 

It also sounds as if you are hot-plugging the external disks... DO NOT...  the SATA drives may be, but unRAID is

NOT. You could cause yourself all kinds of grief. 

 

(I apologize if you had both plugged in at the same time, but it sounds as if you had one disk connected, and then the other.)

 

The syslog filling with disk errors is not a good sign...

 

I'd run

smartctl -d ata -a /dev/sdk

on the drive, to see what the full SMART report says.

 

Joe L.

Share this post


Link to post

Hi Joe,

 

Thanks for the analysis. Please see below for the smartctl command output and it seems to be fine to me.

 

I did the hot-plug on the external SATA cable.Thanks for pointing out that to me since I did not know I am not supposed to do that.  I am using putty to connect to unRAID server. Whenever I execute the command  "preclear_disk.sh -t /dev/sdk", the session just terminated right away. I think I will go ahead stop the array and restart the server, and then run the preclear_disk.sh again.

 

Thanks,

--Tom

 

 

------------------------------------------------------------------------

root@Tower:~# smartctl -d ata -a /dev/sdk

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:    SAMSUNG HD154UI

Serial Number:    S1Y6J1KS744099

Firmware Version: 1AG01118

User Capacity:    1,500,301,910,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 3b

Local Time is:    Fri Aug 21 12:44:04 2009 Local time zone must be set--see zic m

 

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

 

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (19393) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (  2) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (  34) minutes.

SCT capabilities:              (0x003f) SCT Status supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  100  100  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0007  071  071  011    Pre-fail  Always      -      9640

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      4

  5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  253  253  051    Pre-fail  Always      -      0

  8 Seek_Time_Performance  0x0025  100  100  015    Pre-fail  Offline      -      0

  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      10

10 Spin_Retry_Count        0x0033  100  100  051    Pre-fail  Always      -      0

11 Calibration_Retry_Count 0x0012  100  100  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      4

13 Read_Soft_Error_Rate    0x000e  100  100  000    Old_age  Always      -      0

183 Unknown_Attribute      0x0032  100  100  000    Old_age  Always      -      0

184 Unknown_Attribute      0x0033  100  100  000    Pre-fail  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

188 Unknown_Attribute      0x0032  100  100  000    Old_age  Always      -      0

190 Airflow_Temperature_Cel 0x0022  075  075  000    Old_age  Always      -      25 (Lifetime Min/Max 25/26)

194 Temperature_Celsius    0x0022  075  075  000    Old_age  Always      -      25 (Lifetime Min/Max 25/27)

195 Hardware_ECC_Recovered  0x001a  100  100  000    Old_age  Always      -      223195953

196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  253  253  000    Old_age  Always      -      5

200 Multi_Zone_Error_Rate  0x000a  253  253  000    Old_age  Always      -      0

201 Soft_Read_Error_Rate    0x000a  100  100  000    Old_age  Always      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

Share this post


Link to post

I think I will go ahead stop the array and restart the server, and then run the preclear_disk.sh again.

 

It sounds as if the out-of-memory kernel process is killing processes on your server.

 

deleting the syslog does not free the space it uses if there is still a process that has an open file-descriptor writing to it.

 

The blocks are freed only after there are no more references to it, and a open file-descriptor is a reference.

 

Some programs actually take advantage of this behavior and create a temp file, open it for reading and writing, then delete it.

Until the file-descriptors are closed, the temp file is still readable and writable by that program...  The memory (and space) is automatically freed when the program exists.   

 

To stop the old syslog process, and restart it, type the following:

/etc/rc.d/rc.syslog restart

 

It should free up the memory and you should then see the new syslog file you created start to be used.

Share this post


Link to post

As far as the hot-plug causing harm...  Look through this thread

 

After a hot plug, and a reboot when it did not work as expected, the user accidentally started a "parity check" with a drive that was not mounted.  It ran for a minute or two before he stopped it

It read "zeros" from the un-mounted drive and changed parity accordingly... Later, when a replacement drive was installed, those zeros were written to it instead of the normal file-system structures.  Basically, he had wiped his data, from both parity and the drive.  That hot-plug initiated actions that resulted in one of the few cases I know of where unRAID lost data.

 

All that said, stop your array, reboot, and you'll probably be fine.  Oh yeah... don't hot-plug... always stop the array and power down.

 

Joe L.

Share this post


Link to post

Hi Joe,

 

Thanks again for the explanations. Those are good notes that I will keep to maintain my unRAID server.

 

I am still waiting for preclear_disk.sh script to finish and will update later on.

 

--Tom

Share this post


Link to post

Hi Joe,

 

The preclear_disk.sh script finally completed one cycle and below is the result. I suppose it is OK, right?

 

Thanks,

--Tom

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdj

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  14:36:38

============================================================================

==

== Disk /dev/sdj has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

71c71

< 190 Airflow_Temperature_Cel 0x0022  075  075  000    Old_age  Always      -      25 (Lifetime Min/Max 25/26)

---

> 190 Airflow_Temperature_Cel 0x0022  072  072  000    Old_age  Always      -      28 (Lifetime Min/Max 25/28)

77c77

< 200 Multi_Zone_Error_Rate  0x000a  253  253  000    Old_age  Always      -      0

---

> 200 Multi_Zone_Error_Rate  0x000a  100  100  000    Old_age  Always      -      0

============================================================================

 

Share this post


Link to post

Hi Joe,

 

The preclear_disk.sh script finally completed one cycle and below is the result. I suppose it is OK, right?

 

Thanks,

--Tom

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdj

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  14:36:38

============================================================================

==

== Disk /dev/sdj has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

71c71

< 190 Airflow_Temperature_Cel 0x0022   075   075   000    Old_age   Always       -       25 (Lifetime Min/Max 25/26)

---

> 190 Airflow_Temperature_Cel 0x0022   072   072   000    Old_age   Always       -       28 (Lifetime Min/Max 25/28)

77c77

< 200 Multi_Zone_Error_Rate   0x000a   253   253   000    Old_age   Always       -       0

---

> 200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0

============================================================================

 

After 14+ hours your disk temperature went from 25C to 28C.     I'd say that in itself is not too serious  ;)  but it does say you have some serious fans. ;)

 

The S.M.A.R.T. wiki here indicates that attribute 200 is

200 C8 Write Error Rate / Multi-Zone Error Rate

The total number of errors when writing a sector.

You started with the default initialized value of 253, and after a full 14+ hour pre-clear cycle, it has a normalized value of 100.  The failure threshold is 0.  You are nowhere close to the failure threshold value, so unless it changes over time, you are fine there too.

 

Joe L.

Share this post


Link to post

I just ran 2 disks single cycle.  One disk was fine the other was not so much.  Do you agree that this might be an RMA canidate?  I'm running a sencond cycle to be sure..

 

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

57c57

< 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0

---

> 1 Raw_Read_Error_Rate 0x000f 099 099 051 Pre-fail Always - 5005

66c66

< 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0

---

> 13 Read_Soft_Error_Rate 0x000e 099 099 000 Old_age Always - 4648

69c69

< 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

---

> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 4952

71c71

< 190 Airflow_Temperature_Cel 0x0022 070 070 000 Old_age Always - 30 (Lifetime Min/Max 30/30)

---

> 190 Airflow_Temperature_Cel 0x0022 068 067 000 Old_age Always - 32 (Lifetime Min/Max 30/33)

74c74

< 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

---

> 197 Current_Pending_Sector 0x0012 092 092 000 Old_age Always - 331

78c78

< 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0

---

> 201 Soft_Read_Error_Rate 0x000a 097 097 000 Old_age Always - 228

============================================================================

 

 

Share this post


Link to post

I just ran 2 disks single cycle.  One disk was fine the other was not so much.  Do you agree that this might be an RMA canidate?  I'm running a sencond cycle to be sure..

 

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

57c57

< 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0

---

> 1 Raw_Read_Error_Rate 0x000f 099 099 051 Pre-fail Always - 5005

66c66

< 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0

---

> 13 Read_Soft_Error_Rate 0x000e 099 099 000 Old_age Always - 4648

69c69

< 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

---

> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 4952

71c71

< 190 Airflow_Temperature_Cel 0x0022 070 070 000 Old_age Always - 30 (Lifetime Min/Max 30/30)

---

> 190 Airflow_Temperature_Cel 0x0022 068 067 000 Old_age Always - 32 (Lifetime Min/Max 30/33)

74c74

< 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

---

[glow=yellow,2,300]> 197 Current_Pending_Sector 0x0012 092 092 000 Old_age Always - 331[/glow]  [glow=pink,2,300] <-- This looks like a good candidate for an RMA to me.[/glow]

78c78

< 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0       

---

> 201 Soft_Read_Error_Rate 0x000a 097 097 000 Old_age Always - 228

============================================================================

I see an RMA in your future.  >:(

 

Not sure why the 331 sectors were not re-allocated already, unless the failures were in the post-read, and no subsequent "write" has happened since then to those sectors.

 

Joe L.

Share this post


Link to post

Hi all,

 

First post so do excuse me if I my questions have been asked before...

 

I am currently trying out the free version (before I put my money down) and running it on an old HP P4 1.9 Ghz server. I have 3 new Seagate Barracuda ES 750Gb hdds which I am preclearing. So far the SMART reports look fine although my temps are a bit high as I live awfully near the Equator (ambient temp is 30+...bah).

 

As it's an old server, I am running the 3 hdds off a PCI SATA card. When preclearing a single drive, I am getting about 30-35 mb/s. Same for 2 drives and down to 15 mb/s for all 3 drives. In terms of hours, that's 13 for 1-2 drives and 23 for 3 drives. Based on the what I have read in the threads, I should be taking about 10 hours for a single drive alone. Not sure if the line below that says "Write cache: disabled" is normal..  :(

 

[sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

 

I have attached the syslog when preclearing all 3 drives below. /dev/hda is a CF card in an IDE adaptor so you can ignore the errors.

 

 

The other thing I am concerned about is that one of the hdds (/dev/sdc) makes a squeaking sound when starting up. No more weird sounds after that. I am told that this is normal for Seagate hdds and nothing to worry about. Still a bit worried as the other 2 do not make the same sound starting up. So should I be concerned?  ???

 

Thanks for everyone's attention. If this is the wrong thread, do let me know and I will start a new one instead.

 

 

Share this post


Link to post

I am currently trying out the free version (before I put my money down) and running it on an old HP P4 1.9 Ghz server. I have 3 new Seagate Barracuda ES 750Gb hdds which I am preclearing. So far the SMART reports look fine although my temps are a bit high as I live awfully near the Equator (ambient temp is 30+...bah).

 

As it's an old server, I am running the 3 hdds off a PCI SATA card. When preclearing a single drive, I am getting about 30-35 mb/s. Same for 2 drives and down to 15 mb/s for all 3 drives. In terms of hours, that's 13 for 1-2 drives and 23 for 3 drives. Based on the what I have read in the threads, I should be taking about 10 hours for a single drive alone. Not sure if the line below that says "Write cache: disabled" is normal..  :(

 

[sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

 

That is not normal, both read and write caching are usually enabled.  I don't have any ideas as to what to make of it though.  You are using a board based on VIA chipsets, which is usually problematic.  I can't say it won't work, but I have seen little success with VIA based boards here.

 

I know you said the 3 drives are "Seagate Barracuda ES 750Gb" drives, but they don't look like anything I have ever seen before.  Neither the Linux kernel or Smartctl 5.38 were able to identify the manufacturer.  They are identified by SMART as (with different serial numbers):

Device Model:     GB0750C4414
Serial Number:    5QD51Y9T
Firmware Version: HPG4

Perhaps someone with experience with the ES series of drives can help here.

 

The other thing I am concerned about is that one of the hdds (/dev/sdc) makes a squeaking sound when starting up. No more weird sounds after that. I am told that this is normal for Seagate hdds and nothing to worry about. Still a bit worried as the other 2 do not make the same sound starting up. So should I be concerned?  ???

 

Squeaking sounds are definitely not normal either, I don't think I have ever heard a hard drive squeak.  Are you positive that the squeaks are coming from the drive, and not a fan?

Share this post


Link to post

That is not normal, both read and write caching are usually enabled.  I don't have any ideas as to what to make of it though.  You are using a board based on VIA chipsets, which is usually problematic.  I can't say it won't work, but I have seen little success with VIA based boards here.

 

Sounds bad. Guess I have to find out more on this. Read speeds are 70+ mb/s using hdparm -tT so no problems there. It's the write speed that is pathetic. Although once Unraid is up, write speeds will probably be limited more by the network speed.

 

I know you said the 3 drives are "Seagate Barracuda ES 750Gb" drives, but they don't look like anything I have ever seen before.  Neither the Linux kernel or Smartctl 5.38 were able to identify the manufacturer.  They are identified by SMART as (with different serial numbers):

Device Model:     GB0750C4414
Serial Number:    5QD51Y9T
Firmware Version: HPG4

Perhaps someone with experience with the ES series of drives can help here.

 

These are actually OEM Seagate drives for HP so you wouldn't find the model number on Seagate's website.

 

Squeaking sounds are definitely not normal either, I don't think I have ever heard a hard drive squeak.  Are you positive that the squeaks are coming from the drive, and not a fan?

 

Yup. Pretty sure as I put my ear to the hdd when pressing the power on button. Everything else appears to be okay so I am a bit reluctant to exchange for a new one which may or may not be better (if the squeaking sound is not a major problem).

 

 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.