[Solved] Hardware_ECC_Recovered increasing rapidly during disk upgrade


Recommended Posts

I am in the process of upgrading some disks in my tower which involved several parity checks / rebuilds. I don't normally check smart reports but out of boredom this morning I did and on a couple of 1.5TB Drives (SAMSUNG HD154UI) I noticed the value of the Hardware_ECC_Recovered is increasing rapidly during the drive upgrade process.

 

Here's what I've done during the last few days. I'm running 4.7 Pro.

 

1. Replaced the motherboad of the server with a HP microserver board but kept the MV8 controller that I had on the old celeron board.

2. Did a parity check with 0 errors. All 8 drives are run from MV8 card. Only cache drive runs form spare SATA port on MB.

3. Moved data from 2 old 1TB Samsung drives and removed them from the array. Rebuilt parity with 0 errors.

4. Another parity check to verify the rebuilt parity, again 0 errors.

5. Replaced 1st 1TB Hitachi drive with new 2TB WD Ears drive (pre cleared), Starting the upgrade process.

6. Decided to run some SMART reports to check on drive statuses for reallocated sectors and such. This is when I've noticed the 2 x SAMSUNG HD154UI reporting a high Hardware_ECC_Recovered which is increasing rapidly as the upgrade process continues.

 

I don't have an old report to compare the figures with data before the upgrade. The other drives in the array don't report this parameter in the SMART report. The drive upgrade process is going to last at least another 4 hours. Should I put back the old drive that I'm currently upgrading and run some tests, or can I safely let it finish?

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   071   071   011    Pre-fail  Always       -       9510
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1003
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       10618
10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       69
13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   085   073   000    Old_age   Always       -       15 (Lifetime Min/Max 10/15)
194 Temperature_Celsius     0x0022   084   066   000    Old_age   Always       -       16 (Lifetime Min/Max 10/17)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       165567083
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

 

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   070   070   011    Pre-fail  Always       -       9710
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1112
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       10636
10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       70
13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   084   066   000    Old_age   Always       -       16 (Lifetime Min/Max 11/16)
194 Temperature_Celsius     0x0022   083   060   000    Old_age   Always       -       17 (Lifetime Min/Max 11/17)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       109349427
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

Full reports and syslog attached.

 

Any thoughts are greatly appreciated.

 

Thanks!

smart_report_hdd_1.txt

smart_report_hdd_2.txt

syslog-2013-01-09.txt

Link to comment

The rebuild process moved past the 1.5TB mark and the 2 samsung drives stopped incrementing the Hardware_ECC_Recovered which has now reached 483,000,419 and 355,541,041. At this point I'm convinced this has happened during this rebuilt alone and I can't wait for it to finish rebuilding so that I can power down and reboot the server. It may be that the value will reset to 0 and that this is some kind of internal counter similar to what Seagate drives seem to use.

Link to comment

The rebuild process moved past the 1.5TB mark and the 2 samsung drives stopped incrementing the Hardware_ECC_Recovered which has now reached 483,000,419 and 355,541,041. At this point I'm convinced this has happened during this rebuilt alone and I can't wait for it to finish rebuilding so that I can power down and reboot the server. It may be that the value will reset to 0 and that this is some kind of internal counter similar to what Seagate drives seem to use.

All drives use hardware error correction, all have errors, some report them, some do not.

 

It is only an issue if the "normalized" value starts moving towards its affiliated failure threshold.

In your case, the "normalized" value of 100 has not budged. (it has not moved from its initial value of 100)

 

The meaning of the "raw" value is only known by the manufacturer, for this parameter, I doubt it is an actual count.

 

The drive is fine.

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.