Is my array about to die?

July 19, 201213 yr

I had a power failure yesterday whilst my UPS was being used for a day elsewhere (murphy's back). Parity sync completed and there weere 640 odd errors. The fileserver was being used a lot that day so figured that might be the problem. It says the parity is valid but I feel like I should run the test again. Before doing that, I decided to have a look in unmenu at the smart history reports. This is where my heart sank. First drive gave this report

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 219 159 021 Pre-fail Always - 4025

4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1520

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0

9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6738

10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 272

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 35

193 Load_Cycle_Count 0x0032 159 159 000 Old_age Always - 124295

194 Temperature_Celsius 0x0022 129 114 000 Old_age Always - 21

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

So I'm thinking its time to replace the drive. Thing is i went through every other drive in the array and they are all giving roughly the same errors! One of the drives was installed in the last month. I feel like it cant be possible that all the drives are suddenly failing. Unless there was a spike in electricity before the lights tripped. Could this cause all the drives to start failing? Unraid is happy at the moment but I need to make sure the drives are ok. I dont want to run the parity sync before doing so. Could it be that unmenu is not giving me the right information? Is there another way I can check these drives?

Thanks for taking the time to read my questions.

Edit : Forgot to add my syslog. Done now.

syslog-2012-07-19.txt

Quote

July 19, 201213 yr

I don't see any errors in that SMART report - which item is concerning you?

Quote

July 19, 201213 yr

Author

All the bits that say pre-fail and old_age? They all above threshold eg. Raw_Read_Error_Rate at 200 value and threshold 51.

Or do I not know how to read these things?

Quote

July 19, 201213 yr

All the bits that say pre-fail and old_age? They all above threshold eg. Raw_Read_Error_Rate at 200 value and threshold 51.

Or do I not know how to read these things?

Nah. My understanding is that the ones you have to really watch out for are:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

And they're all reading at "0", which indicates a healthy disk.

Quote

July 19, 201213 yr

Author

Ok. So I was reading it wrong! Thanks for your help. I have checked all the drives against the 4 you mentioned to look out for.

1st drive

Reallocated_sector_ct - 2

reallocated_event_ct - 1

current_pending_sector - 1

offline_uncorrectable - 0

2nd drive

Reallocated_sector_ct - 27 (worrying?)

reallocated_event_ct - 0

current_pending_sector - 0

offline_uncorrectable - 0

The first drive has 6000 hrs powered on and the 2nd drive is nearly at 14000 hours. If the Reallocated sectors are at 27 will it quickly reach failing point or does it still have life in it? Also I noticed some of my drives reporting load_cycle_counts in the 100,000's. Is that normal? They are western digital EARS 2tb's

Safe to run my parity sync? Or could it send the 2nd drive over the edge?

Quote

July 19, 201213 yr

Author

On top of that my seagate drive is reporting the following

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 170984514

3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1909

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0

7 Seek_Error_Rate 0x000f 058 049 030 Pre-fail Always - 107391830176

9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7126

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 215

183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0

184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

188 Command_Timeout 0x0032 100 099 000 Old_age Always - 8590065668

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 081 050 045 Old_age Always - 19 (Min/Max 19/22)

194 Temperature_Celsius 0x0022 019 050 000 Old_age Always - 19 (0 9 0 0)

195 Hardware_ECC_Recovered 0x001a 054 026 000 Old_age Always - 170984514

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 114941914780915

241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 935982931

242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3182183045

Obviously Head_flying_hours cant be 114941914780915 unless these new hard drives are breaking the speed of light and causing the head to trime travel.

Quote

July 19, 201213 yr

On top of that my seagate drive is reporting the following

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 170984514

3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1909

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0

7 Seek_Error_Rate 0x000f 058 049 030 Pre-fail Always - 107391830176

9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7126

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 215

183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0

184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

188 Command_Timeout 0x0032 100 099 000 Old_age Always - 8590065668

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 081 050 045 Old_age Always - 19 (Min/Max 19/22)

194 Temperature_Celsius 0x0022 019 050 000 Old_age Always - 19 (0 9 0 0)

195 Hardware_ECC_Recovered 0x001a 054 026 000 Old_age Always - 170984514

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 114941914780915

241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 935982931

242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3182183045

Obviously Head_flying_hours cant be 114941914780915 unless these new hard drives are breaking the speed of light and causing the head to trime travel.

This drive looks fine.

Quote

July 19, 201213 yr

Author

Thanks. But how can it be showing such a high value for head hours?

Quote

July 19, 201213 yr

Ok. So I was reading it wrong! Thanks for your help. I have checked all the drives against the 4 you mentioned to look out for.

1st drive

Reallocated_sector_ct - 2

reallocated_event_ct - 1

current_pending_sector - 1

offline_uncorrectable - 0

2nd drive

Reallocated_sector_ct - 27 (worrying?)

reallocated_event_ct - 0

current_pending_sector - 0

offline_uncorrectable - 0

The first drive has 6000 hrs powered on and the 2nd drive is nearly at 14000 hours. If the Reallocated sectors are at 27 will it quickly reach failing point or does it still have life in it? Also I noticed some of my drives reporting load_cycle_counts in the 100,000's. Is that normal? They are western digital EARS 2tb's

Safe to run my parity sync? Or could it send the 2nd drive over the edge?

The first disks need to be rebuilt. That should clear the spending sector. The second disk is ok if the reallocated sector count does not increase.

Quote

July 19, 201213 yr

Thanks. But how can it be showing such a high value for head hours?

The raw value has meaning only to the manufacturer for most attributes.

Quote

July 19, 201213 yr

Author

Thank you. So should I run the parity sync and then rebuild? Also how do I rebuild the 1st disk? Are you saying to do that because there is a sector that has not been moved?

Quote

July 19, 201213 yr

All the bits that say pre-fail and old_age?

Those are categories of parameters. ALL parameters belong to one of those categories. none are failing at this time.

They all above threshold eg. Raw_Read_Error_Rate at 200 value and threshold 51.
Or do I not know how to read these things?

The latter. You don't know how to read the report. I see nothing wrong with the smart report.

Th normalized value is expected to be ABOVE the affiliated failure threshold. When the normalized value goes below the threshold, then the drive is considered to have failed by the SMART firmware. When that happens, the line will then say "failing_now" in the "WHEN_FAILED"column. (all your lines simply have a "-" in that column, indicating no failure)

Quote

July 19, 201213 yr

Author

Just finishing my parity sync. So if no errors are found should i still rebuild Drive 2?

Quote

July 19, 201213 yr

Thanks. But how can it be showing such a high value for head hours?

Most raw values are not meant for human interpretation. Only the manufacturer knows how to read most of those. (and they are not telling)

Apparently, only some of the bits in that value represent the hours.

114941914780915 hours = 134,529,395,000 years. (and somehow, I doubt your disk is that old )

Quote

July 19, 201213 yr

Thank you. So should I run the parity sync and then rebuild? Also how do I rebuild the 1st disk? Are you saying to do that because there is a sector that has not been moved?

The 1st disk has an unreadable sector, the pending sector. This will cause an issue if a different disk fails and needs to be rebuilt. In order to restore the pending sector it needs to be rewritten. Rebuilding the disk will write all sectors and resolve the pending sector. To rebuild a disk:

1. Stop the array and un-assign the disk.

2. Start the array and the stop the array.

3. re-assign the disk and start the array.

The disk will rebuild.

Just finishing my parity sync. So if no errors are found should i still rebuild Drive 2?

Is Drive 2 the 1st disk referred to above?

Quote

Is my array about to die?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)