March 22, 201016 yr Do I need to worry about that? I recently set up smarthistory along with the cron job JL mentioned so it runs everyday and a couple days ago I looked at the charts and noticed this: Disk 0: *ERROR* - Current_Pending_Sector it is now 10 (error threshold is 5) So I ran long smart reports on every drive in my array that day to see if that number would increase or if they would change into reallocated and make sure nothing was up with the other drives. Nothing changed. Here is the full report for my parity drive: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 159 149 021 Pre-fail Always - 9033 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 214 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 531 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 18 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2534 194 Temperature_Celsius 0x0022 125 110 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 198 000 Old_age Offline - 0 Then I also ran a parity check and that went fine as well, no errors: Mar 20 16:28:03 Tower kernel: mdcmd (64551): check CORRECT Mar 20 16:28:03 Tower kernel: md: recovery thread woken up ... Mar 20 16:28:03 Tower kernel: md: recovery thread checking parity... Mar 20 16:28:03 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. Mar 20 17:41:42 Tower kernel: mdcmd (64981): spinup 0 Mar 20 17:41:42 Tower kernel: Mar 20 22:29:46 Tower kernel: mdcmd (66634): spinup 0 Mar 20 22:29:46 Tower kernel: Mar 21 05:13:22 Tower kernel: md: sync done. time=45648sec rate=42795K/sec Mar 21 05:13:22 Tower kernel: md: recovery thread sync completion status: 0 But pending sectors is still 10. Should I just monitor it to make sure it doesn't increase or should I get a better drive into the parity slot? Or should I try the re-store button to make it write parity all over again and see if that helps? Thanks!
March 22, 201016 yr Do I need to worry about that? I recently set up smarthistory along with the cron job JL mentioned so it runs everyday and a couple days ago I looked at the charts and noticed this: Disk 0: *ERROR* - Current_Pending_Sector it is now 10 (error threshold is 5) So I ran long smart reports on every drive in my array that day to see if that number would increase or if they would change into reallocated. Nothing changed. Here is the full report for my parity drive: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 159 149 021 Pre-fail Always - 9033 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 214 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 531 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 18 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2534 194 Temperature_Celsius 0x0022 125 110 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 198 000 Old_age Offline - 0 Then I also ran a parity check and that went fine as well, no errors: Mar 20 16:28:03 Tower kernel: mdcmd (64551): check CORRECT Mar 20 16:28:03 Tower kernel: md: recovery thread woken up ... Mar 20 16:28:03 Tower kernel: md: recovery thread checking parity... Mar 20 16:28:03 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. Mar 20 17:41:42 Tower kernel: mdcmd (64981): spinup 0 Mar 20 17:41:42 Tower kernel: Mar 20 22:29:46 Tower kernel: mdcmd (66634): spinup 0 Mar 20 22:29:46 Tower kernel: Mar 21 05:13:22 Tower kernel: md: sync done. time=45648sec rate=42795K/sec Mar 21 05:13:22 Tower kernel: md: recovery thread sync completion status: 0 But pending sectors is still 10. Should I just monitor it to make sure it doesn't increase or should I need to get a better drive into the parity slot? Or should I try the re-store button to make it write parity all over again and see if that helps? Thanks! The sectors pending re-allocation will not be re-located until they are written to. (Basically, the disk has determined they are not readable, therefor, it has no way to move them since it has no way to know what to move) A standard parity check should attempt to read the sectors and then correct the parity disk by writing to it if it returns the wrong data. I would try a simple parity "Check" first. If that does not re-allocate them, then you can try the button labeled as "restore" to force a complete re-write. I doubt if that will be needed. (You'll be without parity protection during the complete re-calc, so you don't want to do that if other disks have problems.) Joe L.
March 22, 201016 yr Author A standard parity check should attempt to read the sectors and then correct the parity disk by writing to it if it returns the wrong data. I would try a simple parity "Check" first. Thanks Joe. But I just did a parity check (see above), albeit it was a "check correct" and it did not help. Does the "check, but not correct" version do anything different that would change these pending sectors? Or are you saying to do another parity check before trying restore?
March 22, 201016 yr A standard parity check should attempt to read the sectors and then correct the parity disk by writing to it if it returns the wrong data. I would try a simple parity "Check" first. Thanks Joe. But I just did a parity check (see above), albeit it was a "check correct" and it did not help. Does the "check, but not correct" version do anything different that would change these pending sectors? Or are you saying to do another parity check before trying restore? No, the CORRECT version is the one you wanted. The NOCORRECT would not write back the corrected value if it found one. I have no answer to why the "check" did not find the errors other than they might be in an area not checked. unRAID uses from sector 63 onward for the disk partitions. Sector 0 has the MBR and partition table on the data disks but is unused on the parity disk. The area from sector 1 through 62 is also unused, since unRAID starts its partitions on sector 63. unRAID does not read or write sectors 0 through 62 on the parity disk when checking parity.
March 22, 201016 yr Author Ok thanks, I guess I'll try a restore. I'd prefer my parity drive to be as healthy as possible. My other drives are working correctly.
March 22, 201016 yr Ok thanks, I guess I'll try a restore. I'd prefer my parity drive to be as healthy as possible. My other drives are working correctly. Let's see what happens. I'll be curious.
March 23, 201016 yr Author Thanks Joe, parity rebuild is done. Looks like it worked! And they must have become readable/writable since they are no longer pending and not even reallocated! SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 159 149 021 Pre-fail Always - 9033 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 217 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 552 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 18 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2552 194 Temperature_Celsius 0x0022 131 110 000 Old_age Always - 21 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 198 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged
March 23, 201016 yr Thanks Joe, parity rebuild is done. Looks like it worked! And they must have become readable/writable since they are no longer pending and not even reallocated! SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 159 149 021 Pre-fail Always - 9033 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 217 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 552 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 18 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2552 194 Temperature_Celsius 0x0022 131 110 000 Old_age Always - 21 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 198 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged That is good news indeed. The SMART firmware on the disks first attempts to re-write the sectors it could not read in their original locations on the disk. If that is successful, and they can be re-read from those original locations, it does not need to re-allocate them. Joe L.
March 23, 201016 yr That drive is not that old. To have 10 pending sectors reported, then go away without reallocation is a sign for caution and review. Writes will usually cause reallocation on pending sectors. On some drives, during early operation writes are verified with a read, then this operation is switched off when the drive passes a certain age. You should do a full read test to be sure. I would suggest keeping an eye on this drive with periodic SMART review and monthly parity checks. I would also suggest a SMART long test during an idle period and/or a parity NOCORRECT check just to insure everything is good.
March 25, 201016 yr Author That drive is not that old. To have 10 pending sectors reported, then go away without reallocation is a sign for caution and review. Writes will usually cause reallocation on pending sectors. On some drives, during early operation writes are verified with a read, then this operation is switched off when the drive passes a certain age. You should do a full read test to be sure. I would suggest keeping an eye on this drive with periodic SMART review and monthly parity checks. I would also suggest a SMART long test during an idle period and/or a parity NOCORRECT check just to insure everything is good. Thanks, I did a parity check after the rebuild to verify everything was ok and it still is the same and no errors reported. But I'm still gonna keep an eye on it, especially since I have 19 other drives that this one drive is parity for! I have monthly parity installed and actually run it more often than that currently. Parity check: Mar 24 08:45:39 Tower kernel: mdcmd (16658): check CORRECT Mar 24 08:45:39 Tower kernel: md: recovery thread woken up ... Mar 24 08:45:39 Tower kernel: md: recovery thread checking parity... Mar 24 08:45:40 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. Mar 24 21:51:37 Tower kernel: md: sync done. time=46887sec rate=41664K/sec Mar 24 21:51:37 Tower kernel: md: recovery thread sync completion status: 0 Here's the smart for the drive now: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 159 149 021 Pre-fail Always - 9033 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 223 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 611 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 18 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2708 194 Temperature_Celsius 0x0022 126 110 000 Old_age Always - 26 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 198 000 Old_age Offline - 0
Archived
This topic is now archived and is closed to further replies.