"Current pending sector" value of 2... is my disk dying?


Recommended Posts

I only just installed unRAID recently, and now it's reporting that one of my 4TB array disks has a "current pending sector" value of 2.

I can run the SMART short self test with no errors, but when I run the extended test it gives back "Errors occurred - check your SMART report"

 

Unfortunately I hadn't yet set up my parity drive. I have the 8TB drive sitting on my desk but I didn't bother to install it yet. Didn't think I'd get a failure so soon after setting up unRAID...

 

It seems I can still read data from the array, as far as I can tell.

 

What is my best course of action? Is it too late to set up the parity drive? Should I just copy over data from the drive with the error onto the 8TB and then get a new drive as parity? Any help would be appreciated.

 

Thanks!

 

 

 

 

SMART report below:

=== START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD40EFRX-68WT0N0 Serial Number: WD-WCC4EFUL6CVL LU WWN Device Id: 5 0014ee 26015376b Firmware Version: 82.00A82 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Oct 21 17:02:23 2019 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, frozen [SEC2] Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (54000) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 540) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported.

SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTENAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 6 3 Spin_Up_Time POS--K 241 178 021 - 4925 4 Start_Stop_Count -O--CK 073 073 000 - 27039 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 056 056 000 - 32704 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 609 192 Power-Off_Retract_Count -O--CK 200 200 000 - 489 193 Load_Cycle_Count -O--CK 173 173 000 - 82313 194 Temperature_Celsius -O---K 115 100 000 - 37 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 2 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 5 |||||| K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning

General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x21 GPL R/O 1 Write stream error log 0x22 GPL R/O 1 Read stream error log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb7 GPL,SL VS 1 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 2 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 [1] occurred at disk power-on lifetime: 32594 hours (1358 days + 2 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 40 00 00 07 f7 37 70 e7 00 Error: UNC 64 sectors at LBA = 0x07f73770 = 133642096

Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- c8 00 00 00 40 00 00 07 f7 37 50 e7 08 2d+07:46:04.173 READ DMA c8 00 00 00 40 00 00 07 f7 37 90 e7 08 2d+07:46:04.161 READ DMA c8 00 00 00 80 00 00 07 f7 36 d0 e7 08 2d+07:46:04.161 READ DMA c8 00 00 00 80 00 00 07 f7 36 50 e7 08 2d+07:46:04.161 READ DMA c8 00 00 00 40 00 00 07 f7 36 10 e7 08 2d+07:46:04.161 READ DMA

Error 1 [0] occurred at disk power-on lifetime: 32594 hours (1358 days + 2 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 07 f7 2a f0 e7 00 Error: UNC at LBA = 0x07f72af0 = 133638896

Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- c8 00 00 00 00 00 00 07 f7 2a 10 e7 08 2d+07:46:00.287 READ DMA c8 00 00 00 00 00 00 07 f7 29 10 e7 08 2d+07:46:00.286 READ DMA c8 00 00 00 00 00 00 07 f7 28 10 e7 08 2d+07:46:00.285 READ DMA c8 00 00 00 00 00 00 07 f7 27 10 e7 08 2d+07:46:00.284 READ DMA c8 00 00 00 00 00 00 07 f7 26 10 e7 08 2d+07:46:00.284 READ DMA

SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Completed: read failure 90% 32700 133638896

2 Short offline Completed without error 00% 32698 -

3 Extended offline Completed: read failure 90% 32697 133638896

4 Extended offline Completed: read failure 90% 32697 133638896

5 Extended offline Completed: read failure 10% 32636 133638896

6 Short offline Completed without error 00% 32629 -

SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) Device State: Active (0) Current Temperature: 37 Celsius Power Cycle Min/Max Temperature: 31/41 Celsius Lifetime Min/Max Temperature: 12/52 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (431)

Index Estimated Time Temperature Celsius 432 2019-10-21 09:05 37 ****************** ... ..( 64 skipped). .. ****************** 19 2019-10-21 10:10 37 ****************** 20 2019-10-21 10:11 35 **************** 21 2019-10-21 10:12 36 ***************** ... ..( 2 skipped). .. ***************** 24 2019-10-21 10:15 36 ***************** 25 2019-10-21 10:16 37 ****************** ... ..( 69 skipped). .. ****************** 95 2019-10-21 11:26 37 ****************** 96 2019-10-21 11:27 38 ******************* ... ..( 15 skipped). .. ******************* 112 2019-10-21 11:43 38 ******************* 113 2019-10-21 11:44 37 ****************** ... ..( 14 skipped). .. ****************** 128 2019-10-21 11:59 37 ****************** 129 2019-10-21 12:00 36 ***************** ... ..( 38 skipped). .. ***************** 168 2019-10-21 12:39 36 ***************** 169 2019-10-21 12:40 37 ****************** ... ..( 17 skipped). .. ****************** 187 2019-10-21 12:58 37 ****************** 188 2019-10-21 12:59 38 ******************* ... ..( 43 skipped). .. ******************* 232 2019-10-21 13:43 38 ******************* 233 2019-10-21 13:44 37 ****************** ... ..(197 skipped). .. ****************** 431 2019-10-21 17:02 37 ******************

SCT Error Recovery Control: Read: 70 (7.0 seconds) Write: 70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 4 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 4 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 598776 Vendor specific

Link to comment

Please use the Diagnostics file     Tools    >>>  Diagnostics    rather than attempting to copy and paste into a message.  Current Pending Sectors can be anywhere on the disk and not necessary in an area where data is currently being stored.  From what little I can see, this is not a new disk...

 

If you want more info about SMART attributes, you can start here:

 

    https://en.wikipedia.org/wiki/S.M.A.R.T.

 

EDIT: be sure to attach the Diagnostics to a new post or we will never know you responded!

 

Edited by Frank1940
Link to comment
15 hours ago, Frank1940 said:

Please use the Diagnostics file     Tools    >>>  Diagnostics    rather than attempting to copy and paste into a message.  Current Pending Sectors can be anywhere on the disk and not necessary in an area where data is currently being stored.  From what little I can see, this is not a new disk...

 

If you want more info about SMART attributes, you can start here:

 

    https://en.wikipedia.org/wiki/S.M.A.R.T.

 

EDIT: be sure to attach the Diagnostics to a new post or we will never know you responded!

 

Diagnostics file attached. Let me know what you recommend; I'm frankly still a bit confused by what the SMART attributes all mean, and at what point you absolutely have to replace a disk.

 

You are correct the disk is not new. I had used it in a windows build prior to unRAID for a number of years.

Diagnostics.zip

Link to comment

SMART is a manufacturer devised spec and each manufacturer implements it a bit differently.  (None of them has ever 'published exactly what each parameter measures. Although most of them are obvious-- Power_On _Hours.)   It provides some 'peeks' into the overall health of a hard disk.  So you are not alone in being troubled by what you see.  Even experts are in disagreement of what to do with a disk that has some questionable issues on it.

 

You can safely ignore most of the attributes.  The ones that most of us think are important are the ones that Unraid will monitor for you.  You can find them listed on the   Settings   >>>    Disk Settings  page near the bottom.  (Attribute 199 is one that I question about being on this list as it is not a disk related failure.  It is a count of the number of times that the data transfer via the SATA cable is found to be in error.  This is usually a connection or cable issue.)

 

The disk in question has two pending sectors on it.  What this means is that there are two sectors on this disk that could not be read!  They may or may not contain current data.   In the case of Windows, if the "may not" was the case, everything would be fine.  You could read all the data on the disk.  However, when Unraid is rebuilding a failed disk, it has be be able to read every sector on every disk in the array.  If it could not read from these two sectors on the disk in question, the rebuild fails!

 

I would be replacing that disk because of the two pending sectors. I would be doing it because I would not want to take a chance of not being able to rebuild another disk!   @johnnie.black says to replace that disk because of the extended testing results.  (He is far more knowledgeable of these things than I am.)

 

If you really wanted to see if this disk might be usable, you could run a minimum of three preclear cycles on it.  On the first cycle, those Current_Pending_Sectors should go to zero and stay there and the Reallocated_Sector_Ct should go to '2' and not increase. Of course, all of the other preclear monitored attributes should not change except for power_on hours.

 

OH, for the preclear tester/utility look at the Apps tab.  It is available both as a plugin and Docker app.

Link to comment

Great thank you. So the course of actions I plan to take is:

  • Add new disk to the array
  • Copy all the data from the questionable disk onto the new one
  • Remove the questionable disk from the array, run preclear 3 times
  • If the reallocated sector count does not increase... add the disk back into the array??
    • If the count does increase... just trash the disk?

Does that sound right?

Link to comment

Basically, yes.  What you really do is depend on your risk-tolerance level. Some folks tolerate only the lowest possible exposure to risk while others will bet a fortune on the color of the next car that turns the corner.  You appear to be closer to the latter as you don't have a Parity disk.  (If you were the former, you would already have two Parity disks!)

Link to comment
26 minutes ago, GameKing505 said:

Great thank you. So the course of actions I plan to take is:

  • Add new disk to the array
  • Copy all the data from the questionable disk onto the new one
  • Remove the questionable disk from the array, run preclear 3 times
  • If the reallocated sector count does not increase... add the disk back into the array??
    • If the count does increase... just trash the disk?

Does that sound right?

When you run the preclear, the two "Current pending sector" should go to zero and not increase if the disc has any hope.  Since this is a Western digital drive, if the two sectors pending reallocation are readable the "Current pending sector" count and the "Reallocated event count" will both be zero after the preclear.  If the two sectors pending relocation are not readable, then your  "Current pending sector" count should go to zero and your "Reallocated event count" should go to two.

 

IMHO if either of the above happen and you run three preclears and another extended SMART test than you could probably use the disc again.  However, if the extended SMART test fails or your "Current pending sector" count or "Reallocated event count" goes higher than two, than the disc should be discarded.

 

One of my 8TB REDs with less than a year's worth of hours suddenly had two pending sectors.  I replaced the drive and rebuild the array (in your case without parity all you can do is try and copy the data first).  I ran a preclear and then "Current pending sector" count and "Reallocated event count" both read zero after.  I then ran another two preclears and then an extended SMART test.  No new "Current pending sector" count or "Reallocated event count" so I then returned the disc to its original location in the array.  The drive has worked flawlessly for many months since then.

 

Best,

craigr

Edited by craigr
Link to comment
9 minutes ago, Frank1940 said:

Basically, yes.  What you really do is depend on your risk-tolerance level. Some folks tolerate only the lowest possible exposure to risk while others will bet a fortune on the color of the next car that turns the corner.  You appear to be closer to the latter as you don't have a Parity disk.  (If you were the former, you would already have two Parity disks!)

There's no sensitive or critical data on the disk, just some media I could reacquire if needed. I will try the preclear method and see if it works.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.