Jump to content

[SOLVED] Red Ball - Extended (Long) SMART Report


Recommended Posts

**EDIT**:  syslog attached

 

My disk7 (sdh) shows a red ball after I restarted unRAID. I followed the Troublshooting section on obtaining smart reports-- first short which didn't seem to show anything, then the long report. Any idea why it went to a red disk or what's wrong with the drive?

 

It's about a 2 year old drive, so I didn't expect this just yet. I wish I logged it, but I think this disk once before went red after a restart and I just restarted again and it went to green. The Troubleshooting documentation indicates "it is NEVER a fluke" to turn red, so I'm unsure what to do.

 

 

 

 

 

 

SMART REPORT:

smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)

Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

 

=== START OF INFORMATION SECTION ===

Model Family:    Western Digital Caviar Green (AF)

Device Model:    WDC WD20EARS-00MVWB0

Firmware Version: 51.0AB51

User Capacity:    2,000,398,934,016 bytes [2.00 TB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.6, 3.0 Gb/s

Local Time is:    Tue Aug 12 13:29:17 2014 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

                                        was suspended by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (39000) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

 

Short self-test routine

recommended polling time:        (  2) minutes.

Extended self-test routine

recommended polling time:        ( 376) minutes.

Conveyance self-test routine

recommended polling time:        (  5) minutes.

SCT capabilities:              (0x3035) SCT Status supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  219  156  021    Pre-fail  Always      -      4033

  4 Start_Stop_Count        0x0032  096  096  000    Old_age  Always      -      4628

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  076  076  000    Old_age  Always      -      17982

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  100  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  099  099  000    Old_age  Always      -      1943

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      97

193 Load_Cycle_Count        0x0032  173  173  000    Old_age  Always      -      81372

194 Temperature_Celsius    0x0022  118  083  000    Old_age  Always      -      32

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error      00%    17964        -

# 2  Short offline      Completed without error      00%    17958        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

syslog_RedBall_after_reboot.txt.zip

Link to comment

Nothing obviously wrong with the drive looking at the smart reports.  A full syslog might provide further information as to why the drive was red-balled.

 

A drive being red-balled in unRAID does not necessarily mean that there is something wrong with the drive.  What it means is that at least one write to the drive has failed and unRAID will stop using it until the issue has been rectified.  It is quite possible that it is a temporary glitch such as a connection being slightly loose, or a power glitch. 

 

Normal action is to check all cables are firmly pushed in and then take action to recover the drive.  Are you running with a parity drive, and do you have a spare drive that could replace the one that has been red-balled if needed.  The answer to that determines the best course of action to take for recovery purposes.

Link to comment

I do have a 1-pass precleared disk (same 2 TB size) I want to add for array expansion but am apprehensive to swap it out and rebuild the array with the help of the parity disk.

 

I have ~1.2 TB of files (not backed up anywhere else) on my cache drive that need to go onto the array and whenever I invoke the mover script unRAID goes completely unresponsive/crashed. Even the monitor plugged into the server goes blank and keyboard can't revive it to grab a syslog. I'm fairly certain the crash is because I invoked the mover while adding a large directory of tarballs (yes, stupid). I deleted that partial directory off the cache disk but am waiting to try to move the important files on the cache until the red ball disk goes back to green.

Link to comment

The red-balled disk will not go back to green until you take some sort of recovery action!

 

As you have a spare drive then assuming you have good parity I would recommend the following to minimise any chance of data loss:

  • Stop the array and set the red-balled drive to unassigned
  • Start the array - it should start and now say that a disk is missing.
  • Stop the array and assign the spare 2TB disk as a replacement for the 'red-balled' disk.  At this time unRAID should tell you that it will rebuild the failed disk onto the new one.  You can put the 'failed' disk somewhere safe for the time being in case anything goes wrong with the rebuild as it can be used for data recovery purposes if the rebuild fails for any reason
  • Assuming that the rebuild works OK, then you can go through the process of running a pre-clear on the previously red-balled disk to see if in practice (as is commonly the case) it is actually fine and the red-ball was caused by an external factor.  Assuming the disk passes this process it is now available to add as a new data disk to the array.

Ideally you should avoid adding any new data to the array while recovery is in progress (although in theory it should work).  The rebuild will not touch your cache disk so any data currently stored there will be unaffected.

 

Do you have any plugins installed?  I would suggest that if you do it may be worth running in 'Safe Mode' to disable plugins loading while going through the recovery process to avoid any possible issue caused by a plugin.

 

As to why you were getting a crash when invoking mover that is not clear.  You should have been able to do what you described without a crash happening, so it is possible there is some other underlying problem at the hardware level.

 

Link to comment

I am now in the process of rebuilding the array with the new disk (plugins disabled).

 

Is a follow-up parity check the way to confirm all files are recovered perfectly?

If you run a non-correcting parity check after the rebuild and no errors are reported then you should be OK.

 

Technically this does not check that the files are fine, merely that they are in the same state as they were when the disk red-balled.    The only way to be 100% certain that files have not been changed in any way since they were first placed on the disks is to calculate CRC checks for them and comparing them to the same checks of your backups.    However most people do not bother to do this and make the (reasonably safe) assumption that the data is unchanged if no issues have been reported.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...