AndrewT Posted August 12, 2014 Share Posted August 12, 2014 **EDIT**: syslog attached My disk7 (sdh) shows a red ball after I restarted unRAID. I followed the Troublshooting section on obtaining smart reports-- first short which didn't seem to show anything, then the long report. Any idea why it went to a red disk or what's wrong with the drive? It's about a 2 year old drive, so I didn't expect this just yet. I wish I logged it, but I think this disk once before went red after a restart and I just restarted again and it went to green. The Troubleshooting documentation indicates "it is NEVER a fluke" to turn red, so I'm unsure what to do. SMART REPORT: smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF) Device Model: WDC WD20EARS-00MVWB0 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Tue Aug 12 13:29:17 2014 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (39000) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 376) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 219 156 021 Pre-fail Always - 4033 4 Start_Stop_Count 0x0032 096 096 000 Old_age Always - 4628 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 076 076 000 Old_age Always - 17982 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1943 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 97 193 Load_Cycle_Count 0x0032 173 173 000 Old_age Always - 81372 194 Temperature_Celsius 0x0022 118 083 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 17964 - # 2 Short offline Completed without error 00% 17958 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. syslog_RedBall_after_reboot.txt.zip Quote Link to comment
itimpi Posted August 12, 2014 Share Posted August 12, 2014 Nothing obviously wrong with the drive looking at the smart reports. A full syslog might provide further information as to why the drive was red-balled. A drive being red-balled in unRAID does not necessarily mean that there is something wrong with the drive. What it means is that at least one write to the drive has failed and unRAID will stop using it until the issue has been rectified. It is quite possible that it is a temporary glitch such as a connection being slightly loose, or a power glitch. Normal action is to check all cables are firmly pushed in and then take action to recover the drive. Are you running with a parity drive, and do you have a spare drive that could replace the one that has been red-balled if needed. The answer to that determines the best course of action to take for recovery purposes. Quote Link to comment
AndrewT Posted August 13, 2014 Author Share Posted August 13, 2014 I do have a 1-pass precleared disk (same 2 TB size) I want to add for array expansion but am apprehensive to swap it out and rebuild the array with the help of the parity disk. I have ~1.2 TB of files (not backed up anywhere else) on my cache drive that need to go onto the array and whenever I invoke the mover script unRAID goes completely unresponsive/crashed. Even the monitor plugged into the server goes blank and keyboard can't revive it to grab a syslog. I'm fairly certain the crash is because I invoked the mover while adding a large directory of tarballs (yes, stupid). I deleted that partial directory off the cache disk but am waiting to try to move the important files on the cache until the red ball disk goes back to green. Quote Link to comment
itimpi Posted August 13, 2014 Share Posted August 13, 2014 The red-balled disk will not go back to green until you take some sort of recovery action! As you have a spare drive then assuming you have good parity I would recommend the following to minimise any chance of data loss: Stop the array and set the red-balled drive to unassigned Start the array - it should start and now say that a disk is missing. Stop the array and assign the spare 2TB disk as a replacement for the 'red-balled' disk. At this time unRAID should tell you that it will rebuild the failed disk onto the new one. You can put the 'failed' disk somewhere safe for the time being in case anything goes wrong with the rebuild as it can be used for data recovery purposes if the rebuild fails for any reason Assuming that the rebuild works OK, then you can go through the process of running a pre-clear on the previously red-balled disk to see if in practice (as is commonly the case) it is actually fine and the red-ball was caused by an external factor. Assuming the disk passes this process it is now available to add as a new data disk to the array. Ideally you should avoid adding any new data to the array while recovery is in progress (although in theory it should work). The rebuild will not touch your cache disk so any data currently stored there will be unaffected. Do you have any plugins installed? I would suggest that if you do it may be worth running in 'Safe Mode' to disable plugins loading while going through the recovery process to avoid any possible issue caused by a plugin. As to why you were getting a crash when invoking mover that is not clear. You should have been able to do what you described without a crash happening, so it is possible there is some other underlying problem at the hardware level. Quote Link to comment
AndrewT Posted August 13, 2014 Author Share Posted August 13, 2014 I am now in the process of rebuilding the array with the new disk (plugins disabled). Is a follow-up parity check the way to confirm all files are recovered perfectly? Quote Link to comment
itimpi Posted August 13, 2014 Share Posted August 13, 2014 I am now in the process of rebuilding the array with the new disk (plugins disabled). Is a follow-up parity check the way to confirm all files are recovered perfectly? If you run a non-correcting parity check after the rebuild and no errors are reported then you should be OK. Technically this does not check that the files are fine, merely that they are in the same state as they were when the disk red-balled. The only way to be 100% certain that files have not been changed in any way since they were first placed on the disks is to calculate CRC checks for them and comparing them to the same checks of your backups. However most people do not bother to do this and make the (reasonably safe) assumption that the data is unchanged if no issues have been reported. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.