March 19, 201115 yr Hi I've been running unRAID for quite some time (C2SEA board + 1 1430SA and 10 disk incl. parity). I moved to 4.7 about a week ago. Yesterday I added a new 1430SA board in order to have move drives. Switched to defaut 4K alignment from now on. I precleared two 2TB WD EARS (no jumper) with option -A. I then replaced the existing 1.5TB by the new 2TB drive and parity reconstruction went well. I then replaced a small 500GB drive by the second 2TB drive. Everything was fine and new disk was reconstructing. About 25% of the process I could not longer access my machine remotely. I ran to the console and it was raining error messages and I could not interrupt them. I hit the power button. When restarting the new disk say "Reconstructing" but it doesn't progress. I noticed in my kernel.log (see attached) and error message with a call trace regarding an error to drivers/ata/libata-sff.c. [EDIT] Just looked at my kernel.log again and it is flooded with: handle_stripe read error: 3861400/8, count: 1 md: disk8 read error What should I do from now? Thanks alphazo I noticed that the temperature for the disk8 was ... 0°C even if it was showing a green light. I looked at the wiring and it could be related to a SATA cable not properly inserted in one the 1430SA board. My SATA cables have locks but the stupid 1430SA board doesn't safely lock any SATA cable. I restarted and the reconstruction is in progress. Stay tuned....I will post my results. kernel.txt
March 20, 201115 yr Post the results of smartctl -a -d ata /dev/sdc You may need to power cycle the drive and try again, so save you entire syslog first.
March 21, 201115 yr Author Disk reconstruction went well. I then did a full parity check and no problem showed up. So I guess I'm safe now. Here is the result of the smartctl command. root@babylon:~# smartctl -a -d ata /dev/sdc smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD15EARS-00Z5B1 Serial Number: WD-WMAVU2365339 Firmware Version: 80.00A80 User Capacity: 1,500,301,910,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Mon Mar 21 10:15:07 2011 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (31800) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 185 185 021 Pre-fail Always - 5725 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 946 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7696 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 27 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 12 193 Load_Cycle_Count 0x0032 189 189 000 Old_age Always - 33400 194 Temperature_Celsius 0x0022 127 109 000 Old_age Always - 23 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 43 - # 2 Extended offline Aborted by host 90% 38 - # 3 Short offline Completed without error 00% 37 - # 4 Short offline Aborted by host 40% 37 - # 5 Short offline Completed without error 00% 37 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Archived
This topic is now archived and is closed to further replies.