May 23, 201115 yr I've had unraid running for a while now with no major issues, but now I'm freaking out! Just earlier I checked and unraid showed my parity drive flashing red. I did a restart, all hard drives were found by the bios.. but then unraid showed the parity drive as blue and said it was a new drive!! It's currently rebuilding parity now... but this has got me really worried... I have attached a syslog, can somebody help diagnose please! syslog.txt
May 23, 201115 yr Author ok... the parity drive has gone red again with 2,880 errors.... I don't think it will be a cable problem as they all seem to be firm and its all brand new... the psu and 4way sas connector... how do I determine for sure the drive is faulty.. and what proof can I provide if I send back for a warranty claim?
May 23, 201115 yr Well, probably best to do some testing. Try another cable, try another port, etc. Just because it's brand new, doesn't mean it hasn't failed Try to run a SMART test on the drive. Generally they don't require proof. Tell them you put it in a RAID array and it spat errors. They'll just bork and hand you over a new one!
May 28, 201115 yr Author thanks for the help. I decided to plug the HD into another slot and plug in a different cable power cable. Next I tried to run a preclear. (I guess I should have done this before I had a problem) The HD is WD green 2tb drive. Stage 1 was showing that it was reading up to 100MB/s but often slowed down.. towards the end it was slowing down a lot. When I went to bed it was up to 95% and had taken about 13hrs.... When I woke up I have the following results in the attached file... I'm not sure why it's scrolled up... maybe my cat walked on the keyboard . But I can't tell from this what has happened... can somebody please help
May 28, 201115 yr Author from what I had read in the preclear thread I had expected a smart report at the end of the preclear. I'm sure this is covered somewhere in the forum... but could somebody please give me instructions on how to run a SMART test. thanks...
May 28, 201115 yr Author ok.. im starting to feel way out of my depth here.... I'm using a Supermicro X7SPA-HF-O board and controlling it remotely wit IPMI. As you can see from the attachment, once text goes of the top of the screen I can no longer access it.... Can somebody help me interpret this.. or help me see what the rest of the report said??
May 29, 201115 yr The easiest thing is to telnet to the server and then you can scroll up to copy and then paste the results. Or you can enter, "smartcl -d ata -a /dev/sdg > /boot/smart-sdg.txt". The results will be in a file on the flash drive.
May 29, 201115 yr Author thanks for the step by step help! I worked out how to telnet and got the following info, can somebody interpret please: === START OF INFORMATION SECTION === Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA3673880 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun May 29 13:04:57 2011 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (38400) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off supp ort. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 140 140 051 Pre-fail Always - 19198 3 Spin_Up_Time 0x0027 253 167 021 Pre-fail Always - 1958 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 277 5 Reallocated_Sector_Ct 0x0033 181 181 140 Pre-fail Always - 372 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2740 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 25 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 17 193 Load_Cycle_Count 0x0032 197 197 000 Old_age Always - 10121 194 Temperature_Celsius 0x0022 124 109 000 Old_age Always - 26 196 Reallocated_Event_Count 0x0032 167 167 000 Old_age Always - 33 197 Current_Pending_Sector 0x0032 198 196 000 Old_age Always - 919 198 Offline_Uncorrectable 0x0030 200 196 000 Old_age Offline - 6 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 128 001 000 Old_age Offline - 19463 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
May 29, 201115 yr The disk has 372 sectors already re-allocated and another 919 pending re-allocation when they are next written. Basically, it is failing with nearly 1400 un-readable sectors, even though it has not yet used up all the spare sectors, it will soon enough. (typically on a large drive, there are several thousand spare sectors) RMA it. Reallocated_Sector_Ct 0x0033 181 181 140 Pre-fail Always 372 Current_Pending_Sector 0x0032 198 196 000 Old_age Always - 919
Archived
This topic is now archived and is closed to further replies.