ives Posted November 11, 2014 Share Posted November 11, 2014 Hi, I have a problematic drive (WD 2tb) and have run a long SMART test on it and was wondering if someone could take a look and see if I need to RMA it.I don't really have much of a clue as to what I'm looking at and would be most grateful if someone could help me. Here's the SMART test: smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD20EZRX-00D8PB0 Serial Number: WD-WCC4M9YLCTRE LU WWN Device Id: 5 0014ee 20a8f8833 Firmware Version: 80.00A80 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Nov 11 05:50:41 2014 GMT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x80) Offline data collection activity was never started. Auto Offline Data Collection: Enabled. Self-test execution status: ( 117) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (26160) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 265) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 178 178 051 Pre-fail Always - 21327 3 Spin_Up_Time 0x0027 167 165 021 Pre-fail Always - 4650 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 62 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 16 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 225 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 60 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 33 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3614 194 Temperature_Celsius 0x0022 129 120 000 Old_age Always - 18 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 22 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 197 197 000 Old_age Offline - 1369 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 50% 215 1501859712 # 2 Extended offline Completed: read failure 70% 187 946507768 # 3 Short offline Completed: read failure 10% 186 946507768 # 4 Extended offline Completed: read failure 70% 186 946507768 # 5 Extended offline Completed: read failure 70% 185 946507768 # 6 Short offline Completed without error 00% 160 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
itimpi Posted November 11, 2014 Share Posted November 11, 2014 The things I noticed were There are a number of re-allocated sectors. While this is not an issue in itself as modern disks are designed to do that if needed, it would be an issue if that number is continually increasing. there are a number of 'pending sectors'. These indicate sectors that were not being read reliably. You do not want any disks in unRAID to have non-zero pending sector values as if a different disk subsequently failed these might stop a rebuild of the failed disk being 100% successful. Pending Sectors are only ever cleared on a write operation. If successful the pending status is cleared, and if unsuccessful the sector should be reallocated. If the disk in question is not part of the array then the easy solution is to run the pre-clear script against it. If it is part of the array then one way to Try to clear these would be to force a rebuild of the disk. However in such a case to minimise any chance of data loss it is better to do the rebuild onto a spare disk (if you have one) and then after that has succeeded the problem disk can then be tested with the pre-clear script without any risk of data loss. Quote Link to comment
ives Posted November 11, 2014 Author Share Posted November 11, 2014 If the disk in question is not part of the array then the easy solution is to run the pre-clear script against it. Hi and thanks for your input. This SMART test is the first thing I've done since a preclear on the drive. The preclear completed all 10 steps but on the final post- read I lost the network on the putty session.I noticed that on the final read the speed had slowed down to several kb/s. When I tested to see if the drive had been precleared using the "-t" option, it said it had. Because I lost the putty session I didn't get the final SMART test results, hence this SMART test. Bearing this in mind do you think the drive is worth persevering with? Quote Link to comment
Lacehim Posted November 11, 2014 Share Posted November 11, 2014 use screen when you pre-clear. http://lime-technology.com/wiki/index.php/Configuration_Tutorial#Preclearing_With_Screen If you get disconnected you can reattach to the session. Personally if it's "problematic" put a new one is and bin it or RMA it, they are only a hundred bucks these days. I don't know what problems you had maybe you can explain more. I recently had a drive like this on my Windoz machine, and it had 55 bad sectors but it was unrecoverable due to I/O problems, and really slow read issues both as a SATA drive and in USB3 enclosure. Lucky I had backups in place, and I just binned it (or in my case I use them for paperweights!). Quote Link to comment
ives Posted November 11, 2014 Author Share Posted November 11, 2014 Thanks for the advice. I bought 3 WD 2TB drives from you to go in an unraid box. Two are performing adequately but one is severely under performing in speed tests. I've run a 400MB read test on each drive using dd of=/dev/null bs=4096 count=102400 if=/dev/sdd 2 of the drives report read speeds of around 150mb/s whereas the other one is at 3.9mb/s . After preclearing it got better but was erratic. Anyway, I've has enough messing about with it. I'VE RM-ed it. Quote Link to comment
WeeboTech Posted November 11, 2014 Share Posted November 11, 2014 RMA it. This is enough to RMA it. @ 225 hours there are 22 pending sectors and you cannot pass a read test. Don't bother with pre-clear at this point. .. 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 225 ... 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 22 .. # 1 Extended offline Completed: read failure 50% 215 1501859712 # 2 Extended offline Completed: read failure 70% 187 946507768 # 3 Short offline Completed: read failure 10% 186 946507768 # 4 Extended offline Completed: read failure 70% 186 946507768 # 5 Extended offline Completed: read failure 70% 185 946507768 Quote Link to comment
ives Posted November 13, 2014 Author Share Posted November 13, 2014 ok. So I RMA-ed the drive and got a new one yesterday. I've got 3 drives in the machine, all the same WD 2tb green drives. I decided to preclear them all again. I started them all off at the same time. The 2 old drives finished preclearing in about 16 hours but the new drive is still preclearing after 25 hours and showing no signs of finishing anytime soon. Here is the current status : note the read speed is 468 kB/s unRAID server Pre-Clear disk /dev/sdd = cycle 1 of 1, partition start on sector 64 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Post-Read in progress: 2% complete. ( 40,265,318,400 of 2,000,398,934,016 bytes read ) 468 kB/s Disk Temperature: 23C, Elapsed Time: 25:57:43 Surely this drive cant be faulty as well? I tried changing the cables and the sata port on the old drive and it made no difference Any ideas what might be going on here? Quote Link to comment
WeeboTech Posted November 13, 2014 Share Posted November 13, 2014 I would abort the preclear. Swap this drive's position on the motherboard with another drive and test it out. I would probably look at the syslog first to see what messages are coming out. However you should still do the basic speed test. dd of=/dev/null bs=4096 count=102400 if=/dev/sd? where ? = device to test. Do hdparm speed tests also. Did you review the smart log of the new drive? (post it). Perhaps post your syslog to see if something else is going on with the motherboard. Quote Link to comment
ives Posted November 13, 2014 Author Share Posted November 13, 2014 thanks. how do I get the smart log of the new drive? Quote Link to comment
WeeboTech Posted November 13, 2014 Share Posted November 13, 2014 preclear captures the smart logs in the /boot/preclear_reports folder. Quote Link to comment
ives Posted November 13, 2014 Author Share Posted November 13, 2014 thanks. the other reports are there but not the drive that is still preclearing. if I stop preclear, will it appear? Quote Link to comment
WeeboTech Posted November 13, 2014 Share Posted November 13, 2014 thanks. the other reports are there but not the drive that is still preclearing. if I stop preclear, will it appear? There should definitely be a starting preclear smart report for the drive in process. There is a start rpt and finish file. At least from what I see with this example: root@unRAIDb:/boot/preclear_reports# ls -l *W1F1H834* -rwxrwxrwx 1 root root 5054 2013-01-14 18:49 preclear_finish_\ W1F1H834_2013-01-14* -rwxrwxrwx 1 root root 1889 2013-01-14 18:49 preclear_rpt_\ W1F1H834_2013-01-14* -rwxrwxrwx 1 root root 5032 2013-01-14 18:49 preclear_start_\ W1F1H834_2013-01-14* root@unRAIDb:/boot/preclear_reports# If not capture one with smartctl -a /dev/sd? Quote Link to comment
dgaschk Posted November 23, 2014 Share Posted November 23, 2014 thanks. the other reports are there but not the drive that is still preclearing. if I stop preclear, will it appear? No. You must wait for pre-clear to finish. Quote Link to comment
Joe L. Posted November 23, 2014 Share Posted November 23, 2014 initial report (and intermediate SMART reports) is stored in the /tmp directory until preclear finishes and the usual three reports are written to /boot/preclear_reports. If you've not rebooted, you'll find them there in that directory. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.