internetfriend Posted February 2, 2012 Share Posted February 2, 2012 My server (4.7 plus) does a monthly parity scan. After I ran it, I saw there were errors on disk 0, 2, and my cache drive. Here's a snapshot of mymain, syslog and smart reports for all three. I'm not familiar with sector reallocation, usually my drives just straight up fail. SMART says they've all passed, should I be worried or carry on? I'd love if someone with a better mind on these kinds of things took a peek. I havent touched the system since I have noticed these problems. Should I shut down and reboot so the sectors are allocated or...? RMA? Yikes! Thank you!! MyMain: syslog: http://dl.dropbox.com/u/519591/unraid/syslog-2012-02-01.txt Disk0 smartctl -a -d ata /dev/sdd (parity) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA1020507 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Feb 2 00:09:05 2012 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40500) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 166 162 021 Pre-fail Always - 6691 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 601 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 8928 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 68 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 17 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2371 194 Temperature_Celsius 0x0022 122 115 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 13 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 6 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 22 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Disk2: smartctl -a -d ata /dev/sde (disk2) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA1050439 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Feb 2 00:09:40 2012 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (38100) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 198 198 051 Pre-fail Always - 2543 3 Spin_Up_Time 0x0027 165 162 021 Pre-fail Always - 6741 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 472 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 8908 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 61 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 13 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2198 194 Temperature_Celsius 0x0022 120 113 000 Old_age Always - 30 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 123 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 33 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 199 000 Old_age Offline - 248 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Cache: smartctl -a -d ata /dev/sdc (cache) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint T166 series Device Model: SAMSUNG HD501LJ Serial Number: S0ZFJ1KQ302194 Firmware Version: CR100-12 User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Thu Feb 2 00:09:42 2012 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (8779) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 150) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 107 3 Spin_Up_Time 0x0007 100 100 015 Pre-fail Always - 7488 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3478 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000e 253 253 000 Old_age Always - 0 8 Seek_Time_Performance 0x0024 253 253 000 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 8508 10 Spin_Retry_Count 0x0032 253 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0012 253 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1087 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 712510462 187 Reported_Uncorrect 0x0032 253 253 000 Old_age Always - 8912896 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 27 190 Airflow_Temperature_Cel 0x0022 074 055 000 Old_age Always - 26 194 Temperature_Celsius 0x0022 157 100 000 Old_age Always - 27 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 712510462 196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 202 Data_Address_Mark_Errs 0x0032 253 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1115 - # 2 Short captive Completed without error 00% 379 - # 3 Short captive Completed without error 00% 379 - # 4 Short captive Completed without error 00% 379 - # 5 Short offline Completed without error 00% 379 - # 6 Short offline Completed without error 00% 379 - # 7 Short offline Completed without error 00% 0 - Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
lionelhutz Posted February 2, 2012 Share Posted February 2, 2012 A reboot will do nothing and that will be tough to fully recover without some data loss. Every other disk must be healthy to properly recover a failed disk and you have 2 array disks which have issues. I'd first copy the data off disk2 if possible. You may want to power-down and re-seat all the HDD connections first and then do another parity check. Antec power supply or older/cheaper model by any chance? Peter Quote Link to comment
internetfriend Posted February 2, 2012 Author Share Posted February 2, 2012 Server is actually an HP Proliant ML110 I got for free. Unsure on PSU make, though I wanted to upgrade to a custom setup so I could add more disks (Server is full) I could copy the data off disk2 to a disk off the array, thats not a big deal. I'll do that tonight. What would you recommend I do with the drives? I guess sector reallocation is inevitable sometimes but am I seeing enough that the drives are in danger? Man, what a pain! hah. Quote Link to comment
lionelhutz Posted February 2, 2012 Share Posted February 2, 2012 I also have 7 disks which vary from about 10 months to 4 years old and have not yet had 1 pending or reallocated sector, so it's not normal to suddenly see that on 3 drives at one time. I would suspect it's a hardware issue. Possibly poor cooling during the parity check or a poor power which over stressed during the parity check. See if you can get the specs off the power supply or swap it for a better quality supply even if only for a test. You can either run another parity check and see if the numbers change or do a preclear on the suspect drives and see if the sectors clear-up. Of course, the preclear will break the array and wipe the data off the drives. Peter Quote Link to comment
Joe L. Posted February 2, 2012 Share Posted February 2, 2012 I also have 7 disks which vary from about 10 months to 4 years old and have not yet had 1 pending or reallocated sector, so it's not normal to suddenly see that on 3 drives at one time. I would suspect it's a hardware issue. Possibly poor cooling during the parity check or a poor power which over stressed during the parity check. See if you can get the specs off the power supply or swap it for a better quality supply even if only for a test. You can either run another parity check and see if the numbers change or do a preclear on the suspect drives and see if the sectors clear-up. Of course, the preclear will break the array and wipe the data off the drives. Peter I suspect it is more likely he never looked at the smart reports before, never precleared the disks, and it is only after loading unMENU and seeing the errors highlighted is he panicking. Re-seating connectors will not likely to help. In fact, if accidentally dislodged could make a manageable case go bad very quickly. Basically, you need to work from your worst disk to the least bad. The un-readable sectors could be unused space, or in critical files... no way to know. I would copy whatever is critical off of disk2 onto other disks in the array that currently have NO errors. Once copied off and safely on other disks, delete the files from disk2. If the un-readable sectors are in files, unRAID should re-construct the contents from parity and the other disks. You just need to play the odds that the un-readable sector on one of the other disks is not in the file with the bad sector on disk2. (Odds are in your favor) Then, copy them back. That should cause disk2 to re-allocate the bad sectors (if they were in files) Then, get a new set of smart reports, rinse, lather, and repeat with the files on the cache drive. Lastly, you'll need to re-check parity. That should fix the errors there by writing the un-readable sectors. Don't do that until you fix disk2 though, or it will not be able to reconstruct what is bad there. Lastly, think about RMAing disk2. And preclear all your drives before putting them in use. Joe L. Quote Link to comment
internetfriend Posted February 2, 2012 Author Share Posted February 2, 2012 Nah I'm not that bad haha, these drives were precleared 3 times in a row before being put into use along with the other drives in the array. SMART reports were golden pre&post preclear so all these pending sectors are new ones. They were going strong all the way up until the latest parity check, the last being 1/1/12. OK so next steps. [*]copy everything off disk2 to other disks in array (I assume its not a problem to copy if OFF the array too, just puts whatever disk I copy it too at risk if its not protected right?) [*]Copy data back on to disk 2 after sacrificing goat [*]Check to ensure pending sectors/unrecoverable sectors etc has not risen since previous report, else RMA [*]Do the same with above with cache drive [*]Once both cache and disk2 seem ok, recheck parity which should sort parity drive. If Allocated sectors jumps dramatically consider RMA Thanks for the help guys! I'll follow up after disk2 is moved, deleted and added back. Quote Link to comment
dgaschk Posted February 3, 2012 Share Posted February 3, 2012 I'd run pre-clear on disk2 once it's empty in order to force the sector to reallocate or clear. Then add disk2 back to the array and repeat this process on the other drives. Quote Link to comment
lionelhutz Posted February 3, 2012 Share Posted February 3, 2012 Just writing to the disks may not clear the bad sectors. You have to write to the bad sector. In theory, unRAID will reconstruct the sector and write it back to the drive when there is a read error due to a bad sector. So, in theory, just running parity checks would clear the bad pending sectors. I believe you need to do a correcting parity check for this to happen though and even then, I've read that it does this in theory but don't recall ever seeing proof that it actually happens. I'd still be very suspect of your power supply. I've read a few cases here where pending sectors were occurring on multiple drives and a new power supply cleared the problem. They were not re-allocated either, they were cleared from the SMART data indicating they began to work corrrectly and could be read again. It might no help, but you've got 3 drives out of 7 acting bad at the same time which just isn't expected. Peter Quote Link to comment
Joe L. Posted February 3, 2012 Share Posted February 3, 2012 Just writing to the disks may not clear the bad sectors. You have to write to the bad sector. In theory, unRAID will reconstruct the sector and write it back to the drive when there is a read error due to a bad sector. So, in theory, just running parity checks would clear the bad pending sectors. I believe you need to do a correcting parity check for this to happen though and even then, I've read that it does this in theory but don't recall ever seeing proof that it actually happens. I'd still be very suspect of your power supply. I've read a few cases here where pending sectors were occurring on multiple drives and a new power supply cleared the problem. They were not re-allocated either, they were cleared from the SMART data indicating they began to work corrrectly and could be read again. It might no help, but you've got 3 drives out of 7 acting bad at the same time which just isn't expected. Peter I can envision a situation where marginal power could cause writes to sectors to be marginal to where they sometimes cannot be read back. A proper "write" with good power would result in the original sector being used, and not a re-allocation. Quote Link to comment
internetfriend Posted February 3, 2012 Author Share Posted February 3, 2012 Sounds like I need to fasttrack my new build then. I'm currently still in the process of moving everything off disk 2. if I do a preclear its going to toast the parity validity(I think) - will unraid just ask if I want to rebuild from parity and should I do that, or should I just copy everything back over normally and then build a NEW parity? Quote Link to comment
Joe L. Posted February 4, 2012 Share Posted February 4, 2012 if I do a preclear its going to toast the parity validity(I think) preclear will only work on drives not assigned to the parity protected array, therefore, it does not affect parity. It does completely erase all data from the drive being cleared... Quote Link to comment
internetfriend Posted February 4, 2012 Author Share Posted February 4, 2012 Sorry, poor choice of words. If I do a preclear on the disk, its going involve me taking it out of thr array and deleting everything on it, meaning I'd have to recalulate parity. So I copied everything off, pending sectors went UP by about 5 sectors on disk2. I'm going to shut down the server, see if perhaps the PSU is a normal deal or if its some proprietary HP thing and try a swap. From there I'll do a preclear on disk2 and see how it fares. Anything bad about my nest steps? thanks all! Quote Link to comment
internetfriend Posted February 4, 2012 Author Share Posted February 4, 2012 turns out the PSU isnt proprietary but its designed in a way where it's upside down compared to standard PSUs, so any PSU worth its salt with a fan blows air against a piece of aluminum. I'm going to order my parts and do an upgrade before I finish this up and do preclears in the meantime, this thing is a P4 that really is on it's last legs. And the PSU is a noname clunker at 350Watts, on paper it should be fine but guessing it just cant cut it anymore due to time. Quote Link to comment
internetfriend Posted February 6, 2012 Author Share Posted February 6, 2012 Preclear finished on disk2, funny because after the zero-write all the sectors were rewritten, but by the end of the test it jumped to 373. Sounds like it's time to RMA? If I still have a SMART= PASS they'll take the disk regardless? I'm getting a new PSU today and plan to migrate everything into a new machine and do testing from a stronger platform. as long as I reassign drives in the correct order I shouldnt have any migration issues right? results below. Thanks! Disk Temperature: 30C, Elapsed Time: 38:22:31 ========================================================================1.13 == WDC WD20EARS-00MVWB0 WD-WCAZA1050439 == Disk /dev/sde has been successfully precleared == with a starting sector of 63 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sde /tmp/smart_finish_sde ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VA LUE Raw_Read_Error_Rate = 188 198 51 ok 15935 Reallocated_Sector_Ct = 199 200 140 ok 30 Temperature_Celsius = 120 122 0 ok 30 Reallocated_Event_Count = 171 200 0 ok 29 Current_Pending_Sector = 199 200 0 ok 373 No SMART attributes are FAILING_NOW 132 sectors were pending re-allocation before the start of the preclear. 211 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 373 sectors are pending re-allocation at the end of the preclear, a change of 241 in the number of sectors pending re-allocation. 0 sectors had been re-allocated before the start of the preclear. 30 sectors are re-allocated at the end of the preclear, a change of 30 in the number of sectors re-allocated. Quote Link to comment
lionelhutz Posted February 7, 2012 Share Posted February 7, 2012 Any power supply will fit. Just stick it in the way necessary and use the 2 or 3 screw holes that line up. You should be testing the drives after getting them on a better power supply. Peter Quote Link to comment
mbryanr Posted February 7, 2012 Share Posted February 7, 2012 Sounds like it's time to RMA? If I still have a SMART= PASS they'll take the disk regardless? Yes. They know as well as us that SMART = PASS means nothing. Quote Link to comment
internetfriend Posted February 7, 2012 Author Share Posted February 7, 2012 New server is assembled, upgraded to a light duty AMD Athlon 4850e processor and mobo combo in an old CMStacker, but more importantly Corsair HX650 PSU. I had a Corsair 620 watt PSU in my desktop I wanted to test with but it has 3 (!) rails so while its good for my 3 disk 1 cdrom desktop, not so much for a big server. I have submitted an RMA request for Disk2 since its a big offender on the reallocated sectors. Once I get the new disk, I'll preclear, load it into the array and copy over data into it. Then I'll tackle the parity drive and not too important cache drive. Lionelhutz, the issue is the fitment is upside down in the ML110, so the psu was unable to intake since the inlet was flush against the top of the case. Doesn't matter now though, anyone want a used HP ML110 that can handle loads of up to 6 disks but not 7? Quote Link to comment
internetfriend Posted February 14, 2012 Author Share Posted February 14, 2012 Update: Disk 2 was RMAed, new disk was precleared and passed the test so I added it to the array. There was a data rebuild to it, but since the previous disk was empty my files arent on there and potentially corupted. Right now I'm copying all the files back onto disk 2. I still have the issue of the parity drive throwing errors, my pending sector count actually went up by 1 when I did the rebuild. How do I go about forcing it to refresh? Will a simple parity check do this or would I need to pull it out of the array, preclear it and then introduce it back if it fixes itself? Quote Link to comment
dgaschk Posted February 14, 2012 Share Posted February 14, 2012 A parity check will not effect pending sectors because it only reads. A pre-clear will write every sector and cause pending sectors to be reallocated or successfully written. Quote Link to comment
internetfriend Posted February 15, 2012 Author Share Posted February 15, 2012 So after copying about 1.5TB back onto the drive, the parity drive as used heavily and the pending sectors were handled. Here is a revised SMART. 87 errors are reported on the unraid page. Should I still RMA this drive? If not, should I just keep doing parity checks until errors are zero? thanks! smartctl -a -d ata /dev/sdd (parity) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA1020507 Firmware Version: 51.0AB51 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Feb 15 01:23:53 2012 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40500) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 165 162 021 Pre-fail Always - 6733 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 632 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 9110 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 90 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 28 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2430 194 Temperature_Celsius 0x0022 122 115 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 6 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 23 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 9082 3511387976 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
lionelhutz Posted February 15, 2012 Share Posted February 15, 2012 If unRAID gets a read error during a parity check then it's supposed to recontruct that data and write back to the drive causing the sector to be reallocated. If this happens is another story - it won't happen during a no correcting parity check though. Peter Quote Link to comment
fitbrit Posted February 15, 2012 Share Posted February 15, 2012 Antec power supply or older/cheaper model by any chance? Peter Hi Peter, May I ask why you specifically asked about the Antec PSU. My current server PSU is an Antec 650W, and I've had a few problems with multiple simultaneous failed disks and other errors. I'm switching the whole server to new hardware shortly, including a better PSU, but I just wondered whether you'd encountered inherent problems with the Antecs. Sorry for threadjack, OP. Quote Link to comment
lionelhutz Posted February 15, 2012 Share Posted February 15, 2012 Antec power supply or older/cheaper model by any chance? Peter Hi Peter, May I ask why you specifically asked about the Antec PSU. My current server PSU is an Antec 650W, and I've had a few problems with multiple simultaneous failed disks and other errors. I'm switching the whole server to new hardware shortly, including a better PSU, but I just wondered whether you'd encountered inherent problems with the Antecs. Sorry for threadjack, OP. There have been 2 or 3 others who have reported odd disk issues here that went away after replacing their Antec supply. Peter Quote Link to comment
internetfriend Posted February 15, 2012 Author Share Posted February 15, 2012 So I went hardcore and threw the parity drive into a preclear. Even though the pending sectors took care of themselves I'm not too happy with having any busted sectors in the first place. If it goes up after the preclear I'll RMA. 2 disks down, 1 to go, no data lost (yet!) Quote Link to comment
fitbrit Posted February 15, 2012 Share Posted February 15, 2012 Antec power supply or older/cheaper model by any chance? Peter Hi Peter, May I ask why you specifically asked about the Antec PSU. My current server PSU is an Antec 650W, and I've had a few problems with multiple simultaneous failed disks and other errors. I'm switching the whole server to new hardware shortly, including a better PSU, but I just wondered whether you'd encountered inherent problems with the Antecs. Sorry for threadjack, OP. There have been 2 or 3 others who have reported odd disk issues here that went away after replacing their Antec supply. Peter Thanks for the information. I'll relegate my Antec to HTPC use eventually, where no real data is actually stored. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.