jbuszkie Posted April 28, 2015 Author Share Posted April 28, 2015 Just a caution before you throw out the drive too quick, pending sectors may or may not be bad. The numbers you showed do indicate the drive needs some repair work, but do not mean the drive is bad, until after testing it. Technically, 'pending' means 'pending full testing', which happens once you tell the drive you don't care about the current data stored there. You do that by writing to it, so that the drive now knows you are OK'ing overwriting the current data. It then can thoroughly test the sector to see how safely it saves test patterns, and if good, saves the data you requested to be saved there and unmarks it as pending. If it decides the sector cannot be trusted, then it is remapped to a good spare sector. A sector is marked as a 'current pending sector' when it fails to be read correctly, even after applying the error correction info. That can happen either because of weak or damaged magnetic media under the sector, or because of electrical activity (spikes and outages while writing) that have scrambled too many bits in the sector. I tend to call the latter ones 'soft errors', because the physical sector is completely fine, and testing will prove that. If you have had a power outage or sparking or other serious power issues, then the 64 sectors may just be soft errors, and the drive be perfectly fine, once tested and rebuilt. So how many of you would risk running without parity for a couple days while I re-preclear the parity drive to check for more errors? I have 3 options.. 1. Do nothing and see if it clears itself out. 2. Take the parity out and run preclear on it. and run unprotected for like 36h (plus rebuild time if the disk checks out) 3. Take my new (green) 4T disk and let it rebuild parity on that. when that's done then run preclear on the old parity. If all good swap parity out again with the old disk or replace with the returned drive when it eventually arrives. This means my new drive (slated for something else) will have to not go into its new home for a week or so.. What would you guys do? Quote Link to comment
RobJ Posted April 28, 2015 Share Posted April 28, 2015 So how many of you would risk running without parity for a couple days while I re-preclear the parity drive to check for more errors? I have 3 options.. 1. Do nothing and see if it clears itself out. 2. Take the parity out and run preclear on it. and run unprotected for like 36h (plus rebuild time if the disk checks out) 3. Take my new (green) 4T disk and let it rebuild parity on that. when that's done then run preclear on the old parity. If all good swap parity out again with the old disk or replace with the returned drive when it eventually arrives. This means my new drive (slated for something else) will have to not go into its new home for a week or so.. What would you guys do? Just my opinion, but #3 is clearly the best. #1 is not an option, sorry. With those pending sectors, it probably cannot be used to rebuild any other drive that failed, so you don't really have full parity protection now. #3 restores your parity protection the quickest. Once finished, the pressure is off, and you can do whatever you want when you want. But yes, it means a delay in your intended use for the 4TB drive. Quote Link to comment
jbuszkie Posted April 28, 2015 Author Share Posted April 28, 2015 Just my opinion, but #3 is clearly the best. #1 is not an option, sorry. With those pending sectors, it probably cannot be used to rebuild any other drive that failed, so you don't really have full parity protection now. #3 restores your parity protection the quickest. Once finished, the pressure is off, and you can do whatever you want when you want. But yes, it means a delay in your intended use for the 4TB drive. Yeah.. That's what I figured. Doing a parity check now. That will be done tomorrow. Then another 26ish hours for the rebuild. Then another 36ish hours for the pre-clear... Yeah.. It's gonna be a while! Thanks for the opinion! Jim Quote Link to comment
RobJ Posted April 28, 2015 Share Posted April 28, 2015 Since you are about to trash that parity info, I can't think of any reason to do a parity check. I'd stop it, replace the drive, and start the rebuild now, save some hours. Quote Link to comment
jbuszkie Posted April 28, 2015 Author Share Posted April 28, 2015 Since you are about to trash that parity info, I can't think of any reason to do a parity check. I'd stop it, replace the drive, and start the rebuild now, save some hours. Yeah.. makes sense.. But I'll probably wait till the rebuild finished before I start the preclear on the old parity drive. Quote Link to comment
SSD Posted April 29, 2015 Share Posted April 29, 2015 ... Here is some commentary on your drive issues: » reported_uncorrect=31 Unsure - not a good sign but same number as ata_error_count below, I am assuming they are related » high_fly_writes=112 Not a big deal » current_pending_sector=64 Not good. See below. » offline_uncorrectable=64 Related to pending sectors. » ata_error_count=31 Often related to a cabling problem on the drive. Preclearing worth a try, but wouldn't bet the ranch on it. Also check the cabling to the drive (cabling will not cause pending sectors but may be causing the ata_errors and reported_uncorrect). Note that most attributes will not reset back to zero. The cure may only be evident if the values do not increase. Pending sectors are the exception. The values DO reduce, and you are looking for that value to go back to zero. Occasionally pending sectors vanish after a parity check, but more often they cause reallocated sectors. If they clear and don't cause reallocations, you're likely ok (I've seen this happen quite a few times, and I chalk it up to some firmware bug). Drives have "spare" sectors and are able to reallocate bad sectors to spare sectors, my experience on the forums is that once sectors start to fail, they continue to fail, and unless the values stabilize such that you can run 3 consecutive parity checks and not have the values change, the drive is on the road to failure and should be replaced. I am curious if unRAID reported read errors on the drive after a parity check. If so, these pending sectors should have cleared themselves. unRAID is supposed to perform a write to a disk reporting a read error with the correct values for the sector (it can figure this out by looking at all of the other drives to compute the sector). If the sector is pending, this should force the drive to remap the sector. But sometimes these pending sectors show up and cause no read errors. I do not understand this phenomenon, but maybe the preclear will cause the drive to resolve these. Pending sectors that result in OS level errors are dangerous as they can affect the ability to use that drive for rebuilding a different failed drive in the future. Quote Link to comment
jbuszkie Posted April 29, 2015 Author Share Posted April 29, 2015 I killed the parity check before it finished. I'm about to preclear the drive and will report back it's status.. Jim Quote Link to comment
Fireball3 Posted April 29, 2015 Share Posted April 29, 2015 I do not understand this phenomenon, but maybe the preclear will cause the drive to resolve these. If you think of the platter of a hard disk, it stores the information by magnetizing the surface into different directions. That means it is not simply 1s and 0s with sharp edges in the signal. As with every physical measuring (reading), there it is a more or less good curve that has to be evaluated by the controller logic. If the signal read is of poor quality due to degradation of the surface or ambient magnetic influence or other factors, there might be a bad sector read. If this sector is "refreshed" via a new write cycle, it is probably readable again. Quote Link to comment
manny Posted April 29, 2015 Share Posted April 29, 2015 I just completed the second preclear cycle and have Current_Pending_Sector = 2 and Raw_Read_Error_Rate=8, as per the explanation from RobJ above i think this should be ok right? I am planning to replace this drive as the Parity drive. Should I run one more cycle just to be on the safe side? The "Raw_Read_Error_Rate=8" is not a problem, because the raw value for that attribute is meaningless. What's important is the VALUE for it, 200, which is perfect. What IS a problem is the "Current_Pending_Sector = 2". As was stated above, that HAS to be zero. If this SMART report occurred right after a Preclear then that is a bad sign. Preclear it again, and if Current_Pending_Sector stays non-zero, I would not use that drive, it can't be trusted. Just completed the 3rd round of preclear, Current_Pending_Sector has become zero :-) Shall iI go ahead and use this drive? SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 8 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 1 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 100 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 7 194 Temperature_Celsius 0x0022 119 118 000 Old_age Always - 31 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. This was the final output from preclear ================================================================== 1.15 = unRAID server Pre-Clear disk /dev/sdf = cycle 1 of 1, partition start on sector 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 31C, Elapsed Time: 32:43:31 ========================================================================1.15 == WDCWD30EFRX-68EUZN0 == Disk /dev/sdf has been successfully precleared == with a starting sector of 1 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdf /tmp/smart_finish_sdf ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Temperature_Celsius = 119 120 0 ok 31 No SMART attributes are FAILING_NOW 2 sectors were pending re-allocation before the start of the preclear. 2 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, a change of -2 in the number of sectors pending re-allocation. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. Quote Link to comment
RobJ Posted April 29, 2015 Share Posted April 29, 2015 Just completed the 3rd round of preclear, Current_Pending_Sector has become zero :-) Shall iI go ahead and use this drive? ================================================================== 1.15 = unRAID server Pre-Clear disk /dev/sdf ... == WDCWD30EFRX-68EUZN0 == Disk /dev/sdf has been successfully precleared == with a starting sector of 1 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdf /tmp/smart_finish_sdf ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Temperature_Celsius = 119 120 0 ok 31 No SMART attributes are FAILING_NOW 2 sectors were pending re-allocation before the start of the preclear. 2 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, a change of -2 in the number of sectors pending re-allocation. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. If this were the result after your very first Preclear, then the drive is probably fine, but you would want to monitor it for a year. But above, it looks like the pending sectors showed up AFTER a Preclear, and that should never happen. While the drive looks fine at the moment, I'm not ready to trust it, and I would recommend Preclearing it twice more, only trusting it if it stays perfect through both. Seems too coincidental, but is there any chance the drive previously Precleared perfectly, then just after that finished, you detected a power outage or power spike or nearby lightning strike? Quote Link to comment
manny Posted April 29, 2015 Share Posted April 29, 2015 Thanks RobJ, yes during the second pre clear which reported these errors I did have a power outage, though the server is connected to a UPS, for a second or two I felt the power went off. Basically when the power cut happens (which is often) the generator kicks in 2-3 mins, during this time I have UPS to supply power. Also we are having thunderstorms lately ... Quote Link to comment
RobJ Posted April 30, 2015 Share Posted April 30, 2015 Thanks RobJ, yes during the second pre clear which reported these errors I did have a power outage, though the server is connected to a UPS, for a second or two I felt the power went off. Basically when the power cut happens (which is often) the generator kicks in 2-3 mins, during this time I have UPS to supply power. Also we are having thunderstorms lately ... In a way, that's good, and we can hope that that explains the surprising pending sectors. Another clean Preclear should prove it. Quote Link to comment
jbuszkie Posted April 30, 2015 Author Share Posted April 30, 2015 I'm getting errors on the old parity drive... Apr 30 09:06:12 Tower kernel: ata1.00: error: { UNC } (Errors) Apr 30 09:06:12 Tower kernel: ata1.00: configured for UDMA/133 (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] Unhandled sense code (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] CDB: (Drive related) Apr 30 09:06:12 Tower kernel: end_request: I/O error, dev sda, sector 2551149104 (Errors) Apr 30 09:06:12 Tower kernel: Buffer I/O error on device sda, logical block 318893638 (Errors) Apr 30 09:06:12 Tower kernel: ata1: EH complete (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Apr 30 09:23:08 Tower kernel: ata1.00: irq_stat 0x40000001 (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: failed command: READ DMA EXT (Minor Issues) Apr 30 09:23:08 Tower kernel: ata1.00: cmd 25/00:00:00:b8:1d/00:01:9f:00:00/e0 tag 0 dma 131072 in (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: status: { DRDY ERR } (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: error: { UNC } (Errors) Apr 30 09:23:08 Tower kernel: ata1.00: configured for UDMA/133 (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] Unhandled sense code (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] CDB: (Drive related) Apr 30 09:23:08 Tower kernel: end_request: I/O error, dev sda, sector 2669525192 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690649 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690650 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690651 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690652 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690653 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690654 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690655 (Errors) Apr 30 09:23:08 Tower kernel: ata1: EH complete (Drive related) It's in the post read part of the preclear.. Maybe by this evening it will be done. but probably tomorrow morning. Quote Link to comment
RobJ Posted May 1, 2015 Share Posted May 1, 2015 I'm getting errors on the old parity drive... Apr 30 09:06:12 Tower kernel: ata1.00: error: { UNC } (Errors) Apr 30 09:06:12 Tower kernel: ata1.00: configured for UDMA/133 (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] Unhandled sense code (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:06:12 Tower kernel: sd 1:0:0:0: [sda] CDB: (Drive related) Apr 30 09:06:12 Tower kernel: end_request: I/O error, dev sda, sector 2551149104 (Errors) Apr 30 09:06:12 Tower kernel: Buffer I/O error on device sda, logical block 318893638 (Errors) Apr 30 09:06:12 Tower kernel: ata1: EH complete (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Apr 30 09:23:08 Tower kernel: ata1.00: irq_stat 0x40000001 (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: failed command: READ DMA EXT (Minor Issues) Apr 30 09:23:08 Tower kernel: ata1.00: cmd 25/00:00:00:b8:1d/00:01:9f:00:00/e0 tag 0 dma 131072 in (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: status: { DRDY ERR } (Drive related) Apr 30 09:23:08 Tower kernel: ata1.00: error: { UNC } (Errors) Apr 30 09:23:08 Tower kernel: ata1.00: configured for UDMA/133 (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] Unhandled sense code (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] (Drive related) Apr 30 09:23:08 Tower kernel: sd 1:0:0:0: [sda] CDB: (Drive related) Apr 30 09:23:08 Tower kernel: end_request: I/O error, dev sda, sector 2669525192 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690649 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690650 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690651 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690652 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690653 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690654 (Errors) Apr 30 09:23:08 Tower kernel: Buffer I/O error on device sda, logical block 333690655 (Errors) Apr 30 09:23:08 Tower kernel: ata1: EH complete (Drive related) It's in the post read part of the preclear.. Maybe by this evening it will be done. but probably tomorrow morning. The UNC flags indicate UNCorrectable sectors. This was only a selection of the associated error lines. There should also be exception lines indicating 'media error'. You've got more bad sectors. Sorry. Quote Link to comment
jbuszkie Posted May 1, 2015 Author Share Posted May 1, 2015 There were more errors like that.. Here are the results of the preclear.. ========================================================================1.15 == ST4000VN000-1H4168 Z3013V86 == Disk /dev/sda has been successfully precleared == with a starting sector of 1 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sda /tmp/smart_finish_sda ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 117 112 6 ok 161315704 Spin_Retry_Count = 100 100 97 near_thresh 0 Unknown_Attribute = 100 100 99 near_thresh 0 Reported_Uncorrect = 18 69 0 near_thresh 82 High_Fly_Writes = 1 1 0 near_thresh 129 Airflow_Temperature_Cel = 74 78 45 ok 26 Temperature_Celsius = 26 22 0 ok 26 No SMART attributes are FAILING_NOW 64 sectors were pending re-allocation before the start of the preclear. 56 sectors were pending re-allocation after pre-read in cycle 1 of 1. 48 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 32 sectors are pending re-allocation at the end of the preclear, a change of -32 in the number of sectors pending re-allocation. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. 1/2 cleared themselves out. Attached is the full preclear report with the smart results. I wish it was cut and dry.. none of the pending got reallocated. Should I try to RMA the drive anyway? should I run another preclear? Preclear_SDA_results.txt Quote Link to comment
itimpi Posted May 1, 2015 Share Posted May 1, 2015 If the drive is eligible for RMA then I would suggest that you go ahead and do it. Ideally you want any Pending sectors to go immediately to zero on a pre-clear run. The fact they do not suggests some sort of lingering problem with the drive. Quote Link to comment
RobJ Posted May 1, 2015 Share Posted May 1, 2015 I completely concur. You had pending sectors drop on both the preread and the post read (which should never happen), and then not be cleared to zero on the zeroing phase! Something is really wrong with that drive. Another Preclear, even if perfect, would not restore my confidence in it. The only other thing I could suggest is to try the manufacturers test tool on the drive, more from curiosity than anything else. Quote Link to comment
jbuszkie Posted May 1, 2015 Author Share Posted May 1, 2015 That's what I figured... I assume there's enough ammunition to merit a RMA? Sigh! looks like my other drive will have to be a parity drive for a couple weeks! grr... Thanks for the input. Jim Quote Link to comment
FreeMan Posted May 19, 2015 Share Posted May 19, 2015 Hey all, been a while since I've stopped by. unRaid is just that good! This isn't even an unRaid issue, but I figure the folks here are pretty helpful, so I thought I'd post this: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 001 001 051 NOW 68478 2 Throughput_Performance -OS--K 252 252 000 - 0 3 Spin_Up_Time PO---K 083 074 025 - 5229 4 Start_Stop_Count -O--CK 100 100 000 - 53 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 7 Seek_Error_Rate -OSR-K 252 252 051 - 0 8 Seek_Time_Performance --S--K 252 252 015 - 0 9 Power_On_Hours -O--CK 100 100 000 - 15210 10 Spin_Retry_Count -O--CK 252 252 051 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 8 12 Power_Cycle_Count -O--CK 100 100 000 - 57 191 G-Sense_Error_Rate -O---K 099 099 000 - 10507 192 Power-Off_Retract_Count -O---K 252 252 000 - 0 194 Temperature_Celsius -O---- 063 058 000 - 37 (Min/Max 16/43) 195 Hardware_ECC_Recovered -O-RCK 100 100 000 - 0 196 Reallocated_Event_Count -O--CK 252 252 000 - 0 197 Current_Pending_Sector -O--CK 097 097 000 - 67 198 Offline_Uncorrectable ----CK 252 252 000 - 0 199 UDMA_CRC_Error_Count -OS-CK 200 200 000 - 0 200 Multi_Zone_Error_Rate -O-R-K 100 100 000 - 3019 223 Load_Retry_Count -O--CK 100 100 000 - 8 225 Load_Cycle_Count -O--CK 100 100 000 - 68 Device Error Count: 58753 (device log contains only the most recent 8 errors) That's the SMART output of my Windows 7 box system drive. I'm no expert, but I'm thinking that's looking pretty bad at this point. Thoughts? TIA! Quote Link to comment
RobJ Posted May 19, 2015 Share Posted May 19, 2015 Hey all, been a while since I've stopped by. unRaid is just that good! This isn't even an unRaid issue, but I figure the folks here are pretty helpful, so I thought I'd post this: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 001 001 051 NOW 68478 2 Throughput_Performance -OS--K 252 252 000 - 0 3 Spin_Up_Time PO---K 083 074 025 - 5229 4 Start_Stop_Count -O--CK 100 100 000 - 53 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 7 Seek_Error_Rate -OSR-K 252 252 051 - 0 8 Seek_Time_Performance --S--K 252 252 015 - 0 9 Power_On_Hours -O--CK 100 100 000 - 15210 10 Spin_Retry_Count -O--CK 252 252 051 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 8 12 Power_Cycle_Count -O--CK 100 100 000 - 57 191 G-Sense_Error_Rate -O---K 099 099 000 - 10507 192 Power-Off_Retract_Count -O---K 252 252 000 - 0 194 Temperature_Celsius -O---- 063 058 000 - 37 (Min/Max 16/43) 195 Hardware_ECC_Recovered -O-RCK 100 100 000 - 0 196 Reallocated_Event_Count -O--CK 252 252 000 - 0 197 Current_Pending_Sector -O--CK 097 097 000 - 67 198 Offline_Uncorrectable ----CK 252 252 000 - 0 199 UDMA_CRC_Error_Count -OS-CK 200 200 000 - 0 200 Multi_Zone_Error_Rate -O-R-K 100 100 000 - 3019 223 Load_Retry_Count -O--CK 100 100 000 - 8 225 Load_Cycle_Count -O--CK 100 100 000 - 68 Device Error Count: 58753 (device log contains only the most recent 8 errors) That's the SMART output of my Windows 7 box system drive. I'm no expert, but I'm thinking that's looking pretty bad at this point. Thoughts? Raw_Read_Error_Rate has bottomed out, far below its Threshold. If you can read the drive at all, salvage everything important off it ASAP. Because it's FAILING SMART, it should be an easy RMA, if that's possible, and in warranty. Just curious, what drive model is that? I've never seen an ATA error listing of 8, it's always the last 5 errors, not 8. Plus, it uses 252 for as-yet-unused values, somewhat unusual. Quote Link to comment
FreeMan Posted May 19, 2015 Share Posted May 19, 2015 Just curious, what drive model is that? I've never seen an ATA error listing of 8, it's always the last 5 errors, not 8. Plus, it uses 252 for as-yet-unused values, somewhat unusual. === START OF INFORMATION SECTION === Model Family: Seagate Samsung Spinpoint F4 Device Model: ST320DM001 HD322GJ Serial Number: S2BJJ90C816980 LU WWN Device Id: 5 0004cf 208350c31 Firmware Version: 1AR10001 User Capacity: 320,072,933,376 bytes [320 GB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Mon May 18 19:42:23 2015 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Disabled APM feature is: Disabled Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, frozen [sEC2] Wt Cache Reorder: Enabled It's been spinning for more than 2 years and was a cheapie that came with the 'bare bones' kit from Tiger Direct. Time for an upgrade so I'm throwing a shiny new SSD in there. Quote Link to comment
summeranne Posted May 19, 2015 Share Posted May 19, 2015 Is this worrisome? In the final screen it says: 1 sector was pending re-allocation after post-read in cycle 1 of 3. Subsequent cycles it is 0. === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (51420) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 514) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 6 3 Spin_Up_Time 0x0027 182 181 021 Pre-fail Always - 7891 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 34 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 282 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 13 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 71 194 Temperature_Celsius 0x0022 110 107 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
itimpi Posted May 19, 2015 Share Posted May 19, 2015 Is this worrisome? In the final screen it says: 1 sector was pending re-allocation after post-read in cycle 1 of 3. Subsequent cycles it is 0. === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (51420) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 514) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 6 3 Spin_Up_Time 0x0027 182 181 021 Pre-fail Always - 7891 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 34 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 282 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 13 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 71 194 Temperature_Celsius 0x0022 110 107 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. I would expect that the drive is OK as the pending sectors went back to 0 and there are no reallocated sectors. However probably a good idea to keep an eye on it just to check it stays there. Quote Link to comment
Trap Posted May 20, 2015 Share Posted May 20, 2015 Hi all. I'm a new member here and just getting my first unRAID server set up. I've been researching unRAID for years now, but just finally took the plunge to gather all my components and set everything up. Anyways, I ran the preclear script on my first three drives. After talking with some other people, I think I know what to look for but thought I would run it by the members here to see what you thought. I know some say any sectors pending reallocation is bad, but is 2 in one instance cause enough not to use the drive? Any thoughts are appreciated! See my complete results below: ========================================================================1.15 == invoked as: ./preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -c 3 /dev/sdd == SAMSUNGHD203WI S1UYJ1RZ525375 == Disk /dev/sdd has been successfully precleared == with a starting sector of 64 == Ran 3 cycles == == Using :Read block size = 65536 Bytes == Last Cycle's Pre Read Time : 8:54:34 (62 MB/s) == Last Cycle's Zeroing time : 6:39:54 (83 MB/s) == Last Cycle's Post Read Time : 15:28:01 (35 MB/s) == Last Cycle's Total Time : 22:09:00 == == Total Elapsed Time 75:19:39 == == Disk Start Temperature: 21C == == Current Disk Temperature: 25C, == ============================================================================ No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 3. 0 sectors were pending re-allocation after post-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 2 of 3. 2 sectors were pending re-allocation after post-read in cycle 2 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 3 of 3. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================ ============================================================================ == == S.M.A.R.T Initial Report for /dev/sdd == Disk: /dev/sdd smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.4-unRAID] (local build) Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F3 EG Device Model: SAMSUNG HD203WI Serial Number: S1UYJ1RZ525375 LU WWN Device Id: 5 0024e9 003991ccd Firmware Version: 1AN10003 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Sat May 16 12:29:24 2015 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (25440) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 424) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 569 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 060 059 025 Pre-fail Always - 12347 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3332 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 13235 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1503 191 G-Sense_Error_Rate 0x0022 099 099 000 Old_age Always - 13921 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 051 000 Old_age Always - 21 (Min/Max 13/49) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 758 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 094 094 000 Old_age Always - 65174 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 12143 - # 2 Short offline Completed without error 00% 10118 - SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Completed [00% left] (0-65535) 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. == ============================================================================ ============================================================================ == == S.M.A.R.T Final Report for /dev/sdd == Disk: /dev/sdd smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.4-unRAID] (local build) Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F3 EG Device Model: SAMSUNG HD203WI Serial Number: S1UYJ1RZ525375 LU WWN Device Id: 5 0024e9 003991ccd Firmware Version: 1AN10003 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Tue May 19 15:49:03 2015 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (25440) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 424) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 890 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 060 059 025 Pre-fail Always - 12347 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3332 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 13310 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1503 191 G-Sense_Error_Rate 0x0022 099 099 000 Old_age Always - 13921 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 051 000 Old_age Always - 25 (Min/Max 13/49) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 760 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 094 094 000 Old_age Always - 65174 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 12143 - # 2 Short offline Completed without error 00% 10118 - SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Completed [00% left] (0-65535) 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. == ============================================================================ Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.