garycase Posted January 8, 2014 Share Posted January 8, 2014 They all look great. Enjoy your server. Is it normal for the Post Read on all his drives to be nearly half the Preread and Zeroing speeds? Yes. In rough terms, the pre-read takes 1/4th of the time for a cycle, the actual pre-clear takes 1/4th of the time; and the post-read takes 1/2 of the overall time. Quote Link to comment
Superorb Posted January 8, 2014 Share Posted January 8, 2014 Gotcha, thanks guys. Quote Link to comment
Joe L. Posted January 9, 2014 Share Posted January 9, 2014 They all look great. Enjoy your server. Is it normal for the Post Read on all his drives to be nearly half the Preread and Zeroing speeds? Yes, perfectly normal .... it is in the post-read where the verification of written zeros is performed. That takes the additional time. Quote Link to comment
Superorb Posted January 9, 2014 Share Posted January 9, 2014 Yes, perfectly normal .... it is in the post-read where the verification of written zeros is performed. That takes the additional time. Thanks for confirming. Quote Link to comment
Fireball3 Posted January 21, 2014 Share Posted January 21, 2014 RMA'd a Seagate ES2 drive and got back a repaired one. Ran preclear 3 times and ... see the results. What about those failures in the SMART log? preclear_reports.zip Quote Link to comment
RobJ Posted January 23, 2014 Share Posted January 23, 2014 RMA'd a Seagate ES2 drive and got back a repaired one. Ran preclear 3 times and ... see the results. What about those failures in the SMART log? My guess is you received a drive with some problems, that had been 'repaired' by clearing the SMART tables, masking the problems, so the first test runs re-exposed the problems. The first 2 Preclears seemed to have dealt with most of them, and the third looks much better, but I'm not confident you've uncovered ALL of the marginal sectors yet. I'd run 2 or 3 more Preclears, and I'd only feel more confident if I had at least 2 passes with NO further changes, no more Current Pending sectors at any phase, no additional Uncorrectables, no additional Reallocated sectors. If interested and have time, you might also try a full badblocks run with the -w option. The other possibility is that it's a bad drive, and it's going to continue getting worse. I suspect that after another Preclear, you will either know it's bad or may decide that you aren't willing to trust the drive, even if it starts behaving, has clean reports. Quote Link to comment
FreeMan Posted January 23, 2014 Share Posted January 23, 2014 RMA'd a Seagate ES2 drive and got back a repaired one. Ran preclear 3 times and ... see the results. What about those failures in the SMART log? My guess is you received a drive with some problems, that had been 'repaired' by clearing the SMART tables, masking the problems, so the first test runs re-exposed the problems. The first 2 Preclears seemed to have dealt with most of them, and the third looks much better, but I'm not confident you've uncovered ALL of the marginal sectors yet. I'd run 2 or 3 more Preclears, and I'd only feel more confident if I had at least 2 passes with NO further changes, no more Current Pending sectors at any phase, no additional Uncorrectables, no additional Reallocated sectors. If interested and have time, you might also try a full badblocks run with the -w option. The other possibility is that it's a bad drive, and it's going to continue getting worse. I suspect that after another Preclear, you will either know it's bad or may decide that you aren't willing to trust the drive, even if it starts behaving, has clean reports. I would suggest keeping all the pre-clear SMART results, just in case Seagate questions why you want to return a drive they just shipped you. From what I've read around here, they don't normally question it, but returning a just shipped drive might raise an eyebrow somewhere. What's a few K of disk space for a few weeks, just to be on the safe side... Quote Link to comment
Fireball3 Posted January 24, 2014 Share Posted January 24, 2014 Thank you for your feedback. Of course I will keep the logs, they are saved on the unRAID website. In fact, the bad blocks don't give me headaches. I'm more worried about this log entries: Error 418 occurred at disk power-on lifetime: 55 hours (2 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 1d+02:05:45.236 READ FPDMA QUEUED 27 00 00 00 00 00 e0 00 1d+02:05:45.209 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 1d+02:05:45.207 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+02:05:45.194 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+02:05:45.166 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] Meanwhile I tested a second drive - the same model - and it logs the same errors as this one. I've opened a ticket with the seagate support. I'm curious about their answers. Quote Link to comment
Fireball3 Posted January 24, 2014 Share Posted January 24, 2014 LOL Just received word from seagate. Claudia (that's her name) tells me that she couldn't open the file attached to the ticket. As she doesn't know what tool I used she makes clear that only results from seatools are valid to her (seagate). If seatools don't indicate any problem, the drive is OK. Wow, that's the skill level that we're faced with... (FYI, I attached the preclear_finish...) Then she tells me if I refer to bad blocks, there would be no problem because they can be "replaced and repaired" in large quantities. Cool eh!? My drives CAN DO THAT! In my answer I copied the content of the preclear file into the mail but I guess there won't be an enlighting answer from seagate on this topic... A perfect start into weekend. Quote Link to comment
RobJ Posted January 24, 2014 Share Posted January 24, 2014 In fact, the bad blocks don't give me headaches. I'm more worried about this log entries: Error 418 occurred at disk power-on lifetime: 55 hours (2 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 1d+02:05:45.236 READ FPDMA QUEUED 27 00 00 00 00 00 e0 00 1d+02:05:45.209 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 1d+02:05:45.207 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+02:05:45.194 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+02:05:45.166 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] Meanwhile I tested a second drive - the same model - and it logs the same errors as this one. Those ARE the bad blocks, in more detail and only the last 5. UNC is short for UNCorrectable, so "Error: UNC at LBA = 0x0fffffff = 268435455" roughly means "bad block at 268435455". You probably got a typical response, from any manufacturing rep. But I would expect SeaTools to provide a similar report. Quote Link to comment
UhClem Posted January 24, 2014 Share Posted January 24, 2014 In fact, the bad blocks don't give me headaches. I'm more worried about this log entries: Error 418 occurred at disk power-on lifetime: 55 hours (2 days + 7 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 1d+02:05:45.236 READ FPDMA QUEUED 27 00 00 00 00 00 e0 00 1d+02:05:45.209 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 1d+02:05:45.207 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 1d+02:05:45.194 SET FEATURES [set transfer mode] 27 00 00 00 00 00 e0 00 1d+02:05:45.166 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] Meanwhile I tested a second drive - the same model - and it logs the same errors as this one. Those ARE the bad blocks, in more detail and only the last 5. UNC is short for UNCorrectable, so "Error: UNC at LBA = 0x0fffffff = 268435455" roughly means "bad block at 268435455". Not this time ... Look at the fuller "picture"-- first, always be a little suspicious of numbers that are "all ones" (ie, 0x0fffffff); then look carefully at the preceding commands in the error log for the conclusive clue. "The devil is in the details." --UhClem Quote Link to comment
RobJ Posted January 24, 2014 Share Posted January 24, 2014 Those ARE the bad blocks, in more detail and only the last 5. UNC is short for UNCorrectable, so "Error: UNC at LBA = 0x0fffffff = 268435455" roughly means "bad block at 268435455". Not this time ... Look at the fuller "picture"-- first, always be a little suspicious of numbers that are "all ones" (ie, 0x0fffffff); then look carefully at the preceding commands in the error log for the conclusive clue. --UhClem Oops, you are absolutely right. I didn't recognize that number in decimal. Not sure what to make of it though, need a lot more context, more of the code path to here. If you check his last SMART report, the 5 last errors show alternating like the above, then simple reads, then repeat. If I had to guess (and that is all I can do here), I would say there is a firmware issue. The LBA, even if it is 0x0fffffff, appears to be valid, about at the 137GB point. But in this small context, is probably a mask, and appears to be a part of an internal reset, possibly an internal crash? If my 'guess' is correct, then you cannot trust this drive. UhClem, I'd like to hear your opinion. Quote Link to comment
manny Posted January 25, 2014 Share Posted January 25, 2014 Hi All, Having problem with my preclear please help. I already have 2 data drives and 1 parity in my unRaid Server. Just got the plus licensing today was started the preclear for my 4th drive. This is WD Red Nas drive. I got the following error when ran the first time Jan 26 00:55:29 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Jan 26 00:55:29 Manitower kernel: ata5.00: irq_stat 0x40000008 Jan 26 00:55:29 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED Jan 26 00:55:29 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in Jan 26 00:55:29 Manitower kernel: res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F> Jan 26 00:55:29 Manitower kernel: ata5.00: status: { DRDY ERR } Jan 26 00:55:29 Manitower kernel: ata5.00: error: { UNC } Jan 26 00:55:29 Manitower kernel: ata5.00: configured for UDMA/133 Jan 26 00:55:29 Manitower kernel: ata5: EH complete Jan 26 00:55:33 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Jan 26 00:55:33 Manitower kernel: ata5.00: irq_stat 0x40000008 Jan 26 00:55:33 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED Jan 26 00:55:33 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in Jan 26 00:55:33 Manitower kernel: res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F> Then I replaced the cable and then connected to a different SATA port and ran the preclear. Again within 1% I started getting errors, the speed reduced to 15-25 Mbs Jan 26 01:10:00 Manitower kernel: ata5.00: irq_stat 0x40000008 Jan 26 01:10:00 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED Jan 26 01:10:00 Manitower kernel: ata5.00: cmd 60/40:00:c0:36:a4/00:00:03:00:00/40 tag 0 ncq 32768 in Jan 26 01:10:00 Manitower kernel: res 41/40:00:c0:36:a4/00:00:03:00:00/40 Emask 0x409 (media error) <F> Jan 26 01:10:00 Manitower kernel: ata5.00: status: { DRDY ERR } Jan 26 01:10:00 Manitower kernel: ata5.00: error: { UNC } Jan 26 01:10:00 Manitower kernel: ata5.00: configured for UDMA/133 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Unhandled sense code Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Sense Key : 0x3 [current] [descriptor] Jan 26 01:10:00 Manitower kernel: Descriptor sense data with sense descriptors (in hex): Jan 26 01:10:00 Manitower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jan 26 01:10:00 Manitower kernel: 03 a4 36 c0 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] ASC=0x11 ASCQ=0x4 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] CDB: cdb[0]=0x28: 28 00 03 a4 36 c0 00 00 40 00 Jan 26 01:10:00 Manitower kernel: end_request: I/O error, dev sdd, sector 61093568 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636696 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636697 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636698 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636699 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636700 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636701 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636702 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636703 Jan 26 01:10:00 Manitower kernel: ata5: EH complete Can some please help me, I am a newbie in unRaid and really need some help. Last 3 drive's preclear went without any issues. This is the command I used preclear_disk.sh -A -M 4 /dev/sda I am attaching the SMART report and syslog, please help !!!! syslog.zip SMART_Report.txt Quote Link to comment
Just Me Posted January 25, 2014 Share Posted January 25, 2014 Hey! Sorry to bother you guys. A friend of mine gave me a 3TB hard drive, which has abnormal SMART values. That's why I started a three-cycle-preclear to test the drive. Here is the preclear report: ========================================================================1.14 == invoked as: ./preclear_disk.sh -A -c 3 /dev/sdd == ST3000DM001-9YN166 S1F14GED == Disk /dev/sdd has been successfully precleared == with a starting sector of 1 == Ran 3 cycles == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 6:22:11 (130 MB/s) == Last Cycle's Zeroing time : 5:33:40 (149 MB/s) == Last Cycle's Post Read Time : 14:56:28 (55 MB/s) == Last Cycle's Total Time : 20:31:10 == == Total Elapsed Time 68:28:52 == == Disk Start Temperature: 26C == == Current Disk Temperature: 29C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdd /tmp/smart_finish_sdd ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 114 97 6 ok 60561760 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 Reported_Uncorrect = 1 1 0 near_thresh 3320 Airflow_Temperature_Cel = 71 74 45 In_the_past 29 Temperature_Celsius = 29 26 0 ok 29 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 3. 0 sectors were pending re-allocation after post-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 2 of 3. 0 sectors were pending re-allocation after post-read in cycle 2 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 3 of 3. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 2136 sectors had been re-allocated before the start of the preclear. 2144 sectors are re-allocated at the end of the preclear, a change of 8 in the number of sectors re-allocated. ============================================================================ I've attached the SMART reports below. It would be nice if someone could take a look at the reports. Would you still trust the hard drive and save data on it? Or is it more a case for disposal? :'( Thanks in advance. preclear_start_SMART.txt preclear_finish_SMART.txt Quote Link to comment
RobJ Posted January 26, 2014 Share Posted January 26, 2014 Hi All, Having problem with my preclear please help. I already have 2 data drives and 1 parity in my unRaid Server. Just got the plus licensing today was started the preclear for my 4th drive. This is WD Red Nas drive. I got the following error when ran the first time ... Then I replaced the cable and then connected to a different SATA port and ran the preclear. Again within 1% I started getting errors, the speed reduced to 15-25 Mbs I am attaching the SMART report and syslog, please help !!!! You have a series of bad sectors on this drive, very early on it too. The SMART report was not very useful, as it was truncated on the right side at 80 columns, cutting off the RAW numbers. Not sure what did that. Quote Link to comment
RobJ Posted January 26, 2014 Share Posted January 26, 2014 Hey! Sorry to bother you guys. A friend of mine gave me a 3TB hard drive, which has abnormal SMART values. That's why I started a three-cycle-preclear to test the drive. Here is the preclear report: ========================================================================1.14 == invoked as: ./preclear_disk.sh -A -c 3 /dev/sdd == ST3000DM001-9YN166 S1F14GED == Disk /dev/sdd has been successfully precleared == with a starting sector of 1 == Ran 3 cycles == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 6:22:11 (130 MB/s) == Last Cycle's Zeroing time : 5:33:40 (149 MB/s) == Last Cycle's Post Read Time : 14:56:28 (55 MB/s) == Last Cycle's Total Time : 20:31:10 == == Total Elapsed Time 68:28:52 == == Disk Start Temperature: 26C == == Current Disk Temperature: 29C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdd /tmp/smart_finish_sdd ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 114 97 6 ok 60561760 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 Reported_Uncorrect = 1 1 0 near_thresh 3320 Airflow_Temperature_Cel = 71 74 45 In_the_past 29 Temperature_Celsius = 29 26 0 ok 29 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 3. 0 sectors were pending re-allocation after post-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 2 of 3. 0 sectors were pending re-allocation after post-read in cycle 2 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 3 of 3. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 2136 sectors had been re-allocated before the start of the preclear. 2144 sectors are re-allocated at the end of the preclear, a change of 8 in the number of sectors re-allocated. ============================================================================ I've attached the SMART reports below. It would be nice if someone could take a look at the reports. Would you still trust the hard drive and save data on it? Or is it more a case for disposal? :'( Thanks in advance. I would run another Preclear or 2 on it, check for additional reallocated sectors. If you can obtain clean results, no further adverse numbers, then it is usable. With that many reallocated sectors, I know that some users would prefer to reserve the drive for secondary uses, such as holding backups. Quote Link to comment
manny Posted January 26, 2014 Share Posted January 26, 2014 Thanks for the reply RobJ, I am attaching the report again, can you please check and let me know if I have RMA the drive. Can you also please how you came this conculsion, I have been reading the the SMART report but still not able to understand it. root@Manitower:~# smartctl -a /dev/sdd smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: WDC WD20EFRX-68AX9N0 Serial Number: WD-WCC300664700 Firmware Version: 80.00A80 User Capacity: 2,000,398,934,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 9 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sun Jan 26 12:13:44 2014 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (27540) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x70bd) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 155 081 051 Pre-fail Always - 1164 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 5 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 4 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 3 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1 194 Temperature_Celsius 0x0022 119 118 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 81 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
RobJ Posted January 26, 2014 Share Posted January 26, 2014 Thanks for the reply RobJ, I am attaching the report again, can you please check and let me know if I have RMA the drive. Can you also please how you came this conculsion, I have been reading the the SMART report but still not able to understand it. root@Manitower:~# smartctl -a /dev/sdd ... Device Model: WDC WD20EFRX-68AX9N0 ... 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 81 ... I was too tired to take much time with it, sorry. This SMART report is intact, thanks. The line above shows 81 Current Pending sectors, a very ominous sign, especially when you only have 5 operational hours on the drive. It means it has already found 81 sectors that are probably bad. As near as I can tell, you have started Preclear 3 times, then aborted it quite early, probably because of the errors and how long it was taking, but these short passes make this even more ominous, in that you shouldn't find even one error over the entire drive, and you found 81 in just the first 1 or 2 percent. What I based my opinion on was the syslog you attached and the 2 syslog excerpts you posted. They all show a series of errors logged by the exception handler. All of them are noted as 'media error' which means a problem with a physical sector on the drive surface. More specifically, the error flag raised for each of those sectors is 'UNC' (short for 'UNCorrectable'), which means the sector was found to be corrupted so much that even the embedded error correction info could not fix it. Because these first Preclear passes are just read passes, we CANNOT conclude for sure that the drive is bad yet, until the drive attempts to fix them, by rewriting them correctly. At that point, the drive will determine if the magnetic media under the sector is good or bad, and either return the sector to service, validly rewritten, or remap it elsewhere (as a reallocated sector). The drive MAY be bad, but the SMART report is not showing any mechanical issues, so far, so it's possible the magnetic surface is good but has been scrambled some how??? Not likely, but possible. An immediate zeroing pass would probably be a good next step, skipping the Preclear Pre-Read, and forcing writes to all sectors. It should rather quickly help you decide if the drive is worth further effort or not. Syntax I believe would be "preclear_disk.sh -W /dev/sdd". Quote Link to comment
manny Posted January 26, 2014 Share Posted January 26, 2014 Thanks a lot RobJ, this is very informative, thank you for taking time to write such a detailed explanation. Meanwhile I took HDD out of the array and dropped it in my windows 7 PC and ran HD tune. Not sure if this is a good idea but as you predicated it showed bunch of bad sectors at the very beginning. It has around 0.6% of damaged blocks as per HD tune. Should I go ahead and RMA the drive or run zero pass preclear?..... Quote Link to comment
Joe L. Posted January 26, 2014 Share Posted January 26, 2014 Hi All, Having problem with my preclear please help. I already have 2 data drives and 1 parity in my unRaid Server. Just got the plus licensing today was started the preclear for my 4th drive. This is WD Red Nas drive. I got the following error when ran the first time Jan 26 00:55:29 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Jan 26 00:55:29 Manitower kernel: ata5.00: irq_stat 0x40000008 Jan 26 00:55:29 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED Jan 26 00:55:29 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in Jan 26 00:55:29 Manitower kernel: res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F> Jan 26 00:55:29 Manitower kernel: ata5.00: status: { DRDY ERR } Jan 26 00:55:29 Manitower kernel: ata5.00: error: { UNC } Jan 26 00:55:29 Manitower kernel: ata5.00: configured for UDMA/133 Jan 26 00:55:29 Manitower kernel: ata5: EH complete Jan 26 00:55:33 Manitower kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Jan 26 00:55:33 Manitower kernel: ata5.00: irq_stat 0x40000008 Jan 26 00:55:33 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED Jan 26 00:55:33 Manitower kernel: ata5.00: cmd 60/08:00:40:c1:da/00:00:03:00:00/40 tag 0 ncq 4096 in Jan 26 00:55:33 Manitower kernel: res 41/40:00:40:c1:da/00:00:03:00:00/40 Emask 0x409 (media error) <F> Then I replaced the cable and then connected to a different SATA port and ran the preclear. Again within 1% I started getting errors, the speed reduced to 15-25 Mbs Jan 26 01:10:00 Manitower kernel: ata5.00: irq_stat 0x40000008 Jan 26 01:10:00 Manitower kernel: ata5.00: failed command: READ FPDMA QUEUED Jan 26 01:10:00 Manitower kernel: ata5.00: cmd 60/40:00:c0:36:a4/00:00:03:00:00/40 tag 0 ncq 32768 in Jan 26 01:10:00 Manitower kernel: res 41/40:00:c0:36:a4/00:00:03:00:00/40 Emask 0x409 (media error) <F> Jan 26 01:10:00 Manitower kernel: ata5.00: status: { DRDY ERR } Jan 26 01:10:00 Manitower kernel: ata5.00: error: { UNC } Jan 26 01:10:00 Manitower kernel: ata5.00: configured for UDMA/133 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Unhandled sense code Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] Sense Key : 0x3 [current] [descriptor] Jan 26 01:10:00 Manitower kernel: Descriptor sense data with sense descriptors (in hex): Jan 26 01:10:00 Manitower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jan 26 01:10:00 Manitower kernel: 03 a4 36 c0 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] ASC=0x11 ASCQ=0x4 Jan 26 01:10:00 Manitower kernel: sd 4:0:0:0: [sdd] CDB: cdb[0]=0x28: 28 00 03 a4 36 c0 00 00 40 00 Jan 26 01:10:00 Manitower kernel: end_request: I/O error, dev sdd, sector 61093568 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636696 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636697 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636698 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636699 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636700 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636701 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636702 Jan 26 01:10:00 Manitower kernel: Buffer I/O error on device sdd, logical block 7636703 Jan 26 01:10:00 Manitower kernel: ata5: EH complete Can some please help me, I am a newbie in unRaid and really need some help. Last 3 drive's preclear went without any issues. This is the command I used preclear_disk.sh -A -M 4 /dev/sda I am attaching the SMART report and syslog, please help !!!! UNC = un-correctable media-error = un-readable sector (sector contents do not match affiliated checksum at end of sector) let the preclear continue. Quote Link to comment
manny Posted January 26, 2014 Share Posted January 26, 2014 Thanks Joe, looks like it has some bunch of bad blocks (0.6%), should i continue to use this drive or RMA it? Quote Link to comment
Joe L. Posted January 26, 2014 Share Posted January 26, 2014 Thanks Joe, looks like it has some bunch of bad blocks (0.6%), should i continue to use this drive or RMA it? rma it. Quote Link to comment
aspik Posted January 28, 2014 Share Posted January 28, 2014 Hi Joe, Thank you for the latest check on my drivers. Now I have precleared my WD AV-GP 2TB disk and have a question about ATA Errors. I guess the smart attributes looks OK? There are no pending sectors to be re-allocated. But the preclear start and finish reports have ATA Errors. As far I could find out, these are cable problems, right? Both Errors occurred on 159 day of use, the disk has currently power on of 408 days, which would mean that the errors occurred quite time ago (where the disk was used in my htpc). What I don’t understand is: SMART Error Log Version: 1 ATA Error Count: 2 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 00 00 00 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 e0 4f c2 a0 00 00:22:17.056 SMART WRITE LOG ec 00 00 00 00 00 a0 00 00:22:17.054 IDENTIFY DEVICE ef 02 00 00 00 00 00 00 00:22:16.633 SET FEATURES [Enable write cache] ec 00 00 00 00 00 a0 00 00:22:16.630 IDENTIFY DEVICE b0 d6 01 e0 4f c2 a0 00 00:22:06.756 SMART WRITE LOG Error 1 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 00 00 00 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 e0 4f c2 a0 00 00:22:06.756 SMART WRITE LOG ec 00 00 00 00 00 a0 00 00:22:06.752 IDENTIFY DEVICE ef 02 00 00 00 00 00 00 00:22:06.722 SET FEATURES [Enable write cache] ec 00 00 00 00 00 a0 00 00:22:06.719 IDENTIFY DEVICE ec 00 00 00 00 00 a0 00 00:21:35.933 IDENTIFY DEVICE What does mean: Error: ABRT? Attached start, finish and rpt files. Thanks! WD_AV-GP_2TB.zip Quote Link to comment
RobJ Posted January 28, 2014 Share Posted January 28, 2014 Hi Joe, Thank you for the latest check on my drivers. Now I have precleared my WD AV-GP 2TB disk and have a question about ATA Errors. I guess the smart attributes looks OK? There are no pending sectors to be re-allocated. But the preclear start and finish reports have ATA Errors. As far I could find out, these are cable problems, right? Both Errors occurred on 159 day of use, the disk has currently power on of 408 days, which would mean that the errors occurred quite time ago (where the disk was used in my htpc). What I don’t understand is: SMART Error Log Version: 1 ATA Error Count: 2 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 00 00 00 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 e0 4f c2 a0 00 00:22:17.056 SMART WRITE LOG ec 00 00 00 00 00 a0 00 00:22:17.054 IDENTIFY DEVICE ef 02 00 00 00 00 00 00 00:22:16.633 SET FEATURES [Enable write cache] ec 00 00 00 00 00 a0 00 00:22:16.630 IDENTIFY DEVICE b0 d6 01 e0 4f c2 a0 00 00:22:06.756 SMART WRITE LOG Error 1 occurred at disk power-on lifetime: 3834 hours (159 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 00 00 00 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 e0 4f c2 a0 00 00:22:06.756 SMART WRITE LOG ec 00 00 00 00 00 a0 00 00:22:06.752 IDENTIFY DEVICE ef 02 00 00 00 00 00 00 00:22:06.722 SET FEATURES [Enable write cache] ec 00 00 00 00 00 a0 00 00:22:06.719 IDENTIFY DEVICE ec 00 00 00 00 00 a0 00 00:21:35.933 IDENTIFY DEVICE What does mean: Error: ABRT? Attached start, finish and rpt files. Thanks! I did some research, and found only 2 possibilities - the drive SMART firmware tried to write to the SMART log but SMART was not enabled at that instant (possibly at drive startup), or there is a bug in the SMART firmware on that drive. Not something to worry about, as it only happened once (the second is a retry of the first), and that was a long time ago. Quote Link to comment
ridley Posted February 6, 2014 Share Posted February 6, 2014 I had a Red BAll on a 2Tb WD Red drive this week which was succesfully replaced and all is well with the array. However I have now ran Pre-Clear on the drive that had had read errors for 4 passes with the following results, does that make it safe to resuse in the array? ================================================================== 1.13 = unRAID server Pre-Clear disk /dev/sda = cycle 3 of 3, partition start on sector 64 = = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 19C, Elapsed Time: 53:14:13 ========================================================================1.13 == WDC WD20EFRX-68AX9N0 WD-WMC301458571 == Disk /dev/sda has been successfully precleared == with a starting sector of 64 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sda /tmp/smart_finish_sda ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VA LUE Temperature_Celsius = 128 127 0 ok 19 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 3. 0 sectors were pending re-allocation after post-read in cycle 1 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 2 of 3. 0 sectors were pending re-allocation after post-read in cycle 2 of 3. 0 sectors were pending re-allocation after zero of disk in cycle 3 of 3. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. root@Tower:/boot# **************** Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.