July 26, 201213 yr Hi, I just got some errors on one of my drives. Should I rma? I am on 4.7. Here is my syslog: http://pastebin.com/WuF7PBWq. Please see July 13th 5am entries, and again on July 14th at 5am. (I had an rsync script scheduled to run at 5 am to copy the latest crashplan backups from the cache drive to the array. The errors happened then. I have since removed crashplan and the rsync script) I was having issues with unraid in general, I had had a power outage a few days prior, plugins and unmenu were crashing, etc. Then I noticed that errors were reported. Here is the smart report for that drive: smartctl -a -d ata /dev/sdf (disk4) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar 7K2000 Device Model: Hitachi HDS722020ALA330 Serial Number: JK1100YAH6M8YT Firmware Version: JKAOA28A User Capacity: 2,000,398,934,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Jul 23 20:57:23 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (22918) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 130 130 054 Pre-fail Offline - 110 3 Spin_Up_Time 0x0007 118 118 024 Pre-fail Always - 614 (Average 614) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 824 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 19 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 114 114 020 Pre-fail Offline - 38 9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 17922 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 104 192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 1229 193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 1229 194 Temperature_Celsius 0x0002 222 222 000 Old_age Always - 27 (Lifetime Min/Max 9/59) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 21 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 222 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 222 occurred at disk power-on lifetime: 17691 hours (737 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 6a e6 8d 32 00 Error: UNC 106 sectors at LBA = 0x00328de6 = 3313126 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 50 8d 32 e0 08 25d+07:55:59.341 READ DMA EXT ef 10 02 00 00 00 a0 08 25d+07:55:59.339 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 25d+07:55:59.338 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 25d+07:55:59.329 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 25d+07:55:59.328 SET FEATURES [set transfer mode] Error 221 occurred at disk power-on lifetime: 17691 hours (737 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 6a e6 8d 32 00 Error: UNC 106 sectors at LBA = 0x00328de6 = 3313126 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 50 8d 32 e0 08 25d+07:55:41.743 READ DMA EXT ef 10 02 00 00 00 a0 08 25d+07:55:41.741 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 25d+07:55:41.740 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 25d+07:55:41.731 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 25d+07:55:41.730 SET FEATURES [set transfer mode] Error 220 occurred at disk power-on lifetime: 17691 hours (737 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 6a e6 8d 32 00 Error: UNC 106 sectors at LBA = 0x00328de6 = 3313126 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 50 8d 32 ee 08 25d+07:55:24.251 READ DMA EXT 25 00 00 50 89 32 ee 08 25d+07:55:24.227 READ DMA EXT 25 00 00 50 85 32 ee 08 25d+07:55:24.180 READ DMA EXT 35 00 00 50 81 32 ee 08 25d+07:55:22.422 WRITE DMA EXT 35 00 00 50 7d 32 ee 08 25d+07:55:22.409 WRITE DMA EXT Error 219 occurred at disk power-on lifetime: 17667 hours (736 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 96 6a eb c2 00 Error: UNC 150 sectors at LBA = 0x00c2eb6a = 12774250 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 00 e8 c2 e0 08 19d+05:29:46.065 READ DMA EXT ef 10 02 00 00 00 a0 08 19d+05:29:46.063 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 19d+05:29:46.062 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 19d+05:29:46.053 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 19d+05:29:46.052 SET FEATURES [set transfer mode] Error 218 occurred at disk power-on lifetime: 17667 hours (736 days + 3 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 96 6a eb c2 00 Error: UNC 150 sectors at LBA = 0x00c2eb6a = 12774250 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 00 e8 c2 e0 08 19d+05:29:20.075 READ DMA EXT ef 10 02 00 00 00 a0 08 19d+05:29:20.073 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 19d+05:29:20.072 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 19d+05:29:20.062 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 19d+05:29:20.061 SET FEATURES [set transfer mode] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 13691 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. By the way, I never rma'd a drive before. I am not familiar with the process at all. On Hitachi's website it shows that my drive's warranty expires on Dec 19th 2012. Will they accept it with the current errors? Because smart report says that it passed even with the errors. All help is appreciated. Thanks
July 26, 201213 yr The drive needs to be rebuilt. Unassign the drive and start the array. Then assign the drive and start the array. The will be rebuilt. The Current_Pending_Sector should go to 0. Why use rsync when the mover is also running? Stop using rsync and just set the mover to run at 5 AM. The are a lot of duplicates due to using rsync incorrectly.
July 26, 201213 yr Author Thanks dgaschk Those duplicates were intentional. I was not using rsync as an alternate but to complement the mover to overcome challenges caused by crashplan. When I originally had crashplan write the backups to a user share (with cache drive enabled) Crashplan would append the existing files every time it did a backup, thereby skipping the cache drive and writing directly onto the array drive. The continuous writes to the array and the parity drive really annoyed me. So I set it up so that crashplan would write the backups to a hidden directory on the cache drive, and every morning at 5 am those files would be rsynced to the array for parity protection. So I had to have duplicates at all times, because if the files were moved by the mover and deleted from the cache drive, then crashplan would do full backups of each computer every day, killing my wifi and overworking my laptops. Here was my rsync script: rsync -avz /mnt/cache/.crashplan/backupArchives/CrashPlan\ Backups/ /mnt/user/CrashPlan\ Backups I no longer use crashplan by the way So the million dollar question is, do you think this drive is fine overall?? I have a feeling it will die right after the warranty is over
July 26, 201213 yr The drive's starting to show its age. It should be fine if the Current_Pending_Sector stays at zero; and Reallocated_Sector_Ct and Reallocated_Event_Count remain stable.
July 30, 201213 yr Author Thanks Dgaschk, Unfortunately I am getting more of the same errors. syslog here Looks like I'll need to rma soon, I ordered a new hdd to use as a warm spare in the mean time. Once I preclear the new drive, what would you recommend? Should I replace the old one with the new drive and keep preclearing the failing one until it fails? I don't really know what qualifies as a failing drive as far as rma is concerned. I am guessing it currently would not qualify as the smart report still says "PASSED" Thanks Smart report: smartctl -a -d ata /dev/sdf (disk4) smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar 7K2000 Device Model: Hitachi HDS722020ALA330 Serial Number: JK1100YAH6M8YT Firmware Version: JKAOA28A User Capacity: 2,000,398,934,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Jul 30 00:01:08 2012 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (22918) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 084 084 016 Pre-fail Always - 8519892 2 Throughput_Performance 0x0005 130 130 054 Pre-fail Offline - 110 3 Spin_Up_Time 0x0007 129 129 024 Pre-fail Always - 503 (Average 615) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 855 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 112 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 114 114 020 Pre-fail Offline - 38 9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 18069 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 105 192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 1260 193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 1260 194 Temperature_Celsius 0x0002 250 250 000 Old_age Always - 24 (Lifetime Min/Max 9/59) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 116 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 14 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 326 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 326 occurred at disk power-on lifetime: 18060 hours (752 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 54 0c ca 26 00 Error: UNC 84 sectors at LBA = 0x0026ca0c = 2542092 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 60 c6 26 e0 08 19d+18:32:11.815 READ DMA EXT ef 10 02 00 00 00 a0 08 19d+18:32:11.814 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 19d+18:32:11.813 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 19d+18:32:11.803 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 19d+18:32:11.802 SET FEATURES [set transfer mode] Error 325 occurred at disk power-on lifetime: 18060 hours (752 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 57 09 ca 26 00 Error: UNC 87 sectors at LBA = 0x0026ca09 = 2542089 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 60 c6 26 e0 08 19d+18:31:54.777 READ DMA EXT ef 10 02 00 00 00 a0 08 19d+18:31:54.776 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 19d+18:31:54.774 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 19d+18:31:54.765 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 19d+18:31:54.764 SET FEATURES [set transfer mode] Error 324 occurred at disk power-on lifetime: 18060 hours (752 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 57 09 ca 26 00 Error: UNC 87 sectors at LBA = 0x0026ca09 = 2542089 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 60 c6 26 e9 08 19d+18:31:37.832 READ DMA EXT 25 00 00 60 c2 26 e9 08 19d+18:31:37.749 READ DMA EXT 25 00 00 60 be 26 e9 08 19d+18:31:37.724 READ DMA EXT 25 00 00 60 ba 26 e9 08 19d+18:31:37.691 READ DMA EXT 25 00 00 60 b6 26 e9 08 19d+18:31:37.667 READ DMA EXT Error 323 occurred at disk power-on lifetime: 18048 hours (752 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 ef 61 ac 55 00 Error: UNC 239 sectors at LBA = 0x0055ac61 = 5614689 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 50 a9 55 e0 08 16d+18:01:56.693 READ DMA EXT e5 00 00 00 00 00 00 08 16d+18:01:56.692 CHECK POWER MODE ef 10 02 00 00 00 a0 08 16d+18:01:56.690 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 16d+18:01:56.689 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 16d+18:01:56.679 IDENTIFY DEVICE Error 322 occurred at disk power-on lifetime: 18048 hours (752 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 cf 81 a5 55 00 Error: UNC 207 sectors at LBA = 0x0055a581 = 5612929 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 00 50 a5 55 e0 08 16d+18:01:34.770 READ DMA EXT ef 10 02 00 00 00 a0 08 16d+18:01:34.768 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 08 16d+18:01:34.767 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 08 16d+18:01:34.758 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 16d+18:01:34.757 SET FEATURES [set transfer mode] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 13691 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Archived
This topic is now archived and is closed to further replies.