keyman33 Posted December 8, 2011 Posted December 8, 2011 Hi there, I've noticed my Error count on the rise on my Disk 2 of my array (61k at this point). I don't know exactly when it started to go up, but I *think* it's when I moved some data from Disk1 to Disk2. I don't believe I had any errors at all a month or so ago. I've read that this might be caused by a temporary READ error/parity issue, but I could be mistaken. Reference: http://lime-technology.com/wiki/index.php?title=Troubleshooting#Obtaining_a_SMART_report When I check the syslog I see "UNC" codes, which potentially point to a bad sector. So I ran Smarthistory and I do have some error that I cannot interpret. "Current_Pending_Sector" and "Offline_Uncorrectable". I'm at a loss...do I have a drive that's going bad? My other specs are in my signature I've attached a partial syslog (mine was 9MB and not compressible to the file upload limit) and a screenshot of the smarthistory of the errors. Your help is greatly appreciated, as always. syslog-2011-12-08_partial_3.zip
lionelhutz Posted December 8, 2011 Posted December 8, 2011 Yes, you should be worried. It very much appears you have a drive going bad. If possible, you should get that drive replaced as soon as possible. Once successfully replaced, you could then try some tests on it to see what becomes of it if you so desired. Personally, I'd just RMA it and get another one. Peter
Joe L. Posted December 9, 2011 Posted December 9, 2011 Yes, you should be worried. It very much appears you have a drive going bad. If possible, you should get that drive replaced as soon as possible. Once successfully replaced, you could then try some tests on it to see what becomes of it if you so desired. Personally, I'd just RMA it and get another one. Peter I agree. It needs to be replaced... ASAP.
keyman33 Posted December 9, 2011 Author Posted December 9, 2011 Thanks Joe and Lionelhutz (<--great alias, btw)! Will do so. Dang - I wish HDs weren't so crazy expensive right now. cheers Steve
WeeboTech Posted December 9, 2011 Posted December 9, 2011 Move your data off that drive as soon as possible. Schedule drive replacement. Should you have any other hard drive failure, the one in question may not be readable to reconstruct another drive.
armbrust Posted December 9, 2011 Posted December 9, 2011 I have the same stituation/question -- And I guessing the same answer - RMA. I did a short test on the Drive. Here is the smartctl output after the short test. I'm running a long test now. Any comments appreciated. Thanks SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 194 194 051 Pre-fail Always - 12786 3 Spin_Up_Time 0x0027 186 163 021 Pre-fail Always - 5675 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 836 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10705 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 12 193 Load_Cycle_Count 0x0032 171 171 000 Old_age Always - 88272 194 Temperature_Celsius 0x0022 115 107 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 180 198 Offline_Uncorrectable 0x0030 200 197 000 Old_age Offline - 14 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 195 000 Old_age Offline - 36 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 70% 10704 2928410663 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
mbryanr Posted December 9, 2011 Posted December 9, 2011 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 180 198 Offline_Uncorrectable 0x0030 200 197 000 Old_age Offline - 14 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 70% 10704 2928410663 RMA.
armbrust Posted December 9, 2011 Posted December 9, 2011 RMA. Thanks for taking the time to comment. Will do. Now, buy a replacement drive, or move stuff off temporarily. Pondering...
Joe L. Posted December 9, 2011 Posted December 9, 2011 RMA. Thanks for taking the time to comment. Will do. Now, buy a replacement drive, or move stuff off temporarily. Pondering... Even if you move stuff off, if another disk were to fail, you might not be able to reconstruct it, since this disk has unreadable sectors. You can move stuff off, but you still need to RMA the drive. (or remove it from your array and re-calculate parity without it.)
Recommended Posts
Archived
This topic is now archived and is closed to further replies.