keyman33 Posted December 8, 2011 Share Posted December 8, 2011 Hi there, I've noticed my Error count on the rise on my Disk 2 of my array (61k at this point). I don't know exactly when it started to go up, but I *think* it's when I moved some data from Disk1 to Disk2. I don't believe I had any errors at all a month or so ago. I've read that this might be caused by a temporary READ error/parity issue, but I could be mistaken. Reference: http://lime-technology.com/wiki/index.php?title=Troubleshooting#Obtaining_a_SMART_report When I check the syslog I see "UNC" codes, which potentially point to a bad sector. So I ran Smarthistory and I do have some error that I cannot interpret. "Current_Pending_Sector" and "Offline_Uncorrectable". I'm at a loss...do I have a drive that's going bad? My other specs are in my signature I've attached a partial syslog (mine was 9MB and not compressible to the file upload limit) and a screenshot of the smarthistory of the errors. Your help is greatly appreciated, as always. syslog-2011-12-08_partial_3.zip Quote Link to comment
lionelhutz Posted December 8, 2011 Share Posted December 8, 2011 Yes, you should be worried. It very much appears you have a drive going bad. If possible, you should get that drive replaced as soon as possible. Once successfully replaced, you could then try some tests on it to see what becomes of it if you so desired. Personally, I'd just RMA it and get another one. Peter Quote Link to comment
Joe L. Posted December 9, 2011 Share Posted December 9, 2011 Yes, you should be worried. It very much appears you have a drive going bad. If possible, you should get that drive replaced as soon as possible. Once successfully replaced, you could then try some tests on it to see what becomes of it if you so desired. Personally, I'd just RMA it and get another one. Peter I agree. It needs to be replaced... ASAP. Quote Link to comment
keyman33 Posted December 9, 2011 Author Share Posted December 9, 2011 Thanks Joe and Lionelhutz (<--great alias, btw)! Will do so. Dang - I wish HDs weren't so crazy expensive right now. cheers Steve Quote Link to comment
WeeboTech Posted December 9, 2011 Share Posted December 9, 2011 Move your data off that drive as soon as possible. Schedule drive replacement. Should you have any other hard drive failure, the one in question may not be readable to reconstruct another drive. Quote Link to comment
armbrust Posted December 9, 2011 Share Posted December 9, 2011 I have the same stituation/question -- And I guessing the same answer - RMA. I did a short test on the Drive. Here is the smartctl output after the short test. I'm running a long test now. Any comments appreciated. Thanks SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 194 194 051 Pre-fail Always - 12786 3 Spin_Up_Time 0x0027 186 163 021 Pre-fail Always - 5675 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 836 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10705 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 21 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 12 193 Load_Cycle_Count 0x0032 171 171 000 Old_age Always - 88272 194 Temperature_Celsius 0x0022 115 107 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 180 198 Offline_Uncorrectable 0x0030 200 197 000 Old_age Offline - 14 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 195 000 Old_age Offline - 36 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 70% 10704 2928410663 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
mbryanr Posted December 9, 2011 Share Posted December 9, 2011 197 Current_Pending_Sector 0x0032 200 197 000 Old_age Always - 180 198 Offline_Uncorrectable 0x0030 200 197 000 Old_age Offline - 14 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 70% 10704 2928410663 RMA. Quote Link to comment
armbrust Posted December 9, 2011 Share Posted December 9, 2011 RMA. Thanks for taking the time to comment. Will do. Now, buy a replacement drive, or move stuff off temporarily. Pondering... Quote Link to comment
Joe L. Posted December 9, 2011 Share Posted December 9, 2011 RMA. Thanks for taking the time to comment. Will do. Now, buy a replacement drive, or move stuff off temporarily. Pondering... Even if you move stuff off, if another disk were to fail, you might not be able to reconstruct it, since this disk has unreadable sectors. You can move stuff off, but you still need to RMA the drive. (or remove it from your array and re-calculate parity without it.) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.