Disk Errors Reported - Should I worry?


Recommended Posts

Hi there,

 

I've noticed my Error count on the rise on my Disk 2 of my array (61k at this point).  I don't know exactly when it started to go up, but I *think* it's when I moved some data from Disk1 to Disk2.  I don't believe I had any errors at all a month or so ago.

 

I've read that this might be caused by a temporary READ error/parity issue, but I could be mistaken. Reference: http://lime-technology.com/wiki/index.php?title=Troubleshooting#Obtaining_a_SMART_report

 

When I check the syslog I see "UNC" codes, which potentially point to a bad sector.  So I ran Smarthistory and I do have some error that I cannot interpret.  "Current_Pending_Sector" and "Offline_Uncorrectable".

 

I'm at a loss...do I have a drive that's going bad?  My other specs are in my signature

 

I've attached a partial syslog (mine was 9MB and not compressible to the file upload limit) and a screenshot of the smarthistory of the errors.

 

Your help is greatly appreciated, as always.

Screenshot_SmartReport_Disk2_small2.jpg.9f8e1cf43fd007b7c7f523942984c74f.jpg

syslog-2011-12-08_partial_3.zip

Link to comment

Yes, you should be worried. It very much appears you have a drive going bad.

 

If possible, you should get that drive replaced as soon as possible.

 

Once successfully replaced, you could then try some tests on it to see what becomes of it if you so desired. Personally, I'd just RMA it and get another one.

 

Peter

 

Link to comment

Yes, you should be worried. It very much appears you have a drive going bad.

 

If possible, you should get that drive replaced as soon as possible.

 

Once successfully replaced, you could then try some tests on it to see what becomes of it if you so desired. Personally, I'd just RMA it and get another one.

 

Peter

 

I agree.  It needs to be replaced... ASAP.
Link to comment

I have the same stituation/question --  And I guessing the same answer - RMA.

 

I did a short test on the Drive.  Here is the smartctl output after the short test.  I'm running a long test now.

 

Any comments appreciated.  Thanks

 

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   194   194   051    Pre-fail  Always       -       12786
  3 Spin_Up_Time            0x0027   186   163   021    Pre-fail  Always       -       5675
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       836
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10705
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       21
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   171   171   000    Old_age   Always       -       88272
194 Temperature_Celsius     0x0022   115   107   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       180
198 Offline_Uncorrectable   0x0030   200   197   000    Old_age   Offline      -       14
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   195   000    Old_age   Offline      -       36

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       70%     10704         2928410663

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

197 Current_Pending_Sector  0x0032  200  197  000    Old_age  Always      -      180

198 Offline_Uncorrectable  0x0030  200  197  000    Old_age  Offline      -      14

 

 

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed: read failure      70%    10704        2928410663

 

RMA.

Link to comment

RMA.

 

Thanks for taking the time to comment.   Will do.  Now, buy a replacement drive, or move stuff off temporarily.   Pondering...

Even if you move stuff off, if another disk were to fail, you might not be able to reconstruct it, since this disk has unreadable sectors. 

 

You can move stuff off, but you still need to RMA the drive.  (or remove it from your array and re-calculate parity without it.)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.