Jump to content

Read errors, smart long test won't finish, badblocks says drive is fine


sureguy

Recommended Posts

Hey all,

 

Disk 2 on my server lists several read errors in the syslog, 256 errors on the GUI, and will not finish a smart long test, but will finish a smart short test.  I ran: 

 

badblocks -s -v -o /tmp/badblocks_sdj.txt /dev/sdj

 

against the drive and it returned 0 errors.  The errors in the syslog look like the following:

 

Feb 18 00:06:55 phatstore kernel: md: disk2 read error, sector=913165296

 

I'm not quite sure where to go from here.  Should I move all the data from disk 2, pause mover, then run badblocks in write mode, then zero the drive and see if it can pass a smart long test?  Or is there something else I should do.

 

I've attached my syslog (sorry for the length), and smart report for the drive.

syslog19feb2015.txt.zip

smartdisk2.txt

Link to comment

Also, if this nearly 3 year old (2.94 years) drive is still inside it's warranty period I would start an advanced RMA process. This is where they ship you a replacement drive first and then you ship back your drive. You will need to secure it with a credit-card, so ONLY IF you don't return the defective drive you would be charged. You will not be charged if you return your drive by the date given, typically 30 days. I haven't had issues with this ever.

 

Nevermind, looks like the drive is no longer under warranty, ended in September 2014 -- http://wdsupport.wdc.com/warranty/warrantycheck.asp?custtype=end

 

In my experience, once the drive shows READ FAILURE during smart tests (short or long) it's down hill from there.

Link to comment

If I'm reading the smart results correctly, the drive is aborting itself the extended tests because of read failures:

 

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       70%     25746         1239633904
# 2  Extended offline    Completed: read failure       70%     25729         1239633904
# 3  Extended offline    Completed: read failure       70%     25727         942590888
# 4  Short offline       Completed without error       00%     25725         -
# 5  Extended offline    Completed: read failure       60%     25725         1239633904
# 6  Extended offline    Completed without error       00%     13953         -
# 7  Extended offline    Interrupted (host reset)      10%       547         -

 

But its stopping at different points (which more or less correspond to the sectors that unraid is reporting the read errors

 

The weird thing is that no sectors are reallocated, and none at pending.

 

I suppose one possibility could be power related - loose cable or weak supply causing the aborts / read errors.  You've got nothing to lose by reseating the power cable and trying it again (make sure that spin down is disabled for the drive)

 

You might want to check this out regarding dd and that failure.  http://www.linuxquestions.org/questions/linux-newbie-8/smartctl-read-failure%3B-is-my-hd-failing-920243/

 

 

But, myself I wouldn't trust the drive.  If only because the LBA of the first error is finding changes, and no reallocated sectors or pendings are happening.

Link to comment

If I'm reading the smart results correctly, the drive is aborting itself the extended tests because of read failures:

 

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       70%     25746         1239633904
# 2  Extended offline    Completed: read failure       70%     25729         1239633904
# 3  Extended offline    Completed: read failure       70%     25727         942590888
# 4  Short offline       Completed without error       00%     25725         -
# 5  Extended offline    Completed: read failure       60%     25725         1239633904
# 6  Extended offline    Completed without error       00%     13953         -
# 7  Extended offline    Interrupted (host reset)      10%       547         -

 

But its stopping at different points (which more or less correspond to the sectors that unraid is reporting the read errors

 

The weird thing is that no sectors are reallocated, and none at pending.

 

I suppose one possibility could be power related - loose cable or weak supply causing the aborts / read errors.  You've got nothing to lose by reseating the power cable and trying it again (make sure that spin down is disabled for the drive)

 

You might want to check this out regarding dd and that failure.  http://www.linuxquestions.org/questions/linux-newbie-8/smartctl-read-failure%3B-is-my-hd-failing-920243/

 

 

But, myself I wouldn't trust the drive.  If only because the LBA of the first error is finding changes, and no reallocated sectors or pendings are happening.

 

I'd think it was a power issue, but I ran the tests multiple times and it always fails at one of 2 sectors (and all my other array drives were spun down - cache may have been active).  After all the data is moved I might throw spinrite at the drive to see what it says (after I get a new drive as this one is out of warranty).

 

Thanks for the reply!

Link to comment

Also, if this nearly 3 year old (2.94 years) drive is still inside it's warranty period I would start an advanced RMA process. This is where they ship you a replacement drive first and then you ship back your drive. You will need to secure it with a credit-card, so ONLY IF you don't return the defective drive you would be charged. You will not be charged if you return your drive by the date given, typically 30 days. I haven't had issues with this ever.

 

Nevermind, looks like the drive is no longer under warranty, ended in September 2014 -- http://wdsupport.wdc.com/warranty/warrantycheck.asp?custtype=end

 

In my experience, once the drive shows READ FAILURE during smart tests (short or long) it's down hill from there.

 

Was super happy to discover it's 6 months out of warranty when I checked - thanks for taking the time to check too!  And for the advice.  Much appreciated.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...