2 Smart failures?


woolooloo

Recommended Posts

I just pulled a pair of 250 GB SATA 3.0 Samsung drives out of my desktop that I have been using without problems for a couple years. I figured I'd throw one in the array and use it as a cache drive.  However, my new Adaptec SATA controller said it had a smart failure, and smartctl said it was failing now on spin up time.  I pulled the drive and stuck the other one into the same slot and got the exact same error.  I put a newer 500GB drive into the slot and there was no error.  I put the 250 into another slot on the same controller, same problem.

 

I just ran a short test on one of them and the results are below.  I guess I'm asking what the chances that both of these drives that have been working fine are both having the same exact failure at the same time?  Could it be that this model drive just has a problem with this particular test?  I don't want to use it as a cache drive if it is going to fail, but I don't want to toss a perfectly good drive either.

 

Another problem is the Adaptec card sits and waits for you to acknowledge the error before booting up, so I'm not sure if there is a way to get around that or not.

 

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG SP2504C
Serial Number:    S09QJ1UA125759
Firmware Version: VT100-33
User Capacity:    250,059,350,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Fri Nov  7 22:09:18 2008 GMT+5
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline
data collection:                 (4838) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp
ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  80) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   253   100   051    Pre-fail  Always       -
       0
  3 Spin_Up_Time            0x0007   001   001   025    Pre-fail  Always   FAILI
NG_NOW 30208
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -
       939
  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -
       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -
       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -
       0
  9 Power_On_Half_Minutes   0x0032   100   100   000    Old_age   Always       -
       141h+13m
10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -
       0
11 Calibration_Retry_Count 0x0012   253   002   000    Old_age   Always       -
       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -
       460
190 Unknown_Attribute       0x0022   139   127   000    Old_age   Always       -
       33
194 Temperature_Celsius     0x0022   139   127   000    Old_age   Always       -
       33
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -
       1415
196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -
       0
197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -
       0
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -
       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -
       0
200 Multi_Zone_Error_Rate   0x000a   253   100   000    Old_age   Always       -
       0
201 Soft_Read_Error_Rate    0x000a   253   100   000    Old_age   Always       -
       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -
       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA
_of_first_error
# 1  Short offline       Completed: unknown failure    90%     16947         -
# 2  Short offline       Completed: unknown failure    90%     16947         -
# 3  Short offline       Completed: unknown failure    90%     16947         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revis
ion number = 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Link to comment

The SMART test is done by the firmware on the drive itself... If it is saying the drive is taking too long to spin up to pass its test ,it usually indicates the drive is probably experiencing excessive friction in its bearings...  Of course, it could have been reporting this all along, even when brand new, and your old PC never bothered to tell you about it. Perhaps its disk controller did not care.

 

Or, If the 250GB drives are old drives, they might draw a lot more current from the power supply than the newer 500 Gig drive..  It might be stressing a marginal power supply..  I'd put the drive in another PC and download and use a manufacturers diagnostic disk to test it there. 

 

Disk drives are too cheap these days to trust your data to one that is marginal. 

(Although, sometimes we are very frugal, and want to use them until they crash, taking our data with them...)

 

Joe L.

Link to comment

Or, If the 250GB drives are old drives, they might draw a lot more current from the power supply than the newer 500 Gig drive..  It might be stressing a marginal power supply..   I'd put the drive in another PC and download and use a manufacturers diagnostic disk to test it there. 

 

Disk drives are too cheap these days to trust your data to one that is marginal.   

(Although, sometimes we are very frugal, and want to use them until they crash, taking our data with them...)

 

Joe L.

 

Thanks Joe. This drives makes 13 on a good quality 650W power supply, I guess if it has higher current requirements it could be stressing the system when they are all spinning up - or should 650W be ok for this?  In any case, I will put it back in my desktop and find some Samsung diagnostic tools as you suggest, that will put a different PS into play though

 

I guess I could pull half my drives and in my unRaid to reduce the load and see if it can power up without the error.

 

I agree with what you are saying about trusting the data to a bad drive though, it just seemed unlikely that both of these drives are bad, but they could be from the same lot, so who knows.

 

Since my unRaid is normally headless (I just installed a head since I installed the new Adaptec SATA card and wanted to make sure it was ok), do you know if there is a way to get automatically notified of SMART failures?

 

Thanks again.

Link to comment

This problem does seem to point to a PSU issue. Either those drives suck a huge amount of current at spinup, or the PSU could be defective and not delivering its full oomph.

 

If it is not the PSU, I am still a bit skeptical of 2 drive failures of exactly the same model and vintage failing with exactly the same symptoms as you have found.  Samsung drives have a reputation of being a bit quirky, and you could be hitting such a quirk with your controller.  (Search the forums and you'll find numerous issues with reporting temperature readings.)  I had some problems a few years back with incompatibility of 250G/300G drives with certain motherboards with certain SATA hard drives.  (I think these were Maxtor, but maybe it was WD).

 

Although I agree with the sentiment to not take unnecessary risks with an iffy drive, I would not just toss these out without doing more research and testing to determine if they work in other systems.  Although 250G is not terribly large by todays standards, it is a useful size to have in a "hand-me-down" PC or a test unRAID server.

 

Link to comment

Had a chance to play with this a bit.  First I pulled about 10 drives out to reduce the load on the PS and still had the same problem.  I really don't think it is related to the PS at this point.

 

I pulled the drive off my new PCIe Adaptec controller and put it on one of my PCI Promise controllers.  Smartctl passed and only showed that that test had failed in the past.  A short test also passed where short tests failed with an unknown error on the other controller.  Any ideas about why switching the controller would make a difference?

 

I'm going to throw one of the drives into a Windows box today and test it out, but I'm guessing at this point the drives are fine.

 

Unrelated - thank you to everyone who helped getting a clean shutdown on power button working, I finally got to test that today and it is great!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.