Jump to content

Disk replacement strategy - how to determine best replacement candidate


nia

Recommended Posts

Hi.

 

I've done some seaching, but have not found exactly what I was looking for.

 

I want to replace one of the three oldest 400GB disks with a 1TB disk. I'm looking for advice as to which is the best to replace.

 

Attached are the smart reports from the candidates.

 

Also, I have a recent incident, which is attached and in part quoted here.

Part of the error here:

Jul 15 18:04:33 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Jul 15 18:04:33 Tower kernel: ata6.00: failed command: SMART

Jul 15 18:04:33 Tower kernel: ata6.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0

Jul 15 18:04:33 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)

Jul 15 18:04:33 Tower kernel: ata6.00: status: { DRDY }

Jul 15 18:04:33 Tower kernel: ata6: hard resetting link

Jul 15 18:04:33 Tower kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jul 15 18:04:33 Tower kernel: ata6.00: configured for UDMA/100

Jul 15 18:04:33 Tower kernel: ata6: EH complete

 

According to other forums, this can be as simple as a bad SATA cable or loose connection. However, it may be also pointing towards which disk to replace? Which of my disks does the error point to anyway?

 

Can anyone decipher the SMART reports and the above and suggest what to replace?

 

TIA

 

Niels

sdi.txt

sdh.txt

sdf.txt

ata6-exception.txt

Link to comment

No syslog so cant respond as to which disk it thinks ATA6 is.

 

First try reseating the cables. bad connection means "timeouts" are to be expected.

 

Next try using the following boot commands especially if you have an old motherboard:

 

in syslinux cfg file

bzroot acpi=off  noapic

 

Try replacing the sata cables.

 

SMART reports all show healthy drives.

 

 

Link to comment

Thanks for the quick and useful response  :) I will take the disk with the most hours logged then.

 

Attached is the syslog - sorry I didn't think it was needed for this  ::)

 

My hardware setup is in the signature. I suspect the cable, as the MoBo apparently is normally used without problems. I had the case opened yesterday, so I may have pushed a cable somewhere - i will reseat everything.

syslog-2010-07-17.txt

Link to comment

They are alike on

 

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct  0x0033  253  253  010    Pre-fail  Always      -      0

However, they differ here:

One is

197 Current_Pending_Sector  0x0012  253  253  000    Old_age  Always      -      0

 

the other two

197 Current_Pending_Sector  0x0012  253  100  000    Old_age  Always      -      0

What's best on the "Worst": 100 or 253?  ???

Link to comment

They are alike on

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -       0

However, they differ here:

One is

197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -       0

 

the other two

197 Current_Pending_Sector  0x0012   253   100   000    Old_age   Always       -       0

What's best on the "Worst": 100 or 253?  ???

253 is the initial value given the parameter by the manufacturer.  It will probably be set to 100 like the others once you get a few hours on the drive.

 

It is not an indication of a problem, but of a brand new drive too new to have "worst" normalized values set.

Link to comment

Thanks Joe, but these drives all have 11.000+ hours under their belts, so the are not really new anymore...  ;)

 

However, as none of these has problems as you state, I will take the oldest 14k+ hrs out, and then let unRAID do its magic.

 

I would assume that I simply close the server down, replace the 400G drive with the new 1T drive, power up, and then unRAID will have identified the replacement and ask if I want to rebuild? 

Link to comment

Thanks Joe, but these drives all have 11.000+ hours under their belts, so the are not really new anymore...  ;)

Then the drive manufacturer is not changing the "normalized" value unless the "raw" attribute for re-allocated sectors starts incrementing.  Your examples from those drives look fine.

However, as none of these has problems as you state, I will take the oldest 14k+ hrs out, and then let unRAID do its magic.

 

I would assume that I simply close the server down, replace the 400G drive with the new 1T drive, power up, and then unRAID will have identified the replacement and ask if I want to rebuild? 

That is it exactly.  Remember to press the "Start" button to begin the re-construction of the old contents onto the new drive. 

 

If you are on unRAID 4.5.3 or previous, DO NOT PRESS THE BUTTON LABELED "restore" as it has nothing to do with re-construction of a replacement disk.  In 4.5.4 onward, through 4.5.6 the button no longer exists and was replaced by an equivalent "initconfig" command line command. (Initialize Configuration and Immediately Invalidate Parity).  It is NOT what you want to do when you are replacing a drive.

Link to comment

That is it exactly.   Remember to press the "Start" button to begin the re-construction of the old contents onto the new drive. 

I did in preperation do an upgrade to the latest version 4.5.6 and a complete parity check. Now the regeneration is humming happily on - looking very convincing so far  :)

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...