Disk replacement strategy - how to determine best replacement candidate - Hardware

July 17, 201015 yr

Hi.

I've done some seaching, but have not found exactly what I was looking for.

I want to replace one of the three oldest 400GB disks with a 1TB disk. I'm looking for advice as to which is the best to replace.

Attached are the smart reports from the candidates.

Also, I have a recent incident, which is attached and in part quoted here.

Part of the error here:
Jul 15 18:04:33 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Jul 15 18:04:33 Tower kernel: ata6.00: failed command: SMART

Jul 15 18:04:33 Tower kernel: ata6.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0

Jul 15 18:04:33 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)

Jul 15 18:04:33 Tower kernel: ata6.00: status: { DRDY }

Jul 15 18:04:33 Tower kernel: ata6: hard resetting link

Jul 15 18:04:33 Tower kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jul 15 18:04:33 Tower kernel: ata6.00: configured for UDMA/100

Jul 15 18:04:33 Tower kernel: ata6: EH complete

According to other forums, this can be as simple as a bad SATA cable or loose connection. However, it may be also pointing towards which disk to replace? Which of my disks does the error point to anyway?

Can anyone decipher the SMART reports and the above and suggest what to replace?

TIA

Niels

Quote

July 17, 201015 yr

No syslog so cant respond as to which disk it thinks ATA6 is.

First try reseating the cables. bad connection means "timeouts" are to be expected.

Next try using the following boot commands especially if you have an old motherboard:

in syslinux cfg file

bzroot acpi=off noapic

Try replacing the sata cables.

SMART reports all show healthy drives.

Quote

July 17, 201015 yr

Author

Thanks for the quick and useful response I will take the disk with the most hours logged then.

Attached is the syslog - sorry I didn't think it was needed for this ::)

My hardware setup is in the signature. I suspect the cable, as the MoBo apparently is normally used without problems. I had the case opened yesterday, so I may have pushed a cable somewhere - i will reseat everything.

syslog-2010-07-17.txt

Quote

July 17, 201015 yr

SAMSUNG_HD401LJ_S0HVJ1KL901268 on Port 2 of the Silicon Image 3114 card.

Quote

July 17, 201015 yr

Author

SAMSUNG_HD401LJ_S0HVJ1KL901268 on Port 2 of the Silicon Image 3114 card.

Thanks - it's also the oldest one with 14793 hours under the belt. So I'll replace this one AND the cable

Quote

July 17, 201015 yr

One should also look at the pending sector replacement and sectors replaced counts in addition to power on hours.

Quote

July 17, 201015 yr

Author

They are alike on

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0

However, they differ here:

One is

197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0

the other two

197 Current_Pending_Sector 0x0012 253 100 000 Old_age Always - 0

What's best on the "Worst": 100 or 253?

Quote

July 17, 201015 yr

They are alike on

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0

However, they differ here:

One is

197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0

the other two

197 Current_Pending_Sector 0x0012 253 100 000 Old_age Always - 0

What's best on the "Worst": 100 or 253?

253 is the initial value given the parameter by the manufacturer. It will probably be set to 100 like the others once you get a few hours on the drive.

It is not an indication of a problem, but of a brand new drive too new to have "worst" normalized values set.

Quote

July 17, 201015 yr

Author

Thanks Joe, but these drives all have 11.000+ hours under their belts, so the are not really new anymore...

However, as none of these has problems as you state, I will take the oldest 14k+ hrs out, and then let unRAID do its magic.

I would assume that I simply close the server down, replace the 400G drive with the new 1T drive, power up, and then unRAID will have identified the replacement and ask if I want to rebuild?

Quote

July 17, 201015 yr

Thanks Joe, but these drives all have 11.000+ hours under their belts, so the are not really new anymore...

Then the drive manufacturer is not changing the "normalized" value unless the "raw" attribute for re-allocated sectors starts incrementing. Your examples from those drives look fine.

However, as none of these has problems as you state, I will take the oldest 14k+ hrs out, and then let unRAID do its magic.

I would assume that I simply close the server down, replace the 400G drive with the new 1T drive, power up, and then unRAID will have identified the replacement and ask if I want to rebuild?

That is it exactly. Remember to press the "Start" button to begin the re-construction of the old contents onto the new drive.

If you are on unRAID 4.5.3 or previous, DO NOT PRESS THE BUTTON LABELED "restore" as it has nothing to do with re-construction of a replacement disk. In 4.5.4 onward, through 4.5.6 the button no longer exists and was replaced by an equivalent "initconfig" command line command. (Initialize Configuration and Immediately Invalidate Parity). It is NOT what you want to do when you are replacing a drive.

Quote

July 18, 201015 yr

Author

That is it exactly. Remember to press the "Start" button to begin the re-construction of the old contents onto the new drive.

I did in preperation do an upgrade to the latest version 4.5.6 and a complete parity check. Now the regeneration is humming happily on - looking very convincing so far

Quote

Disk replacement strategy - how to determine best replacement candidate

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)