nia Posted July 17, 2010 Share Posted July 17, 2010 Hi. I've done some seaching, but have not found exactly what I was looking for. I want to replace one of the three oldest 400GB disks with a 1TB disk. I'm looking for advice as to which is the best to replace. Attached are the smart reports from the candidates. Also, I have a recent incident, which is attached and in part quoted here. Part of the error here: Jul 15 18:04:33 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jul 15 18:04:33 Tower kernel: ata6.00: failed command: SMART Jul 15 18:04:33 Tower kernel: ata6.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 Jul 15 18:04:33 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/10 Emask 0x4 (timeout) Jul 15 18:04:33 Tower kernel: ata6.00: status: { DRDY } Jul 15 18:04:33 Tower kernel: ata6: hard resetting link Jul 15 18:04:33 Tower kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 15 18:04:33 Tower kernel: ata6.00: configured for UDMA/100 Jul 15 18:04:33 Tower kernel: ata6: EH complete According to other forums, this can be as simple as a bad SATA cable or loose connection. However, it may be also pointing towards which disk to replace? Which of my disks does the error point to anyway? Can anyone decipher the SMART reports and the above and suggest what to replace? TIA Niels sdi.txt sdh.txt sdf.txt ata6-exception.txt Link to comment
Kaygee Posted July 17, 2010 Share Posted July 17, 2010 No syslog so cant respond as to which disk it thinks ATA6 is. First try reseating the cables. bad connection means "timeouts" are to be expected. Next try using the following boot commands especially if you have an old motherboard: in syslinux cfg file bzroot acpi=off noapic Try replacing the sata cables. SMART reports all show healthy drives. Link to comment
nia Posted July 17, 2010 Author Share Posted July 17, 2010 Thanks for the quick and useful response I will take the disk with the most hours logged then. Attached is the syslog - sorry I didn't think it was needed for this My hardware setup is in the signature. I suspect the cable, as the MoBo apparently is normally used without problems. I had the case opened yesterday, so I may have pushed a cable somewhere - i will reseat everything. syslog-2010-07-17.txt Link to comment
Kaygee Posted July 17, 2010 Share Posted July 17, 2010 SAMSUNG_HD401LJ_S0HVJ1KL901268 on Port 2 of the Silicon Image 3114 card. Link to comment
nia Posted July 17, 2010 Author Share Posted July 17, 2010 SAMSUNG_HD401LJ_S0HVJ1KL901268 on Port 2 of the Silicon Image 3114 card. Thanks - it's also the oldest one with 14793 hours under the belt. So I'll replace this one AND the cable Link to comment
BRiT Posted July 17, 2010 Share Posted July 17, 2010 One should also look at the pending sector replacement and sectors replaced counts in addition to power on hours. Link to comment
nia Posted July 17, 2010 Author Share Posted July 17, 2010 They are alike on ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0 However, they differ here: One is 197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0 the other two 197 Current_Pending_Sector 0x0012 253 100 000 Old_age Always - 0 What's best on the "Worst": 100 or 253? Link to comment
Joe L. Posted July 17, 2010 Share Posted July 17, 2010 They are alike on ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0 However, they differ here: One is 197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0 the other two 197 Current_Pending_Sector 0x0012 253 100 000 Old_age Always - 0 What's best on the "Worst": 100 or 253? 253 is the initial value given the parameter by the manufacturer. It will probably be set to 100 like the others once you get a few hours on the drive. It is not an indication of a problem, but of a brand new drive too new to have "worst" normalized values set. Link to comment
nia Posted July 17, 2010 Author Share Posted July 17, 2010 Thanks Joe, but these drives all have 11.000+ hours under their belts, so the are not really new anymore... However, as none of these has problems as you state, I will take the oldest 14k+ hrs out, and then let unRAID do its magic. I would assume that I simply close the server down, replace the 400G drive with the new 1T drive, power up, and then unRAID will have identified the replacement and ask if I want to rebuild? Link to comment
Joe L. Posted July 17, 2010 Share Posted July 17, 2010 Thanks Joe, but these drives all have 11.000+ hours under their belts, so the are not really new anymore... Then the drive manufacturer is not changing the "normalized" value unless the "raw" attribute for re-allocated sectors starts incrementing. Your examples from those drives look fine. However, as none of these has problems as you state, I will take the oldest 14k+ hrs out, and then let unRAID do its magic. I would assume that I simply close the server down, replace the 400G drive with the new 1T drive, power up, and then unRAID will have identified the replacement and ask if I want to rebuild? That is it exactly. Remember to press the "Start" button to begin the re-construction of the old contents onto the new drive. If you are on unRAID 4.5.3 or previous, DO NOT PRESS THE BUTTON LABELED "restore" as it has nothing to do with re-construction of a replacement disk. In 4.5.4 onward, through 4.5.6 the button no longer exists and was replaced by an equivalent "initconfig" command line command. (Initialize Configuration and Immediately Invalidate Parity). It is NOT what you want to do when you are replacing a drive. Link to comment
nia Posted July 18, 2010 Author Share Posted July 18, 2010 That is it exactly. Remember to press the "Start" button to begin the re-construction of the old contents onto the new drive. I did in preperation do an upgrade to the latest version 4.5.6 and a complete parity check. Now the regeneration is humming happily on - looking very convincing so far Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.