TODDLT Posted June 23, 2015 Share Posted June 23, 2015 Would any of the errors on the attached screenshot worry you? EDIT - added clipped image of MyMain screen and smart test of the drive. I think the ST3000DM001 drives worry me the most. It seems after they came out for while he complaints about failures about a year in started to grow. I haven't had any issues yet. The high fly writes showed up very quickly but grew very slowly. However Disk 3 now has 2 new types of errors. runtime bad block =3 | udma crc error count =22663. I've never seen either of these before. The other 3 of this type only have the high fly write counts. I hate to replace my 4 newest drives, but i really hate to see a couple go bad at once. Any thoughts? sdd.txt Link to comment
SSD Posted June 23, 2015 Share Posted June 23, 2015 In a word - no. The high fly writes seem unique to Seagate and may be related to vibration. I have never seem them cause a problem. I don't think I've ever seen them much above what you see here. The load cycle counts are common with WD drives. I am not sure at what point they start to reach the manufacturer limit. There is a tool - WDIDLE? - that you can use to alter the frequency at which the heads are parked and dramatically reduce the growth. myMain shows them based on a formula with power on hours. 50K is not a problem, but at least you know they are stacking up and cause you to research how to stop them before they get a lot higher. The udma_crc_error is usually a sign of a bad cable. You value is about the highest I've ever seen! If it is continuously increasing with each parity check, you have a bad cable. Once the cable is corrected the values should cease to increase further. The runtime_bad_block is probably the worse one. I would not worry about a value like 3. If it starts to increase you might have a problem. With myMain, each of those errors are hyperlinks. If you click on one it will bring up the drive settings page with values to set an attribute warning level at the value reported. If you save it you will never see that warning again, and it will only show if the value increases. Do them one at a time. For something like load_cycle_count, you'd actually want to boost the value at which it reports again. Just increase the number to 100000 or something. For the others I would not change it. Once you do them all and save them, the smart view should be "clean" until an attribute increases. (There is a "raw" refresh button that will ignore the overrides and show you all issues on all drive.) Note that the yellow highlights on these smart issues is telling you they are warnings, and probably not super serious. If they show up in red, the attribute failure is more serious. It can still be overridden, though. Link to comment
TODDLT Posted June 23, 2015 Author Share Posted June 23, 2015 Thanks, I'll shut down and re-seat the cable, then see what happens next time it runs a parity check. Link to comment
RobJ Posted June 23, 2015 Share Posted June 23, 2015 Thanks, I'll shut down and re-seat the cable, then see what happens next time it runs a parity check. I doubt if re-seating a cable will help with CRC errors. If you are going to the trouble of shutting down and opening the case, I'd REPLACE the cable not re-seat it. And if that stops the UDMA CRC errors, I'd throw that cable away. Link to comment
TODDLT Posted June 24, 2015 Author Share Posted June 24, 2015 Thanks, I'll shut down and re-seat the cable, then see what happens next time it runs a parity check. I doubt if re-seating a cable will help with CRC errors. If you are going to the trouble of shutting down and opening the case, I'd REPLACE the cable not re-seat it. And if that stops the UDMA CRC errors, I'd throw that cable away. OK, thanks for that. I was just looking at my drive map trying to remember which drives were on the SAS cables and which ones were individual SATA's. Based on the drive position in the case, either is an individual cable or my older SAS cable. (maybe 5 years or so). I don't have a spare SAS (and perhaps should remedy that anyway). I'll be opening the case tonight and find out. Thanks for the heads up. Link to comment
SSD Posted June 24, 2015 Share Posted June 24, 2015 Not to disagree with my friend RobJ, but the need to replace the cable has not been established. You may find the connection slightly loose. I've even seen exchanging cables with a neighboring drive fix subtle issues. If I had a spare cable I'd certainly consider swapping it, but don't know that you need to buy a new one without trying to get the old one working first. Link to comment
c3 Posted June 24, 2015 Share Posted June 24, 2015 Not to disagree with my friend RobJ, but the need to replace the cable has not been established. You may find the connection slightly loose. I've even seen exchanging cables with a neighboring drive fix subtle issues. If I had a spare cable I'd certainly consider swapping it, but don't know that you need to buy a new one without trying to get the old one working first. I think RobJ's thinking, might be, if the cable came loose once, it will do it again. Using a proper latching cable will solve the problem permanently. It's so common a problem with WD they even have a doc about it. Link to comment
garycase Posted June 24, 2015 Share Posted June 24, 2015 ... Using a proper latching cable ... Agree. More than the age of the cable, see if it has latching connectors. If not, I'd replace it with one that does. Link to comment
RobJ Posted June 24, 2015 Share Posted June 24, 2015 I certainly can be wrong, and always happy to learn from it! My thinking was that CRC errors (especially such a high number if recent) are indicative of communication errors across the cable, a fully connected cable. (It's rarer but could also be poor power type issues.) The errors from a loose connection (power or SATA) tend to be of the disconnection/reconnection type, with PHY change errors in the syslog, numerous resets, slower operation, and even the possibility of a disabled drive, dropped after a longer disconnection. Link to comment
TODDLT Posted June 25, 2015 Author Share Posted June 25, 2015 I certainly can be wrong, and always happy to learn from it! My thinking was that CRC errors (especially such a high number if recent) are indicative of communication errors across the cable, a fully connected cable. (It's rarer but could also be poor power type issues.) The errors from a loose connection (power or SATA) tend to be of the disconnection/reconnection type, with PHY change errors in the syslog, numerous resets, slower operation, and even the possibility of a disabled drive, dropped after a longer disconnection. I have an HX750 power supply and based on my understanding of consumption and the drive count I have, I believe I have more than ample head room with power. I pulled the case today. I checked all cable (power and sata) to make sure they were seated, everything looked fine. The cable was a single latching type but it had a 90 elbow pointing down but the cable flipped 180 and ran back up. It wasn't a hard pinch but it was pretty tight. I replaced it with a latching type not having the elbow. In fact, I'm going to order some more cable to replace a couple others with similar configurations, and one of my SAS cables does not have latches so I'll take care of that too. A few $$ now so i'm not chasing this in the future. Link to comment
TODDLT Posted July 1, 2015 Author Share Posted July 1, 2015 all non latching cables replaced tonight. a parity check will run after midnight, so interested to see if anything new or higher counts show. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.