September 7, 200916 yr My array seems to have some sort of ailment that is causing it to develop persistent errors. I've had a number of drive failures of late that I think are legacy issues from a power issue that i had a few weeks ago. It seems that every time I replace a drive another develops errors shortly after. As can be seen from the attached syslog, there are all sorts of errors cropping up. Any insights would be very welcome... Syslog here: http://pastebin.ca/1557136
September 7, 200916 yr Run smartctl reports on your drives. See troubleshooting link in my sig for instructions.
September 7, 200916 yr Author Thanks for that. As far as I can tell from the instructions, the smartctl report from the latest drive to report an error looks ok.
September 7, 200916 yr My initial thought is that you have a cabling problem from the SATA port to your drive. I would replace the SATA cable with a known good cable and see if the problem resolves.
September 8, 200916 yr Author Well, I checked the cables and made sure they were all in good order. The array failed again not long after. It still keeps reporting missing disks. It seems to just be the 750gb drives that it has a problem with, and 4 of them are reporting high raw read error rates. The syslog doesn't look too good either... syslog: http://pastebin.com/m5063b90b
September 10, 200916 yr Sorry, I started a post yesterday, then was interrupted, then my machine crashed. I'll try to give more detail later, but for now a bit of summary. You have serious problems with either the motherboard, heat problems, or power problems. I saw no indication of drive issues, that is, there was no evidence of anything physically wrong with the drives. On both syslogs, all of the drives connected to the motherboard SATA ports were rapidly disabled, within 4 minutes of the first one being disabled. Since the system ran fine up to the point of this catastrophe, in both syslogs, then all 4 drives were rapidly and completely lost, that seems to rule out bad cables (this time!). All subsequent syslog errors related to those 4 drives can be ignored. There was also a simultaneous problem with 3 drives in one of the syslogs, looked like a simultaneous hotplug event. Two of the drives were on one Promise card, the other was on the other Promise card, so perhaps there may be a common backplane with these 3 drives, that vibrated loose, or a power cable to it vibrated loose? More later ...
September 11, 200916 yr Author Thanks for the reply Rob, based on your post I think it may well be a faulty motherboard. It seems to be an ongoing issue that was caused originally by this problem: http://lime-technology.com/forum/index.php?topic=4095.0. There are two Promise TX4 cards in the array, one of which is only about 2 weeks old. There is also a new power supply as well. If you're saying you don't think the drives themselves are faulty, then that pretty much only leaves the motherboard doesn't it?
September 11, 200916 yr then that pretty much only leaves the motherboard doesn't it? Or the memory... or memory voltage, timing, or clock speed settings.... Premium memory often requires higher voltage, or different timing than set by the BIOS. If the memory is defective, or set up incorrectly, nothing else will work properly, no matter what you try. (Of course, if you have verified the BIOS settings for your specific memory-strips, run the memory test and it passes, the motherboard might be defective as you said)
September 11, 200916 yr Thanks for the reply Rob, based on your post I think it may well be a faulty motherboard. It seems to be an ongoing issue that was caused originally by this problem: http://lime-technology.com/forum/index.php?topic=4095.0. There are two Promise TX4 cards in the array, one of which is only about 2 weeks old. There is also a new power supply as well. If you're saying you don't think the drives themselves are faulty, then that pretty much only leaves the motherboard doesn't it? A full memory test is always a good idea. Did you have any of these problems before installing this new power supply? It could be defective ... Of course, make sure all power cables are well seated. Try re-seating all of them, including the main motherboard ones. Heat is still a possible suspect, although the fact that you completed both a parity build and parity check without issue would seem to point away from heat issues. With the system on, and after touching the metal case, try touching the back of a finger to the main motherboard chipsets, or the heat sinks covering them. They should feel warm to hot, but not too hot to touch. Deciding you have a defective motherboard is usually a matter of eliminating every other possible cause first.
September 12, 200916 yr Author The new power supply was fitted after the problems occurred. Also, the room that the array sits in has an open (shielded) window to the cold Scottish outdoors, which means that even when there is lots of activity the drive temperatures rarely rise above 30 degrees, even in "summer"...
Archived
This topic is now archived and is closed to further replies.