DoeBoye Posted April 18, 2011 Share Posted April 18, 2011 Every time we've encountered a problem with one of our servers in the data center randomly rebooting, it was usually due to a faulty power supply. The voltage drops and the system thinks someone hit the reset button. We start there and usually end there. If that doesn't fix it, memory was the second culprit. Heating has, for us, never been a problem. +1 for power supply. I know you've heard it a bunch of times already, but that's what I'd swap out first. I know it can be a pain in the ass to start replacing parts till the problem is cured, but sometimes that's the only solution. If you don't want to buy a new one before you know, maybe harvest one from another system just to test... As far as the CPU is concerned, if there's no problem with the fan, it's possible the heatsink isn't seated properly.... Any chance you swapped out the processor or the heatsink and fan lately? Quote Link to comment
Mailman74 Posted April 18, 2011 Author Share Posted April 18, 2011 This syslog appears to be taken after a restart. Try to get one before a restart. That is weird because I had the sys log running before my server crashed. Quote Link to comment
dgaschk Posted April 18, 2011 Share Posted April 18, 2011 The syslog shows journal transactions being replayed. This happens after and un-clean shutdown. Try running reiserfsck on the drives /dev/md1 - /dev/md3 Quote Link to comment
SSD Posted April 18, 2011 Share Posted April 18, 2011 The PSU takes a ton of abuse in this forum. If the PSU is severely underpowered, it should power itself down in an overload state when trying to spin up all of your drives on power up, or when doing some dramatic spinups in unRAID (like stopping the array or starting a parity check will spin up all drives if they are spun down). If it is not severely underpowered, likely the ebbs and flows of power demands would allow all the drives to power up, albeit a bit more slowly than usuual. If a PSU dies, my experience (and I've had my share), is that they die like a light bulb. There is (or used to be) an internal fuse in the PSU that blows. I've tried replacing that fuse and never had luck reviving a dead one for more than 30 seconds. I'm not saying yours might not be doing flakey things, but this has just been my experience in building and troubleshooting my own (the first PC I built was an 8MHz 8088). Random power downs when the server is basically idling is not normally a PSU problem. Cabling problems are the #1 greatest cause of flakey behavior. My first step would be to unplug and replug, securely, every connection. If it keeps happening, my money is on a failing capacitor on the motherboard. Quote Link to comment
PeterB Posted April 18, 2011 Share Posted April 18, 2011 If a PSU dies, my experience (and I've had my share), is that they die like a light bulb. I'm not saying that it's the case here, but I've known PSUs suffer a variety of 'soft' failures. As an example, in one case, the ethernet interface was intermittently failing to initialise and, in all other respects, the machine appeared to be operating normally. Replacing the PSU solved the problem with the ethernet interface, and the machine has continued to operate normally for several years. That machine (still with original mobo/processor), purchased in 2002, is still going strong on its third PSU (the second one did die a 'sudden death' shortly after its arrival in Philippines). Oh, and talking of first 'PC' built - mine was in 1978 - a 4MHz INS8060, with 256bytes of memory. Input was from a hexadecimal keypad, and output was on an 8-digit 7-segment display. Quote Link to comment
DoeBoye Posted April 18, 2011 Share Posted April 18, 2011 I like ruling out the PSU first because: It's often the culprit A compatible working tester PSU can be found in many other systems, unlike CPUs, ram etc... If one is available, it doesn't hurt to swap it out and confirm or rule out the PSU as the issue. A flaky PSU can still fail, whether the system is at idle or under load (Though I agree, it does tend to usually happen under load... Mind you, the OP *does* state that the issue was usually noticed while recording tv or ripping video...) Quote Link to comment
Mailman74 Posted April 18, 2011 Author Share Posted April 18, 2011 This syslog has been running all day. syslog2.txt Quote Link to comment
dgaschk Posted April 18, 2011 Share Posted April 18, 2011 What do these commands show: ntpq -p hwclock;date Quote Link to comment
mbryanr Posted April 18, 2011 Share Posted April 18, 2011 set_rtc_mmss: errors http://nixcraft.com/centos-rhel-fedora/13458-set_rtc_mmss-cant-update-59-0-error.html What does set_rtc_mmss: can't update from 54 to 5 mean? The function set_rtc_mmss() updates minutes and seconds of the CMOS clock from system time. It does not update the hour or date to avoid problems with timezones.[1] The message shown was added to make users and implementers aware of the problem that not all time updates will succeed. Imagine the system time is 17:56:23 while the CMOS clock is already at 18:03:45. Updating just minutes and seconds would set the hardware clock to 18:56:23, a wrong value. The solution for this problem is either to wait a few minutes, or to install a kernel patch that fixes the problem. Normally a wrong time in the hardware clock will not show up until after reboot, or maybe after APM slowed down your system. Quote Link to comment
Mailman74 Posted April 24, 2011 Author Share Posted April 24, 2011 A couple random shut downs today and no sys log running. But I noticed after restarting server that my pc gets a blue screen of death. Wonder if there is something in my network causing this? I have a DLink DIR 655 router that is going to a 5 and 8 ports gigabit switch that goes to each room and server. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.