Server stops, blank screen with no indications (SOLVED)


kizer

Recommended Posts

I'm starting to have issues and decided I needed to upgrade so I've upgraded to 6.1.16 hoping to figure out what my problem may be. There has been a few times where my server just becomes unresponsive and I literally have to unplug it. I could of sworn 6.1 added Diagnostics so we can see what failed. Does it save failures on the flash?

Link to comment

Thanks for the response trurl

 

Nothing. I plug in my monitor and there is absolutely nothing. Its like the Mobo/Video Card just isn't responding.

No ping response, Nothing on the Monitor.  What is weird thou. I've upgraded to 6 twice from 5. Both times I've upgraded it had this same issue almost within the first 12hours. When I'm on 5 It can go for weeks or even months without a problem then all of the sudden blam. Oddly enough it has never had an issue doing a Parity check. I know I'm going from a 32bit to a 64bit version and several changes under the hood, but its just so odd.

 

I noticed this AM I was moving a few files onto Disk 2 and all of the sudden nothing. Also oddly enough the last time it did this I was copying some files off Disk 2 and it had this issue. I guess maybe a SMART test of Disk 2 is needed. It just happens to be the most used disk on the server too. Seems like maybe everything should get tested. lol

 

I'm currently running a Memtest and its been going for over 90minutes with no errors. I also have a twin to this server in a closet and thought I could pull it out and exchange all my drives and the SATA controller card and see if has any issues.

 

Without any indication of what it can be I kinda feel like I'm chasing my shadow since nothing so far to date has given me any clues. Last time I pulled my Flash stick out windows said it would like to repair it. Other than a power loss unless the stick is dieing I can't think of a reason it would need to.

Link to comment

Sorry to hear about this kind of "tough to diagnose" problem.  I had different, but similar weird problems with my first system.  Turned out to be the motherboard.  Given you have a duplicate system, I'd swap the mobo (vs the drives) and keep everything else the same.  If it works, then you know its the MOBO...  My philosophy, elimination of discrete components is the key.

Link to comment

@Jeffrey its kinda what I'm thinking. Its been in Service for 4years and 3 of those years its been rock solid. When a system becomes completely unresponsive even with no video it tells me that it has to be the motherboard or the video card. Currently I have a Daughter card plugged into an expansion port so I'm also wondering if that could be causing problems. I suppose I could request a new Flash drive license, swap over all my hardware (minus) Drives and go from there.

 

I'll check the memory test later on today, which would be a good 8+ hours of running and see if there are any errors at all.

Link to comment

What really stumps me thou and is has a few times. When I upgrade to V6 from V5 it has this problem a lot faster than if I just stayed with V5. Its almost like there is something in V6 that accelerates this issue. Don't get me wrong I'm not saying V6 is the problem, but just confuses the heck out of me. Lol

Link to comment

Welp 14 Hours and 12 Passes of MemTest and  0 Errors.

 

Odd, Now my UPS is reporting that its lost communication.  :o

Dec 8 21:14:19 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 21:24:19 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 21:34:19 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 21:44:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 21:54:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 22:04:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 22:14:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 22:24:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 22:26:48 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog

Dec 8 22:34:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 22:44:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 22:54:20 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 23:04:21 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 23:14:21 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 23:24:21 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 23:34:21 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 23:44:21 Tower apcupsd[27250]: Communications with UPS lost.

Dec 8 23:54:21 Tower apcupsd[27250]: Communications with UPS lost.

Dec 9 00:04:21 Tower apcupsd[27250]: Communications with UPS lost.

Dec 9 00:14:22 Tower apcupsd[27250]: Communications with UPS lost.

 

 

My box is currently doing a parity check again. *Sigh* I sware this thing has to do them constantly when coming up from whatever this is. Could the lack of communication be a USB bus issue? Possibly a Mobo problem?

 

I also pulled my Cache Drive just incase it's causing any possible issues.

Link to comment

Ok, This Morning I told my system to grab 300 or so gig of files specifially off disc 2 to my Windows Desktop. I chose Disk2 because in the past I've noticed a pattern of my system freezing or doing this when I pick from Disc2 or at least I'm thinking it has. Ran a Quick Smart Test it shows nothing wrong. Well came home and found my Expansion card with lit up lights on the first channel. Oddly enough the Disc I was pulling from isn't even using it. Like usual nothing in on Screen Log file or no other log file to pull from. PC screen is black. Disc2 is still spinning or at least it feels like it is.  Keyboard doesn't respond, simply nothing does.

 

I can only conclude either Disc2 is causing this issue or the mother board is dieing. I'm going to let this Parity check complete and then I'm going to hammer the heck out of another drive by copying files and see if the problem repeats. If it does I'm going to guess its not Disc2 and it must be something system wide like the MotherBoard/PowerSupply or Sata Expansion Card. Maybe even my USB stick. Heck I don't know lol .

IMG_0055.JPG.adf52ddd69b8f10749199276c600747f.JPG

Link to comment

Ok, confirmed something today.

Yesterday I tried to transfer 300GB of data from Disc2 my server to my Windows machine. Froze

Just tried to copy 300GB from Disc3 to my windows machine and it froze

Its not just specific to Disc2, which I can finally put that to rest.

 

Ram passed a Memtest, Couple of drives copied froze so I'm going to say it has to be either Mobo, power supply or USB

So I'll swap out the PowerSupply, Mobo and work on a USB swap since I built a double of this machine for a friend who never claimed it. Spare parts for me..  ;)

 

Sorry everybody for my Rambling on the Thread. Just looking for anybody with Ideas or whom ever might of faced this craziness with me. I'll get it eventually. One thing I don't get thou is why doesn't it ever fail on Parity Checks. Only seems to be happening with Data Transferring. Suppose it could be my on board NIC, but that's a stretch......

Link to comment

... Suppose it could be my on board NIC, but that's a stretch......

You could try a large transfer between drives from the console and eliminate the network completely.

 

Very true. Glad you guys are throwing things at me. ;) I'll try that and see what explodes.

Pulled the USB stick so I know its not the stick failing. Lol

 

I'll just swap the Mobo/PowerSupply.

Link to comment

Since I upgraded to v6 I also get this, the pattern for me has been after periods of inactivity, for example if I go away for a weekend. The server becomes unresponsive to telndt/SSH, gui etc, however it does respond to ping.

As yet I can't find anything in logs to diagnose it. It's happened maybe 4 times in 3 months.

Link to comment

Mine doesn't respond to anything at all. Yesterday Mine gave me a "CMOS Checksum Error" which means battery and its having issues posting so I know for a fact now that my Motherboard is going out.

 

However like you said I noticed it was happening more since the upgrade. Both times I've attempted to upgrade it had this issue, but I don't know if something has changed that accelerated it or in my case just dumb luck.

Link to comment

Welp. Finally figured it out I believe. Replaced my mother board and the problem appears to have gone away. Oddly I swapped some cables around on my Sata Expansion card and then it labeled my only disc on it bad. Unassigned the disk and it rebuilt and seems perfectly fine. Not sure what was with that, but I know my Parity was good so I just took a leap of faith.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.