Random Shut Downs Getting Very Frustrating


Recommended Posts

My system has been suffering from random shutdowns for about the last 4 weeks.  It's been happening on average once a day, but I can go 2 or 3 days without one only to have 3 or 4 shutdowns in one day.  

 

I've got about 2 pages of notes, and over a dozen logs that I've recorded.  Most of the time nothing shows up in the tail of the logs, but sometimes it does show what appears to be a controlled shutdown.

 

Summary of what I've done for troubleshooting:

Disconnected the UPS and plugged right into the wall

Replaced the power supply

Tested, then replaced, then added more memory.

Created a virgin copy of unRAID on a new thumb drive and restored my configs.

Updated Motherboard BIOS

Started FCP Troubleshooting mode after (almost) every restart 

TELNET in and start " tail -f /var/log/syslog" after (almost) every restart

I've been over, up and down, and all around all the settings looking for something out of the ordinary

 

At the time the problem started, I had recently been trying to set up a share to use as a TimeMachine for my Mac systems, but couldn't get it to work right.  I used MC to go in and delete all the files from that share and went back to using it for a Carbon Copy Cloner backup instead.

 

The system currently consists of the following:

unRAID 6.4.1 (was on a 6 year old PNY 8GB thumbdrive, now on a 10+ year old unknown make 2GB thumbdrive)

ASUS M5A78L-M/USB3 Motherboard

AMD Sempron Single-Core 2.8 GHz Socket AM3 CPU

Crucial 2GB (2 x 1GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600)

A-Tech 2GB kit (1GBx2) DDR3 PC3-10600 DESKTOP Memory Modules (240-pin DIMM, 1333MHz) (these are brand new, purchased as a part of troubleshooting)

Thermaltake Smart 500W 80+ White Continuous Power ATX 12V V2.3/EPS 12V Active PFC Power Supply PS-SPD-0500NPCWUS-W (original power supply was a LEPA N Series N500-SA 500W ATX12V Power Supply)

5x Seagate Drives (variety now that I have added more drives and replaced 2 failed drives since building it)

 

This setup had worked flawlessly since I built it almost 4 years ago.

 

The next step I am thinking about taking is disconnecting the 2 oldest Seagate Drives to see if they might be causing serious power drains (one of the drives is over 10 years old and the other is about 8 years old.  The 10 yr old one is always reporting some sort of problem.  Neither of them are currently storing any files on the array.)

 

I sure could could use some thoughts or suggestions otherwise pretty soon I will have just built a brand new system.

 

Thanks  

 

 

 

FCPsyslog_tail 18 feb 1430.txt

FCPsyslog_tail 22 Feb 1100.txt

Edited by geofbennett
Forgot a few things
Link to comment

Looking at your setup, I would not expect there to be any issues with it.  You virtually have the most basic system that anyone might be running today.  

 

Having said that I most suspect a hardware issue. RAM, PS, Cooling would be the first thing I would be looking at.  Make sure that all of the fans are working and the case, filters and CPU heat sink fins  are not fill with dirt.   Make sure that all connectors are firmly seated into their sockets.  Run a memtst for a minimum of 24 hours.  (Memtst is an option in the unRAID Boot Menu.)   If all of everything seems alright with those checks and tests, I would start by trying a different PS.  (PS fail more than most people realize.) 

 

Also, use 4GB of RAM since you have it.  unRAID may boot with less but the 64 bit version and the kernel additions which have been made to Linux make more memory demands than earlier versions.

Link to comment

Yep, purpose built to be very basic, only serves as central storage for my media files, backups, and security camera files.

  

PS was the first thing I replaced.  New one has only been plugged in for a little over a week.  New PS made no difference.

 

Air circulation and cleanliness - I dusted everything off pretty good while I was replacing the PS, it was not exceptionally dusty, probably less dusty than the last time I replaced a hard drive.  I gave everything a good looking ove when I put it back together and the CPU fan, PS Fan and case fan were all spinning freely.  

 

Now let me tell you about my experience with Memtest...

 

The first time I ran MEMTEST, it finished 1 pass and showed no errors in the memory, but the system rebooted on it’s own at about the 15 minute mark.  Since I had 2 memory chips and 4 slots I tried running MEMTEST with the chips in every possible configuration to isolate which one might be causing the problem, but no matter how I arranged the chips,  MEMTEST always rebooted the system after anywhere from 15 to 30 minutes (never showing any errors in the chips however).  So I ordered some new chips.

 

While waiting for the new memory to arrive, I tried to bring the system back online and discovered that now it was not recognizing the built in eth0 on the motherboard.  The port had blinking green lights and my router recognized that it was there, but the server would not communicate.

 

I found a post in the forums that suggested installing a virgin copy of unRAID on a different drive to see if it boots properly, so I gave that a try.

 

Low and behold, the new install of unRAID on a 10 year old 2GB thumb drive booted right up, no problems.

 

I ran MEMTEST on my original memory chips, in their original configuration, for about 2 hours without any problems.  When the new chips arrived I installed them in the remaining 2 slots (now I have 4 chips installed) and started MEMTEST which continued this time for over 12 hours, ran 6 passes on all 4 chips finding 0 errors.

 

I brought the system online and It ran for about another 39 hours without a problem so I shut it down, moved it from the workbench back into it's closet and brought it back online.  

 

That was yesterday.  It was up for 24 hours before it shut itself down again.  I've had at least 2 shutdowns today, and I noticed that on 1 of them it restarted itself

Link to comment

Yep, now it's at least trying to restart itself.

 

Moments ago I hear my desktop computer make an alert noise, I rushed to check it and found a warning from my security camera system that it couldn't find the designated storage location. 

 

I looked at the telnet log and nothing had been registered for about 10 minutes

I checked the web GUI and it showed an uptime of only one minute.

I clicked on plugins then FCP,  and troubleshooting mode was now off.

I clicked on dashboard and this time I got a notice that the webpage could not be found.

 

I went to check the system and it was shut down.

 

I pulled the boot drive and found the attached FCP log.

FCPsyslog_tail 22 Feb 1643.txt

Link to comment
2 hours ago, geofbennett said:

I brought the system online and It ran for about another 39 hours without a problem so I shut it down, moved it from the workbench back into it's closet and brought it back online. 

 

 

2 hours ago, geofbennett said:

back into it's closet

 

The server is in a closet? What kind of system temps are you running? Is there airflow in/out of the closet? The fans may all be running just fine, but if all they're doing is pushing 80+° air around, they're not going to be doing much good. It's entirely possible that the CPU is shutting the system down to protect itself.

 

Install the Dynamix System Temperature plugin to keep an eye on your temps.

Link to comment

Installed the Dynamix System Temperature plugin 

Temp in the closet is still 70F

CPU temp is 32C (89.6F)

MB temp is 33C (91.4)

 

Drives are all reporting 27-29C (80.6 - 84.2C)

 

System is oriented so the disks are on bottom, MB and PS are on top.  The case fan is oriented directly upwards so air flow is from bottom to top.    

Link to comment

 

OP, based on I think Johnnie has the likely culprut. 

 

3 hours ago, geofbennett said:

The first time I ran MEMTEST, it finished 1 pass and showed no errors in the memory, but the system rebooted on it’s own at about the 15 minute mark.  Since I had 2 memory chips and 4 slots I tried running MEMTEST with the chips in every possible configuration to isolate which one might be causing the problem, but no matter how I arranged the chips,  MEMTEST always rebooted the system after anywhere from 15 to 30 minutes (never showing any errors in the chips however).

 

2 hours ago, johnnie.black said:

Based on the symptoms and what you already did a failing motherboard would be my prime suspect, unfortunately not easy to test.

 

 

OP your system is crashing in 15-30 minutes in memtest, and you swapped the PSU but no memory errors and I'm assuming no thermal warning in bios; I think Johnnie hit your likely culprit. Are there any bulging, or crusty looking stuff on the top of any capacitors? If yes, then I would definitely call it a bad main board.  If CAPS look good, and you still don't want to call it the logic board, you could try disconnecting all peripheral hardware so it's just MLB, RAM, CPU, (if the motherboard has onboard video remove video card), keyboard and monitor then run memtest again. If the system crashes with everything disco, very high probability of bad logic board. If the system is rock solid in memtest, for say 12+ hours, then that means something unplugged is causing your crashes (either directly being the part, or warn out component of MLB as it's interacting with the component. You'll want to introduce a component, test, then add another part, test, repeat this until your system goes back into FUBAR. I.E. process of elimination.

 

Edited by Jcloud
A clarification, on the bad parts for troubleshooting.
Link to comment
2 hours ago, geofbennett said:

Still frustrated because a total rebuild was not exactly in the budget at the moment

You have your CPU and RAM, why not just source out, and buy a new logic board of same RAM/CPU specs? If it was what you wanted before, and you don't have green for a rebuild now, that seems like best consumer option to me. New board should about $100-250 range, I'd imagine.  Versus MOBO, CPU, RAM ... which is how I interpreted your statement.

Link to comment

The original MB was only $60 4 years ago.  In fact the whole build 4 years ago was only around $300.

 

Buying an older MB now just adds to that cost and leaves me little room to start exploring all the cool new options available to me.

 

I'm just gonna put together a modern bargain system that I can start working with VMs and various applications.

Link to comment
9 minutes ago, geofbennett said:

'm just gonna put together a modern bargain system that I can start working with VMs and various applications.

 

Take your time in doing your research.  Pay particular attention to exactly what you want to be able to so with your VM's.  The closer, you want to come to the capabilities of a stand-alone desktop, the higher the requirements (and cost) on the hardware will be.  

Link to comment
27 minutes ago, geofbennett said:

I'm just gonna put together a modern bargain system that I can start working with VMs and various applications.

 

I'm sure you probably already know of it, but in case you don't there's a great website for sourcing parts that also filters hardware as you go to make sure everything you buy is compatible. I find it makes builds so much simpler.

 

https://pcpartpicker.com/

 

It has a bunch of different country options. Just change your location from the dropdown in the top right corner :).

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.