[Closed] My UNRAID machine is freezing after some hours


tokra

Recommended Posts

Here is My Unraid 6.2.4 machine:

> Fractal Design Node 804

> Fractal Design Integra M 550W black

> ASROCK FM2A88M EXTREME4 + R2.0

> AMD A8-7600

> Cooler Master Hyper TX3 EVO

> Patriot 8GB DDR3 1600MHz CL11 Signature Line with cooler

> cache: Kingston HyperX FURY SSD 120GB

> 3x Western Digital RED 3TB

 

I switched to this HW 1 week ago. After some hours, usually during night, my UNRAID completely freeze.

 

One of my friend also noticed me that BIOS usually overclock when its auto setting, so i set manual to 100mhz APU.. (range is 100-136mhz i guess), maybe also counter needs to be adjusted ?

 

I am desperate because, system running really great until I left it without work, and freeze over night. I noticed its about after 5-8 hours. When i wake up next day i need to hard reset.

 

I also doubt its PSU issue, as PSU is brand new Fractal Design Integra M550.

 

I did check flash but windows checkdisk haven't found any errors. I was also running memtest86 for 2h and haven't found any errors, but someone one this forum noticed me to run memtest at least 24h, so this is I'm doing now.

 

I did reformated my previous flash and put brand new clean UNRAID installation.

 

Any other hints or ideas ? :'(

 

Thank you in advance

tower-diagnostics-20161206-0532.zip

Link to comment

I presume that it ran the 24 hour memtst without locking up or having a memory fault.

 

Can you telenet or SSH into the server when it is locked up?  If so, type diagnostics on the cammand line.  That will write the diagnostics file to your Flash Drive.  (look in the root or the logs folder.)

 

You are saying that it only locks up overnight and at no other time.  And it does matter when you start the server?  What happens if you turn it off and leave it off until evening and then start it up.  If it only locks up during the over-night period, do you have some job scheduled to run during that period?  (And don't forget about mover!)

 

Have you tried a quick push of the power button to see if you get a clean shutdown?  (Quick push should start a clean shutdown while holding it down will force a poweroff which results in an unclean shutdown.)

 

Attach a monitor to the server and see if you have any messages after it happens.  You may have to take a picture of the monitor screen.  If you do that, make sure it can be easily read.  A blurry or out-of-focus picture may not be enough to figure out what is going on. 

 

You might also provide us with a rundown of all the plugins,  Dockers and VM's that you are running.  On your own, you might shut some of them down be you go to bed and see if you can isolate the issue yourself.   

 

You have to understand that trouble shooting these types of problems is every difficult from afar.  (It can be difficult even when one is seated right in front of it.)  It could be caused either software or faulty hardware.

 

 

Link to comment

Alright,

Issue is sporadical, and I am not saying its strictly over night.

1) I did check ram with memtest86, no problem. I even switched ram, from my gaming pc - DID NOT HELP

2) I also changed FLASH drive to new one SANDISK Cruzer16GB - DID NOT HELP

3) I also tried several versions on unraid, with even completely clean instalation 6.1.x, 6.2.4, now Im runnign 6.3.0 RC6 - DID NOT HELP

4) My HW is completely brand new, so I doubt its PSU issue - you can see my HW components in signature of post

5) When system freeze, I cannot login over SSH, even cannot ping - when I check monitor, there is only last message from startup of system and seeing login into tower - its unresponsive, typing on keyboard not seen in console

6) I also did check SMART of all my 3x disks, which seems OK

7) I am now running with "Fix Common Problems - Troubleshooting mode", so if system freeze there suppose to be some syslog in /boot/logs/ after restart - I HOPE

 

BIOS is up to date.

 

So what else you recommend ?

 

I was also thinking of install windows on SSD, and run some performance tests...

 

Is unraid kernell suporting AMD cpu's ? Because from what I saw on lime tech, all they testing machine are Intel only...

Link to comment

If 'Fix Common Problems' doesn't find anything, you should try this from the troubleshooting guide found here:

 

      http://lime-technology.com/forum/index.php?topic=39257.0

 

"First, tell us the exact version of unRAID, the plugins and addons you have loaded, and what hardware you are using.  You can tell us about your hardware either here in the post, or in your signature."

 

      and

 

"* Try starting in Safe Mode, without any plugins or other addons.  Does the problem still occur?  Then repeat testing with and without various plugins and addons."

Remember, most folks are basically lazy and everyone here is a volunteer, the more information you provide about  your system, the more likely they are likely to jump in and give you some assistance.  (Both of my servers are basic unRAID NAS boxes and I have absolutely no experience with Dockers or VM's.  I only have some very basic knowledge about them that I have picked up by reading the forums.)

Link to comment

So here is my HW:

  • Case: Fractal Design Node 804
  • PSU: Fractal Design Integra M 550W black
  • MB: ASROCK FM2A88M EXTREME4 + R2.0
  • CPU: AMD A8-7600
  • CPU cooler: Cooler Master Hyper TX3 EVO
  • RAM: Patriot 8GB DDR3 1600MHz CL11 Signature Line with cooler (This i used but for test exchanged with 16GB 1333MHz from my gaming pc)
  • Cache disk: Kingston HyperX FURY SSD 120GB
  • Data + Parity disks: 3x Western Digital RED 3TB

 

I am attaching Fix common problems - syslog & diagnostic

In next post I attach how screen looked like after freeze & Also attaching HDD's S.M.A.R.T status logs.

 

I have installed these dockers:

  • linuxserver/couchpotato
  • limetech/plex
  • linuxserver/sickbeard
  • linuxserver/sickrage
  • linuxserver/sonarr

 

these plugins:

  • CA Auto Update Applications
  • CA Backup / Restore Appdata
  • CA Cleanup Appdata
  • Community Applications
  • Dynamix System Buttons
  • Dynamix System Information
  • Dynamix System Statistics
  • Dynamix System Temperature
  • Dynamix webGui
  • Fix Common Problems
  • Nerd Tools

 

syslog.txt

tower-diagnostics-20161211-1113.zip

Link to comment

So after many testings and my friend helping me we found this issues with Cache:

- Unraid freeze when cache mover is on (100-300GB of data moved)

- Does not matter if its ssd or HDD: tested with Kingston SSD and WD Red HDD

- Plex cause freeze when library updates

- Using different sata cable or sata port does not matter

- When cache disk is out of array (unassigned), it not causing issues

>:(

 

There is no problem when running same machine with SSD on WINDOWS !!!  :o #UnraidFail

 

Link to comment

Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool:

 

"* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)."

 

You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information.  (By the way, your posted picture was fine.)

Link to comment

Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool:

 

"* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)."

 

You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information.  (By the way, your posted picture was fine.)

 

I already put diagnostic and it shows nothing. I also attached screenshot which shows how unraid looks when freezed.

 

If you wanna help would be nice to look at logs I attached.

 

Thank you

Link to comment

Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool:

 

"* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)."

 

You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information.  (By the way, your posted picture was fine.)

 

I already put diagnostic and it shows nothing. I also attached screenshot which shows how unraid looks when freezed.

 

If you wanna help would be nice to look at logs I attached.

 

Thank you

 

Was the computer actually running this command line when the crash/lockup occurred and you took the picture after the crash?  (It looks to be as if the line has not been run as I would not be seeing a login prompt!)

Link to comment

Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool:

 

"* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)."

 

You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information.  (By the way, your posted picture was fine.)

 

I already put diagnostic and it shows nothing. I also attached screenshot which shows how unraid looks when freezed.

 

If you wanna help would be nice to look at logs I attached.

 

Thank you

 

Was the computer actually running this command line when the crash/lockup occurred and you took the picture after the crash?  (It looks to be as if the line has not been run as I would not be seeing a login prompt!)

 

Frank, Im not saying I was tailing syslog on the monitor, but I was running "Fix Common Issues" which is storing syslog to flash. Please check some posts back where i attached syslog before crash.

 

Thank you

 

Link to comment

Friends don't let friends buy AsRock.

 

Buy a proper motherboard, either Gigabyte or Asus.  I'm confident that'll cure the issue.

 

ASROCK is ASUS  ;)

 

And thanks but this is not really helpful.

 

I cannot just return MB to seller to say: Sorry its not working with UNRAID, but on Windows its completely fine.

:'(

 

I think UNRAID should be stable enough to support any MB.

Link to comment

Friends don't let friends buy AsRock.

 

Buy a proper motherboard, either Gigabyte or Asus.  I'm confident that'll cure the issue.

 

ASROCK is ASUS  ;)

 

ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company.  Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line.

 

Side note though.  How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows?  There is a reason why a workstation is different than a PC, and yet they both run the exact same software.

Link to comment

Friends don't let friends buy AsRock.

 

Buy a proper motherboard, either Gigabyte or Asus.  I'm confident that'll cure the issue.

 

ASROCK is ASUS  ;)

 

ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company.  Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line.

 

Side note though.  How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows?  There is a reason why a workstation is different than a PC, and yet they both run the exact same software.

 

Man, you no need to tell me this I am Software Developer.

 

I do use almost all desktop platforms: Linux/Unix, MacOs, Windows.

 

Those sayings ASROCK sucks won't help to solve my issue, and it's also not easy to say seller/vendor this MB is not working on Unraid but its just fine on the other platform (with this argument I cannot return that MB)

 

And Im pretty sure when I will install Ubuntu or any other Linux, everything will work just fine.

 

So don't tell me it could not be Unraid issue. It can be some hidden bug or some driver problem with specific type of HW, as these things happened in past too.

 

As I developer my responsibility is to make sure my code (App, WebApp, Mobile App, Native App) will work, and if my APP is PAID then reason to fix something is much more serious.

 

I would need some solution, not chit-chat about whats good and whats not - at least this is what I hoped to expect from this community.

 

Thank you for understanding.

Link to comment

Friends don't let friends buy AsRock.

 

Buy a proper motherboard, either Gigabyte or Asus.  I'm confident that'll cure the issue.

 

ASROCK is ASUS  ;)

 

ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company.  Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line.

 

Side note though.  How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows?  There is a reason why a workstation is different than a PC, and yet they both run the exact same software.

 

Man, you no need to tell me this I am Software Developer.

 

I do use almost all desktop platforms: Linux/Unix, MacOs, Windows.

 

Those sayings ASROCK sucks won't help to solve my issue, and it's also not easy to say seller/vendor this MB is not working on Unraid but its just fine on the other platform (with this argument I cannot return that MB)

 

And Im pretty sure when I will install Ubuntu or any other Linux, everything will work just fine.

 

So don't tell me it could not be Unraid issue. It can be some hidden bug or some driver problem with specific type of HW, as these things happened in past too.

 

As I developer my responsibility is to make sure my code (App, WebApp, Mobile App, Native App) will work, and if my APP is PAID then reason to fix something is much more serious.

 

I would need some solution, not chit-chat about whats good and whats not - at least this is what I hoped to expect from this community.

 

Thank you for understanding.

My point was stating why the Don't Buy AsRock comment came up by HellDriver, and I justified it by my own experience and also stated that I've heard the server boards are top of the line.

 

Its just a pet-peeve when I see people state that Windows works on such-and-such and hear constantly out of everyone's mouths in the world the "its a windows glitch" when in fact most windows "glitches" can in fact be blamed upon hardware.

 

But yes you are correct that driver issues, etc can be to blame for a lot of hidden issues, but so can the hardware of the system as a whole.  Flakey power supplies are one key item.  Dust on solder joints can cause problems because under the right circumstances dust conducts electricity.

 

Unfortunately your syslog snippet just shows that the system presumably outright crashed.  The picture of what's on the display only showed what was normally there.  Usually, but not always, a software error would result in a kernel oops or something displaying on the screen.

 

Since there is very little information to go on, hardware tends to get blamed.  The only suggestion that I can offer is:

 

Install the NerdPack plugin (and set it to install Perl)

Install the Dynamix System Temp plugin, and have it detect the available sensors and then load the available drivers

Update FCP to the latest version

Run FCP in troubleshooting mode again.

 

With all that, the output of the various sensors will also get logged to the syslog which may or may not shed additional information on what's going on.  (Depends upon the sensors your mb has)

 

But TBH, based upon your symptoms of an outright crash with no warnings or errors, it does certainly imply powersupply / cpu / motherboard / cooling.  And since very few people in the world have the means or knowledge to properly diagnose those items while they are installed in a system, the general course of action is to begin to replace components until you come across the faulty part.

 

IE: Just because you can load and run Windows does not mean at all that the hardware is not defective.  Personally, I have run unRaid on every P.O.S. motherboard that I have owned (all consumer level) and the software itself has proven to be on my equipment to be rock-solid when the hardware is not at fault. 

 

And Limetech is very good at fixing faults with their software (or outright stating that it doesn't work with such and such component) if they can replicate the problem. 

Link to comment

Friends don't let friends buy AsRock.

 

Buy a proper motherboard, either Gigabyte or Asus.  I'm confident that'll cure the issue.

 

ASROCK is ASUS  ;)

 

ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company.  Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line.

 

Side note though.  How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows?  There is a reason why a workstation is different than a PC, and yet they both run the exact same software.

 

Man, you no need to tell me this I am Software Developer.

 

I do use almost all desktop platforms: Linux/Unix, MacOs, Windows.

 

Those sayings ASROCK sucks won't help to solve my issue, and it's also not easy to say seller/vendor this MB is not working on Unraid but its just fine on the other platform (with this argument I cannot return that MB)

 

And Im pretty sure when I will install Ubuntu or any other Linux, everything will work just fine.

 

So don't tell me it could not be Unraid issue. It can be some hidden bug or some driver problem with specific type of HW, as these things happened in past too.

 

As I developer my responsibility is to make sure my code (App, WebApp, Mobile App, Native App) will work, and if my APP is PAID then reason to fix something is much more serious.

 

I would need some solution, not chit-chat about whats good and whats not - at least this is what I hoped to expect from this community.

 

Thank you for understanding.

My point was stating why the Don't Buy AsRock comment came up by HellDriver, and I justified it by my own experience and also stated that I've heard the server boards are top of the line.

 

Its just a pet-peeve when I see people state that Windows works on such-and-such and hear constantly out of everyone's mouths in the world the "its a windows glitch" when in fact most windows "glitches" can in fact be blamed upon hardware.

 

But yes you are correct that driver issues, etc can be to blame for a lot of hidden issues, but so can the hardware of the system as a whole.  Flakey power supplies are one key item.  Dust on solder joints can cause problems because under the right circumstances dust conducts electricity.

 

Unfortunately your syslog snippet just shows that the system presumably outright crashed.  The picture of what's on the display only showed what was normally there.  Usually, but not always, a software error would result in a kernel oops or something displaying on the screen.

 

Since there is very little information to go on, hardware tends to get blamed.  The only suggestion that I can offer is:

 

Install the NerdPack plugin (and set it to install Perl)

Install the Dynamix System Temp plugin, and have it detect the available sensors and then load the available drivers

Update FCP to the latest version

Run FCP in troubleshooting mode again.

 

With all that, the output of the various sensors will also get logged to the syslog which may or may not shed additional information on what's going on.  (Depends upon the sensors your mb has)

 

But TBH, based upon your symptoms of an outright crash with no warnings or errors, it does certainly imply powersupply / cpu / motherboard / cooling.  And since very few people in the world have the means or knowledge to properly diagnose those items while they are installed in a system, the general course of action is to begin to replace components until you come across the faulty part.

 

IE: Just because you can load and run Windows does not mean at all that the hardware is not defective.  Personally, I have run unRaid on every P.O.S. motherboard that I have owned (all consumer level) and the software itself has proven to be on my equipment to be rock-solid when the hardware is not at fault. 

 

And Limetech is very good at fixing faults with their software (or outright stating that it doesn't work with such and such component) if they can replicate the problem.

 

I did measure temp with Dynamix but values are max 39 degrees of centigrade.

 

I gave this unraid to my friend to make some tests and digging. He did various benchmarks (again in Windows) and they show that HW is perfectly fine.

 

From symptoms I dont know what to expect, there was no kernell panic on screen, console just freeze..

 

Whole HW is brand new including PSU which is 550W from Coirsair.

 

We will make several more tests, but starting to be desperate.

 

Thx

Link to comment

 

Frank, Im not saying I was tailing syslog on the monitor, but I was running "Fix Common Issues" which is storing syslog to flash. Please check some posts back where i attached syslog before crash.

 

Thank you

 

The point being that what you have done hasn't presented a clue about what the problem is.  What I was suggesting is a different diagnostic tool that might provide that clue.  You do have to leave the monitor run while you are doing the test.  Just allow the server to run until the problem happens and the last thing that has been logged will be on the screen (and possibly it will be after what the 'Fix Common Problems" was doing as the Common Problem has to store that syslog which it can't do it if the system is locked up.).

 

While I don't run the software combination that you are using, it seems likely that the problem is somehow involved in what is running (and possibly locking a file so Mover can't finish) or you have some mis-configuration in one of your Dockers. 

Link to comment

 

Additionally, I will go on record as stating the A88 chipset is 100% compatible with unRaid (I use an Asus A88X-Pro, albeit with an FM2 processor (A8-6600k) not an FM2+ like yours.

 

I concur. I have an FM2+ A88X ASrock board (A88M-G/3.1) with an A8-7600 that has been stable since upgrading to 8GB of RAM. This MB looks almost exactly like yours.

 

I run only run one SATA port (cache drive) on MB though.

 

 

 

 

 

 

Link to comment

 

Additionally, I will go on record as stating the A88 chipset is 100% compatible with unRaid (I use an Asus A88X-Pro, albeit with an FM2 processor (A8-6600k) not an FM2+ like yours.

 

I concur. I have an FM2+ A88X ASrock board (A88M-G/3.1) with an A8-7600 that has been stable since upgrading to 8GB of RAM. This MB looks almost exactly like yours.

 

I run only run one SATA port (cache drive) on MB though.

 

So you needed to upgrade to 8GB of RAM ?

 

My machine already has 8GB of "Patriot 8GB DDR3 1600MHz CL11 Signature Line with cooler"

Link to comment

Also notice that you have ReiserFS. I'm surprised that a member has not suggested this as the root cause.

 

I had a 6.2.4 AMD system that would not stay stable beyond 24 hours with ReiserFS. This box had been stable for over 3 years with 5x. It would lock up SMB and would neither restart nor shutdown. I eventually installed Debian on the erstwhile unstable unraid box in order to migrate data to XFS due to its mysterious instability.

 

I emailed LimeTech for help about the problem, but never received a response  :-\

 

https://lime-technology.com/forum/index.php?topic=54452.msg521132

 

Do you have a drive to test the system without Reiser?

 

 

 

Link to comment

Also notice that you have ReiserFS. I'm surprised that a member has not suggested this as the root cause.

 

I had a 6.2.4 AMD system that would not stay stable beyond 24 hours with ReiserFS. This box had been stable for over 3 years with 5x. It would lock up SMB and would neither restart nor shutdown. I eventually installed Debian on the erstwhile unstable unraid box in order to migrate data to XFS due to its mysterious instability.

 

I emailed LimeTech for help about the problem, but never received a response  :-\

 

https://lime-technology.com/forum/index.php?topic=54452.msg521132

 

Do you have a drive to test the system without Reiser?

 

I already buy a new HDD and I will try to move those data, and reformat existing drives as XFS.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.