tokra Posted December 9, 2016 Share Posted December 9, 2016 Here is My Unraid 6.2.4 machine: > Fractal Design Node 804 > Fractal Design Integra M 550W black > ASROCK FM2A88M EXTREME4 + R2.0 > AMD A8-7600 > Cooler Master Hyper TX3 EVO > Patriot 8GB DDR3 1600MHz CL11 Signature Line with cooler > cache: Kingston HyperX FURY SSD 120GB > 3x Western Digital RED 3TB I switched to this HW 1 week ago. After some hours, usually during night, my UNRAID completely freeze. One of my friend also noticed me that BIOS usually overclock when its auto setting, so i set manual to 100mhz APU.. (range is 100-136mhz i guess), maybe also counter needs to be adjusted ? I am desperate because, system running really great until I left it without work, and freeze over night. I noticed its about after 5-8 hours. When i wake up next day i need to hard reset. I also doubt its PSU issue, as PSU is brand new Fractal Design Integra M550. I did check flash but windows checkdisk haven't found any errors. I was also running memtest86 for 2h and haven't found any errors, but someone one this forum noticed me to run memtest at least 24h, so this is I'm doing now. I did reformated my previous flash and put brand new clean UNRAID installation. Any other hints or ideas ? :'( Thank you in advance tower-diagnostics-20161206-0532.zip Quote Link to comment
Frank1940 Posted December 11, 2016 Share Posted December 11, 2016 I presume that it ran the 24 hour memtst without locking up or having a memory fault. Can you telenet or SSH into the server when it is locked up? If so, type diagnostics on the cammand line. That will write the diagnostics file to your Flash Drive. (look in the root or the logs folder.) You are saying that it only locks up overnight and at no other time. And it does matter when you start the server? What happens if you turn it off and leave it off until evening and then start it up. If it only locks up during the over-night period, do you have some job scheduled to run during that period? (And don't forget about mover!) Have you tried a quick push of the power button to see if you get a clean shutdown? (Quick push should start a clean shutdown while holding it down will force a poweroff which results in an unclean shutdown.) Attach a monitor to the server and see if you have any messages after it happens. You may have to take a picture of the monitor screen. If you do that, make sure it can be easily read. A blurry or out-of-focus picture may not be enough to figure out what is going on. You might also provide us with a rundown of all the plugins, Dockers and VM's that you are running. On your own, you might shut some of them down be you go to bed and see if you can isolate the issue yourself. You have to understand that trouble shooting these types of problems is every difficult from afar. (It can be difficult even when one is seated right in front of it.) It could be caused either software or faulty hardware. Quote Link to comment
tokra Posted December 11, 2016 Author Share Posted December 11, 2016 Alright, Issue is sporadical, and I am not saying its strictly over night. 1) I did check ram with memtest86, no problem. I even switched ram, from my gaming pc - DID NOT HELP 2) I also changed FLASH drive to new one SANDISK Cruzer16GB - DID NOT HELP 3) I also tried several versions on unraid, with even completely clean instalation 6.1.x, 6.2.4, now Im runnign 6.3.0 RC6 - DID NOT HELP 4) My HW is completely brand new, so I doubt its PSU issue - you can see my HW components in signature of post 5) When system freeze, I cannot login over SSH, even cannot ping - when I check monitor, there is only last message from startup of system and seeing login into tower - its unresponsive, typing on keyboard not seen in console 6) I also did check SMART of all my 3x disks, which seems OK 7) I am now running with "Fix Common Problems - Troubleshooting mode", so if system freeze there suppose to be some syslog in /boot/logs/ after restart - I HOPE BIOS is up to date. So what else you recommend ? I was also thinking of install windows on SSD, and run some performance tests... Is unraid kernell suporting AMD cpu's ? Because from what I saw on lime tech, all they testing machine are Intel only... Quote Link to comment
Frank1940 Posted December 11, 2016 Share Posted December 11, 2016 If 'Fix Common Problems' doesn't find anything, you should try this from the troubleshooting guide found here: http://lime-technology.com/forum/index.php?topic=39257.0 "First, tell us the exact version of unRAID, the plugins and addons you have loaded, and what hardware you are using. You can tell us about your hardware either here in the post, or in your signature." and "* Try starting in Safe Mode, without any plugins or other addons. Does the problem still occur? Then repeat testing with and without various plugins and addons." Remember, most folks are basically lazy and everyone here is a volunteer, the more information you provide about your system, the more likely they are likely to jump in and give you some assistance. (Both of my servers are basic unRAID NAS boxes and I have absolutely no experience with Dockers or VM's. I only have some very basic knowledge about them that I have picked up by reading the forums.) Quote Link to comment
tokra Posted December 11, 2016 Author Share Posted December 11, 2016 So here is my HW: Case: Fractal Design Node 804 PSU: Fractal Design Integra M 550W black MB: ASROCK FM2A88M EXTREME4 + R2.0 CPU: AMD A8-7600 CPU cooler: Cooler Master Hyper TX3 EVO RAM: Patriot 8GB DDR3 1600MHz CL11 Signature Line with cooler (This i used but for test exchanged with 16GB 1333MHz from my gaming pc) Cache disk: Kingston HyperX FURY SSD 120GB Data + Parity disks: 3x Western Digital RED 3TB I am attaching Fix common problems - syslog & diagnostic In next post I attach how screen looked like after freeze & Also attaching HDD's S.M.A.R.T status logs. I have installed these dockers: linuxserver/couchpotato limetech/plex linuxserver/sickbeard linuxserver/sickrage linuxserver/sonarr these plugins: CA Auto Update Applications CA Backup / Restore Appdata CA Cleanup Appdata Community Applications Dynamix System Buttons Dynamix System Information Dynamix System Statistics Dynamix System Temperature Dynamix webGui Fix Common Problems Nerd Tools syslog.txt tower-diagnostics-20161211-1113.zip Quote Link to comment
tokra Posted December 11, 2016 Author Share Posted December 11, 2016 Here is S.M.A.R.T & how screen looks like after freeze tower-smart-20161211-0959.zip tower-smart-20161211-1010.zip tower-smart-20161211-1014.zip Quote Link to comment
John_M Posted December 12, 2016 Share Posted December 12, 2016 Is unraid kernell suporting AMD cpu's ? Because from what I saw on lime tech, all they testing machine are Intel only... Yes, unRAID runs on AMD CPUs and APUs. Many people who post here use them. Quote Link to comment
tokra Posted December 18, 2016 Author Share Posted December 18, 2016 So after many testings and my friend helping me we found this issues with Cache: - Unraid freeze when cache mover is on (100-300GB of data moved) - Does not matter if its ssd or HDD: tested with Kingston SSD and WD Red HDD - Plex cause freeze when library updates - Using different sata cable or sata port does not matter - When cache disk is out of array (unassigned), it not causing issues There is no problem when running same machine with SSD on WINDOWS !!! #UnraidFail Quote Link to comment
Frank1940 Posted December 18, 2016 Share Posted December 18, 2016 Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool: "* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)." You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information. (By the way, your posted picture was fine.) Quote Link to comment
tokra Posted December 18, 2016 Author Share Posted December 18, 2016 Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool: "* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)." You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information. (By the way, your posted picture was fine.) I already put diagnostic and it shows nothing. I also attached screenshot which shows how unraid looks when freezed. If you wanna help would be nice to look at logs I attached. Thank you Quote Link to comment
Frank1940 Posted December 18, 2016 Share Posted December 18, 2016 Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool: "* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)." You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information. (By the way, your posted picture was fine.) I already put diagnostic and it shows nothing. I also attached screenshot which shows how unraid looks when freezed. If you wanna help would be nice to look at logs I attached. Thank you Was the computer actually running this command line when the crash/lockup occurred and you took the picture after the crash? (It looks to be as if the line has not been run as I would not be seeing a login prompt!) Quote Link to comment
tokra Posted December 18, 2016 Author Share Posted December 18, 2016 Since you have a monitor attached to your server and you feel that you have identified the condition that causes the problem, why don't you try this diagnostic tool: "* If the system crashes completely and there is no way to capture a final syslog, then start a tail on the unRAID console or Telnet session (tail -f /var/log/syslog)." You will have to take a picture of what is on the screen after the crash/lockup so make that the picture is in focus and the flash reflection does not obscure any vital information. (By the way, your posted picture was fine.) I already put diagnostic and it shows nothing. I also attached screenshot which shows how unraid looks when freezed. If you wanna help would be nice to look at logs I attached. Thank you Was the computer actually running this command line when the crash/lockup occurred and you took the picture after the crash? (It looks to be as if the line has not been run as I would not be seeing a login prompt!) Frank, Im not saying I was tailing syslog on the monitor, but I was running "Fix Common Issues" which is storing syslog to flash. Please check some posts back where i attached syslog before crash. Thank you Quote Link to comment
HellDiverUK Posted December 18, 2016 Share Posted December 18, 2016 Friends don't let friends buy AsRock. Buy a proper motherboard, either Gigabyte or Asus. I'm confident that'll cure the issue. Quote Link to comment
tokra Posted December 18, 2016 Author Share Posted December 18, 2016 Friends don't let friends buy AsRock. Buy a proper motherboard, either Gigabyte or Asus. I'm confident that'll cure the issue. ASROCK is ASUS And thanks but this is not really helpful. I cannot just return MB to seller to say: Sorry its not working with UNRAID, but on Windows its completely fine. :'( I think UNRAID should be stable enough to support any MB. Quote Link to comment
Squid Posted December 18, 2016 Share Posted December 18, 2016 Friends don't let friends buy AsRock. Buy a proper motherboard, either Gigabyte or Asus. I'm confident that'll cure the issue. ASROCK is ASUS ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company. Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line. Side note though. How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows? There is a reason why a workstation is different than a PC, and yet they both run the exact same software. Quote Link to comment
tokra Posted December 18, 2016 Author Share Posted December 18, 2016 Friends don't let friends buy AsRock. Buy a proper motherboard, either Gigabyte or Asus. I'm confident that'll cure the issue. ASROCK is ASUS ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company. Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line. Side note though. How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows? There is a reason why a workstation is different than a PC, and yet they both run the exact same software. Man, you no need to tell me this I am Software Developer. I do use almost all desktop platforms: Linux/Unix, MacOs, Windows. Those sayings ASROCK sucks won't help to solve my issue, and it's also not easy to say seller/vendor this MB is not working on Unraid but its just fine on the other platform (with this argument I cannot return that MB) And Im pretty sure when I will install Ubuntu or any other Linux, everything will work just fine. So don't tell me it could not be Unraid issue. It can be some hidden bug or some driver problem with specific type of HW, as these things happened in past too. As I developer my responsibility is to make sure my code (App, WebApp, Mobile App, Native App) will work, and if my APP is PAID then reason to fix something is much more serious. I would need some solution, not chit-chat about whats good and whats not - at least this is what I hoped to expect from this community. Thank you for understanding. Quote Link to comment
Squid Posted December 18, 2016 Share Posted December 18, 2016 Friends don't let friends buy AsRock. Buy a proper motherboard, either Gigabyte or Asus. I'm confident that'll cure the issue. ASROCK is ASUS ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company. Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line. Side note though. How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows? There is a reason why a workstation is different than a PC, and yet they both run the exact same software. Man, you no need to tell me this I am Software Developer. I do use almost all desktop platforms: Linux/Unix, MacOs, Windows. Those sayings ASROCK sucks won't help to solve my issue, and it's also not easy to say seller/vendor this MB is not working on Unraid but its just fine on the other platform (with this argument I cannot return that MB) And Im pretty sure when I will install Ubuntu or any other Linux, everything will work just fine. So don't tell me it could not be Unraid issue. It can be some hidden bug or some driver problem with specific type of HW, as these things happened in past too. As I developer my responsibility is to make sure my code (App, WebApp, Mobile App, Native App) will work, and if my APP is PAID then reason to fix something is much more serious. I would need some solution, not chit-chat about whats good and whats not - at least this is what I hoped to expect from this community. Thank you for understanding. My point was stating why the Don't Buy AsRock comment came up by HellDriver, and I justified it by my own experience and also stated that I've heard the server boards are top of the line. Its just a pet-peeve when I see people state that Windows works on such-and-such and hear constantly out of everyone's mouths in the world the "its a windows glitch" when in fact most windows "glitches" can in fact be blamed upon hardware. But yes you are correct that driver issues, etc can be to blame for a lot of hidden issues, but so can the hardware of the system as a whole. Flakey power supplies are one key item. Dust on solder joints can cause problems because under the right circumstances dust conducts electricity. Unfortunately your syslog snippet just shows that the system presumably outright crashed. The picture of what's on the display only showed what was normally there. Usually, but not always, a software error would result in a kernel oops or something displaying on the screen. Since there is very little information to go on, hardware tends to get blamed. The only suggestion that I can offer is: Install the NerdPack plugin (and set it to install Perl) Install the Dynamix System Temp plugin, and have it detect the available sensors and then load the available drivers Update FCP to the latest version Run FCP in troubleshooting mode again. With all that, the output of the various sensors will also get logged to the syslog which may or may not shed additional information on what's going on. (Depends upon the sensors your mb has) But TBH, based upon your symptoms of an outright crash with no warnings or errors, it does certainly imply powersupply / cpu / motherboard / cooling. And since very few people in the world have the means or knowledge to properly diagnose those items while they are installed in a system, the general course of action is to begin to replace components until you come across the faulty part. IE: Just because you can load and run Windows does not mean at all that the hardware is not defective. Personally, I have run unRaid on every P.O.S. motherboard that I have owned (all consumer level) and the software itself has proven to be on my equipment to be rock-solid when the hardware is not at fault. And Limetech is very good at fixing faults with their software (or outright stating that it doesn't work with such and such component) if they can replicate the problem. Quote Link to comment
tokra Posted December 18, 2016 Author Share Posted December 18, 2016 Friends don't let friends buy AsRock. Buy a proper motherboard, either Gigabyte or Asus. I'm confident that'll cure the issue. ASROCK is ASUS ASRock was originally the very low-end brand of Asus until it was spun off into its own separate company. Personally I wouldn't buy anything from them because of a bad taste left in my mouth from their very poor quality when it was owned by Asus, but I have heard that their server motherboards are top of the line. Side note though. How many times have you ever said to yourself "Oh That's Just A Windows Glitch" when something didn't quite work right on windows? There is a reason why a workstation is different than a PC, and yet they both run the exact same software. Man, you no need to tell me this I am Software Developer. I do use almost all desktop platforms: Linux/Unix, MacOs, Windows. Those sayings ASROCK sucks won't help to solve my issue, and it's also not easy to say seller/vendor this MB is not working on Unraid but its just fine on the other platform (with this argument I cannot return that MB) And Im pretty sure when I will install Ubuntu or any other Linux, everything will work just fine. So don't tell me it could not be Unraid issue. It can be some hidden bug or some driver problem with specific type of HW, as these things happened in past too. As I developer my responsibility is to make sure my code (App, WebApp, Mobile App, Native App) will work, and if my APP is PAID then reason to fix something is much more serious. I would need some solution, not chit-chat about whats good and whats not - at least this is what I hoped to expect from this community. Thank you for understanding. My point was stating why the Don't Buy AsRock comment came up by HellDriver, and I justified it by my own experience and also stated that I've heard the server boards are top of the line. Its just a pet-peeve when I see people state that Windows works on such-and-such and hear constantly out of everyone's mouths in the world the "its a windows glitch" when in fact most windows "glitches" can in fact be blamed upon hardware. But yes you are correct that driver issues, etc can be to blame for a lot of hidden issues, but so can the hardware of the system as a whole. Flakey power supplies are one key item. Dust on solder joints can cause problems because under the right circumstances dust conducts electricity. Unfortunately your syslog snippet just shows that the system presumably outright crashed. The picture of what's on the display only showed what was normally there. Usually, but not always, a software error would result in a kernel oops or something displaying on the screen. Since there is very little information to go on, hardware tends to get blamed. The only suggestion that I can offer is: Install the NerdPack plugin (and set it to install Perl) Install the Dynamix System Temp plugin, and have it detect the available sensors and then load the available drivers Update FCP to the latest version Run FCP in troubleshooting mode again. With all that, the output of the various sensors will also get logged to the syslog which may or may not shed additional information on what's going on. (Depends upon the sensors your mb has) But TBH, based upon your symptoms of an outright crash with no warnings or errors, it does certainly imply powersupply / cpu / motherboard / cooling. And since very few people in the world have the means or knowledge to properly diagnose those items while they are installed in a system, the general course of action is to begin to replace components until you come across the faulty part. IE: Just because you can load and run Windows does not mean at all that the hardware is not defective. Personally, I have run unRaid on every P.O.S. motherboard that I have owned (all consumer level) and the software itself has proven to be on my equipment to be rock-solid when the hardware is not at fault. And Limetech is very good at fixing faults with their software (or outright stating that it doesn't work with such and such component) if they can replicate the problem. I did measure temp with Dynamix but values are max 39 degrees of centigrade. I gave this unraid to my friend to make some tests and digging. He did various benchmarks (again in Windows) and they show that HW is perfectly fine. From symptoms I dont know what to expect, there was no kernell panic on screen, console just freeze.. Whole HW is brand new including PSU which is 550W from Coirsair. We will make several more tests, but starting to be desperate. Thx Quote Link to comment
Squid Posted December 18, 2016 Share Posted December 18, 2016 Also check for BIOS updates Additionally, I will go on record as stating the A88 chipset is 100% compatible with unRaid (I use an Asus A88X-Pro, albeit with an FM2 processor (A8-6600k) not an FM2+ like yours. Quote Link to comment
Frank1940 Posted December 18, 2016 Share Posted December 18, 2016 Frank, Im not saying I was tailing syslog on the monitor, but I was running "Fix Common Issues" which is storing syslog to flash. Please check some posts back where i attached syslog before crash. Thank you The point being that what you have done hasn't presented a clue about what the problem is. What I was suggesting is a different diagnostic tool that might provide that clue. You do have to leave the monitor run while you are doing the test. Just allow the server to run until the problem happens and the last thing that has been logged will be on the screen (and possibly it will be after what the 'Fix Common Problems" was doing as the Common Problem has to store that syslog which it can't do it if the system is locked up.). While I don't run the software combination that you are using, it seems likely that the problem is somehow involved in what is running (and possibly locking a file so Mover can't finish) or you have some mis-configuration in one of your Dockers. Quote Link to comment
ixnu Posted December 19, 2016 Share Posted December 19, 2016 Additionally, I will go on record as stating the A88 chipset is 100% compatible with unRaid (I use an Asus A88X-Pro, albeit with an FM2 processor (A8-6600k) not an FM2+ like yours. I concur. I have an FM2+ A88X ASrock board (A88M-G/3.1) with an A8-7600 that has been stable since upgrading to 8GB of RAM. This MB looks almost exactly like yours. I run only run one SATA port (cache drive) on MB though. Quote Link to comment
tokra Posted December 19, 2016 Author Share Posted December 19, 2016 Additionally, I will go on record as stating the A88 chipset is 100% compatible with unRaid (I use an Asus A88X-Pro, albeit with an FM2 processor (A8-6600k) not an FM2+ like yours. I concur. I have an FM2+ A88X ASrock board (A88M-G/3.1) with an A8-7600 that has been stable since upgrading to 8GB of RAM. This MB looks almost exactly like yours. I run only run one SATA port (cache drive) on MB though. So you needed to upgrade to 8GB of RAM ? My machine already has 8GB of "Patriot 8GB DDR3 1600MHz CL11 Signature Line with cooler" Quote Link to comment
ixnu Posted December 19, 2016 Share Posted December 19, 2016 Yep. https://lime-technology.com/forum/index.php?topic=54731.msg522948 Oom killer invoked - although it probably had to do with cache dirs. As a side note, it appears that your MB variant had one of the highest RMA percentages in 2014: https://www.techpowerup.com/forums/threads/2014-motherboard-rma-rate-new-update-from-hardware-fr.207128/ Quote Link to comment
ixnu Posted December 19, 2016 Share Posted December 19, 2016 Also notice that you have ReiserFS. I'm surprised that a member has not suggested this as the root cause. I had a 6.2.4 AMD system that would not stay stable beyond 24 hours with ReiserFS. This box had been stable for over 3 years with 5x. It would lock up SMB and would neither restart nor shutdown. I eventually installed Debian on the erstwhile unstable unraid box in order to migrate data to XFS due to its mysterious instability. I emailed LimeTech for help about the problem, but never received a response https://lime-technology.com/forum/index.php?topic=54452.msg521132 Do you have a drive to test the system without Reiser? Quote Link to comment
tokra Posted December 19, 2016 Author Share Posted December 19, 2016 Also notice that you have ReiserFS. I'm surprised that a member has not suggested this as the root cause. I had a 6.2.4 AMD system that would not stay stable beyond 24 hours with ReiserFS. This box had been stable for over 3 years with 5x. It would lock up SMB and would neither restart nor shutdown. I eventually installed Debian on the erstwhile unstable unraid box in order to migrate data to XFS due to its mysterious instability. I emailed LimeTech for help about the problem, but never received a response https://lime-technology.com/forum/index.php?topic=54452.msg521132 Do you have a drive to test the system without Reiser? I already buy a new HDD and I will try to move those data, and reformat existing drives as XFS. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.