November 26, 201312 yr I'm running 5.0 (stable release) on a new build (about 1 month old) and have never had it running stable. I have another server that has been running for years without any issues (5.0 as well). I have to turn off the tower to reboot and get everything started again. I'm running 5 drives (1 parity, 1 cache and 3 data). I thought originally I was having a network problem as I cold see the tower for a period of time (usually about 30-60 minutes) I have the IP fixed in both the tower and my router. I now know that the entire system crashes (I've plugged a monitor and keyboard into the tower and I cannot type anything). I've recently disabled the cache disc in the Settings/Share Settings - I still crashed. I recently noticed the last time the system crashed, the system was in the midst of a parity check and it was stuck at 16.5% and the estimated speed was 1MB/s with the estimated time being 5 days. Also, I'm pretty sure the parity check isn't the cause as many times I've stopped the check as I've rebooted and I've still crashed. Here is my syslog from the log link on the top right of the main page\ /usr/bin/tail -f /var/log/syslog Nov 24 20:53:36 Tower emhttp: get_config_idx: fopen /boot/config/shares/Movies.cfg: No such file or directory - assigning defaults Nov 24 20:53:36 Tower emhttp: get_config_idx: fopen /boot/config/shares/Music.cfg: No such file or directory - assigning defaults Nov 24 20:53:36 Tower emhttp: get_config_idx: fopen /boot/config/shares/Photos.cfg: No such file or directory - assigning defaults Nov 24 20:53:36 Tower emhttp: get_config_idx: fopen /boot/config/shares/TV.cfg: No such file or directory - assigning defaults Nov 24 20:53:36 Tower emhttp: Restart SMB... Nov 24 20:53:36 Tower emhttp: shcmd (39): killall -HUP smbd Nov 24 20:53:37 Tower emhttp: shcmd (40): ps axc | grep -q rpc.mountd Nov 24 20:53:37 Tower emhttp: _shcmd: shcmd (40): exit status: 1 Nov 24 20:53:37 Tower emhttp: shcmd (41): /usr/local/sbin/emhttp_event svcs_restarted Nov 24 20:53:37 Tower emhttp_event: svcs_restarted It just crashed again and I have this syslog attached to report Any ideas? tower_syslog.txt
January 17, 201412 yr Author Ok, sorry, but work and the holidays got in the way. After a long break, i finally have some time to look at this problem again. The last post said that I had an add-on that was incompatible or misconfigured. So, what I've done is I have completely re-formatted the flash drive and re-installed UnRaid this time with 5.0.4. I have not added or enabled any add-ons. This install is completely stock. I've mounted the drives and started the parity-sync. After about 10 minutes, the same problem has surfaced. The server has crashed and cannot be accessed either from the browser or directly on the server itself. Also, I've tried two other network cards with the same result. I have attached my latest syslog for anyone's review. tower_syslog_2014-01-16.txt
January 18, 201412 yr Author OK, I reformatted the flash drive again and re-installed 5.0.4 but the only thing I did was assign the drives and start the parity sync. I did nothing else. I'm using the stock gui. So i watched it run to about 30% and went to bed. Now, this morning, nothing. Dead again. However, on the server terminal itself, it's still alive. I used the ifconfig eth0 command to see if the IP was still assigned and it is - 192.168.0.27. Typed that directly into the browser and cannot find (chrome - this webpage is not available). ethtool eth0 shows it is connected to a 1Gb/s connection. So, it doesn't look like a networking problem? I've attached the latest syslog. What should I try next? tower_syslog_2014-01-17.txt
January 18, 201412 yr The syslog does not indicate a problem. It is 4 minutes long. Did you collect this syslog using the terminal to copy to the flash?
January 18, 201412 yr Author Sorry, attached is the syslog from the console. Also, I did perform the memtest and no errors. Hardware: Gigabyte GA-A75M-UD2H Motherboard AMD A6-3500 CPU 8GB Ram - (2) 4GB DDR3-1333 4GB SanDisk Cruzer Fit Flash Drive tower_syslog_2014-01-18_Console_after_crash.txt
January 18, 201412 yr PSU? Have you checked all the connections? What about the fans, especially the CPU fan?
January 18, 201412 yr Author OK, but I can't see the box from any web browser. I can't access any of the drives, even the flash through the network. I completely lose it after a period of time. I have another box running just fine and has been running for years. The memtest did run overnight. No errors. Also, all the fans are working properly they continue to run at good speed. I just went and updated the BIOS on the MB and started it back up. Can't think of anything else to try.
January 19, 201412 yr Author Well, I might have some success. I did two things that may have fixed it. One was the BIOS update. The other is hard disk cooling. I noticed yesterday that the temperatures of the HD's during the sync process were climbing higher and higher (up to 52C at one point). The higher the disk in the case the higher the temp. So I opened up the side and the front of the case and the temp went down a couple of degrees. Then I went and put a small window fan in front of the box so the air could blow from front to back of the drives and the temp plummeted. I'ts now been up for over 14 hours which is by far the longest this box has ever run. I think it's time today to go and buy a good case with proper cooling. Does this cooling issue make any sense on why the webGUI wouldn't work yet the console would? It's clear the CPU and computer still worked. Maybe it was the BIOS upgrade?
January 19, 201412 yr Could have been the BIOS...from everyone's comments above, the consensus is pointing towards a hardware problem. The temperature sensors are just on the MB and the CPU itself, so the hard drives, any added cards, etc., could be much hotter. 52C shouldn't be fatal, but High Heat can do strange things...even if the CPU keeps running, the other chips on your MB could have problems. If you can't add fans and cooling to your existing rig, a new well cooled case is in order--find one that sucks air in over the harddrives.
January 19, 201412 yr Author Thanks for everyone's help with this issue. I really appreciate it. Just in case, I did go and get a nice case with great cooling. But, I do tend to agree, I think it was the BIOS update.
Archived
This topic is now archived and is closed to further replies.