Unstable V6.0.1 Random Lockups


Recommended Posts

Hi All,

Upon glowing recommendation from a mate at work, I am trying to migrate to unRAID from an N36L with an mdadm array, QNAP TS-412 and another box (performing all the services below) with 6 3TB drives, but I am facing some issues.

 

The killer so far is random crashing.

 

So far there is very little this box is doing as I am trying to set it up and prove that it can take over the duties of my existing boxes.

 

Required to run are

Sickbeard / Sonarr

SABnzbd

Logitech Media Server

TVHeadend  (with HDHomerun tuners, any direction with getting these tuners running would be awesome, can't seem to find anything more than the drivers are included but not sure where to go from)

Couchpotato

Headphones

mysql

kodi headless

 

I would also like to run some VM's on this once all is setup and i am comfortable

 

Everything seems to be going fine Sonarr, Couch and Sab all talk and behave the way they should.

 

But then everything stops, no response from console, usb keyboard does not respond to caps lock being touched.

 

In the beginning when it was crashing it would take out my network, maybe it goes nuts broadcasting not sure, as soon as i unplug network or power everything restores.

 

This seems to have stopped, but it still crashes around 20-30 hours.

 

I thought maybe it was when mover was being run, but i have run mover from command and all works nicely.

 

 

Setup is a 3TB WD Green WD30EURS (Parity),  3TB WD Red WD30EFRX, and a 256GB Fujitsu SSD (Cache)

 

M/B ASUSTek Gryphon Z87

CPU i7-4770K

HVM Enabled

IOMMU Disbaled

RAM 32GB

Network Onboard Gig

Kernel Linux 4.0.4-unRAID x86_64

 

Are there any more details or output i can provide (of course there are :) my knowledge is limited but learn quickly, and enough to build the boxes above, I just don't always know where to look for the outputs i need for diagnosis / troubleshooting).

 

Any help would be greatly appreciated!

 

Cheers

 

Link to comment

Hi trurl,

 

thanks for your response, diagnostics attached.

 

I have setup a ping monitor to notify my when it goes down,it stopped responding at 4:36am this morning, and came back up at 7:58am, and had rebooted. I am not sure if it has been left this long while hung before, possibly while i was out over the weekend, but otherwise as soon as i notice it i go and have a look. Which usually results in a hard reset as the terminal is unresponsive to inputs and nothing changes if the power button is pressed and left to shutdown.

 

I will screen and tail syslog, and dmesg to files, is there anything else you suggest i output to file?

 

Cheers

ironman-diagnostics-20150714-0912.zip

Link to comment

Please boot into safe mode and see if you can recreate the issue.  Post a diagnostics from after booting in safe mode.  The reason I'm asking you to do this is because you currently have plugins installed on your system which are not officially supported yet.

Link to comment

also I am not sure if this helps or not, but i tailed syslog and dmesg to files yesterday (attached) to see if anything happened before it crashed again (it did)

 

Since then, i have upgraded my bios to version 2103, and reseated ram.

 

Any input would be greatly appreciated, cheers.

 

 

Syslog output

 

Jul 14 07:58:54 IronMan dnsmasq-dhcp[2423]: read /var/lib/libvirt/dnsmasq/default.hostsfile

Jul 14 07:58:54 IronMan kernel: virbr0: port 1(virbr0-nic) entered disabled state

Jul 14 07:58:55 IronMan avahi-daemon[2062]: Service "IronMan" (/services/sftp-ssh.service) successfully established.

Jul 14 07:58:55 IronMan avahi-daemon[2062]: Service "IronMan" (/services/smb.service) successfully established.

Jul 14 07:58:55 IronMan avahi-daemon[2062]: Service "IronMan" (/services/ssh.service) successfully established.

Jul 14 07:58:55 IronMan ntpd[1540]: Listen normally on 3 docker0 172.17.42.1:123

Jul 14 07:58:55 IronMan ntpd[1540]: Listen normally on 4 virbr0 192.168.122.1:123

Jul 14 07:58:55 IronMan ntpd[1540]: new interface(s) found: waking up resolver

Jul 14 09:21:49 IronMan in.telnetd[13158]: connect from 192.168.1.254 (192.168.1.254)

Jul 14 09:21:55 IronMan login[13159]: ROOT LOGIN  on '/dev/pts/0' from '192.168.1.254'

 

 

 

ironman-dmesg.txt

Link to comment

Some observations -

 

* The go file shows UnMENU has been used, and it's loading packages from /packages.  Are there any packages in that boot folder?

 

* Syslog ends just as the boot completes, looks basically OK, but a parity check is running, due to previous crash.  Last entry is at Jul 14 21:53:17, do you have an idea when it crashed?  I assume it didn't crash until later, but there are no further entries after 21:53:17.  Time of diagnostics capture is 22:05.

 

* The syslog tail has nothing later either, and the dmesg has nothing else to add.

 

* There are ACPI warnings in the syslog, which are quite common, but there are also ACPI errors, not so common.  I recommend checking for a BIOS upgrade for the motherboard, applying it if found.  Just noticed that you mention you just upgraded the BIOS, have you tested since?

 

* Earlier, you said "it still crashes around 20-30 hours".  I strongly recommend a thorough memory test, at least 36 hours.  There's a good Memtest on the boot menu.

 

* When it crashes, what's on the monitor?  Are you running that syslog tail from a Telnet session?  There really ought to be some last message or output somewhere, unless it's a memory problem.  Heat problem?  Anything running too hot?  Check the CPU temps.  And make sure all fans are working correctly.

Link to comment

Thanks for looking Rob,

 

I have left it running since the BIOS upgrade and memory reseat seems fine so far. uptime ~22 hours so far.

 

below is what is in /boot/packages , the only one that i have installed is screen. I had previously thought that UnMENU was the culprit and had not run it, but the crashing happened still.

 

root@IronMan:/boot/packages# ls

airvideo-unmenu-package.conf*                  mover_fix-logging-unmenu-package.conf*

apcupsd-3.14.10-unmenu-package-x86_64.conf*    mysql-unmenu-package.conf*

apcupsd-3.14.10-unmenu-package.conf*          ntfs-3g-2010.3.6-unmenu-package.conf*

apcupsd-unmenu-package.conf*                  openssh-unmenu-package.conf*

apcupsd3-unmenu-package.conf*                  openssl-unmenu-package.conf*

bwm-ng-unmenu-package-x86_64.conf*            p910nd-unmenu-package.conf*

bwm-ng-unmenu-package.conf*                    pbzip2-unmenu-package.conf*

compiler-unmenu-package.conf*                  pciutils-unmenu-package.conf*

compiler-unmenu-package_x86_64.conf*          perl-unmenu-package.conf*

cpio-unmenu-package.conf*                      php-unmenu-package.conf*

cpufrequtils-unmenu-package.conf*              powerdown-2.06_ctlaltdel-unmenu-package.conf*

cxxlibs-unmenu-package.conf*                  powerdown-overtemp-unmenu-package.conf*

dmidecode-unmenu-package.conf*                powerdown_ctlaltdel-unmenu-package.conf*

ds_store_cleanup-unmenu-package.conf*          proftp-unmenu-package.conf*

dynamic-dns-unmenu-package.conf*              python-unmenu-package.conf*

encfs-unmenu-package.conf*                    python_cheetah-unmenu-package.conf*

file-unmenu-package.conf*                      reiserfsck-3.6.21-unmenu-package.conf*

hdparm-9.27-unmenu-package.conf*              rsync-unmenu-package.conf*

hdparm-9.37-unmenu-package-1.conf*            ruby-unmenu-package.conf*

htop-unmenu-package.conf*                      sabnzbd-unmenu-package.conf*

iftop-unmenu-package.conf*                    screen-4.0.3-x86_64-4.txz*

image_server-unmenu-package.conf*              screen-4.0.3-x86_64-4.txz.auto_install*

inotify-tools-unmenu-package.conf*            screen-4.0.3-x86_64-4.txz.manual_install*

iperf-unmenu-package.conf*                    screen-unmenu-package-x86_64.conf*

istat-unmenu-package.conf*                    screen-unmenu-package.conf*

jre-unmenu-package.conf*                      shellinabox-unmenu-package.conf*

lighttpd-unmenu-package.conf*                  smartctl-unmenu-package.conf*

lsof-unmenu-package.conf*                      socat-unmenu-package.conf*

mail-ssmtp-unmenu-package-x86_64.conf*        sqlite-unmenu-package.conf*

mail-ssmtp-unmenu-package.conf*                svn-unmenu-package.conf*

mail_status-unmenu-package.conf*              unraid-web-unmenu-package.conf*

md5deep-unmenu-package.conf*                  unrar-unmenu-package.conf*

monthly-parity-unmenu-package.conf*            utempter-1.1.5-x86_64-1.txz*

mover_conditional_sync-unmenu-package.conf*    vim-unmenu-package.conf*

mover_exclude_underscore-unmenu-package.conf*  zip-unmenu-package.conf*

root@IronMan:/boot/packages#

 

14/07/2015 9:46:23 PM or 9:57PM (trying to suss whether the time in the alert is host system time or gmail's time) but a few minutes either side of the crash it became unresponsive to pings, and it's mac address is unlearnt from my switch.

The diagnostic capture was after a reboot, in safe mode.

 

 

The last crash there was some output i didn't manage to get a photo of, but usually it looks as it does straight from boot, but the cursor stops flashing.

 

The syslog tail was via a screen output to a file.

 

Temperatures all seem hunky dory.

ironman-diagnostics-20150716-0803.zip

Link to comment

I have just had a look at the console screen, it has the output below.

 

cp: cannot stat '/boot/config/*.conf' : no such file or directory

 

i had a look at some other threads, and typing 'diagnostics' gives the same output, but does create the diagnostics file in /boot/logs

 

My flash is an Imation Nano Pro 8GB, i have seen some suggestions of reformatting and  putting unraid back onto the flash. Is that a likely solution?

Link to comment

Formatting the flash and preparing like a new install will eliminate a likely source of corruption. Take a screenshot that shows the current disk assignments. Assign the disks by serial number to the same locations. The data on the drives will not be effected.

 

Find dockers or plugins that will work under v6. Some of the current config files of the add-ons may still be useful.

Link to comment

Hi All,

 

Just an update, not really sure what has (hopefully) fixed this,

But since doing the BIOS upgrade i had ~2 days stable at work, and have taken it home and had it running for just over 3 days so far, all seems to be running nicely, vm and heaps of dockers running without a drama

 

Happy Days!

 

Cheers

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.