Jump to content

Crashes: I have access to the Console ONLY after crashes.


TedatTNT

Recommended Posts

I've recently changed EVERYTHING -- Unraid V6.1.2, new MB/CPU/Memory/Drives/Controllers. I added and then removed the docker functionality. My previous builds - V4 beta and later V5x - I rarely touched, so I'm not very educated in the ways of Unraid.

 

Unraid keeps crashing if I begin to pull or push a lot of data. I've been trying for days to copy 20-40 GB of data. During this, it invariably locks up on me. No GUI access, no Telnet access. Only console access.

 

I managed to complete a parity check a few days ago, but since then it has probably crashed/locked up on me about ten or twelve times.

 

I've finally got a syslog to share from the last crash. I'm not sure what else will be helpful, but I sure could use some help with this!

 

Syslog is attached.

 

Thanks in advance.

 

Ted

syslog-2015-09-22.txt

Link to comment

It's been up and running for a couple of hours -- this is from the web gui's log button:

 

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: sending REQUEST (xid 0x4e47068), next in 3.9 seconds

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: acknowledged 192.168.1.150 from 192.168.1.1

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: leased 192.168.1.150 for 1440 seconds

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: renew in 661 seconds, rebind in 1201 seconds

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: writing lease `/var/lib/dhcpcd/dhcpcd-eth0.lease'

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: IP address 192.168.1.150/24 already exists

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' RENEW

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: ARP announcing 192.168.1.150 (1 of 2), next in 2.0 seconds

Sep 22 12:52:15 UNRAID dhcpcd[1361]: eth0: ARP announcing 192.168.1.150 (2 of 2)

Sep 22 12:53:59 UNRAID dhcpcd[1361]: eth0: xid 0xbb48d5f5 is for hwaddr 00:18:dd:01:03:bf:00:00:00:00:00:00:00:00:00:00

Sep 22 12:55:06 UNRAID dhcpcd[1361]: eth0: xid 0x5a79e4d5 is for hwaddr 00:18:dd:01:0c:f0:00:00:00:00:00:00:00:00:00:00

Sep 22 12:55:50 UNRAID dhcpcd[1361]: eth0: xid 0x3a2d881b is for hwaddr 00:18:dd:31:f3:80:00:00:00:00:00:00:00:00:00:00

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: renewing lease of 192.168.1.150

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: rebind in 540 seconds, expire in 779 seconds

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: sending REQUEST (xid 0x61f9d8dd), next in 4.8 seconds

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: acknowledged 192.168.1.150 from 192.168.1.1

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: leased 192.168.1.150 for 1440 seconds

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: renew in 644 seconds, rebind in 1184 seconds

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: writing lease `/var/lib/dhcpcd/dhcpcd-eth0.lease'

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: IP address 192.168.1.150/24 already exists

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' RENEW

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: ARP announcing 192.168.1.150 (1 of 2), next in 2.0 seconds

Sep 22 13:03:16 UNRAID dhcpcd[1361]: eth0: ARP announcing 192.168.1.150 (2 of 2)

Sep 22 13:05:59 UNRAID dhcpcd[1361]: eth0: xid 0x46ba6fb3 is for hwaddr 00:18:dd:01:03:bf:00:00:00:00:00:00:00:00:00:00

Sep 22 13:07:06 UNRAID dhcpcd[1361]: eth0: xid 0x9c97f112 is for hwaddr 00:18:dd:01:0c:f0:00:00:00:00:00:00:00:00:00:00

Sep 22 13:07:50 UNRAID dhcpcd[1361]: eth0: xid 0x5b6da783 is for hwaddr 00:18:dd:31:f3:80:00:00:00:00:00:00:00:00:00:00

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: renewing lease of 192.168.1.150

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: rebind in 540 seconds, expire in 796 seconds

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: sending REQUEST (xid 0x1109532d), next in 4.6 seconds

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: acknowledged 192.168.1.150 from 192.168.1.1

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: leased 192.168.1.150 for 1440 seconds

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: renew in 657 seconds, rebind in 1197 seconds

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: writing lease `/var/lib/dhcpcd/dhcpcd-eth0.lease'

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: IP address 192.168.1.150/24 already exists

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: executing `/lib/dhcpcd/dhcpcd-run-hooks' RENEW

Sep 22 13:13:59 UNRAID dhcpcd[1361]: eth0: ARP announcing 192.168.1.150 (1 of 2), next in 2.0 seconds

Sep 22 13:14:01 UNRAID dhcpcd[1361]: eth0: ARP announcing 192.168.1.150 (2 of 2)

Sep 22 13:17:59 UNRAID dhcpcd[1361]: eth0: xid 0xaeda13b6 is for hwaddr 00:18:dd:01:03:bf:00:00:00:00:00:00:00:00:00:00

Sep 22 13:19:06 UNRAID dhcpcd[1361]: eth0: xid 0x5a3c7b8e is for hwaddr 00:18:dd:01:0c:f0:00:00:00:00:00:00:00:00:00:00

Sep 22 13:19:50 UNRAID dhcpcd[1361]: eth0: xid 0x4f9f9dcb is for hwaddr 00:18:dd:31:f3:80:00:00:00:00:00:00:00:00:00:00

Sep 22 13:21:32 UNRAID kernel: kvm: already loaded the other module

Sep 22 13:22:16 UNRAID emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog

Link to comment

A quick comment, without looking at the syslog, your DHCP server (probably your router) is configured to require IP renewal every 24 minutes(!), and effectively performing it every 11 minutes!  Perhaps someone tried to configure it for 24 hours and miscalculated, making it 24 minutes.  That should be changed to MUCH longer, preferably several days.  But also, you could avoid the problem by using a static IP, just set it to what you have been getting from the DHCP server (IP: 192.168.1.150, subnet mask: 255.255.255.0, gateway and DNS: 192.168.1.1).

 

You mentioned all the changes you have been making.  Another change to make, attach the diagnostics zip instead of the syslog.  ;)

 

Please see Need help? Read me first! for instructions.  The diagnostic zip contains the syslog plus much more, that would provide a complete picture of the system, including your networking setup.

 

Off-topic - I completely misinterpreted your topic heading.  I thought you were having crashes ONLY with your console, no others!  That I hadn't heard of before, and decide to check it out.

Link to comment

That's interesting about the least time on DHCP - it is set to 1440 minutes - 24 hours. I prefer to mange networking from the router and therefore assign IP's to MAC addresses on the network -- including the UNRAID server. Is there any reason to avoid this?

 

Perhaps the DHCP renewal timing is a glitch with the router's firmware. I'll change it just to see if that helps.

 

Regarding the diagnostics.zip -- It is my understanding that I need to obtain this PRIOR to restarting, is this correct? If so, After reading that Read Me, it seems I have a dilemma, as I can't access anything but the console when it goes down. Am I missing something?

 

My server JUST went down again, so if there is a way to get it, now would be the time!

 

Thanks,

 

Ted

Link to comment

The only problems I see are all related to networking, either configuration or an issue with the onboard NIC.  You mentioned the router "is set to 1440 minutes", but here's what your own syslog excerpt above is saying -

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: leased 192.168.1.150 for 1440 seconds

Sep 22 12:52:13 UNRAID dhcpcd[1361]: eth0: renew in 661 seconds, rebind in 1201 seconds

...

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: leased 192.168.1.150 for 1440 seconds

Sep 22 13:03:14 UNRAID dhcpcd[1361]: eth0: renew in 644 seconds, rebind in 1184 seconds

...

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: leased 192.168.1.150 for 1440 seconds

Sep 22 13:13:58 UNRAID dhcpcd[1361]: eth0: renew in 657 seconds, rebind in 1197 seconds

The router is giving you leases of 1440 seconds, not minutes so you may want to check the units the router is using.

 

There's no problem managing the IP's yourself from the router, and no need to change anything.  I do still recommend a static IP, partly because you are setting up an FTP server, but also because the networking is set up once on boot, and not done again over and over.  You see in your own excerpt above how much work is required each time, and it's redundant.  There's no problem leaving the reservation on the router just the way it is, as it leaves it reserved and reminds you of how it's set up.  But it's much less work and disruption on the server here, if it only sets it up once, for good, statically preset.

 

That's all minor though.  The real problem seems to be the Marvell networking chipset (using the sky2 driver).  There were over 600 packet overruns, something outside my expertise, and I don't know what causes that, hopefully someone else will.  But it's not right, extremely unusual.  Perhaps even more seriously, you had the following NIC crash -

Sep 22 14:50:46 UNRAID kernel: sky2 0000:03:00.0: error interrupt status=0x80000000

Sep 22 14:50:46 UNRAID kernel: sky2 0000:03:00.0: PCI hardware error (0x2010)

Sep 22 14:50:53 UNRAID kernel: ------------[ cut here ]------------

Sep 22 14:50:53 UNRAID kernel: WARNING: CPU: 0 PID: 21763 at net/sched/sch_generic.c:303 dev_watchdog+0x194/0x1fa()

Sep 22 14:50:53 UNRAID kernel: NETDEV WATCHDOG: eth0 (sky2): transmit queue 0 timed out

And a call trace follows, and the kernel recovers and continues on, but the ethernet carrier is dropped, and then goes up and down continuously.  That would severely impact any transfers over the net, and possibly timeout at the other end.

 

You do have an additional NIC onboard.  You might try disabling the first in the BIOS, and connecting to the second.  Otherwise, you may need to look into adding a networking card, the Intels are recommended.  If you do, disable the onboard NIC's in the BIOS.

 

Your BIOS is from 2009.  It's slightly possible that there's a newer BIOS, and that it has a firmware update for the onboard Marvell networking chipsets.

Link to comment

Wow, Rob J, Thank you SO much for all this information!

 

You've convinced me -- I'll stop using DHCP and use a hard-coded IP -- after I switch NIC's, disabling the other in the BIOS. /

 

I may even look at my old PCI cards -- I bet I have an Intel NIC that would work and could switch off those two.

 

I'm heading out of town for a couple of weeks, so I'll have to look back into this later.

 

Thanks again.

Link to comment

Even if it's only a short test time yet, that's a much cleaner system now!  You have already transferred over 4 to 5 times what the earlier syslog showed, and with zero overruns!  I would push it hard for a bit, try to make it crash.  It will either break, and you will replace whatever needs to be replaced, or you will gain a much higher confidence in it.

Link to comment

RobJ, I can't thank you enough for your help. I really had no idea where to begin but I was tired of rebooting it a few times a day. I'll be gone for a week and a half, but then I'll start pushing it hard again. And, if it locks up on me, I now know how to get a diagnostic from the console.

 

Thanks again!

 

Ted

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...