6.11.4 hangs on HPE ProLiant MicroServer Gen10 Plus

CorruptComputer · November 21, 2022

I tried to log into my server this morning to upgrade to the newly released 6.11.5 and noticed that it had hanged, and was returning either a 500 server error or timing out when trying to access the web interface. After rebooting through the iLO the syslog is just empty though, so I'm not sure what was happening.

Never had this happen before this version, so I don't think its hardware related as nothing has changed there. The server has been running perfectly for years at this point.

Does anyone know where the logs are stored so I can get a log of what happened? Going to wait to update in case that wipes the old logs out, and see if it happens again.

JorgeB · November 21, 2022

Enable the syslog server and post that if it happens again together with the complete diagnostics.

CorruptComputer · November 21, 2022

10 minutes ago, JorgeB said:

Enable the syslog server and post that if it happens again together with the complete diagnostics.

So there are no logs stored if this wasn't enabled before... Why is this not enabled by default? Seems quite useless to enable this AFTER a problem occurs, but I've enabled it now so hopefully it will give some insight as to what is going on if this happens again.

EDIT: Perhaps this info could be added to the setup guide? I feel like this is very important to enable system logging for when problems occur. https://wiki.unraid.net/Articles/Getting_Started

Edited November 21, 2022 by CorruptComputer

JonathanM · November 21, 2022

5 minutes ago, CorruptComputer said:

Why is this not enabled by default?

Because there is no good universal location to log to, every situation is slightly different. The only universal location that is guaranteed to exist is the flash drive itself, and that is a very poor choice for a logging location as all the constant writes put much wear and tear on the licensed USB. Probably the best option is to send the logs to a SSD, but not everyone uses one in their server, and the path can be different for any given install.

If you do choose to log to the boot USB, be sure to turn that off as soon as you can.

CorruptComputer · November 21, 2022

11 minutes ago, JonathanM said:

Because there is no good universal location to log to, every situation is slightly different. The only universal location that is guaranteed to exist is the flash drive itself, and that is a very poor choice for a logging location as all the constant writes put much wear and tear on the licensed USB. Probably the best option is to send the logs to a SSD, but not everyone uses one in their server, and the path can be different for any given install.

If you do choose to log to the boot USB, be sure to turn that off as soon as you can.

Yeah for sure, not logging to the USB. Added a new share and saving the logs there. How large can I generally expect them to be? I setup a rotation of 4 at 100mb each, do you think this is sufficient if I notice issues within a day or two of them happening?

Not sure if I had the edit in by the time you were replying, so I'll ask again. Should the syslog configuration be added to the setup guide so folks can get a log of issues when they happen?

CorruptComputer · April 4, 2023

Just had this happen to me again on 6.11.5 and got the following log:

Apr  1 01:02:51 neptune kernel: ------------[ cut here ]------------
Apr  1 01:02:51 neptune kernel: WARNING: CPU: 0 PID: 926 at net/netfilter/nf_conntrack_core.c:1208 __nf_conntrack_confirm+0xa5/0x2cb [nf_conntrack]
Apr  1 01:02:51 neptune kernel: Modules linked in: xt_mark xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igb x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel mgag200 ipmi_ssif drm_shmem_helper ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd cryptd drm rapl nvme intel_cstate intel_uncore backlight syscopyarea i2c_algo_bit acpi_ipmi sysfillrect sysimgblt i2c_core fb_sys_fops ahci nvme_core intel_pch_thermal wmi ipmi_si libahci acpi_tad acpi_power_meter button unix [last unloaded: igb]
Apr  1 01:02:51 neptune kernel: CPU: 0 PID: 926 Comm: kworker/0:1 Not tainted 5.19.17-Unraid #2
Apr  1 01:02:51 neptune kernel: Hardware name: HPE ProLiant MicroServer Gen10 Plus/ProLiant MicroServer Gen10 Plus, BIOS U48 07/14/2022
Apr  1 01:02:51 neptune kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Apr  1 01:02:51 neptune kernel: RIP: 0010:__nf_conntrack_confirm+0xa5/0x2cb [nf_conntrack]
Apr  1 01:02:51 neptune kernel: Code: c6 48 89 44 24 10 e8 dd e2 ff ff 8b 7c 24 04 89 da 89 c6 89 04 24 e8 56 e6 ff ff 84 c0 75 a2 48 8b 85 80 00 00 00 a8 08 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 16 de ff ff e8 2c e3 ff ff e9 7e 01
Apr  1 01:02:51 neptune kernel: RSP: 0018:ffffc90000003cf0 EFLAGS: 00010202
Apr  1 01:02:51 neptune kernel: RAX: 0000000000000188 RBX: 0000000000000000 RCX: ab746607d338df42
Apr  1 01:02:51 neptune kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa02fcccc
Apr  1 01:02:51 neptune kernel: RBP: ffff8881e9620900 R08: 9f6e9c8d27a2e914 R09: d373ae0a6dc13241
Apr  1 01:02:51 neptune kernel: R10: 9920e9b9d70536e0 R11: 0600604e2c14e251 R12: ffffffff82909480
Apr  1 01:02:51 neptune kernel: R13: 0000000000036d3d R14: ffff8881e9e71400 R15: 0000000000000000
Apr  1 01:02:51 neptune kernel: FS:  0000000000000000(0000) GS:ffff88885ec00000(0000) knlGS:0000000000000000
Apr  1 01:02:51 neptune kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  1 01:02:51 neptune kernel: CR2: 00001464ee5d8000 CR3: 000000000420a003 CR4: 00000000003706f0
Apr  1 01:02:51 neptune kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr  1 01:02:51 neptune kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr  1 01:02:51 neptune kernel: Call Trace:
Apr  1 01:02:51 neptune kernel: <IRQ>
Apr  1 01:02:51 neptune kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Apr  1 01:02:51 neptune kernel: nf_hook_slow+0x3a/0x96
Apr  1 01:02:51 neptune kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Apr  1 01:02:51 neptune kernel: NF_HOOK.constprop.0+0x79/0xd9
Apr  1 01:02:51 neptune kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Apr  1 01:02:51 neptune kernel: ip_sabotage_in+0x47/0x58 [br_netfilter]
Apr  1 01:02:51 neptune kernel: nf_hook_slow+0x3a/0x96
Apr  1 01:02:51 neptune kernel: ? ip_rcv_finish_core.constprop.0+0x3b7/0x3b7
Apr  1 01:02:51 neptune kernel: NF_HOOK.constprop.0+0x79/0xd9
Apr  1 01:02:51 neptune kernel: ? ip_rcv_finish_core.constprop.0+0x3b7/0x3b7
Apr  1 01:02:51 neptune kernel: __netif_receive_skb_one_core+0x77/0x9c
Apr  1 01:02:51 neptune kernel: process_backlog+0x8c/0x116
Apr  1 01:02:51 neptune kernel: __napi_poll.constprop.0+0x28/0x124
Apr  1 01:02:51 neptune kernel: net_rx_action+0x159/0x24f
Apr  1 01:02:51 neptune kernel: __do_softirq+0x126/0x288
Apr  1 01:02:51 neptune kernel: do_softirq+0x7f/0xab
Apr  1 01:02:51 neptune kernel: </IRQ>
Apr  1 01:02:51 neptune kernel: <TASK>
Apr  1 01:02:51 neptune kernel: __local_bh_enable_ip+0x4c/0x6b
Apr  1 01:02:51 neptune kernel: netif_rx+0x52/0x5a
Apr  1 01:02:51 neptune kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Apr  1 01:02:51 neptune kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Apr  1 01:02:51 neptune kernel: process_one_work+0x1a8/0x295
Apr  1 01:02:51 neptune kernel: worker_thread+0x18b/0x244
Apr  1 01:02:51 neptune kernel: ? rescuer_thread+0x281/0x281
Apr  1 01:02:51 neptune kernel: kthread+0xe4/0xef
Apr  1 01:02:51 neptune kernel: ? kthread_complete_and_exit+0x1b/0x1b
Apr  1 01:02:51 neptune kernel: ret_from_fork+0x1f/0x30
Apr  1 01:02:51 neptune kernel: </TASK>
Apr  1 01:02:51 neptune kernel: ---[ end trace 0000000000000000 ]---
Apr  1 05:00:15 neptune  crond[1103]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Apr  2 01:53:58 neptune webGUI: Successful login user root from 10.0.0.88
Apr  2 01:54:01 neptune  sSMTP[24591]: Creating SSL connection to host
Apr  2 01:54:01 neptune  sSMTP[24591]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384
Apr  2 01:54:03 neptune  sSMTP[24591]: Sent mail for [email protected] (221 Bye) uid=0 username=root outbytes=821
Apr  2 01:54:39 neptune flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update
Apr  2 01:55:01 neptune  sSMTP[26871]: Creating SSL connection to host
Apr  2 01:55:02 neptune  sSMTP[26871]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384
Apr  2 01:55:03 neptune  sSMTP[26871]: Sent mail for [email protected] (221 Bye) uid=0 username=root outbytes=848
Apr  2 04:40:01 neptune  apcupsd[1629]: apcupsd exiting, signal 15
Apr  2 04:40:01 neptune  apcupsd[1629]: apcupsd shutdown succeeded
Apr  2 04:40:03 neptune  apcupsd[17815]: apcupsd 3.14.14 (31 May 2016) slackware startup succeeded
Apr  2 04:40:03 neptune  apcupsd[17815]: NIS server startup succeeded
Apr  2 05:00:09 neptune  crond[1103]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Apr  3 04:00:07 neptune  avahi-daemon[2511]: Registering new address record for fe80::90bc:40ff:fe4f:443d on shim-br0.*.
Apr  3 04:00:12 neptune kernel: igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Down
Apr  3 04:00:12 neptune kernel: bond0: (slave eth0): link status definitely down, disabling slave
Apr  3 04:00:12 neptune kernel: device eth0 left promiscuous mode
Apr  3 04:00:12 neptune kernel: bond0: now running without any active interface!
Apr  3 04:00:12 neptune kernel: br0: port 1(bond0) entered disabled state
Apr  3 04:00:15 neptune  ntpd[1082]: Deleting interface #1 br0, 10.0.1.1#123, interface stats: received=1215, sent=1215, dropped=0, active_time=238051 secs
Apr  3 04:00:15 neptune  ntpd[1082]: 143.215.130.72 local addr 10.0.1.1 -> <null>
Apr  3 04:00:15 neptune  ntpd[1082]: 80.241.0.72 local addr 10.0.1.1 -> <null>
Apr  3 04:00:15 neptune  ntpd[1082]: 142.202.190.19 local addr 10.0.1.1 -> <null>
Apr  3 04:00:15 neptune  ntpd[1082]: 69.164.213.136 local addr 10.0.1.1 -> <null>
Apr  3 04:01:27 neptune kernel: igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Apr  3 04:01:27 neptune kernel: bond0: (slave eth0): link status definitely up, 1000 Mbps full duplex
Apr  3 04:01:27 neptune kernel: bond0: (slave eth0): making interface the new active one
Apr  3 04:01:27 neptune kernel: device eth0 entered promiscuous mode
Apr  3 04:01:27 neptune kernel: bond0: active interface up!
Apr  3 04:01:27 neptune kernel: br0: port 1(bond0) entered blocking state
Apr  3 04:01:27 neptune kernel: br0: port 1(bond0) entered forwarding state
Apr  3 04:01:29 neptune  ntpd[1082]: Listen normally on 3 br0 10.0.1.1:123
Apr  3 04:01:29 neptune  ntpd[1082]: new interface(s) found: waking up resolver
Apr  3 04:01:30 neptune  avahi-daemon[2511]: Withdrawing address record for fe80::90bc:40ff:fe4f:443d on shim-br0.
Apr  3 04:01:30 neptune  avahi-daemon[2511]: Registering new address record for fe80::90bc:40ff:fe4f:443d on shim-br0.*.
Apr  3 04:02:06 neptune  avahi-daemon[2511]: Withdrawing address record for fe80::90bc:40ff:fe4f:443d on shim-br0.
Apr  3 05:00:09 neptune  crond[1103]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Apr  4 02:52:06 neptune kernel: BUG: Bad rss-counter state mm:000000005de9be02 type:MM_SHMEMPAGES val:1
Apr  4 05:00:16 neptune  crond[1103]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Apr  4 08:31:50 neptune  emhttpd: Starting services...

If I had to guess from reading this, it looks like the mover crashed and took down the rest of the server with it.
My ping check shows it stopped responding around 4:40am and I restarted the server around 8:00am.

Edited April 4, 2023 by CorruptComputer

JorgeB · April 4, 2023

5 minutes ago, CorruptComputer said:

Apr  1 01:02:51 neptune kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Apr  1 01:02:51 neptune kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right))

CorruptComputer · April 4, 2023

Ah I see, thanks for letting me know!

I found the option you said to change, but for me it is not able to be changed:

JorgeB · April 4, 2023

10 minutes ago, CorruptComputer said:

but for me it is not able to be changed:

You need to stop the docker service first.

CorruptComputer · April 4, 2023

Ah, I see. I am currently running a parity check so I can't stop the array. I'll make that change tomorrow once the check finishes. Thank you for your help!

JorgeB · April 4, 2023

2 minutes ago, CorruptComputer said:

Ah, I see. I am currently running a parity check so I can't stop the array.

OK, but you don't need to stop the array, just the docker service, first option on that page.

6.11.4 hangs on HPE ProLiant MicroServer Gen10 Plus

Recommended Posts

CorruptComputer

Link to comment

JorgeB

Link to comment

CorruptComputer

Link to comment

JonathanM

Link to comment

CorruptComputer

Link to comment

CorruptComputer

Link to comment

JorgeB

Link to comment

CorruptComputer

Link to comment

JorgeB

Link to comment

CorruptComputer

Link to comment

JorgeB

Link to comment

Join the conversation