Jump to content

Unraid server becomes unresponsive


Recommended Posts

Hello guys, I need help with this problem that happened to my server. Please note it had already happened once, maybe 1-2 months ago, then never happened again until yesterday. Please find attached my diagnostics zip.

 

I don't understand how, but the machine enters an "unresponsive status": while still being powered on it is not possible to access web ui, containers ui, smb share, nothing works. If i try to ping it it says unreachable. To my ears, HDDs seem to be spin down.

The only solution (unfortunately) seems to be hard shut down, then power on again. Last time it worked, server did parity check after and there were no errors (this time i managed to reboot it again, parity check is in progress).

 

After the first time I enabled local rsyslog to a rpi, so I was able to collect the full log, which as far as I can understand doesn't tell much.

 

I am sure that the machine was working fine until 19:15, then I unfortunately was not at home until this morning.

 

In kernel.log there is a huge series of the two lines below, then (i suppose at the time of the crash, unfotrunately I was not home) it interrupts and next line is the power on of this morning after the hard sutdown.

2023-01-08T23:19:21+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Enabling MPC IRBNCE
2023-01-08T23:19:21+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Intel PCH root port ACS workaround enabled
2023-01-09T11:17:29+00:00 BerroServer kernel: md: unRAID driver 2.9.25 installed

It looks like 23:19 might have been the time of the crash (?)

 

Another file in rsyslog folder, named ".log", maybe tells a bit more. Here's the tail:

2023-01-08T16:26:04+00:00 BerroServer  emhttpd: read SMART /dev/sdg
2023-01-08T16:26:11+00:00 BerroServer  emhttpd: read SMART /dev/sde
2023-01-08T16:27:34+00:00 BerroServer  shfs: share cache full
2023-01-08T16:27:34+00:00 BerroServer  emhttpd: read SMART /dev/sdf
2023-01-08T16:27:41+00:00 BerroServer  shfs: share cache full
2023-01-08T16:27:49+00:00 BerroServer  message repeated 148 times: [ shfs: share cache full]
2023-01-08T16:32:48+00:00 BerroServer  shfs: share cache full
2023-01-08T16:32:58+00:00 BerroServer  message repeated 9 times: [ shfs: share cache full]
2023-01-08T16:57:45+00:00 BerroServer  emhttpd: spinning down /dev/sdg
2023-01-08T17:05:15+00:00 BerroServer  emhttpd: spinning down /dev/sde
2023-01-08T17:05:15+00:00 BerroServer  emhttpd: spinning down /dev/sdf
2023-01-09T11:17:28+00:00 BerroServer  sshd[7822]: Server listening on 0.0.0.0 port 22.

In this log the last entry before the reboot is dated 17:05, so way before the supposed crash time of 23:19.

 

I have looked up in the forum about the message "shfs: share cache full" and it looks like it shouldn't be the cause of this problem.

The only similar issue i found was this post unraid-became-mostly-unresponsive which unfortunately led to nowhere because there was no log.

 

If You need, i can provide the full zip export of rsyslog.

 

Could it be a hardware related problem? I do have 4x8GB ECC RAM, is it useful to run a memtest (after parity-check completion of course)?

 

Thanks in advance for the support.

 

berroserver-diagnostics-20230109-1238.zip

Link to comment
37 minutes ago, aleberro said:

"shfs: share cache full" and it looks like it shouldn't be the cause of this problem.

It should not.

 

Without anything logged there aren't many clues, try disabling PCIe ACS override, you can also try to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
  • 4 weeks later...

First of all, thank You @JorgeB for the support.

 

System worked flawlessly until yesterday. Please notice that in the meanwhile I replaced a HDD which began showing SMART errors. New HDD has been working without any issues for two weeks, I guess the old drive was not related to this "unresponsive" thing which happened again.

 

This time kernel log has more info, please find attached the diagnostics .zip. Here's the last lines of the log, which as far as I can understand, maybe can be useful in investigating the problem:

 

Spoiler

2023-02-04T09:48:34+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Enabling MPC IRBNCE
2023-02-04T09:48:34+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Intel PCH root port ACS workaround enabled
2023-02-04T09:49:35+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Enabling MPC IRBNCE
2023-02-04T09:49:35+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Intel PCH root port ACS workaround enabled
2023-02-04T09:50:36+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Enabling MPC IRBNCE
2023-02-04T09:50:36+00:00 BerroServer kernel: pcieport 0000:00:1c.1: Intel PCH root port ACS workaround enabled
2023-02-04T09:50:44+00:00 BerroServer kernel: general protection fault, probably for non-canonical address 0xa2948afb5bf76207: 0000 [#1] PREEMPT SMP PTI
2023-02-04T09:50:44+00:00 BerroServer kernel: CPU: 3 PID: 18557 Comm: app Tainted: G        W         5.19.17-Unraid #2
2023-02-04T09:50:44+00:00 BerroServer kernel: Hardware name: ASUSTeK COMPUTER INC. P9D-M Series/P9D-M Series, BIOS 2101 04/20/2018
2023-02-04T09:50:44+00:00 BerroServer kernel: RIP: 0010:nf_nat_setup_info+0x142/0x7b1 [nf_nat]
2023-02-04T09:50:44+00:00 BerroServer kernel: Code: 4c 89 f7 e8 2f f8 ff ff 48 8b 15 66 6a 00 00 89 c0 48 8d 04 c2 4c 8b 28 4d 85 ed 74 2a 49 81 ed 90 00 00 00 eb 21 8a 44 24 46 <41> 38 45 46 74 21 49 8b 95 90 00 00 00 48 85 d2 0f 84 53 ff ff ff
2023-02-04T09:50:44+00:00 BerroServer kernel: RSP: 0018:ffffc90000178730 EFLAGS: 00010282
2023-02-04T09:50:44+00:00 BerroServer kernel: RAX: ffff888103edf511 RBX: ffff8881a113b100 RCX: 469e93f3514ea88e
2023-02-04T09:50:44+00:00 BerroServer kernel: RDX: a2948afb5bf76251 RSI: d1e865f3880a7926 RDI: 1f64589d6c3d4144
2023-02-04T09:50:44+00:00 BerroServer kernel: RBP: ffffc900001787f8 R08: d776c335d6b7943a R09: 91b317315f4e5ab5
2023-02-04T09:50:44+00:00 BerroServer kernel: R10: 2d5ac8d98b98afa7 R11: ce13e7c889e48066 R12: ffffc9000017880c
2023-02-04T09:50:44+00:00 BerroServer kernel: R13: a2948afb5bf761c1 R14: ffffffff82909480 R15: 0000000000000000
2023-02-04T09:50:44+00:00 BerroServer kernel: FS:  000000c000380090(0000) GS:ffff88880fcc0000(0000) knlGS:0000000000000000
2023-02-04T09:50:44+00:00 BerroServer kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-02-04T09:50:44+00:00 BerroServer kernel: ? __ip_finish_output+0x144/0x144
2023-02-04T09:50:44+00:00 BerroServer kernel: nf_hook+0xdf/0x110
2023-02-04T09:50:44+00:00 BerroServer kernel: CR2: 000000c00040e000 CR3: 00000001870d4002 CR4: 00000000001706e0
2023-02-04T09:50:44+00:00 BerroServer kernel: ? ethnl_parse_bit+0xce/0x202
2023-02-04T09:50:44+00:00 BerroServer kernel: ? __ip_finish_output+0x144/0x144
2023-02-04T09:50:44+00:00 BerroServer kernel: ip_output+0x78/0x88
2023-02-04T09:50:44+00:00 BerroServer kernel: Call Trace:
2023-02-04T09:50:44+00:00 BerroServer kernel: <IRQ>
2023-02-04T09:50:44+00:00 BerroServer kernel: ? krealloc+0x7f/0x90
2023-02-04T09:50:44+00:00 BerroServer kernel: nf_nat_masquerade_ipv4+0x114/0x13c [nf_nat]
2023-02-04T09:50:44+00:00 BerroServer kernel: masquerade_tg+0x48/0x66 [xt_MASQUERADE]
2023-02-04T09:50:44+00:00 BerroServer kernel: ipt_do_table+0x51e/0x5bf [ip_tables]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? xt_write_recseq_end+0xf/0x1c [ip_tables]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? __local_bh_enable_ip+0x56/0x6b
2023-02-04T09:50:44+00:00 BerroServer kernel: ? __ip_finish_output+0x144/0x144
2023-02-04T09:50:44+00:00 BerroServer kernel: ip_sabotage_in+0x4a/0x58 [br_netfilter]
2023-02-04T09:50:44+00:00 BerroServer kernel: nf_hook_slow+0x3d/0x96
2023-02-04T09:50:44+00:00 BerroServer kernel: ? ip_rcv_finish_core.constprop.0+0x3b7/0x3b7
2023-02-04T09:50:44+00:00 BerroServer kernel: NF_HOOK.constprop.0+0x79/0xd9
2023-02-04T09:50:44+00:00 BerroServer kernel: ? ip_rcv_finish_core.constprop.0+0x3b7/0x3b7
2023-02-04T09:50:44+00:00 BerroServer kernel: __netif_receive_skb_one_core+0x77/0x9c
2023-02-04T09:50:44+00:00 BerroServer kernel: ? ipt_do_table+0x57a/0x5bf [ip_tables]
2023-02-04T09:50:44+00:00 BerroServer kernel: netif_receive_skb+0xbf/0x127
2023-02-04T09:50:44+00:00 BerroServer kernel: br_handle_frame_finish+0x476/0x4b0 [bridge]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
2023-02-04T09:50:44+00:00 BerroServer kernel: br_nf_hook_thresh+0xe5/0x109 [br_netfilter]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
2023-02-04T09:50:44+00:00 BerroServer kernel: nf_nat_inet_fn+0x126/0x1a8 [nf_nat]
2023-02-04T09:50:44+00:00 BerroServer kernel: nf_nat_ipv4_out+0x15/0x91 [nf_nat]
2023-02-04T09:50:44+00:00 BerroServer kernel: nf_hook_slow+0x3d/0x96
2023-02-04T09:50:44+00:00 BerroServer kernel: br_nf_pre_routing_finish+0x2c1/0x2ec [br_netfilter]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? NF_HOOK.isra.0+0xe4/0x140 [br_netfilter]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
2023-02-04T09:50:44+00:00 BerroServer kernel: br_nf_pre_routing+0x226/0x23a [br_netfilter]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? br_nf_hook_thresh+0x109/0x109 [br_netfilter]
2023-02-04T09:50:44+00:00 BerroServer kernel: br_handle_frame+0x27f/0x2e7 [bridge]
2023-02-04T09:50:44+00:00 BerroServer kernel: ? br_pass_frame_up+0xdd/0xdd [bridge]
2023-02-04T09:50:44+00:00 BerroServer kernel: __netif_receive_skb_core.constprop.0+0x4f9/0x6e3
2023-02-04T09:50:44+00:00 BerroServer kernel: ? dequeue_load_avg+0x30/0x6d
2023-02-04T09:50:44+00:00 BerroServer kernel: ? enqueue_entity+0x150/0x1ae
2023-02-04T09:50:44+00:00 BerroServer kernel: __netif_receive_skb_one_core+0x40/0x9c
2023-02-04T09:50:44+00:00 BerroServer kernel: process_backlog+0x8c/0x116
2023-02-04T09:50:44+00:00 BerroServer kernel: __napi_poll.constprop.0+0x2b/0x124
2023-02-04T09:50:44+00:00 BerroServer kernel: net_rx_action+0x159/0x24f
2023-02-04T09:50:44+00:00 BerroServer kernel: ? _raw_spin_lock_irq+0x19/0x22
2023-02-04T09:50:44+00:00 BerroServer kernel: __do_softirq+0x129/0x288
2023-02-04T09:50:44+00:00 BerroServer kernel: do_softirq+0x7f/0xab
2023-02-04T09:50:44+00:00 BerroServer kernel: </IRQ>
2023-02-04T09:50:44+00:00 BerroServer kernel: <TASK>
2023-02-04T09:50:44+00:00 BerroServer kernel: __local_bh_enable_ip+0x4c/0x6b
2023-02-04T09:50:44+00:00 BerroServer kernel: ip_finish_output2+0x37d/0x3b0
2023-02-04T09:50:44+00:00 BerroServer kernel: ip_send_skb+0x15/0x3b
2023-02-04T09:50:44+00:00 BerroServer kernel: udp_send_skb+0x278/0x2e6
2023-02-04T09:50:44+00:00 BerroServer kernel: udp_sendmsg+0x72c/0x991
2023-02-04T09:50:44+00:00 BerroServer kernel: ? ip_neigh_gw4+0x8b/0x8b
2023-02-04T09:50:44+00:00 BerroServer kernel: ? sched_clock_cpu+0x12/0xa1
2023-02-04T09:50:44+00:00 BerroServer kernel: ? __smp_call_single_queue+0x23/0x35
2023-02-04T09:50:44+00:00 BerroServer kernel: ? ttwu_queue_wakelist+0x9a/0xcf
2023-02-04T09:50:44+00:00 BerroServer kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
2023-02-04T09:50:44+00:00 BerroServer kernel: ? try_to_wake_up+0x20e/0x248
2023-02-04T09:50:44+00:00 BerroServer kernel: ? sock_sendmsg_nosec+0x2b/0x40
2023-02-04T09:50:44+00:00 BerroServer kernel: sock_sendmsg_nosec+0x2b/0x40
2023-02-04T09:50:44+00:00 BerroServer kernel: sock_write_iter+0x89/0xb8
2023-02-04T09:50:44+00:00 BerroServer kernel: new_sync_write+0x7f/0xbb
2023-02-04T09:50:44+00:00 BerroServer kernel: vfs_write+0xda/0x129
2023-02-04T09:50:44+00:00 BerroServer kernel: ksys_write+0x76/0xc2
2023-02-04T09:50:44+00:00 BerroServer kernel: ? fpregs_assert_state_consistent+0x1d/0x41
2023-02-04T09:50:44+00:00 BerroServer kernel: do_syscall_64+0x6b/0x81
2023-02-04T09:50:44+00:00 BerroServer kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
2023-02-04T09:50:44+00:00 BerroServer kernel: RIP: 0033:0x40394e
2023-02-04T09:50:44+00:00 BerroServer kernel: Code: 48 89 6c 24 38 48 8d 6c 24 38 e8 0d 00 00 00 48 8b 6c 24 38 48 83 c4 40 c3 cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
2023-02-04T09:50:44+00:00 BerroServer kernel: RSP: 002b:000000c0001a6198 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
2023-02-04T09:50:44+00:00 BerroServer kernel: RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 000000000040394e
2023-02-04T09:50:44+00:00 BerroServer kernel: RDX: 000000000000002f RSI: 000000c00020e002 RDI: 0000000000000009
2023-02-04T09:50:44+00:00 BerroServer kernel: RBP: 000000c0001a61d8 R08: 0000000000000000 R09: 0000000000000000
2023-02-04T09:50:44+00:00 BerroServer kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 000000c0001a6318
2023-02-04T09:50:44+00:00 BerroServer kernel: R13: 0000000000000000 R14: 000000c000032340 R15: 000014c71145d038
2023-02-04T09:50:44+00:00 BerroServer kernel: </TASK>
2023-02-04T09:50:44+00:00 BerroServer kernel: Modules linked in: tcp_diag udp_diag inet_diag xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_nat xt_tcpudp veth macvlan xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod nct6775 nct6775_core hwmon_vid wmi jc42 iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls igb x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ast kvm drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd drm rapl intel_cstate agpgart i2c_i801 syscopyarea i2c_algo_bit sysfillrect i2c_smbus ahci
2023-02-04T09:50:44+00:00 BerroServer kernel: sysimgblt intel_uncore ipmi_si i2c_core fb_sys_fops libahci thermal fan video backlight button unix [last unloaded: igb]
2023-02-04T09:50:44+00:00 BerroServer kernel: ---[ end trace 0000000000000000 ]---
2023-02-04T09:50:44+00:00 BerroServer kernel: RIP: 0010:nf_nat_setup_info+0x142/0x7b1 [nf_nat]
2023-02-04T09:50:44+00:00 BerroServer kernel: Code: 4c 89 f7 e8 2f f8 ff ff 48 8b 15 66 6a 00 00 89 c0 48 8d 04 c2 4c 8b 28 4d 85 ed 74 2a 49 81 ed 90 00 00 00 eb 21 8a 44 24 46 <41> 38 45 46 74 21 49 8b 95 90 00 00 00 48 85 d2 0f 84 53 ff ff ff
2023-02-04T09:50:44+00:00 BerroServer kernel: RSP: 0018:ffffc90000178730 EFLAGS: 00010282
2023-02-04T09:50:44+00:00 BerroServer kernel: RAX: ffff888103edf511 RBX: ffff8881a113b100 RCX: 469e93f3514ea88e
2023-02-04T09:50:44+00:00 BerroServer kernel: RDX: a2948afb5bf76251 RSI: d1e865f3880a7926 RDI: 1f64589d6c3d4144
2023-02-04T09:50:44+00:00 BerroServer kernel: RBP: ffffc900001787f8 R08: d776c335d6b7943a R09: 91b317315f4e5ab5
2023-02-04T09:50:44+00:00 BerroServer kernel: R10: 2d5ac8d98b98afa7 R11: ce13e7c889e48066 R12: ffffc9000017880c
2023-02-04T09:50:44+00:00 BerroServer kernel: R13: a2948afb5bf761c1 R14: ffffffff82909480 R15: 0000000000000000
2023-02-04T09:50:44+00:00 BerroServer kernel: FS:  000000c000380090(0000) GS:ffff88880fcc0000(0000) knlGS:0000000000000000
2023-02-04T09:50:44+00:00 BerroServer kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2023-02-04T09:50:44+00:00 BerroServer kernel: CR2: 000000c00040e000 CR3: 00000001870d4002 CR4: 00000000001706e0

 

Is there something useful in the attached log?

 

Thanks in advance.

berroserver-diagnostics-20230205-1844.zip

Link to comment
  • 6 months later...
  • 2 weeks later...
  • 1 month later...
On 2/6/2023 at 4:43 AM, JorgeB said:

[RESOLVED] ]Try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)). 

 

Do we know why this would be an issue? Mine suddenly became unresponsive few weeks back and tried, save mode, save mode, without plugins and the only fix was reboot as I couldn't login via ssh, gui, nor console/video, switched the docker to ipvlan and so far so good, additionally I also disabled docker/vms. 

 

Hopefully this sticks and fixed my issue as well. 

 

Link to comment
  • 2 months later...

Mine was good for 4 days and started again, I also tested with all VMS off and dockers. Same thing, I also spent 2 days on mem86 testing all ram/cpu and cleared. I'm leaning towards maybe a bad USB? Also upgraded the firmware for the motherboard as it was 4 versions behind.

Link to comment
  • 2 months later...
  • 5 months later...

I started looking at the router that the server is connected to.  Noticed that the lights were off on that particular port.
Disconnected, unplugged the ethernet cable and putting it back in did not work, but what worked is disconnecting the ethernet cable from back of the server/PC and re-inserting it got the lights back in, and I can connect immediately back to RDP, SSH, and the GUI.

 

Going through logs there is / was a docker that was messing with the same lan adapter or something to that extend. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...