Jump to content

Server Unresponsive


Recommended Posts

Good day all,

   Disclaimer, I'm new to this and honestly fumbling along as best as I can.  I have exhausted multiple resources prior to posting here.

My system specs are in my signature.   

I built my server initially with a Supermicro x9SCL mobo / xeon e3-1230v2 and 16gb ram.  It ran well, but transcoding was an issue.  I decided to replace a few items.  I changed to the X11SSH-F mobo, xeon e3-1285v6 and 64GB UDIMM.  I then added the Tesla p4.  It is a headless server.  With the new setup, it runs good and the TDARR and Plex containers are utilizing the GPU for transcoding now. However, when I go to bed, I wake up and I can no longer access the server from anywhere.  The MAC address doesn't even populate in the router. I end up shutting off the power to the server and rebooting.  This will allow me to access it from a desktop again.  Sometimes I can access it for hours, and sometimes it locks up again in minutes. As of right now, it's been up for 6h57m.  It's like when it's not being accessed, it just times out.  I have brought it up to my office and connected it to a monitor and no difference.  

I have tried another mobo, tried new ram, reloaded everything one step at a time.  The only thing I haven't replaced is the cpu.  It this a config issue or a hardware issue?  I have absolutely no idea and am unsure where to find more info. 

 

Any help is greatly appreciated!!

Screen Shot 2024-02-13 at 3.19.53 PM.png

Screen Shot 2024-02-13 at 3.29.26 PM.png

Link to comment

More specifically, here is a snippet of the log that I think might be an issue that I have no idea to resolve.  

 

Feb 14 09:38:01 GOLDRAID root: Fix Common Problems Version 2024.01.18

Feb 14 09:38:02 GOLDRAID root: Fix Common Problems: Warning: Share appdata set to cache-only, but files / folders exist on the array

Feb 14 09:38:02 GOLDRAID root: Fix Common Problems: Warning: Docker Application bazarr has an update available for it

Feb 14 09:38:10 GOLDRAID kernel: ------------[ cut here ]------------

Feb 14 09:38:10 GOLDRAID kernel: WARNING: CPU: 2 PID: 270 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]

Feb 14 09:38:10 GOLDRAID kernel: Modules linked in: nvidia_uvm(PO) macvlan xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag jc42 regmap_i2c ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp bridge stp llc bonding tls igb intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) nvidia_modeset(PO) kvm_intel i915 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 ast sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd drm_vram_helper iosf_mbi drm_ttm_helper drm_buddy ipmi_ssif cryptd nvidia(PO) drm_display_helper rapl ttm mpt3sas drm_kms_helper intel_cstate raid_class intel_uncore drm intel_gtt i2c_i801 joydev i2c_smbus scsi_transport_sas agpgart syscopyarea ahci Feb 14 09:38:10 GOLDRAID kernel: i2c_algo_bit mei_me input_leds acpi_ipmi led_class video sysfillrect i2c_core mei libahci sysimgblt intel_pch_thermal wmi fb_sys_fops thermal fan ipmi_si backlight intel_pmc_core acpi_power_meter acpi_pad button unix [last unloaded: igb]

Feb 14 09:38:10 GOLDRAID kernel: CPU: 2 PID: 270 Comm: kworker/u16:5 Tainted: P IO 6.1.64-Unraid #1

Feb 14 09:38:10 GOLDRAID kernel: Hardware name: Supermicro PIO-1UUP-ND1-AI036/X11SSH-F, BIOS 3.0 06/06/2023

Feb 14 09:38:10 GOLDRAID kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]

Feb 14 09:38:10 GOLDRAID kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]

Feb 14 09:38:10 GOLDRAID kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01

Feb 14 09:38:10 GOLDRAID kernel: RSP: 0000:ffffc90000178d98 EFLAGS: 00010202

Feb 14 09:38:10 GOLDRAID kernel: RAX: 0000000000000001 RBX: ffff88879ccdf800 RCX: 0c5d9175f44708e9

Feb 14 09:38:10 GOLDRAID kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88879ccdf800

Feb 14 09:38:10 GOLDRAID kernel: RBP: 0000000000000001 R08: fe5f0278be059942 R09: 76ba4064315dc017

Feb 14 09:38:10 GOLDRAID kernel: R10: d444fff6b20efcb2 R11: ffffc90000178d60 R12: ffffffff82a14d00

Feb 14 09:38:10 GOLDRAID kernel: R13: 0000000000034e1b R14: ffff888416ba0400 R15: 0000000000000000

Feb 14 09:38:10 GOLDRAID kernel: FS: 0000000000000000(0000) GS:ffff889055280000(0000) knlGS:0000000000000000

Feb 14 09:38:10 GOLDRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Feb 14 09:38:10 GOLDRAID kernel: CR2: 00001fb78b21b000 CR3: 00000002e2362001 CR4: 00000000003706e0

Feb 14 09:38:10 GOLDRAID kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Feb 14 09:38:10 GOLDRAID kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Feb 14 09:38:10 GOLDRAID kernel: Call Trace:

Feb 14 09:38:10 GOLDRAID kernel: <IRQ>

Feb 14 09:38:10 GOLDRAID kernel: ? __warn+0xab/0x122

Feb 14 09:38:10 GOLDRAID kernel: ? report_bug+0x109/0x17e

Feb 14 09:38:10 GOLDRAID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]

Feb 14 09:38:10 GOLDRAID kernel: ? handle_bug+0x41/0x6f

Feb 14 09:38:10 GOLDRAID kernel: ? exc_invalid_op+0x13/0x60

Feb 14 09:38:10 GOLDRAID kernel: ? asm_exc_invalid_op+0x16/0x20

Feb 14 09:38:10 GOLDRAID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]

Feb 14 09:38:10 GOLDRAID kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]

Feb 14 09:38:10 GOLDRAID kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat]

Feb 14 09:38:10 GOLDRAID kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]

Feb 14 09:38:10 GOLDRAID kernel: nf_hook_slow+0x3a/0x96

Feb 14 09:38:10 GOLDRAID kernel: ? ip_protocol_deliver_rcu+0x164/0x164

Feb 14 09:38:10 GOLDRAID kernel: NF_HOOK.constprop.0+0x79/0xd9

Feb 14 09:38:10 GOLDRAID kernel: ? ip_protocol_deliver_rcu+0x164/0x164

Feb 14 09:38:10 GOLDRAID kernel: __netif_receive_skb_one_core+0x77/0x9c

Feb 14 09:38:10 GOLDRAID kernel: process_backlog+0x8c/0x116

Feb 14 09:38:10 GOLDRAID kernel: __napi_poll.constprop.0+0x28/0x124

Feb 14 09:38:10 GOLDRAID kernel: net_rx_action+0x159/0x24f

Feb 14 09:38:10 GOLDRAID kernel: __do_softirq+0x126/0x288

Feb 14 09:38:10 GOLDRAID kernel: do_softirq+0x7f/0xab

Feb 14 09:38:10 GOLDRAID kernel: </IRQ>

Feb 14 09:38:10 GOLDRAID kernel: <TASK>

Feb 14 09:38:10 GOLDRAID kernel: __local_bh_enable_ip+0x4c/0x6b

Feb 14 09:38:10 GOLDRAID kernel: netif_rx+0x52/0x5a

Feb 14 09:38:10 GOLDRAID kernel: macvlan_broadcast+0x10a/0x150 [macvlan]

Feb 14 09:38:10 GOLDRAID kernel: ? _raw_spin_unlock+0x14/0x29

Feb 14 09:38:10 GOLDRAID kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

Feb 14 09:38:10 GOLDRAID kernel: process_one_work+0x1a8/0x295

Feb 14 09:38:10 GOLDRAID kernel: worker_thread+0x18b/0x244

Feb 14 09:38:10 GOLDRAID kernel: ? rescuer_thread+0x281/0x281

Feb 14 09:38:10 GOLDRAID kernel: kthread+0xe4/0xef

Feb 14 09:38:10 GOLDRAID kernel: ? kthread_complete_and_exit+0x1b/0x1b

Feb 14 09:38:10 GOLDRAID kernel: ret_from_fork+0x1f/0x30

Feb 14 09:38:10 GOLDRAID kernel: </TASK>

Feb 14 09:38:10 GOLDRAID kernel: ---[ end trace 0000000000000000 ]---

Feb 14 09:38:12 GOLDRAID root: Fix Common Problems: Warning: Syslog mirrored to flash

Link to comment
39 minutes ago, GatorMB said:

Feb 14 09:38:10 GOLDRAID kernel: macvlan_broadcast+0x10a/0x150 [macvlan]

Feb 14 09:38:10 GOLDRAID kernel: ? _raw_spin_unlock+0x14/0x29

Feb 14 09:38:10 GOLDRAID kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

 

Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot.

Link to comment
1 hour ago, GatorMB said:

Feb 14 09:38:02 GOLDRAID root: Fix Common Problems: Warning: Share appdata set to cache-only, but files / folders exist on the array

Your appdata share has files on the array, and that share isn't configured to allow mover to move it to cache.

 

Also, your system share is all on the array.

 

Ideally, appdata, domains, and system shares would have all files on fast pool such as cache, with nothing on the array, and configured to stay there, so Dockers/VMs will perform better, and so array disks can spin down since these files are always open.

 

Nothing can move open files. You will have to disable Docker and VM Manager in Settings to get these moved to cache.

Link to comment
1 minute ago, trurl said:
docker network ls

NETWORK ID     NAME      DRIVER    SCOPE

400cca84e0d2   br0       ipvlan    local

06eb05e8c95b   bridge    bridge    local

74b5ac950c5a   host      host      local

8c0d0933753b   none      null      local

Link to comment
4 hours ago, trurl said:

Your appdata share has files on the array, and that share isn't configured to allow mover to move it to cache.

 

Also, your system share is all on the array.

 

Ideally, appdata, domains, and system shares would have all files on fast pool such as cache, with nothing on the array, and configured to stay there, so Dockers/VMs will perform better, and so array disks can spin down since these files are always open.

 

Nothing can move open files. You will have to disable Docker and VM Manager in Settings to get these moved to cache.

 

I went to the bash command and made sure all appdata / domains / system shares  were moved to the cache using:

rsync -av --remove-source-files /mnt/disk2/appdata/ /mnt/cache/appdata/

It moved all files.  I then removed all empty folders left behind:
find /mnt/disk2/appdata/ -type d -empty -delete

I corrected appdata folder permissions:
chmod -R 755 /mnt/cache/appdata/

chown -R nobody:users /mnt/cache/appdata/

 

I made sure that the appdata / domains / system shares were all now cache only in my shares menu.

I made sure that the data & iCloud-drive-sync shares were all now pointing to array in my shares menu.

I reloaded the nvidia driver, and installed the gpu statistics plug-in.

 

I removed changed the macvlan to ipvlan.

I have checked the cache pool and It's still pretty full (I think).  I have a 2tb ssd and a 256GB ssd.  Do I need a larger cache pool?

What am I missing?

And before I forget, huge thanks to trurl & JorgeB for all your help. I really appreciate the time you are taking!

Link to comment
34 minutes ago, GatorMB said:

I corrected appdata folder permissions:
chmod -R 755 /mnt/cache/appdata/

chown -R nobody:users /mnt/cache/appdata/

That might not be appropriate for all containers. That is why there is a Docker Safe New Permissions, so New Permissions can be run while excluding appdata.

Link to comment

So yesterday I woke up to an unresponsive server again.  No network connectivity, nothing.  So, I decided to try 2 more things.  I changed the PSU from a 600w to a new corsair 850 80+ Gold.  I then added a corsair water cooler for the cpu.  Again this am, I woke to a non-responsive server.  But this time it was still showing on the router as connected.

If it crashes over the weekend, I'll upload a new set of logs.  

goldraid-diagnostics-20240216-2022.zip syslog-2.txt

Link to comment

@GatorMB Do you have a idle session to Unraids WebUI opened somewhere on your network? I have an issue the server freezing randomly if I have a websession opened from a Windows box with Firefox. Randomly every 1-2 days with root logged in to Unraid, the server will freeze without any errors catched in the logs. If I log off or shutdown the Windows pc the server won't crash.

Link to comment
1 hour ago, bastl said:

@GatorMB Do you have a idle session to Unraids WebUI opened somewhere on your network? I have an issue the server freezing randomly if I have a websession opened from a Windows box with Firefox. Randomly every 1-2 days with root logged in to Unraid, the server will freeze without any errors catched in the logs. If I log off or shutdown the Windows pc the server won't crash.

No, I don't leave any active connections.  It's a headless server and I only log in to run a process or to try to figure out an use such as this.  I'm going to enable the IPMI function and connect that lan port for diagnostics later this weekend.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...