GatorMB Posted February 13 Share Posted February 13 Good day all, Disclaimer, I'm new to this and honestly fumbling along as best as I can. I have exhausted multiple resources prior to posting here. My system specs are in my signature. I built my server initially with a Supermicro x9SCL mobo / xeon e3-1230v2 and 16gb ram. It ran well, but transcoding was an issue. I decided to replace a few items. I changed to the X11SSH-F mobo, xeon e3-1285v6 and 64GB UDIMM. I then added the Tesla p4. It is a headless server. With the new setup, it runs good and the TDARR and Plex containers are utilizing the GPU for transcoding now. However, when I go to bed, I wake up and I can no longer access the server from anywhere. The MAC address doesn't even populate in the router. I end up shutting off the power to the server and rebooting. This will allow me to access it from a desktop again. Sometimes I can access it for hours, and sometimes it locks up again in minutes. As of right now, it's been up for 6h57m. It's like when it's not being accessed, it just times out. I have brought it up to my office and connected it to a monitor and no difference. I have tried another mobo, tried new ram, reloaded everything one step at a time. The only thing I haven't replaced is the cpu. It this a config issue or a hardware issue? I have absolutely no idea and am unsure where to find more info. Any help is greatly appreciated!! Quote Link to comment
trurl Posted February 13 Share Posted February 13 Attach Diagnostics to your NEXT post in this thread. Also setup syslog server. Quote Link to comment
GatorMB Posted February 13 Author Share Posted February 13 Hope this helps. goldraid-diagnostics-20240213-1617.zip goldraid-syslog-20240213-2221.zip Quote Link to comment
JorgeB Posted February 14 Share Posted February 14 Nothing obvious so far, post the persistent syslog after a crash. 11 hours ago, trurl said: Also setup syslog server. Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 I have enabled the persistent syslog server. I will hopefully have better info soon. I woke up this AM and it had locked up again. goldraid-diagnostics-20240214-0912.zip Quote Link to comment
JorgeB Posted February 14 Share Posted February 14 And where is it? It only comes with the diags if you chose the mirror to flash drive option. Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 syslog.txt I have enabled the syslog and when I get more data I will forward it after the next crash. For now, this is all I have. Are my shares setup incorrectly? Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 More specifically, here is a snippet of the log that I think might be an issue that I have no idea to resolve. Feb 14 09:38:01 GOLDRAID root: Fix Common Problems Version 2024.01.18 Feb 14 09:38:02 GOLDRAID root: Fix Common Problems: Warning: Share appdata set to cache-only, but files / folders exist on the array Feb 14 09:38:02 GOLDRAID root: Fix Common Problems: Warning: Docker Application bazarr has an update available for it Feb 14 09:38:10 GOLDRAID kernel: ------------[ cut here ]------------ Feb 14 09:38:10 GOLDRAID kernel: WARNING: CPU: 2 PID: 270 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Feb 14 09:38:10 GOLDRAID kernel: Modules linked in: nvidia_uvm(PO) macvlan xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag jc42 regmap_i2c ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp bridge stp llc bonding tls igb intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp nvidia_drm(PO) nvidia_modeset(PO) kvm_intel i915 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 ast sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd drm_vram_helper iosf_mbi drm_ttm_helper drm_buddy ipmi_ssif cryptd nvidia(PO) drm_display_helper rapl ttm mpt3sas drm_kms_helper intel_cstate raid_class intel_uncore drm intel_gtt i2c_i801 joydev i2c_smbus scsi_transport_sas agpgart syscopyarea ahci Feb 14 09:38:10 GOLDRAID kernel: i2c_algo_bit mei_me input_leds acpi_ipmi led_class video sysfillrect i2c_core mei libahci sysimgblt intel_pch_thermal wmi fb_sys_fops thermal fan ipmi_si backlight intel_pmc_core acpi_power_meter acpi_pad button unix [last unloaded: igb] Feb 14 09:38:10 GOLDRAID kernel: CPU: 2 PID: 270 Comm: kworker/u16:5 Tainted: P IO 6.1.64-Unraid #1 Feb 14 09:38:10 GOLDRAID kernel: Hardware name: Supermicro PIO-1UUP-ND1-AI036/X11SSH-F, BIOS 3.0 06/06/2023 Feb 14 09:38:10 GOLDRAID kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan] Feb 14 09:38:10 GOLDRAID kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Feb 14 09:38:10 GOLDRAID kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01 Feb 14 09:38:10 GOLDRAID kernel: RSP: 0000:ffffc90000178d98 EFLAGS: 00010202 Feb 14 09:38:10 GOLDRAID kernel: RAX: 0000000000000001 RBX: ffff88879ccdf800 RCX: 0c5d9175f44708e9 Feb 14 09:38:10 GOLDRAID kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88879ccdf800 Feb 14 09:38:10 GOLDRAID kernel: RBP: 0000000000000001 R08: fe5f0278be059942 R09: 76ba4064315dc017 Feb 14 09:38:10 GOLDRAID kernel: R10: d444fff6b20efcb2 R11: ffffc90000178d60 R12: ffffffff82a14d00 Feb 14 09:38:10 GOLDRAID kernel: R13: 0000000000034e1b R14: ffff888416ba0400 R15: 0000000000000000 Feb 14 09:38:10 GOLDRAID kernel: FS: 0000000000000000(0000) GS:ffff889055280000(0000) knlGS:0000000000000000 Feb 14 09:38:10 GOLDRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 14 09:38:10 GOLDRAID kernel: CR2: 00001fb78b21b000 CR3: 00000002e2362001 CR4: 00000000003706e0 Feb 14 09:38:10 GOLDRAID kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 14 09:38:10 GOLDRAID kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Feb 14 09:38:10 GOLDRAID kernel: Call Trace: Feb 14 09:38:10 GOLDRAID kernel: <IRQ> Feb 14 09:38:10 GOLDRAID kernel: ? __warn+0xab/0x122 Feb 14 09:38:10 GOLDRAID kernel: ? report_bug+0x109/0x17e Feb 14 09:38:10 GOLDRAID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Feb 14 09:38:10 GOLDRAID kernel: ? handle_bug+0x41/0x6f Feb 14 09:38:10 GOLDRAID kernel: ? exc_invalid_op+0x13/0x60 Feb 14 09:38:10 GOLDRAID kernel: ? asm_exc_invalid_op+0x16/0x20 Feb 14 09:38:10 GOLDRAID kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack] Feb 14 09:38:10 GOLDRAID kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack] Feb 14 09:38:10 GOLDRAID kernel: ? nf_nat_inet_fn+0x60/0x1a8 [nf_nat] Feb 14 09:38:10 GOLDRAID kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack] Feb 14 09:38:10 GOLDRAID kernel: nf_hook_slow+0x3a/0x96 Feb 14 09:38:10 GOLDRAID kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Feb 14 09:38:10 GOLDRAID kernel: NF_HOOK.constprop.0+0x79/0xd9 Feb 14 09:38:10 GOLDRAID kernel: ? ip_protocol_deliver_rcu+0x164/0x164 Feb 14 09:38:10 GOLDRAID kernel: __netif_receive_skb_one_core+0x77/0x9c Feb 14 09:38:10 GOLDRAID kernel: process_backlog+0x8c/0x116 Feb 14 09:38:10 GOLDRAID kernel: __napi_poll.constprop.0+0x28/0x124 Feb 14 09:38:10 GOLDRAID kernel: net_rx_action+0x159/0x24f Feb 14 09:38:10 GOLDRAID kernel: __do_softirq+0x126/0x288 Feb 14 09:38:10 GOLDRAID kernel: do_softirq+0x7f/0xab Feb 14 09:38:10 GOLDRAID kernel: </IRQ> Feb 14 09:38:10 GOLDRAID kernel: <TASK> Feb 14 09:38:10 GOLDRAID kernel: __local_bh_enable_ip+0x4c/0x6b Feb 14 09:38:10 GOLDRAID kernel: netif_rx+0x52/0x5a Feb 14 09:38:10 GOLDRAID kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Feb 14 09:38:10 GOLDRAID kernel: ? _raw_spin_unlock+0x14/0x29 Feb 14 09:38:10 GOLDRAID kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Feb 14 09:38:10 GOLDRAID kernel: process_one_work+0x1a8/0x295 Feb 14 09:38:10 GOLDRAID kernel: worker_thread+0x18b/0x244 Feb 14 09:38:10 GOLDRAID kernel: ? rescuer_thread+0x281/0x281 Feb 14 09:38:10 GOLDRAID kernel: kthread+0xe4/0xef Feb 14 09:38:10 GOLDRAID kernel: ? kthread_complete_and_exit+0x1b/0x1b Feb 14 09:38:10 GOLDRAID kernel: ret_from_fork+0x1f/0x30 Feb 14 09:38:10 GOLDRAID kernel: </TASK> Feb 14 09:38:10 GOLDRAID kernel: ---[ end trace 0000000000000000 ]--- Feb 14 09:38:12 GOLDRAID root: Fix Common Problems: Warning: Syslog mirrored to flash Quote Link to comment
JorgeB Posted February 14 Share Posted February 14 39 minutes ago, GatorMB said: Feb 14 09:38:10 GOLDRAID kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Feb 14 09:38:10 GOLDRAID kernel: ? _raw_spin_unlock+0x14/0x29 Feb 14 09:38:10 GOLDRAID kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot. Quote Link to comment
trurl Posted February 14 Share Posted February 14 1 hour ago, GatorMB said: Feb 14 09:38:02 GOLDRAID root: Fix Common Problems: Warning: Share appdata set to cache-only, but files / folders exist on the array Your appdata share has files on the array, and that share isn't configured to allow mover to move it to cache. Also, your system share is all on the array. Ideally, appdata, domains, and system shares would have all files on fast pool such as cache, with nothing on the array, and configured to stay there, so Dockers/VMs will perform better, and so array disks can spin down since these files are always open. Nothing can move open files. You will have to disable Docker and VM Manager in Settings to get these moved to cache. Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 So, it crashed again. Here is the syslog info. syslog-previous.txt syslog.txt Quote Link to comment
trurl Posted February 14 Share Posted February 14 What do you get from command line with this? docker network ls Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 1 minute ago, trurl said: docker network ls NETWORK ID NAME DRIVER SCOPE 400cca84e0d2 br0 ipvlan local 06eb05e8c95b bridge bridge local 74b5ac950c5a host host local 8c0d0933753b none null local Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 4 hours ago, trurl said: Your appdata share has files on the array, and that share isn't configured to allow mover to move it to cache. Also, your system share is all on the array. Ideally, appdata, domains, and system shares would have all files on fast pool such as cache, with nothing on the array, and configured to stay there, so Dockers/VMs will perform better, and so array disks can spin down since these files are always open. Nothing can move open files. You will have to disable Docker and VM Manager in Settings to get these moved to cache. I went to the bash command and made sure all appdata / domains / system shares were moved to the cache using: rsync -av --remove-source-files /mnt/disk2/appdata/ /mnt/cache/appdata/ It moved all files. I then removed all empty folders left behind: find /mnt/disk2/appdata/ -type d -empty -delete I corrected appdata folder permissions: chmod -R 755 /mnt/cache/appdata/ chown -R nobody:users /mnt/cache/appdata/ I made sure that the appdata / domains / system shares were all now cache only in my shares menu. I made sure that the data & iCloud-drive-sync shares were all now pointing to array in my shares menu. I reloaded the nvidia driver, and installed the gpu statistics plug-in. I removed changed the macvlan to ipvlan. I have checked the cache pool and It's still pretty full (I think). I have a 2tb ssd and a 256GB ssd. Do I need a larger cache pool? What am I missing? And before I forget, huge thanks to trurl & JorgeB for all your help. I really appreciate the time you are taking! Quote Link to comment
trurl Posted February 14 Share Posted February 14 34 minutes ago, GatorMB said: I corrected appdata folder permissions: chmod -R 755 /mnt/cache/appdata/ chown -R nobody:users /mnt/cache/appdata/ That might not be appropriate for all containers. That is why there is a Docker Safe New Permissions, so New Permissions can be run while excluding appdata. Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 19 minutes ago, trurl said: That might not be appropriate for all containers. That is why there is a Docker Safe New Permissions, so New Permissions can be run while excluding appdata. What would you suggest? Quote Link to comment
trurl Posted February 14 Share Posted February 14 Just now, GatorMB said: What would you suggest? Let each container manage their appdata permissions. Quote Link to comment
GatorMB Posted February 14 Author Share Posted February 14 1 minute ago, trurl said: Let each container manage their appdata permissions. Ok, then how would I reset it to that? Quote Link to comment
trurl Posted February 15 Share Posted February 15 There is no "reset" for that. If one of your containers seems to be broken you might have to delete its appdata and let it recreate. Quote Link to comment
GatorMB Posted February 15 Author Share Posted February 15 4 minutes ago, trurl said: Do you have appdata backup? I will after this learning experience! lol It’s running fine now. I’ll post in the morning and let you know if it crashed again. Thanks again for all your help! Quote Link to comment
GatorMB Posted February 17 Author Share Posted February 17 So yesterday I woke up to an unresponsive server again. No network connectivity, nothing. So, I decided to try 2 more things. I changed the PSU from a 600w to a new corsair 850 80+ Gold. I then added a corsair water cooler for the cpu. Again this am, I woke to a non-responsive server. But this time it was still showing on the router as connected. If it crashes over the weekend, I'll upload a new set of logs. goldraid-diagnostics-20240216-2022.zip syslog-2.txt Quote Link to comment
bastl Posted February 17 Share Posted February 17 @GatorMB Do you have a idle session to Unraids WebUI opened somewhere on your network? I have an issue the server freezing randomly if I have a websession opened from a Windows box with Firefox. Randomly every 1-2 days with root logged in to Unraid, the server will freeze without any errors catched in the logs. If I log off or shutdown the Windows pc the server won't crash. Quote Link to comment
GatorMB Posted February 17 Author Share Posted February 17 1 hour ago, bastl said: @GatorMB Do you have a idle session to Unraids WebUI opened somewhere on your network? I have an issue the server freezing randomly if I have a websession opened from a Windows box with Firefox. Randomly every 1-2 days with root logged in to Unraid, the server will freeze without any errors catched in the logs. If I log off or shutdown the Windows pc the server won't crash. No, I don't leave any active connections. It's a headless server and I only log in to run a process or to try to figure out an use such as this. I'm going to enable the IPMI function and connect that lan port for diagnostics later this weekend. Quote Link to comment
GatorMB Posted February 18 Author Share Posted February 18 Ok, so I went away to the lake yesterday am and left the server running. I got back an hour ago and it's all locked up again. Not showing on the network either. Did a hard reset and it came up fine. Here are the logs. I can't figure this out! Someone please point me in the right direction! syslog syslog-previous Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.