elcapitano Posted March 31, 2020 Share Posted March 31, 2020 (edited) So I build an UnRaid server after having watched SpaceinvaderOne's videos. I watched all his videos for a month (more than once) and decided to build my own UnRaid server, as a replacement for my Synology NAS + Mac Pro running Plex. It's been online for 4 days, and has crashed twice. Totally unresponsive, all IP's and no response to anything. Krusader was transferring files as it happened. But I am not convinced, this to be the problem. I deleted Binhex-Krusader and installed a different version, but still crashed with new version of Krusader. My config is as follows: Power: Corsair HX850i MB: AsRock X570 Extreme4 CPU: Ryzen 9 -> 3900x GPU: Nvidia Quadro p2000 RAM: 4 x 16GB Corsair Vengeance LPX DDR4 3600 MHz Netværkskort: Dual 10Gbe Solarflare HDD: 2 x XPG SX8200 Pro 2TB 3D NAND NVMe Gen3x4 PCIe M.2 (cache) 3 x 14 TB WD Red 1 x 6 TB WD Red (unregistered for CCTV) Dunno where to find logs after reboot to check reason for crash? H.E.L.P. Edited March 31, 2020 by elcapitano Quote Link to comment
Squid Posted March 31, 2020 Share Posted March 31, 2020 Very first thing is to read this post https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-819173 Quote Link to comment
JorgeB Posted March 31, 2020 Share Posted March 31, 2020 4 minutes ago, elcapitano said: RAM: 4 x 16GB Corsair Vengeance LPX DDR4 3600 MHz In some cases Ryzen is known to be unstable with overclocked RAM, see here for more info. Quote Link to comment
trurl Posted March 31, 2020 Share Posted March 31, 2020 Also 18 minutes ago, elcapitano said: H.E.L.P. Go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post. Then 19 minutes ago, elcapitano said: Dunno where to find logs after reboot to check reason for crash? Setup Syslog Server to get your logs saved somewhere you can retrieve them after crash: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601 Quote Link to comment
elcapitano Posted March 31, 2020 Author Share Posted March 31, 2020 master-diagnostics-20200331-2145.zip Quote Link to comment
trurl Posted March 31, 2020 Share Posted March 31, 2020 Looks like you are running Nvidia Unraid build. Does it happen with the standard Unraid build? Quote Link to comment
elcapitano Posted March 31, 2020 Author Share Posted March 31, 2020 Would that still make Plex GPU transcoding possible? Maybe I didn't get the memo. If not, my incentive to move across to UnRaid is reduced. Will look it up . . . Quote Link to comment
elcapitano Posted March 31, 2020 Author Share Posted March 31, 2020 I truly appreciate your answer. Are you saying, there is no need for the Nvidia build? Quote Link to comment
primeval_god Posted March 31, 2020 Share Posted March 31, 2020 The Nvidia Build is required to use a Nvidia GPU with Docker (the alternative being hardware passthough to a VM). The Nvidia Build is however a community supported plugin and not an Officially Supported unRAID build. One of the normal steps in debugging is to disable all plugins (and revert to a Stock unRAID build) to determine if the issue is with unRAID itself or one or more of the plugins. Quote Link to comment
elcapitano Posted March 31, 2020 Author Share Posted March 31, 2020 Thanks . . I removed the Nvidia Build Plugin. Went 24 hours before crash, sp I guess we will know soon. But, I gotta say, wow, very disappointing if I can't use this plugin. My entire incentive to move across . . . Quote Link to comment
itimpi Posted March 31, 2020 Share Posted March 31, 2020 1 minute ago, elcapitano said: Thanks . . I removed the Nvidia Build Plugin. Went 24 hours before crash, sp I guess we will know soon. But, I gotta say, wow, very disappointing if I can't use this plugin. My entire incentive to move across . . . When you say you removed the plugin do you mean that you reverted to a standard Unraid build? If you merely meant that you actually removed the plugin that would not revert to the standard Unraid build. Quote Link to comment
elcapitano Posted March 31, 2020 Author Share Posted March 31, 2020 I also removed the Nvidia reference in the Plex Docker container Quote Link to comment
Hoopster Posted March 31, 2020 Share Posted March 31, 2020 10 minutes ago, elcapitano said: Thanks . . I removed the Nvidia Build Plugin. Went 24 hours before crash, sp I guess we will know soon. But, I gotta say, wow, very disappointing if I can't use this plugin. My entire incentive to move across . . . I suspect your problem is caused by hardware stability issues (BIOS settings, overclocked RAM, etc. ) in general and has nothing to do with the UnRaid Nvidia plugin/build. There are many, many unRAID users running that build with hardware similar to yours. As @johnnie.black pointed out, with all four RAM slots on the MB populated, the fastest RAM speed a 3rd Gen Ryzen can support is DDR4-2667. If you are attempting to run the RAM at its rated (overclocked speed) of DDR4 3600 that will cause crashes. 1 Quote Link to comment
elcapitano Posted March 31, 2020 Author Share Posted March 31, 2020 Thanks, that actually makes sense. Will check BIOS... All I did, was update the BIOS, before building the server. Quote Link to comment
elcapitano Posted April 1, 2020 Author Share Posted April 1, 2020 Thanks for the heads up. This is how the RAM is installed: This is the from vendors website: Do you have any suggestion as to how it should be set in BIOS? Quote Link to comment
elcapitano Posted April 1, 2020 Author Share Posted April 1, 2020 Recon the nvidia suggestion was correct. I had multiple entries like this before the system became unresponsive: Apr 1 08:06:30 MASTER kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Apr 1 08:06:30 MASTER kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs Removed the GPU Statistics Plugin, and the log entries reduced. Quote Link to comment
trurl Posted April 1, 2020 Share Posted April 1, 2020 6 hours ago, elcapitano said: Recon the nvidia suggestion was correct. I had multiple entries like this before the system became unresponsive: Apr 1 08:06:30 MASTER kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Apr 1 08:06:30 MASTER kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs Removed the GPU Statistics Plugin, and the log entries reduced. Are you still having problems? If so post new diagnostics Quote Link to comment
elcapitano Posted April 1, 2020 Author Share Posted April 1, 2020 Thanks, but for now, it seems that it was the GPU Diagnostics Plugin that introduced multiple warnings. I don't see many of the nvidia warnings any more. Syslog is being recorded off server, so if it happens again, I should have better info. Quote Link to comment
elcapitano Posted April 2, 2020 Author Share Posted April 2, 2020 Froze again . . Krusader disconnected shortly before the server froze. Nothing else from syslog. In the process of transferring media from NAS to UnRaid array, using Krusader. master-diagnostics-20200402-1240.zip Quote Link to comment
trurl Posted April 2, 2020 Share Posted April 2, 2020 Not related, but I see some things in syslog that suggests you are trying to add admin user. Only root has access to the webUI and command line, and only users you create in the webUI have access to shares over the network. Quote Link to comment
trurl Posted April 2, 2020 Share Posted April 2, 2020 Apr 2 08:38:04 MASTER kernel: WARNING: CPU: 8 PID: 223 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e Apr 2 08:38:04 MASTER kernel: Modules linked in: nvidia_uvm(O) xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap xt_nat macvlan ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid k10temp bonding sfc mdio igb(O) nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) edac_mce_amd crc32_pclmul pcbc aesni_intel aes_x86_64 glue_helper crypto_simd ghash_clmulni_intel cryptd drm_kms_helper drm kvm_amd kvm syscopyarea sysfillrect sysimgblt fb_sys_fops rsnvme(PO) agpgart i2c_piix4 ccp ahci i2c_core libahci wmi_bmof pcc_cpufreq nvme crct10dif_pclmul nvme_core wmi crc32c_intel button acpi_cpufreq [last unloaded: mdio] Apr 2 08:38:04 MASTER kernel: CPU: 8 PID: 223 Comm: kworker/8:1 Tainted: P O 4.19.107-Unraid #1 Apr 2 08:38:04 MASTER kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Extreme4, BIOS P2.30 02/03/2020 Apr 2 08:38:04 MASTER kernel: Workqueue: events macvlan_process_broadcast [macvlan] Apr 2 08:38:04 MASTER kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e Apr 2 08:38:04 MASTER kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 95 f1 ff ff be 00 02 00 00 48 Apr 2 08:38:04 MASTER kernel: RSP: 0018:ffff888fde803d90 EFLAGS: 00010202 Apr 2 08:38:04 MASTER kernel: RAX: 0000000000000188 RBX: ffff8889945e9b00 RCX: ffff888e4bddf618 Apr 2 08:38:04 MASTER kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e08fb4 Apr 2 08:38:04 MASTER kernel: RBP: ffff888e4bddf5c0 R08: 00000000e88d3a6f R09: ffffffff81c8aa80 Apr 2 08:38:04 MASTER kernel: R10: 0000000000000098 R11: ffff888f634f9400 R12: 000000000000ca6d Apr 2 08:38:04 MASTER kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 000000000000ac8f Apr 2 08:38:04 MASTER kernel: FS: 0000000000000000(0000) GS:ffff888fde800000(0000) knlGS:0000000000000000 Apr 2 08:38:04 MASTER kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 2 08:38:04 MASTER kernel: CR2: 0000146f70289000 CR3: 0000000fd4aa6000 CR4: 0000000000340ee0 Apr 2 08:38:04 MASTER kernel: Call Trace: Apr 2 08:38:04 MASTER kernel: <IRQ> Apr 2 08:38:04 MASTER kernel: ipv4_confirm+0xaf/0xb9 Apr 2 08:38:04 MASTER kernel: nf_hook_slow+0x3a/0x90 Apr 2 08:38:04 MASTER kernel: ip_local_deliver+0xad/0xdc Apr 2 08:38:04 MASTER kernel: ? ip_sublist_rcv_finish+0x54/0x54 Apr 2 08:38:04 MASTER kernel: ip_rcv+0xa0/0xbe Apr 2 08:38:04 MASTER kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1 Apr 2 08:38:04 MASTER kernel: __netif_receive_skb_one_core+0x53/0x6f Apr 2 08:38:04 MASTER kernel: process_backlog+0x77/0x10e Apr 2 08:38:04 MASTER kernel: net_rx_action+0x107/0x26c Apr 2 08:38:04 MASTER kernel: __do_softirq+0xc9/0x1d7 Apr 2 08:38:04 MASTER kernel: do_softirq_own_stack+0x2a/0x40 Apr 2 08:38:04 MASTER kernel: </IRQ> Apr 2 08:38:04 MASTER kernel: do_softirq+0x4d/0x5a Apr 2 08:38:04 MASTER kernel: netif_rx_ni+0x1c/0x22 Apr 2 08:38:04 MASTER kernel: macvlan_broadcast+0x111/0x156 [macvlan] Apr 2 08:38:04 MASTER kernel: macvlan_process_broadcast+0xea/0x128 [macvlan] Apr 2 08:38:04 MASTER kernel: process_one_work+0x16e/0x24f Apr 2 08:38:04 MASTER kernel: worker_thread+0x1e2/0x2b8 Apr 2 08:38:04 MASTER kernel: ? rescuer_thread+0x2a7/0x2a7 Apr 2 08:38:04 MASTER kernel: kthread+0x10c/0x114 Apr 2 08:38:04 MASTER kernel: ? kthread_park+0x89/0x89 Apr 2 08:38:04 MASTER kernel: ret_from_fork+0x22/0x40 Apr 2 08:38:04 MASTER kernel: ---[ end trace eba31347ec0cb1fc ]--- Looks like something to do with nvidia again Quote Link to comment
JorgeB Posted April 2, 2020 Share Posted April 2, 2020 Macvlan call traces are usually related to dockers with custom IP addresses. Quote Link to comment
elcapitano Posted April 2, 2020 Author Share Posted April 2, 2020 Right . . I have edited the docker's with IP to bridge mode . . except PiHole . . Couldn't get it to work without IP. Will look into it later. I do have 1 entry in syslog: Apr 2 16:22:53 MASTER kernel: igb 0000:09:00.0 eth1: mixed HW and IP checksum settings. How do I fix that? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.