Super dissapointed


Recommended Posts

So I build an UnRaid server after having watched SpaceinvaderOne's videos.

 

I watched all his videos for a month (more than once) and decided to build my own UnRaid server, as a replacement for my Synology NAS + Mac Pro running Plex.

 

It's been online for 4 days, and has crashed twice. Totally unresponsive, all IP's and no response to anything.

Krusader was transferring files as it happened. But I am not convinced, this to be the problem. I deleted Binhex-Krusader and installed a different version, but still crashed with new version of Krusader.

 

My config is as follows:

Power: Corsair HX850i
MB: AsRock X570 Extreme4
CPU: Ryzen 9 -> 3900x
GPU: Nvidia Quadro p2000
RAM: 4 x 16GB Corsair Vengeance LPX DDR4 3600 MHz
Netværkskort: Dual 10Gbe Solarflare
HDD:
2 x XPG SX8200 Pro 2TB 3D NAND NVMe Gen3x4 PCIe M.2 (cache)
3 x 14 TB WD Red
1 x 6 TB WD Red (unregistered for CCTV)

 

Dunno where to find logs after reboot to check reason for crash?

H.E.L.P.

 

 

Edited by elcapitano
Link to comment

Also

18 minutes ago, elcapitano said:

H.E.L.P.

Go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post.

 

Then

19 minutes ago, elcapitano said:

Dunno where to find logs after reboot to check reason for crash?

Setup Syslog Server to get your logs saved somewhere you can retrieve them after crash:

 

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601

 

Link to comment

The Nvidia Build is required to use a Nvidia GPU with Docker (the alternative being hardware passthough to a VM). The Nvidia Build is however a community supported plugin and not an Officially Supported unRAID build. One of the normal steps in debugging is to disable all plugins (and revert to a Stock unRAID build) to determine if the issue is with unRAID itself or one or more of the plugins.

Link to comment
1 minute ago, elcapitano said:

Thanks . .

I removed the Nvidia Build Plugin.

Went 24 hours before crash, sp I guess we will know soon.

 

But, I gotta say, wow, very disappointing if I can't use this plugin.

My entire incentive to move across . . . 

When you say you removed the plugin do you mean that you reverted to a standard Unraid build?    If you merely meant that you actually removed the plugin that would not revert to the standard Unraid build.

Link to comment
10 minutes ago, elcapitano said:

Thanks . .

I removed the Nvidia Build Plugin.

Went 24 hours before crash, sp I guess we will know soon.

 

But, I gotta say, wow, very disappointing if I can't use this plugin.

My entire incentive to move across . . . 

I suspect your problem is caused by hardware stability issues (BIOS settings, overclocked RAM, etc. ) in general and has nothing to do with the UnRaid Nvidia plugin/build.  There are many, many unRAID users running that build with hardware similar to yours.

 

As @johnnie.black pointed out, with all four RAM slots on the MB populated, the fastest RAM speed a 3rd Gen Ryzen can support is DDR4-2667.  If you are attempting to run the RAM at its rated (overclocked speed) of DDR4 3600 that will cause crashes.

  • Like 1
Link to comment

Recon the nvidia suggestion was correct.

I had multiple entries like this before the system became unresponsive:

 

Apr 1 08:06:30 MASTER kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr 1 08:06:30 MASTER kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

 

Removed the GPU Statistics Plugin, and the log entries reduced.

Link to comment
6 hours ago, elcapitano said:

Recon the nvidia suggestion was correct.

I had multiple entries like this before the system became unresponsive:

 

Apr 1 08:06:30 MASTER kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr 1 08:06:30 MASTER kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

 

Removed the GPU Statistics Plugin, and the log entries reduced.

Are you still having problems? If so post new diagnostics

Link to comment
Apr  2 08:38:04 MASTER kernel: WARNING: CPU: 8 PID: 223 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e
Apr  2 08:38:04 MASTER kernel: Modules linked in: nvidia_uvm(O) xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap xt_nat macvlan ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid k10temp bonding sfc mdio igb(O) nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) edac_mce_amd crc32_pclmul pcbc aesni_intel aes_x86_64 glue_helper crypto_simd ghash_clmulni_intel cryptd drm_kms_helper drm kvm_amd kvm syscopyarea sysfillrect sysimgblt fb_sys_fops rsnvme(PO) agpgart i2c_piix4 ccp ahci i2c_core libahci wmi_bmof pcc_cpufreq nvme crct10dif_pclmul nvme_core wmi crc32c_intel button acpi_cpufreq [last unloaded: mdio]
Apr  2 08:38:04 MASTER kernel: CPU: 8 PID: 223 Comm: kworker/8:1 Tainted: P           O      4.19.107-Unraid #1
Apr  2 08:38:04 MASTER kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Extreme4, BIOS P2.30 02/03/2020
Apr  2 08:38:04 MASTER kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Apr  2 08:38:04 MASTER kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e
Apr  2 08:38:04 MASTER kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 95 f1 ff ff be 00 02 00 00 48
Apr  2 08:38:04 MASTER kernel: RSP: 0018:ffff888fde803d90 EFLAGS: 00010202
Apr  2 08:38:04 MASTER kernel: RAX: 0000000000000188 RBX: ffff8889945e9b00 RCX: ffff888e4bddf618
Apr  2 08:38:04 MASTER kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e08fb4
Apr  2 08:38:04 MASTER kernel: RBP: ffff888e4bddf5c0 R08: 00000000e88d3a6f R09: ffffffff81c8aa80
Apr  2 08:38:04 MASTER kernel: R10: 0000000000000098 R11: ffff888f634f9400 R12: 000000000000ca6d
Apr  2 08:38:04 MASTER kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 000000000000ac8f
Apr  2 08:38:04 MASTER kernel: FS:  0000000000000000(0000) GS:ffff888fde800000(0000) knlGS:0000000000000000
Apr  2 08:38:04 MASTER kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  2 08:38:04 MASTER kernel: CR2: 0000146f70289000 CR3: 0000000fd4aa6000 CR4: 0000000000340ee0
Apr  2 08:38:04 MASTER kernel: Call Trace:
Apr  2 08:38:04 MASTER kernel: <IRQ>
Apr  2 08:38:04 MASTER kernel: ipv4_confirm+0xaf/0xb9
Apr  2 08:38:04 MASTER kernel: nf_hook_slow+0x3a/0x90
Apr  2 08:38:04 MASTER kernel: ip_local_deliver+0xad/0xdc
Apr  2 08:38:04 MASTER kernel: ? ip_sublist_rcv_finish+0x54/0x54
Apr  2 08:38:04 MASTER kernel: ip_rcv+0xa0/0xbe
Apr  2 08:38:04 MASTER kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1
Apr  2 08:38:04 MASTER kernel: __netif_receive_skb_one_core+0x53/0x6f
Apr  2 08:38:04 MASTER kernel: process_backlog+0x77/0x10e
Apr  2 08:38:04 MASTER kernel: net_rx_action+0x107/0x26c
Apr  2 08:38:04 MASTER kernel: __do_softirq+0xc9/0x1d7
Apr  2 08:38:04 MASTER kernel: do_softirq_own_stack+0x2a/0x40
Apr  2 08:38:04 MASTER kernel: </IRQ>
Apr  2 08:38:04 MASTER kernel: do_softirq+0x4d/0x5a
Apr  2 08:38:04 MASTER kernel: netif_rx_ni+0x1c/0x22
Apr  2 08:38:04 MASTER kernel: macvlan_broadcast+0x111/0x156 [macvlan]
Apr  2 08:38:04 MASTER kernel: macvlan_process_broadcast+0xea/0x128 [macvlan]
Apr  2 08:38:04 MASTER kernel: process_one_work+0x16e/0x24f
Apr  2 08:38:04 MASTER kernel: worker_thread+0x1e2/0x2b8
Apr  2 08:38:04 MASTER kernel: ? rescuer_thread+0x2a7/0x2a7
Apr  2 08:38:04 MASTER kernel: kthread+0x10c/0x114
Apr  2 08:38:04 MASTER kernel: ? kthread_park+0x89/0x89
Apr  2 08:38:04 MASTER kernel: ret_from_fork+0x22/0x40
Apr  2 08:38:04 MASTER kernel: ---[ end trace eba31347ec0cb1fc ]---

Looks like something to do with nvidia again

Link to comment

Right . .

I have edited the docker's with IP to bridge mode . . except PiHole . .  Couldn't get it to work without IP. Will look into it later.

 

I do have 1 entry in syslog:

Apr 2 16:22:53 MASTER kernel: igb 0000:09:00.0 eth1: mixed HW and IP checksum settings.

How do I fix that?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.