Super dissapointed

elcapitano · March 31, 2020

So I build an UnRaid server after having watched SpaceinvaderOne's videos.

I watched all his videos for a month (more than once) and decided to build my own UnRaid server, as a replacement for my Synology NAS + Mac Pro running Plex.

It's been online for 4 days, and has crashed twice. Totally unresponsive, all IP's and no response to anything.

Krusader was transferring files as it happened. But I am not convinced, this to be the problem. I deleted Binhex-Krusader and installed a different version, but still crashed with new version of Krusader.

My config is as follows:

Power: Corsair HX850i
MB: AsRock X570 Extreme4
CPU: Ryzen 9 -> 3900x
GPU: Nvidia Quadro p2000
RAM: 4 x 16GB Corsair Vengeance LPX DDR4 3600 MHz
Netværkskort: Dual 10Gbe Solarflare
HDD:
2 x XPG SX8200 Pro 2TB 3D NAND NVMe Gen3x4 PCIe M.2 (cache)
3 x 14 TB WD Red
1 x 6 TB WD Red (unregistered for CCTV)

Dunno where to find logs after reboot to check reason for crash?

H.E.L.P.

Edited March 31, 2020 by elcapitano

Squid · March 31, 2020

Very first thing is to read this post

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-819173

JorgeB · March 31, 2020

4 minutes ago, elcapitano said:

RAM: 4 x 16GB Corsair Vengeance LPX DDR4 3600 MHz

In some cases Ryzen is known to be unstable with overclocked RAM, see here for more info.

trurl · March 31, 2020

Also

18 minutes ago, elcapitano said:

H.E.L.P.

Go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post.

Then

19 minutes ago, elcapitano said:

Dunno where to find logs after reboot to check reason for crash?

Setup Syslog Server to get your logs saved somewhere you can retrieve them after crash:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601

elcapitano · March 31, 2020

master-diagnostics-20200331-2145.zip

trurl · March 31, 2020

Looks like you are running Nvidia Unraid build. Does it happen with the standard Unraid build?

elcapitano · March 31, 2020

Would that still make Plex GPU transcoding possible? Maybe I didn't get the memo.

If not, my incentive to move across to UnRaid is reduced.

Will look it up . . .

elcapitano · March 31, 2020

I truly appreciate your answer.

Are you saying, there is no need for the Nvidia build?

primeval_god · March 31, 2020

The Nvidia Build is required to use a Nvidia GPU with Docker (the alternative being hardware passthough to a VM). The Nvidia Build is however a community supported plugin and not an Officially Supported unRAID build. One of the normal steps in debugging is to disable all plugins (and revert to a Stock unRAID build) to determine if the issue is with unRAID itself or one or more of the plugins.

elcapitano · March 31, 2020

Thanks . .

I removed the Nvidia Build Plugin.

Went 24 hours before crash, sp I guess we will know soon.

But, I gotta say, wow, very disappointing if I can't use this plugin.

My entire incentive to move across . . .

itimpi · March 31, 2020

1 minute ago, elcapitano said:

Thanks . .

I removed the Nvidia Build Plugin.

Went 24 hours before crash, sp I guess we will know soon.

But, I gotta say, wow, very disappointing if I can't use this plugin.

My entire incentive to move across . . .

When you say you removed the plugin do you mean that you reverted to a standard Unraid build? If you merely meant that you actually removed the plugin that would not revert to the standard Unraid build.

elcapitano · March 31, 2020

I also removed the Nvidia reference in the Plex Docker container

Hoopster · March 31, 2020

10 minutes ago, elcapitano said:

Thanks . .

I removed the Nvidia Build Plugin.

Went 24 hours before crash, sp I guess we will know soon.

But, I gotta say, wow, very disappointing if I can't use this plugin.

My entire incentive to move across . . .

I suspect your problem is caused by hardware stability issues (BIOS settings, overclocked RAM, etc. ) in general and has nothing to do with the UnRaid Nvidia plugin/build. There are many, many unRAID users running that build with hardware similar to yours.

As @johnnie.black pointed out, with all four RAM slots on the MB populated, the fastest RAM speed a 3rd Gen Ryzen can support is DDR4-2667. If you are attempting to run the RAM at its rated (overclocked speed) of DDR4 3600 that will cause crashes.

elcapitano · March 31, 2020

Thanks, that actually makes sense.

Will check BIOS...

All I did, was update the BIOS, before building the server.

elcapitano · April 1, 2020

Thanks for the heads up.

This is how the RAM is installed:

This is the from vendors website:

285315969_Skrmbillede2020-04-0106_10_11.png.6739c5d7094671c45fd9c5b60b43762a.png

Do you have any suggestion as to how it should be set in BIOS?

elcapitano · April 1, 2020

Recon the nvidia suggestion was correct.

I had multiple entries like this before the system became unresponsive:

Apr 1 08:06:30 MASTER kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr 1 08:06:30 MASTER kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

Removed the GPU Statistics Plugin, and the log entries reduced.

trurl · April 1, 2020

6 hours ago, elcapitano said:

Recon the nvidia suggestion was correct.

I had multiple entries like this before the system became unresponsive:

Apr 1 08:06:30 MASTER kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr 1 08:06:30 MASTER kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

Removed the GPU Statistics Plugin, and the log entries reduced.

Are you still having problems? If so post new diagnostics

elcapitano · April 1, 2020

Thanks, but for now, it seems that it was the GPU Diagnostics Plugin that introduced multiple warnings.

I don't see many of the nvidia warnings any more.

Syslog is being recorded off server, so if it happens again, I should have better info.

elcapitano · April 2, 2020

Froze again . .

Krusader disconnected shortly before the server froze. Nothing else from syslog.

In the process of transferring media from NAS to UnRaid array, using Krusader.

master-diagnostics-20200402-1240.zip

trurl · April 2, 2020

Not related, but I see some things in syslog that suggests you are trying to add admin user. Only root has access to the webUI and command line, and only users you create in the webUI have access to shares over the network.

trurl · April 2, 2020

Apr  2 08:38:04 MASTER kernel: WARNING: CPU: 8 PID: 223 at net/netfilter/nf_conntrack_core.c:945 __nf_conntrack_confirm+0xa0/0x69e
Apr  2 08:38:04 MASTER kernel: Modules linked in: nvidia_uvm(O) xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap xt_nat macvlan ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid k10temp bonding sfc mdio igb(O) nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) edac_mce_amd crc32_pclmul pcbc aesni_intel aes_x86_64 glue_helper crypto_simd ghash_clmulni_intel cryptd drm_kms_helper drm kvm_amd kvm syscopyarea sysfillrect sysimgblt fb_sys_fops rsnvme(PO) agpgart i2c_piix4 ccp ahci i2c_core libahci wmi_bmof pcc_cpufreq nvme crct10dif_pclmul nvme_core wmi crc32c_intel button acpi_cpufreq [last unloaded: mdio]
Apr  2 08:38:04 MASTER kernel: CPU: 8 PID: 223 Comm: kworker/8:1 Tainted: P           O      4.19.107-Unraid #1
Apr  2 08:38:04 MASTER kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Extreme4, BIOS P2.30 02/03/2020
Apr  2 08:38:04 MASTER kernel: Workqueue: events macvlan_process_broadcast [macvlan]
Apr  2 08:38:04 MASTER kernel: RIP: 0010:__nf_conntrack_confirm+0xa0/0x69e
Apr  2 08:38:04 MASTER kernel: Code: 04 e8 56 fb ff ff 44 89 f2 44 89 ff 89 c6 41 89 c4 e8 7f f9 ff ff 48 8b 4c 24 08 84 c0 75 af 48 8b 85 80 00 00 00 a8 08 74 26 <0f> 0b 44 89 e6 44 89 ff 45 31 f6 e8 95 f1 ff ff be 00 02 00 00 48
Apr  2 08:38:04 MASTER kernel: RSP: 0018:ffff888fde803d90 EFLAGS: 00010202
Apr  2 08:38:04 MASTER kernel: RAX: 0000000000000188 RBX: ffff8889945e9b00 RCX: ffff888e4bddf618
Apr  2 08:38:04 MASTER kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81e08fb4
Apr  2 08:38:04 MASTER kernel: RBP: ffff888e4bddf5c0 R08: 00000000e88d3a6f R09: ffffffff81c8aa80
Apr  2 08:38:04 MASTER kernel: R10: 0000000000000098 R11: ffff888f634f9400 R12: 000000000000ca6d
Apr  2 08:38:04 MASTER kernel: R13: ffffffff81e91080 R14: 0000000000000000 R15: 000000000000ac8f
Apr  2 08:38:04 MASTER kernel: FS:  0000000000000000(0000) GS:ffff888fde800000(0000) knlGS:0000000000000000
Apr  2 08:38:04 MASTER kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  2 08:38:04 MASTER kernel: CR2: 0000146f70289000 CR3: 0000000fd4aa6000 CR4: 0000000000340ee0
Apr  2 08:38:04 MASTER kernel: Call Trace:
Apr  2 08:38:04 MASTER kernel: <IRQ>
Apr  2 08:38:04 MASTER kernel: ipv4_confirm+0xaf/0xb9
Apr  2 08:38:04 MASTER kernel: nf_hook_slow+0x3a/0x90
Apr  2 08:38:04 MASTER kernel: ip_local_deliver+0xad/0xdc
Apr  2 08:38:04 MASTER kernel: ? ip_sublist_rcv_finish+0x54/0x54
Apr  2 08:38:04 MASTER kernel: ip_rcv+0xa0/0xbe
Apr  2 08:38:04 MASTER kernel: ? ip_rcv_finish_core.isra.0+0x2e1/0x2e1
Apr  2 08:38:04 MASTER kernel: __netif_receive_skb_one_core+0x53/0x6f
Apr  2 08:38:04 MASTER kernel: process_backlog+0x77/0x10e
Apr  2 08:38:04 MASTER kernel: net_rx_action+0x107/0x26c
Apr  2 08:38:04 MASTER kernel: __do_softirq+0xc9/0x1d7
Apr  2 08:38:04 MASTER kernel: do_softirq_own_stack+0x2a/0x40
Apr  2 08:38:04 MASTER kernel: </IRQ>
Apr  2 08:38:04 MASTER kernel: do_softirq+0x4d/0x5a
Apr  2 08:38:04 MASTER kernel: netif_rx_ni+0x1c/0x22
Apr  2 08:38:04 MASTER kernel: macvlan_broadcast+0x111/0x156 [macvlan]
Apr  2 08:38:04 MASTER kernel: macvlan_process_broadcast+0xea/0x128 [macvlan]
Apr  2 08:38:04 MASTER kernel: process_one_work+0x16e/0x24f
Apr  2 08:38:04 MASTER kernel: worker_thread+0x1e2/0x2b8
Apr  2 08:38:04 MASTER kernel: ? rescuer_thread+0x2a7/0x2a7
Apr  2 08:38:04 MASTER kernel: kthread+0x10c/0x114
Apr  2 08:38:04 MASTER kernel: ? kthread_park+0x89/0x89
Apr  2 08:38:04 MASTER kernel: ret_from_fork+0x22/0x40
Apr  2 08:38:04 MASTER kernel: ---[ end trace eba31347ec0cb1fc ]---

Looks like something to do with nvidia again

JorgeB · April 2, 2020

Macvlan call traces are usually related to dockers with custom IP addresses.

elcapitano · April 2, 2020

Right . .

I have edited the docker's with IP to bridge mode . . except PiHole . . Couldn't get it to work without IP. Will look into it later.

I do have 1 entry in syslog:

Apr 2 16:22:53 MASTER kernel: igb 0000:09:00.0 eth1: mixed HW and IP checksum settings.

How do I fix that?

Super dissapointed

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation