SelfSD Posted September 30, 2020 Share Posted September 30, 2020 My server has now crashed 3 days in a row due to out of memory problems. It happens overnight when I'm sleeping so I wake up to a hard locked OS, but today I managed to log in to it and copy the syslog. Sadly running diagnostics just froze it completely and even after an hour of waiting before force rebooting there were no ZIP file on the USB drive. I've attached the diagnostics from after the reboot. I know it's not very helpful for troubleshooting but it shows my system and configs at least. It completed 1 pass of memtest this morning with no issues. I will have to wait until the weekend if I'm gonna run a day long check. I thought the OOM issues was the problem so I tried to add a 128 GB swap file but it didn't help. I'm by no means an expert at reading into these call traces but last time it was something about C States so I disabled C states in the BIOS. I know the C state issue is mostly affecting 1'st generation Ryzen CPUs but I thought it could help. At least I'm not getting the C State error in the call traces but it's still not working well... Maybe it's time to just retire this whole system and get some new hardware. ☹️ BTW, didn't Fix Common Problem have a diagnostic mode that would pull diagnostics every 15 minutes? I can't seem to find this option again. syslog.log unraid-diagnostics-20200930-0906.zip Quote Link to comment
JorgeB Posted September 30, 2020 Share Posted September 30, 2020 Macvlan call traces are usually the result of dockers with a custom IP address, more info here. For better stability also a good idea to respect max AMD officially supported RAM speed for your config, which is 1333Mhz, not 1600. 1 Quote Link to comment
SelfSD Posted September 30, 2020 Author Share Posted September 30, 2020 Thanks for the link! I'll read through the topic and I'll turn down the memory clock. Quote Link to comment
SelfSD Posted September 30, 2020 Author Share Posted September 30, 2020 The RAM has been manually set to 1333 MHz, auto bumped it up to 1600 for some reason and I've put my dockers on a separate VLAN which seems to be the best solution from the thread you linked. Now I just hope that the out of memory errors at the end were related to the call traces. 😕 Quote Link to comment
SelfSD Posted October 1, 2020 Author Share Posted October 1, 2020 I got another call trace after about 17 hours uptime. This time it's something else. Everything still seems to be running fine Oct 1 05:15:49 UnRaid kernel: general protection fault: 0000 [#1] SMP NOPTI Oct 1 05:15:49 UnRaid kernel: CPU: 6 PID: 674 Comm: kswapd0 Tainted: P O 4.19.107-Unraid #1 Oct 1 05:15:49 UnRaid kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./CROSSHAIR V FORMULA-Z, BIOS 2201 03/23/2015 Oct 1 05:15:49 UnRaid kernel: RIP: 0010:iput+0x87/0x154 Oct 1 05:15:49 UnRaid kernel: Code: 89 e6 e8 e5 80 50 00 85 c0 75 bc e9 df 00 00 00 48 8b 5d 28 a8 08 4c 8b 6b 30 74 0e 48 c7 c7 79 fc d2 81 e8 83 7b f2 ff 0f 0b <49> 8b 45 20 48 85 c0 74 29 48 89 ef e8 62 c7 89 00 85 c0 75 75 f6 Oct 1 05:15:49 UnRaid kernel: RSP: 0018:ffffc9000344bc08 EFLAGS: 00010246 Oct 1 05:15:49 UnRaid kernel: RAX: 0000000000000000 RBX: ffff88868e356000 RCX: 0000000000000000 Oct 1 05:15:49 UnRaid kernel: RDX: 0000000100000000 RSI: ffff888226874680 RDI: ffff888226874680 Oct 1 05:15:49 UnRaid kernel: RBP: ffff888226874600 R08: 0000000000000001 R09: ffffc9000344bb78 Oct 1 05:15:49 UnRaid kernel: R10: 0000000000000000 R11: ffff88881fb9fb40 R12: ffff888226874680 Oct 1 05:15:49 UnRaid kernel: R13: 7d808fd500000000 R14: ffffc9000344bc98 R15: ffff888222ce4cc0 Oct 1 05:15:49 UnRaid kernel: FS: 0000000000000000(0000) GS:ffff88881fb80000(0000) knlGS:0000000000000000 Oct 1 05:15:49 UnRaid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 05:15:49 UnRaid kernel: CR2: 00007f091b0898c8 CR3: 00000006464e6000 CR4: 00000000000406e0 Oct 1 05:15:49 UnRaid kernel: Call Trace: Oct 1 05:15:49 UnRaid kernel: __dentry_kill+0xcb/0x135 Oct 1 05:15:49 UnRaid kernel: shrink_dentry_list+0x149/0x185 Oct 1 05:15:49 UnRaid kernel: prune_dcache_sb+0x56/0x74 Oct 1 05:15:49 UnRaid kernel: super_cache_scan+0xee/0x16d Oct 1 05:15:49 UnRaid kernel: do_shrink_slab+0x128/0x194 Oct 1 05:15:49 UnRaid kernel: shrink_slab+0x11b/0x276 Oct 1 05:15:49 UnRaid kernel: shrink_node+0x108/0x3cb Oct 1 05:15:49 UnRaid kernel: kswapd+0x451/0x58a Oct 1 05:15:49 UnRaid kernel: ? __switch_to_asm+0x41/0x70 Oct 1 05:15:49 UnRaid kernel: ? mem_cgroup_shrink_node+0xa4/0xa4 Oct 1 05:15:49 UnRaid kernel: kthread+0x10c/0x114 Oct 1 05:15:49 UnRaid kernel: ? kthread_park+0x89/0x89 Oct 1 05:15:49 UnRaid kernel: ret_from_fork+0x22/0x40 Oct 1 05:15:49 UnRaid kernel: Modules linked in: vhost_net tun vhost tap kvm_amd kvm ccp veth nvidia_uvm(O) xt_nat macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables iptable_filter xfs md_mod it87 hwmon_vid iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat ip_tables wireguard ip6_udp_tunnel udp_tunnel bonding mlx4_en mlx4_core e1000e nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) edac_mce_amd crc32_pclmul pcbc aesni_intel aes_x86_64 glue_helper crypto_simd ghash_clmulni_intel cryptd drm_kms_helper drm syscopyarea sysfillrect sysimgblt fb_sys_fops mpt3sas fam15h_power mxm_wmi wmi_bmof agpgart k10temp crct10dif_pclmul wmi crc32c_intel i2c_piix4 ahci raid_class scsi_transport_sas i2c_core libahci button [last unloaded: kvm] Oct 1 05:15:49 UnRaid kernel: ---[ end trace eda3ee69822f802e ]--- Oct 1 05:15:49 UnRaid kernel: RIP: 0010:iput+0x87/0x154 Oct 1 05:15:49 UnRaid kernel: Code: 89 e6 e8 e5 80 50 00 85 c0 75 bc e9 df 00 00 00 48 8b 5d 28 a8 08 4c 8b 6b 30 74 0e 48 c7 c7 79 fc d2 81 e8 83 7b f2 ff 0f 0b <49> 8b 45 20 48 85 c0 74 29 48 89 ef e8 62 c7 89 00 85 c0 75 75 f6 Oct 1 05:15:49 UnRaid kernel: RSP: 0018:ffffc9000344bc08 EFLAGS: 00010246 Oct 1 05:15:49 UnRaid kernel: RAX: 0000000000000000 RBX: ffff88868e356000 RCX: 0000000000000000 Oct 1 05:15:49 UnRaid kernel: RDX: 0000000100000000 RSI: ffff888226874680 RDI: ffff888226874680 Oct 1 05:15:49 UnRaid kernel: RBP: ffff888226874600 R08: 0000000000000001 R09: ffffc9000344bb78 Oct 1 05:15:49 UnRaid kernel: R10: 0000000000000000 R11: ffff88881fb9fb40 R12: ffff888226874680 Oct 1 05:15:49 UnRaid kernel: R13: 7d808fd500000000 R14: ffffc9000344bc98 R15: ffff888222ce4cc0 Oct 1 05:15:49 UnRaid kernel: FS: 0000000000000000(0000) GS:ffff88881fb80000(0000) knlGS:0000000000000000 Oct 1 05:15:49 UnRaid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 05:15:49 UnRaid kernel: CR2: 00007f091b0898c8 CR3: 00000006464e6000 CR4: 00000000000406e0 Googling "kswapd0 Tainted" led me to this page and following the steps there gives me the following: cat /proc/sys/kernel/tainted 4225 linux-tools/kernel-tools is not installed and doesn't exist in the NerdPack plugin so I can't run that. for i in $(seq 18); do echo $(($i-1)) $(($(cat /proc/sys/kernel/tainted)>>($i-1)&1));done 0 1 1 0 2 0 3 0 4 0 5 0 6 0 7 1 8 0 9 0 10 0 11 0 12 1 13 0 14 0 15 0 16 0 17 0 If I'm reading the decoding table correctly, a proprietary module was loaded, the kernel died recently and an externally built module was loaded. This doesn't tell me much but maybe it can help finding out why it happened. unraid-diagnostics-20201001-0839.zip Quote Link to comment
civic95man Posted October 1, 2020 Share Posted October 1, 2020 9 hours ago, SelfSD said: If I'm reading the decoding table correctly, a proprietary module was loaded The 'tainted' keyword is because of the out-of-tree nvidia driver which was loaded. Its nothing bad or to be worried about - just lets people know that you are using an "unsupported" configuration due to that oot driver so kernel-level tech support would be limited. The call trace appears to be a kernel bug relating to kswapd and several other people are seeing this issue - according to google. It *may* not be a critical issue and can be ignored, but I don't know for sure. You could always try the beta version to see if the newer kernel fixes things. 1 Quote Link to comment
SelfSD Posted October 1, 2020 Author Share Posted October 1, 2020 Thank you, that's good to hear. I've been holding off the betas but maybe I should try the latest one. If the server crashes once more I'll jump on the new 29 beta. Quote Link to comment
civic95man Posted October 1, 2020 Share Posted October 1, 2020 This current beta seems to be pretty solid for people and for some its the only way to support their new hardware. Just be sure to read up on the release notes because there were some significant changes Quote Link to comment
SelfSD Posted October 2, 2020 Author Share Posted October 2, 2020 I've been following the changes and I'm excited for it all! But I do wonder if that call trace messes with my dockers and VM. I'm not able to stop any of the dockers and they're all stuck on: [s6-finish] sending all processes the KILL signal and exiting. I managed to manually kill the dockers and stop the docker service, but now when I'm attempting to shut down the array, it's stuck on: Oct 2 17:24:46 UnRaid root: Waiting on VMs to shutdown Oct 2 17:24:46 UnRaid root: Stopping libvirtd... I can't manually stop the process either as it's not running. root@UnRaid:~# /etc/rc.d/rc.libvirt stop libvirt is not running... unraid-diagnostics-20201002-1731.zip Quote Link to comment
SelfSD Posted October 2, 2020 Author Share Posted October 2, 2020 It got stuck on "turning off swap" even though I did not have a swapfile enabled. It generated a diagnostics file which I have attached. I'm gonna give 6.9.0 beta 29 a try. Hopefully it will solve these weird issues. unraid-diagnostics-20201002-1740.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.