cyberspectre Posted May 26, 2019 Share Posted May 26, 2019 Hey everyone. My server had been solid up until recently, when I began experiencing new problems. Most importantly, the web UI stops working after a few days of uptime. Secondarily — and perhaps this is related — it's outputting messages I've never seen before, such as: May 25 17:10:12 ANDRAS4 kernel: Call Trace: May 25 17:10:12 ANDRAS4 kernel: lookup_fast+0x1d2/0x27a May 25 17:10:12 ANDRAS4 kernel: path_openat+0x2b6/0xc07 May 25 17:10:12 ANDRAS4 kernel: ? filename_lookup.part.16+0xa5/0xcc May 25 17:10:12 ANDRAS4 kernel: do_filp_open+0x4c/0xa9 May 25 17:10:12 ANDRAS4 kernel: ? _copy_to_user+0x22/0x28 May 25 17:10:12 ANDRAS4 kernel: do_sys_open+0x132/0x1ce May 25 17:10:12 ANDRAS4 kernel: do_syscall_64+0x57/0xe6 May 25 17:10:12 ANDRAS4 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 May 25 17:10:12 ANDRAS4 kernel: RIP: 0033:0x145931420380 May 25 17:10:12 ANDRAS4 kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 36 48 8d 05 87 c3 0d 00 8b 00 85 c0 75 5a 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 84 00 00 00 48 83 c4 68 5b 5d c3 0f 1f 44 May 25 17:10:12 ANDRAS4 kernel: RSP: 002b:00007ffcdba6f960 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 May 25 17:10:12 ANDRAS4 kernel: RAX: ffffffffffffffda RBX: 0000000000420f23 RCX: 0000145931420380 May 25 17:10:12 ANDRAS4 kernel: RDX: 0000000000000000 RSI: 0000000000664860 RDI: 00000000ffffff9c May 25 17:10:12 ANDRAS4 kernel: RBP: 0000000000628270 R08: 0000000000000000 R09: 00000000ffffffff May 25 17:10:12 ANDRAS4 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008 May 25 17:10:12 ANDRAS4 kernel: R13: 0000000000420f23 R14: 0000000000000000 R15: 00001459314fe4c0 May 25 17:10:12 ANDRAS4 kernel: Modules linked in: dm_mod dax xt_CHECKSUM iptable_mangle ipt_REJECT ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap veth ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod it87 hwmon_vid igb i2c_algo_bit alx mdio edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd k10temp ahci i2c_piix4 glue_helper i2c_core libahci wmi_bmof mxm_wmi wmi ccp button pcc_cpufreq acpi_cpufreq [last unloaded: i2c_algo_bit] May 25 17:10:12 ANDRAS4 kernel: ---[ end trace 08e156b27250f0bc ]--- May 25 17:10:12 ANDRAS4 kernel: RIP: 0010:__d_lookup_rcu+0x5a/0x12f May 25 17:10:12 ANDRAS4 kernel: Code: 89 ea 41 89 ee d3 ea 48 8d 04 d0 48 8b 18 48 89 e8 48 c1 e8 20 48 89 04 24 48 83 e3 fe 48 85 db 0f 84 c4 00 00 00 4c 8d 6b f8 <44> 8b 63 fc 4c 39 7b 10 0f 85 aa 00 00 00 48 83 7b 08 00 0f 84 9f May 25 17:10:12 ANDRAS4 kernel: RSP: 0018:ffffc90003f13bc0 EFLAGS: 00010282 May 25 17:10:12 ANDRAS4 kernel: RAX: 000000000000000c RBX: ffef88028cd56008 RCX: 000000000000000a May 25 17:10:12 ANDRAS4 kernel: RDX: 000000000006f25d RSI: ffffc90003f13d50 RDI: ffff88028fcc3440 May 25 17:10:12 ANDRAS4 kernel: RBP: 0000000c1bc97651 R08: ffffc90003f13d50 R09: 8e04502e0637f6d7 May 25 17:10:12 ANDRAS4 kernel: R10: ffffc90003f13c2c R11: 8080808080808080 R12: ffffc90003f13c88 May 25 17:10:12 ANDRAS4 kernel: R13: ffef88028cd56000 R14: 000000001bc97651 R15: ffff88028fcc3440 May 25 17:10:12 ANDRAS4 kernel: FS: 00001459314fe540(0000) GS:ffff88082ec40000(0000) knlGS:0000000000000000 May 25 17:10:12 ANDRAS4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 25 17:10:12 ANDRAS4 kernel: CR2: 00000000006ab000 CR3: 0000000282d2e000 CR4: 00000000003406e0 May 25 17:10:27 ANDRAS4 kernel: general protection fault: 0000 [#19] SMP NOPTI May 25 17:10:27 ANDRAS4 kernel: CPU: 0 PID: 13062 Comm: lsof Tainted: G D 4.18.20-unRAID #1 May 25 17:10:27 ANDRAS4 kernel: Hardware name: Gigabyte Technology Co., Ltd. AX370-Gaming K7/AX370-Gaming K7, BIOS F10 12/07/2017 May 25 17:10:27 ANDRAS4 kernel: RIP: 0010:__d_lookup+0x3e/0x12b May 25 17:10:27 ANDRAS4 kernel: Code: 51 44 8b 26 8b 0d fd 58 d5 00 48 8b 05 ee 58 d5 00 44 89 e2 d3 ea 48 8d 04 d0 48 8b 18 48 83 e3 fe 48 85 db 0f 84 dc 00 00 00 <44> 39 63 18 0f 85 ca 00 00 00 48 8d 43 50 48 89 c7 48 89 04 24 e8 May 25 17:10:27 ANDRAS4 kernel: RSP: 0018:ffffc90039777c80 EFLAGS: 00010206 May 25 17:10:27 ANDRAS4 kernel: RAX: ffffc9000047b378 RBX: 0010000000000000 RCX: 000000000000000a May 25 17:10:27 ANDRAS4 kernel: RDX: 000000000006f46f RSI: ffffc90039777df0 RDI: ffff8807b58db980 May 25 17:10:27 ANDRAS4 kernel: RBP: ffff8808071f9800 R08: 61c8864680b583eb R09: ffff8807b58db980 May 25 17:10:27 ANDRAS4 kernel: R10: 0000000000000000 R11: 8080808080808080 R12: 000000001bd1bfbc May 25 17:10:27 ANDRAS4 kernel: R13: ffffffffffffffff R14: ffffc90039777df0 R15: ffff8807b58db980 May 25 17:10:27 ANDRAS4 kernel: FS: 000014583c9ce540(0000) GS:ffff88082ec00000(0000) knlGS:0000000000000000 May 25 17:10:27 ANDRAS4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 25 17:10:27 ANDRAS4 kernel: CR2: 00000000006ab000 CR3: 00000007b8c96000 CR4: 00000000003406f0 My diagnostics file is attached. Could someone take a look and let me know what's going on? Thanks! andras4-diagnostics-20190525-1645.zip Quote Link to comment
JorgeB Posted May 26, 2019 Share Posted May 26, 2019 Are you using the Ryzen workarounds? https://forums.unraid.net/topic/80006-random-crashes-restarts/?do=findComment&comment=742911 Quote Link to comment
cyberspectre Posted May 26, 2019 Author Share Posted May 26, 2019 (edited) 13 hours ago, johnnie.black said: Are you using the Ryzen workarounds? https://forums.unraid.net/topic/80006-random-crashes-restarts/?do=findComment&comment=742911 I believe I've done all the workarounds. And, as I said, the system was stable with uptimes of several months until just recently. Here are the AMD optimizations I did. At least, all the ones I can remember. I should have made a note. BIOS: Global C-States set to DISABLED Power Supply Idle set to TYPICAL SVM set to ENABLE UnRaid: Set rcu_nocbs kernel parameter Edited May 26, 2019 by cyberspectre Quote Link to comment
JorgeB Posted May 27, 2019 Share Posted May 27, 2019 Try safe mode without any dockers/VMs for a few days, if it still stops responding it's likely hardware related. Quote Link to comment
cyberspectre Posted May 27, 2019 Author Share Posted May 27, 2019 15 hours ago, johnnie.black said: Try safe mode without any dockers/VMs for a few days, if it still stops responding it's likely hardware related. Any idea what hardware I need to replace if it does? Quote Link to comment
cyberspectre Posted May 28, 2019 Author Share Posted May 28, 2019 Sure enough, it happened again with no VMs or Dockers running. I ran memtest86+ and got no errors in the memory... Is it definitely hardware? What would cause this? Quote Link to comment
cyberspectre Posted May 28, 2019 Author Share Posted May 28, 2019 51 minutes ago, cyberspectre said: Sure enough, it happened again with no VMs or Dockers running. I ran memtest86+ and got no errors in the memory... Is it definitely hardware? What would cause this? Yikes, just kidding. I ran memtest86 for longer and did indeed get memory errors. Going to try to isolate which module now. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.