nraygun Posted April 9, 2023 Share Posted April 9, 2023 I was doing some things in Plex (trying to add a cartoon series) and then all of a sudden the server crashed. I could not log into the GUI, but some of the shares still worked. And I could ssh into it. I managed to grab the dmesg before I did a reboot. I could not do a "powerdown" but I could do a "shutdown -h now". It's doing a parity check as I type this. Any ideas what happened here? [3547852.627297] BUG: kernel NULL pointer dereference, address: 0000000000000000 [3547852.627882] #PF: supervisor write access in kernel mode [3547852.628434] #PF: error_code(0x0002) - not-present page [3547852.628975] PGD 0 P4D 0 [3547852.629511] Oops: 0002 [#1] PREEMPT SMP PTI [3547852.630051] CPU: 8 PID: 19396 Comm: shfs Tainted: G W I 5.19.17-Unraid #2 [3547852.630621] Hardware name: Dell Inc. PowerEdge R710/XXXXXX, BIOS 6.6.0 05/22/2018 [3547852.631155] RIP: 0010:__rb_erase_color+0xe7/0x1ca [3547852.631732] Code: 8b 6b 10 f6 45 00 01 75 2f 4c 8b 75 08 48 89 d8 48 89 ee 31 c9 48 83 c8 01 4c 89 ea 48 89 df 4c 89 73 10 48 89 5d 08 4c 89 f5 <49> 89 06 e8 d8 fe ff ff 2e e8 9a be 79 00 48 8b 45 10 48 85 c0 74 [3547852.632927] RSP: 0018:ffffc9000b057d18 EFLAGS: 00010286 [3547852.633522] RAX: ffff888cc8e3a621 RBX: ffff888cc8e3a620 RCX: 0000000000000000 [3547852.634153] RDX: ffff888370fb0cc8 RSI: ffff888cc8e3a1a0 RDI: ffff888cc8e3a620 [3547852.634759] RBP: 0000000000000000 R08: ffff888acf65f520 R09: ffffc9000b057cd0 [3547852.635361] R10: 00001466f75f7000 R11: 00001466f75f8000 R12: ffffffff811d547a [3547852.636008] R13: ffff888370fb0cc8 R14: 0000000000000000 R15: ffff888cc8e3a600 [3547852.636646] FS: 000014670d18e6c0(0000) GS:ffff88902fb00000(0000) knlGS:0000000000000000 [3547852.637263] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3547852.637876] CR2: 0000000000000000 CR3: 0000000855544004 CR4: 00000000000226e0 [3547852.638513] Call Trace: [3547852.639169] <TASK> [3547852.639824] __do_munmap+0x1c0/0x2e2 [3547852.640466] mmap_region+0x10d/0x45d [3547852.641070] do_mmap+0x3c1/0x42d [3547852.641664] vm_mmap_pgoff+0xbb/0x112 [3547852.642255] ksys_mmap_pgoff+0x138/0x166 [3547852.642835] do_syscall_64+0x6b/0x81 [3547852.643407] entry_SYSCALL_64_after_hwframe+0x63/0xcd [3547852.643971] RIP: 0033:0x14670d7a47f3 [3547852.644526] Code: ef e8 61 b8 ff ff eb e7 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 89 ca 41 f7 c1 ff 0f 00 00 75 14 b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 25 c3 0f 1f 40 00 48 8b 05 d9 95 0d 00 64 c7 [3547852.645726] RSP: 002b:000014670d18db88 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 [3547852.646339] RAX: ffffffffffffffda RBX: 00005641df847b50 RCX: 000014670d7a47f3 [3547852.646954] RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000000000000 [3547852.647569] RBP: 00005641df847a20 R08: 00000000ffffffff R09: 0000000000000000 [3547852.648180] R10: 0000000000000022 R11: 0000000000000246 R12: 00000000000000b8 [3547852.648787] R13: 00005641df847a88 R14: 0000000000000000 R15: 000014670d18dc80 [3547852.649398] </TASK> [3547852.650000] Modules linked in: vhost_net vhost tap kvm_intel kvm macvlan md_mod tls xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle tun vhost_iotlb veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc ipmi_devintf iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bnx2 mgag200 drm_shmem_helper i2c_algo_bit drm_kms_helper drm sr_mod cdrom ipmi_ssif backlight intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd intel_cstate i2c_core input_leds syscopyarea intel_uncore mpt3sas led_class joydev ahci sysfillrect ata_piix sysimgblt fb_sys_fops [3547852.650111] libahci raid_class scsi_transport_sas wmi ipmi_si acpi_power_meter button acpi_cpufreq unix [last unloaded: md_mod] [3547852.656724] CR2: 0000000000000000 [3547852.658803] ---[ end trace 0000000000000000 ]--- [3547852.674564] RIP: 0010:__rb_erase_color+0xe7/0x1ca [3547852.675323] Code: 8b 6b 10 f6 45 00 01 75 2f 4c 8b 75 08 48 89 d8 48 89 ee 31 c9 48 83 c8 01 4c 89 ea 48 89 df 4c 89 73 10 48 89 5d 08 4c 89 f5 <49> 89 06 e8 d8 fe ff ff 2e e8 9a be 79 00 48 8b 45 10 48 85 c0 74 [3547852.676900] RSP: 0018:ffffc9000b057d18 EFLAGS: 00010286 [3547852.677696] RAX: ffff888cc8e3a621 RBX: ffff888cc8e3a620 RCX: 0000000000000000 [3547852.678495] RDX: ffff888370fb0cc8 RSI: ffff888cc8e3a1a0 RDI: ffff888cc8e3a620 [3547852.679286] RBP: 0000000000000000 R08: ffff888acf65f520 R09: ffffc9000b057cd0 [3547852.680055] R10: 00001466f75f7000 R11: 00001466f75f8000 R12: ffffffff811d547a [3547852.680811] R13: ffff888370fb0cc8 R14: 0000000000000000 R15: ffff888cc8e3a600 [3547852.681565] FS: 000014670d18e6c0(0000) GS:ffff88902fb00000(0000) knlGS:0000000000000000 [3547852.682360] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3547852.683168] CR2: 0000000000000000 CR3: 0000000855544004 CR4: 00000000000226e0 [3548148.759343] br0: port 2(vnet1) entered disabled state [3548148.762004] device vnet1 left promiscuous mode [3548148.762668] br0: port 2(vnet1) entered disabled state [3548940.712053] elogind-daemon[1579]: New session c20 of user root. [3549039.152378] elogind-daemon[1579]: Removed session c20. flores-diagnostics-20230408-2107.zip Quote Link to comment
trwolff04 Posted April 9, 2023 Share Posted April 9, 2023 (edited) I have a very similar issue, same message in my log, although mine says CPU 26 is tainted, mentions qbittorrent. Almost same server (I am running a Poweredge R720) and I have an open thread about it as well. Thoughts? Edited April 9, 2023 by trwolff04 Quote Link to comment
nraygun Posted April 9, 2023 Author Share Posted April 9, 2023 (edited) @trwolff04 Seems like I am having more problems since going to unRaid 6.11.x. No problems before. In addition to this crash condition, my DelugeVPN goes down overnight sometimes(every day or two). A simple restart brings it back up. Trying to look into that as well in a separate posting. Edited April 10, 2023 by nraygun Quote Link to comment
trwolff04 Posted April 10, 2023 Share Posted April 10, 2023 (edited) I'm having the same issue, except with haugene qbittorrent w/ vpn, though normally my gui crashes or at last partially crashes when it happens. No issues on earlier 6.10.x builds. Edited April 10, 2023 by trwolff04 Quote Link to comment
nraygun Posted April 10, 2023 Author Share Posted April 10, 2023 Hmmm, similar sort of thing happening for two users with bittorrent and crashing server with partially inoperative GUI. Do you know if anyone else is having these seemingly related problems? Also, I'm starting to think it's a kernel bug of somesort. Google shows a handful of reports of this null pointer dereference. How can we get Limetech folks involved in this? Or do we need to jump on another thread? Quote Link to comment
nraygun Posted April 10, 2023 Author Share Posted April 10, 2023 (edited) Thanks @JorgeB! My server is not, thankfully, crashing as much as others are reporting. I guess what we're seeing here is maybe my delugevpn errors and server crashes are related. I'll have to look through this thread for the version of binhex/arch-delugevpn that uses libtorrent 1.x. Not sure I want to switch out the whole bittorrent VPN container for another one. That'll be my last resort. I'm a relative noob on this stuff. Edited April 10, 2023 by nraygun Quote Link to comment
nraygun Posted April 10, 2023 Author Share Posted April 10, 2023 I didn't see any recommendations for binhex/arch-delugevpn users so I asked for help in the thread JorgeB provided. Quote Link to comment
trwolff04 Posted April 10, 2023 Share Posted April 10, 2023 (edited) qbittorrent has always run better for me and binhex has a version available for that. You need to pull version 4.3.9-2-01 as that is the last one to use libtorrent v1. Mine is running fine again though I'll know for sure in a week if it doesn't crash. Through ssh: docker pull binhex/arch-qbittorrentvpn:4.3.9-2-01 in unraid apps, search install binhex qbittorrentvpn, under container options specify "binhex/arch-qbittorrentvpn:4.3.9-2-01" next to repository and it should work fine. I know you don't want to switch but I'm confirmed operational. Edited April 10, 2023 by trwolff04 Quote Link to comment
nraygun Posted April 10, 2023 Author Share Posted April 10, 2023 (edited) @trwolff04 Did you just recently go to this version in an effort to resolve the server crashes or have you been getting server crashes even with this version of qbittorrent? I just went ahead and got qbittorrent operational. Wasn't too bad. Testing it now. Edited April 11, 2023 by nraygun Quote Link to comment
trwolff04 Posted April 11, 2023 Share Posted April 11, 2023 (edited) I downgraded to the last version of hexbin-qbittorrent (4.3.9-2-01) that used libtorrentv1 today dpecifically because I was having docker crashes, and sometimes full gui crashes, anywhere between 24 hours and 8 days after starting my server. This version seems to fix it but I won't know for sure until another week passes. Edited April 11, 2023 by trwolff04 Quote Link to comment
nraygun Posted April 11, 2023 Author Share Posted April 11, 2023 56 minutes ago, trwolff04 said: I downgraded to the last version of hexbin-qbittorrent (4.3.9-2-01) that used libtorrentv1 today dpecifically because I was having docker crashes, and sometimes full gui crashes, anywhere between 24 hours and 8 days after starting my server. This version seems to fix it but I won't know for sure until another week passes. Yep, same here. Now we wait! (I'll go update the other post to let them know I went to qbittorrent 4.3.9-2-01) Quote Link to comment
nraygun Posted April 17, 2023 Author Share Posted April 17, 2023 Hey @trwolff04 - All good here, how about you? No crashes (although my machine didn't crash much - maybe a few times) But qbittorrent is up and stable. And the UI is good too. Quote Link to comment
nraygun Posted April 21, 2023 Author Share Posted April 21, 2023 (edited) Had a crash - sort of. I removed one of my drives from the array using a method from SpaceInvader One. That went fine. At some point during this process, I got what appears to be a crash: CPU: 22 PID: 18564 Comm: kworker/22:0 Tainted: G I 5.19.17-Unraid #2 Workqueue: events macvlan_process_broadcast [macvlan] Everything seems fine and I didn't notice this crash until I looked in the logs. I've seen mention of this type of crash in other threads. It's rebuilding the parity since I removed a drive. Once this is done, I'll reboot one more time and monitor. Edited April 21, 2023 by nraygun Quote Link to comment
JorgeB Posted April 21, 2023 Share Posted April 21, 2023 Try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)). Quote Link to comment
nraygun Posted April 21, 2023 Author Share Posted April 21, 2023 5 hours ago, JorgeB said: Try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)). I had tried that and it caused my DuckDNS container to stop working. Putting it back to macvlan restored operation. Maybe once the parity is done building, I'll try it again with a reboot. Quote Link to comment
nraygun Posted April 22, 2023 Author Share Posted April 22, 2023 Parity done, changed to ipvlan, rebooted - all good I think. There's an entry about a kernel bug that I'll post in a different thread, but I think the macvlan crash should be OK now. Quote Link to comment
nraygun Posted April 23, 2023 Author Share Posted April 23, 2023 12 hours ago, nraygun said: Parity done, changed to ipvlan, rebooted - all good I think. There's an entry about a kernel bug that I'll post in a different thread, but I think the macvlan crash should be OK now. Nope. Using ipvlan renders my server inoperable. My server can't ping hosts and my DuckDNS no longer updates. Going back to macvlan restored all operation. While I get what appears to be crashes, my server stays operational. There's this post for setting up a separate network on a separated NIC that I might look into: https://forums.unraid.net/topic/137048-guide-how-to-solve-macvlan-and-ipvlan-issues-with-containers-on-a-custom-network/ But as far as the crashes from libtorrent, I think I'm good using the old qbittorrent container with libtorrent 1.x. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.