• Crashes since updating to v6.11.x for qBittorrent and Deluge users


    JorgeB
    • Closed

    EDIT: issue was traced to libtorrent 2.x, it's not an Unraid problem, more info in this post:

     

    https://forums.unraid.net/bug-reports/stable-releases/crashes-since-updating-to-v611x-for-qbittorrent-and-deluge-users-r2153/?do=findComment&comment=21671

     

     

    Original Post:

     

    I'm creating this to better track an issue that some users have been reporting where Unraid started crashing after updating to v6.11.x (it happens with both 6.11.0 and 6.11.1), there's a very similar call traced logged for all cases, e.g:

     

    Oct 12 04:18:27 zaBOX kernel: BUG: kernel NULL pointer dereference, address: 00000000000000b6
    Oct 12 04:18:27 zaBOX kernel: #PF: supervisor read access in kernel mode
    Oct 12 04:18:27 zaBOX kernel: #PF: error_code(0x0000) - not-present page
    Oct 12 04:18:27 zaBOX kernel: PGD 0 P4D 0
    Oct 12 04:18:27 zaBOX kernel: Oops: 0000 [#1] PREEMPT SMP PTI
    Oct 12 04:18:27 zaBOX kernel: CPU: 4 PID: 28596 Comm: Disk Tainted: P     U  W  O      5.19.14-Unraid #1
    Oct 12 04:18:27 zaBOX kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS PRO WIFI/Z390 AORUS PRO WIFI-CF, BIOS F12 11/05/2021
    Oct 12 04:18:27 zaBOX kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21
    Oct 12 04:18:27 zaBOX kernel: Code: e8 8e 61 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 9e 9b 64 00 48 81 c4 88 00 00 00 5b c3 cc cc cc cc <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb
    Oct 12 04:18:27 zaBOX kernel: RSP: 0000:ffffc900070dbcc0 EFLAGS: 00010246
    Oct 12 04:18:27 zaBOX kernel: RAX: 0000000000000082 RBX: 0000000000000082 RCX: 0000000000000082
    Oct 12 04:18:27 zaBOX kernel: RDX: 0000000000000001 RSI: ffff888757426fe8 RDI: 0000000000000082
    Oct 12 04:18:27 zaBOX kernel: RBP: 0000000000000000 R08: 0000000000000028 R09: ffffc900070dbcd0
    Oct 12 04:18:27 zaBOX kernel: R10: ffffc900070dbcd0 R11: ffffc900070dbd48 R12: 0000000000000000
    Oct 12 04:18:27 zaBOX kernel: R13: ffff88824f95d138 R14: 000000000007292c R15: ffff88824f95d140
    Oct 12 04:18:27 zaBOX kernel: FS:  000014ed38204b38(0000) GS:ffff8888a0500000(0000) knlGS:0000000000000000
    Oct 12 04:18:27 zaBOX kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Oct 12 04:18:27 zaBOX kernel: CR2: 00000000000000b6 CR3: 0000000209854005 CR4: 00000000003706e0
    Oct 12 04:18:27 zaBOX kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Oct 12 04:18:27 zaBOX kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Oct 12 04:18:27 zaBOX kernel: Call Trace:
    Oct 12 04:18:27 zaBOX kernel: <TASK>
    Oct 12 04:18:27 zaBOX kernel: __filemap_get_folio+0x98/0x1ff
    Oct 12 04:18:27 zaBOX kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
    Oct 12 04:18:27 zaBOX kernel: filemap_fault+0x6e/0x524
    Oct 12 04:18:27 zaBOX kernel: __do_fault+0x2d/0x6e
    Oct 12 04:18:27 zaBOX kernel: __handle_mm_fault+0x9a5/0xc7d
    Oct 12 04:18:27 zaBOX kernel: handle_mm_fault+0x113/0x1d7
    Oct 12 04:18:27 zaBOX kernel: do_user_addr_fault+0x36a/0x514
    Oct 12 04:18:27 zaBOX kernel: exc_page_fault+0xfc/0x11e
    Oct 12 04:18:27 zaBOX kernel: asm_exc_page_fault+0x22/0x30
    Oct 12 04:18:27 zaBOX kernel: RIP: 0033:0x14ed3a0ae7b5
    Oct 12 04:18:27 zaBOX kernel: Code: 8b 48 08 48 8b 32 48 8b 00 48 39 f0 73 09 48 8d 14 08 48 39 d6 eb 0c 48 39 c6 73 0b 48 8d 14 0e 48 39 d0 73 02 0f 0b 48 89 c7 <f3> a4 66 48 8d 3d 59 b7 22 00 66 66 48 e8 d9 d8 f6 ff 48 89 28 48
    Oct 12 04:18:27 zaBOX kernel: RSP: 002b:000014ed38203960 EFLAGS: 00010206
    Oct 12 04:18:27 zaBOX kernel: RAX: 000014ed371aa160 RBX: 000014ed38203ad0 RCX: 0000000000004000
    Oct 12 04:18:27 zaBOX kernel: RDX: 000014c036530000 RSI: 000014c03652c000 RDI: 000014ed371aa160
    Oct 12 04:18:27 zaBOX kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000014ed38203778
    Oct 12 04:18:27 zaBOX kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
    Oct 12 04:18:27 zaBOX kernel: R13: 000014ed38203b40 R14: 000014ed384fe940 R15: 000014ed38203ac0
    Oct 12 04:18:27 zaBOX kernel: </TASK>
    Oct 12 04:18:27 zaBOX kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net vhost vhost_iotlb tap tun veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs md_mod kvmgt mdev i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper intel_gtt agpgart hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls ipv6 nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) x86_pkg_temp_thermal intel_powerclamp drm_kms_helper btusb btrtl i2c_i801 btbcm coretemp gigabyte_wmi wmi_bmof intel_wmi_thunderbolt mxm_wmi kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd
    Oct 12 04:18:27 zaBOX kernel: btintel rapl intel_cstate intel_uncore e1000e i2c_smbus bluetooth drm nvme nvme_core ahci i2c_core libahci ecdh_generic ecc syscopyarea sysfillrect input_leds sysimgblt led_class joydev nzxt_kraken2 intel_pch_thermal fb_sys_fops thermal fan video tpm_crb wmi tpm_tis backlight tpm_tis_core tpm acpi_pad button unix
    Oct 12 04:18:27 zaBOX kernel: CR2: 00000000000000b6
    Oct 12 04:18:27 zaBOX kernel: ---[ end trace 0000000000000000 ]---

     

    Another example with very different hardware:

    Oct 11 21:32:08 Impulse kernel: BUG: kernel NULL pointer dereference, address: 0000000000000056
    Oct 11 21:32:08 Impulse kernel: #PF: supervisor read access in kernel mode
    Oct 11 21:32:08 Impulse kernel: #PF: error_code(0x0000) - not-present page
    Oct 11 21:32:08 Impulse kernel: PGD 0 P4D 0
    Oct 11 21:32:08 Impulse kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
    Oct 11 21:32:08 Impulse kernel: CPU: 1 PID: 5236 Comm: Disk Not tainted 5.19.14-Unraid #1
    Oct 11 21:32:08 Impulse kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING II, BIOS 4301 03/04/2021
    Oct 11 21:32:08 Impulse kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21
    Oct 11 21:32:08 Impulse kernel: Code: e8 8e 61 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 9e 9b 64 00 48 81 c4 88 00 00 00 5b e9 cc 5f 86 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb
    Oct 11 21:32:08 Impulse kernel: RSP: 0000:ffffc900026ffcc0 EFLAGS: 00010246
    Oct 11 21:32:08 Impulse kernel: RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000022
    Oct 11 21:32:08 Impulse kernel: RDX: 0000000000000001 RSI: ffff88801e450b68 RDI: 0000000000000022
    Oct 11 21:32:08 Impulse kernel: RBP: 0000000000000000 R08: 000000000000000c R09: ffffc900026ffcd0
    Oct 11 21:32:08 Impulse kernel: R10: ffffc900026ffcd0 R11: ffffc900026ffd48 R12: 0000000000000000
    Oct 11 21:32:08 Impulse kernel: R13: ffff888428441cb8 R14: 00000000000028cd R15: ffff888428441cc0
    Oct 11 21:32:08 Impulse kernel: FS:  00001548d34fa6c0(0000) GS:ffff88842e840000(0000) knlGS:0000000000000000
    Oct 11 21:32:08 Impulse kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Oct 11 21:32:08 Impulse kernel: CR2: 0000000000000056 CR3: 00000001a3fe6000 CR4: 00000000003506e0
    Oct 11 21:32:08 Impulse kernel: Call Trace:
    Oct 11 21:32:08 Impulse kernel: <TASK>
    Oct 11 21:32:08 Impulse kernel: __filemap_get_folio+0x98/0x1ff
    Oct 11 21:32:08 Impulse kernel: filemap_fault+0x6e/0x524
    Oct 11 21:32:08 Impulse kernel: __do_fault+0x30/0x6e
    Oct 11 21:32:08 Impulse kernel: __handle_mm_fault+0x9a5/0xc7d
    Oct 11 21:32:08 Impulse kernel: handle_mm_fault+0x113/0x1d7
    Oct 11 21:32:08 Impulse kernel: do_user_addr_fault+0x36a/0x514
    Oct 11 21:32:08 Impulse kernel: exc_page_fault+0xfc/0x11e
    Oct 11 21:32:08 Impulse kernel: asm_exc_page_fault+0x22/0x30
    Oct 11 21:32:08 Impulse kernel: RIP: 0033:0x1548dbc04741
    Oct 11 21:32:08 Impulse kernel: Code: 48 01 d0 eb 1b 0f 1f 40 00 f3 0f 1e fa 48 39 d1 0f 82 73 28 fc ff 0f 1f 00 f3 0f 1e fa 48 89 f8 48 83 fa 20 0f 82 af 00 00 00 <c5> fe 6f 06 48 83 fa 40 0f 87 3e 01 00 00 c5 fe 6f 4c 16 e0 c5 fe
    Oct 11 21:32:08 Impulse kernel: RSP: 002b:00001548d34f9808 EFLAGS: 00010202
    Oct 11 21:32:08 Impulse kernel: RAX: 000015480c010d30 RBX: 000015480c018418 RCX: 00001548d34f9a40
    Oct 11 21:32:08 Impulse kernel: RDX: 0000000000004000 RSI: 000015471f8cd50f RDI: 000015480c010d30
    Oct 11 21:32:08 Impulse kernel: RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
    Oct 11 21:32:08 Impulse kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
    Oct 11 21:32:08 Impulse kernel: R13: 00001548d34f9ac0 R14: 0000000000000003 R15: 0000154814013d10
    Oct 11 21:32:08 Impulse kernel: </TASK>
    Oct 11 21:32:08 Impulse kernel: Modules linked in: xt_connmark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc ipv6 mlx4_en mlx4_core igb i2c_algo_bit edac_mce_amd edac_core kvm_amd kvm wmi_bmof mxm_wmi asus_wmi_sensors crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas aesni_intel crypto_simd nvme cryptd ahci i2c_piix4 raid_class rapl k10temp i2c_core nvme_core ccp scsi_transport_sas libahci wmi button acpi_cpufreq unix [last unloaded: mlx4_core]
    Oct 11 21:32:08 Impulse kernel: CR2: 0000000000000056
    Oct 11 21:32:08 Impulse kernel: ---[ end trace 0000000000000000 ]---

     

    So they always start with this (end address will change):

     

    Oct 11 05:02:02 Cogsworth kernel: BUG: kernel NULL pointer dereference, address: 0000000000000076

     

    and always have this:

     

    Oct 11 05:02:02 Cogsworth kernel: Call Trace:
    Oct 11 05:02:02 Cogsworth kernel: <TASK>
    Oct 11 05:02:02 Cogsworth kernel: __filemap_get_folio+0x98/0x1ff

     

    The fact that it's happening to various users with very different hardware, both Intel and AMD, makes me think it's not a hardware/firmware issue, so we can try to find if they are running anything in common, these are the plugins I've found in common between the 4 or 5 cases found so far, these are some of the most used plugins so not surprising they are installed in all but it's also easy to rule them out:

     

    ca.backup2.plg - 2022.07.23  (Up to date)
    community.applications.plg - 2022.09.30  (Up to date)
    dynamix.active.streams.plg - 2020.06.17  (Up to date)
    file.activity.plg - 2022.08.19  (Up to date)
    fix.common.problems.plg - 2022.10.09  (Up to date)
    unassigned.devices.plg - 2022.10.03  (Up to date)
    unassigned.devices-plus.plg - 2022.08.19  (Up to date)

     

    So anyone having this issue try temporarily uninstalling/disabling these plugin to see if there's any difference.

    • Like 2
    • Upvote 1



    User Feedback

    Recommended Comments



    1 hour ago, naebula said:

    I ran into the same issue yesterday while upgrading to 6.11.2 from 6.10.3. Even rolling back to 6.10.3 would lead to the GUI becoming unresponsive and having to power-button shut down many times. 

     

    Here's what I tried that did NOT work:

    • Reverting back to 6.10.3
    • Disabling docker (both on 6.10.3 and 6.11.2)
    • Completely deleting docker vDisk image and disabling docker
    • Rebuilding vDisk then upgrading to 6.11.2 
    • Removing the plugins listed in the first post
    • Removing all plugins (including CA)
    • Disabling VMs (I did this from the very beginning)
    • Formatting flash and recreating using USB Media Creator, then copying entire config folder

     

    The only solution that ultimately worked was to format and recreate USB flash then selectively copying the bare minimum. Here's how I did that. 

    Format USB Flash, rewrite using USB Creator Tool, then copy only select files/folders:

    • key file
    • /config/shares
    • /config/pools

    Then I started it up and had to recreate all my users (not sure what I needed to copy to retain these settings).

    Then I installed a few docker containers via Apps and realized I need all my templates so I grabbed them from /config/plugins/dockerMan/templates-user.

     

    Needless to say, this was all I did on Saturday and I'm still rebuilding it now.

     

    Things I did not test that I still wonder about:

    • Could my cloudflare tunnel have caused it? I set this up since the last update. One side effect I kept seeing was a 502 bad gateway error when I tried to view logs via the browser so it made me think it was network related. Here 's how I set up the tunnel: https://docs.ibracorp.io/cloudflare-tunnel/ 
    • I also don't have SSL enabled for the GUI - again, network related, but not sure if it was it. 

     

    Nov  5 16:29:08 Tower kernel: BUG: kernel NULL pointer dereference, address: 0000000000000123
    Nov  5 16:29:08 Tower kernel: #PF: supervisor write access in kernel mode
    Nov  5 16:29:08 Tower kernel: #PF: error_code(0x0002) - not-present page
    Nov  5 16:29:08 Tower kernel: PGD 104a07067 P4D 104a07067 PUD 104a1c067 PMD 0 
    Nov  5 16:29:08 Tower kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
    Nov  5 16:29:08 Tower kernel: CPU: 4 PID: 767 Comm: udevd Tainted: G S                5.19.17-Unraid #2
    Nov  5 16:29:08 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. B550 AORUS PRO V2/B550 AORUS PRO V2, BIOS F15a 02/17/2022
    Nov  5 16:29:08 Tower kernel: RIP: 0010:netlink_recvmsg+0x2b3/0x2c0
    Nov  5 16:29:08 Tower kernel: Code: e8 5a c2 97 ff 8b 44 24 08 85 c0 41 0f 44 c4 48 8b 54 24 38 65 48 2b 14 25 28 00 00 00 74 05 e8 77 cc 0f 00 48 83 c4 40 5b 5d <01> 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f 1f 44 00 00 41 56 49 89
    Nov  5 16:29:08 Tower kernel: RSP: 0018:ffffc90000857ce0 EFLAGS: 00010282
    Nov  5 16:29:08 Tower kernel: RAX: 0000000000000063 RBX: 0000000000000040 RCX: 0000000000000000
    Nov  5 16:29:08 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: 00000000ffffffff
    Nov  5 16:29:08 Tower kernel: RBP: 00007ffe9bcd8210 R08: 0000000000000000 R09: ffffc90000857c28
    Nov  5 16:29:08 Tower kernel: R10: 0000000000000002 R11: 0000000000000020 R12: 0000000000000063
    Nov  5 16:29:08 Tower kernel: R13: ffffc90000857ca8 R14: ffffffff8223dbc0 R15: ffff88810490fd00
    Nov  5 16:29:08 Tower kernel: FS:  000014ca8cd9fbc0(0000) GS:ffff888ffe100000(0000) knlGS:0000000000000000
    Nov  5 16:29:08 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Nov  5 16:29:08 Tower kernel: CR2: 0000000000000123 CR3: 000000010728c000 CR4: 0000000000750ee0
    Nov  5 16:29:08 Tower kernel: PKRU: 55555554
    Nov  5 16:29:08 Tower kernel: Call Trace:
    Nov  5 16:29:08 Tower kernel: <TASK>
    Nov  5 16:29:08 Tower kernel: ____sys_recvmsg+0x7e/0x144
    Nov  5 16:29:08 Tower kernel: ? __import_iovec+0xa7/0xc0
    Nov  5 16:29:08 Tower kernel: ? import_iovec+0x17/0x1d
    Nov  5 16:29:08 Tower kernel: ? copy_msghdr_from_user+0x5c/0x87
    Nov  5 16:29:08 Tower kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
    Nov  5 16:29:08 Tower kernel: ___sys_recvmsg+0x7d/0xb8
    Nov  5 16:29:08 Tower kernel: ? _raw_write_unlock_irq+0x18/0x2d
    Nov  5 16:29:08 Tower kernel: ? do_epoll_wait+0x438/0x557
    Nov  5 16:29:08 Tower kernel: ? __rseq_handle_notify_resume+0x258/0x427
    Nov  5 16:29:08 Tower kernel: __sys_recvmsg+0x5b/0x92
    Nov  5 16:29:08 Tower kernel: do_syscall_64+0x6b/0x81
    Nov  5 16:29:08 Tower kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
    Nov  5 16:29:08 Tower kernel: RIP: 0033:0x14ca8d2a5ae5
    Nov  5 16:29:08 Tower kernel: Code: 21 03 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 53 48 83 ec 10 80 3d fc 8a 0d 00 00 74 22 b8 2f 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5b 48 63 d8 48 83 c4 10 48 89 d8 5b c3 0f 1f
    Nov  5 16:29:08 Tower kernel: RSP: 002b:00007ffe9bcd81a0 EFLAGS: 00000202 ORIG_RAX: 000000000000002f
    Nov  5 16:29:08 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014ca8d2a5ae5
    Nov  5 16:29:08 Tower kernel: RDX: 0000000000000000 RSI: 00007ffe9bcd8230 RDI: 0000000000000004
    Nov  5 16:29:08 Tower kernel: RBP: 0000000000000000 R08: 00000000064967d0 R09: 000000000045f370
    Nov  5 16:29:08 Tower kernel: R10: 00007ffe9bd45080 R11: 0000000000000202 R12: 000000000044a9e0
    Nov  5 16:29:08 Tower kernel: R13: 00007ffe9bcd82f0 R14: 00007ffe9bcd823c R15: 0000000000000000

     

     

    This was on an B550 5600g Ryzen system with 1 NVME and 3 drives. 

     

     

    tower-diagnostics-20221105-1637.zip


    I've done a lot of the same, and the random crashing has gotten worse. Every hour. Nothing in the logs. I've started replacing physical cables and drives to see if that is the problem. No relief yet. I guess I'll try a new USB install. PIA having to ditch all my config though.  

    Some items to note: I too am on a Ryzen system. Wonder if this has something to do with it? I also have my main disks on reiserfs but dread having to update them to XFS. 

    Link to comment

    Also Ryzen.  Moved most of my torrents from unraid to a seedbox (might be averaging 1.5mibs now), and the frequency for me has stayed about the same:  Once every 5-8 days - longest I've ever gone is 10.5 days.  My logs are the same as the logs I've posted 12 or so times before, so I won't bore with duplicate details.  Now I'm just down to Plex pulling on the system, and I refused to off-load it.  If this config won't function as a basic media server serving up 2-3 movie views per 24hrs, well...

     

    Again, the (temporary) fix for me every time is to stop the docker engine and restart.  Works 100% of the time.

    Link to comment
    6 hours ago, naebula said:
    Nov  5 16:29:08 Tower kernel: Call Trace:
    Nov  5 16:29:08 Tower kernel: <TASK>
    Nov  5 16:29:08 Tower kernel: ____sys_recvmsg+0x7e/0x144
    Nov  5 16:29:08 Tower kernel: ? __import_iovec+0xa7/0xc0
    Nov  5 16:29:08 Tower kernel: ? import_iovec+0x17/0x1d
    Nov  5 16:29:08 Tower kernel: ? copy_msghdr_from_user+0x5c/0x87
    Nov  5 16:29:08 Tower kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
    Nov  5 16:29:08 Tower kernel: ___sys_recvmsg+0x7d/0xb8
    Nov  5 16:29:08 Tower kernel: ? _raw_write_unlock_irq+0x18/0x2d
    Nov  5 16:29:08 Tower kernel: ? do_epoll_wait+0x438/0x557
    Nov  5 16:29:08 Tower kernel: ? __rseq_handle_notify_resume+0x258/0x427
    Nov  5 16:29:08 Tower kernel: __sys_recvmsg+0x5b/0x92
    Nov  5 16:29:08 Tower kernel: do_syscall_64+0x6b/0x81
    Nov  5 16:29:08 Tower kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd

     

    Not sure you're having the same issue that those of us in this post are having. Your Call Trace would have to look something like the one in @JorgeB original post. Most all people rolling back to 6.10.3 seem to have their issues fixed once downgraded.

     

    4 hours ago, canuck said:


    I've done a lot of the same, and the random crashing has gotten worse. Every hour. Nothing in the logs. I've started replacing physical cables and drives to see if that is the problem. No relief yet. I guess I'll try a new USB install. PIA having to ditch all my config though.  

    Some items to note: I too am on a Ryzen system. Wonder if this has something to do with it? I also have my main disks on reiserfs but dread having to update them to XFS. 

     

    Are you logging your syslog to your Flash? That should be catching everything so the reboot doesn't wipe any info of what's happening. I am also using Ryzen but I doubt that's the issue since in the first post it lists a Ryzen and Intel system.

     

    On 11/5/2022 at 6:27 AM, JorgeB said:

    There's a newer kernel, so it's worth trying for anyone affected.

     

    I did rollback to 6.10.3 but I have upgraded to 6.11.2 to see if the issue still exists in this version also. All plugins installed on the list.

    Link to comment
    12 hours ago, naebula said:

    I ran into the same issue yesterday while upgrading to 6.11.2 from 6.10.3.

    It's not the same issue, this thread is about users having the same call trace as described in the 1st post, and for this issue rolling back solves it.

    Link to comment
    13 minutes ago, dlandon said:

    Are you writing your downloads to an encrypted disk?  Array or UD?

     

    Unencrypted XFS cache (separate from the array cache).

     

    I was using UD at first (earlier part of this thread) because I never got around to moving the static non-array storage to cache when multiple caches were enabled in Unraid a few releases back. To debug and remove UD as the potential culprit of this issue, I removed all my static drive UD dependence, and ran a test with UD both installed and uninstalled and got the same error. I don't think this is a UD problem.

     

    -JesterEE

     

     

    Link to comment

    Hi all, just a short update. Replace PSU and all the drive cabling, swapped out a cache disk that was tossing errors. (amazing what you find when you really start reading the logs haha). I also set the C-state in the bios to 'off' (Ryzen CPU unraid recommendation). So far 8 hours of uptime. Fingers crossed. Running 6.11.2

    Link to comment

    Well, i don't think this problem is hardware error. So why was it working with 6.10.3 version, and suddenly after upgrading to 6.11.X the problem starts?. This has to be something with the new kernel and how it handle high I/O for some reason.
    When this happens, my cache disk is busy because of qbitorrent is not responding. All the other containers are running okay when this happening.

    So here is the thing, my torrent program are installed on the cache disk (pool devices) (2 x 250Gb SSD /mirror) fileformat BTRFS
    My torrent program are downloading everything into another disk(pool devices) 1 x 1TB SSD fileformat XFS

     

    I use UD, but only to get access to my NAS. I don't have any UD disks.

     

    image.png

    Edited by CiscoCoreX
    Link to comment

    I switched a heavier IO load back over onto my Unraid qBittorrent docker yesterday.  Last night at 2am the issue occurred again.  Same exact (to the row) error messages.  The last several times I've had this occur, I've noted that the "docker stop $(docker ps -q)" would kill every docker EXCEPT qBittorrent, which would continue in a "running" state.  I could force it closed by turning off the docker engine via the Unraid UI, but turning the docker engine back on in the same manner always presented me with an error message:

    Quote

    Warning: stream_socket_client(): unable to connect to unix:///var/run/docker.sock (No such file or directory) in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 712
    Couldn't create socket: [2] No such file or directory
    Warning: Invalid argument supplied for foreach() in /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php on line 898

     

    The only way I've been able to find to correct this issue is with a reboot.  Though admittedly I'm no command line docker expert.

     

    Thus, it looks like if I put my heart into it, I'm able to reproduce this issue with a fair amount of regularity.

     

    As far as my configuration of qBittorrent goes, its PRIMARY drives are both Unassigned Devices mounted:  a 1TB NVME for the initial download and a 12TB spinner also mounted by Unassigned Devices.  That 12TB drive is identical to the other 6 that I have mounted as part of Unraid's primary array, so my logic is that there shouldn't be any difference of hardware there causing this particular issue.  Depending on the category of torrent, however, qBittorrent may also access files on the primary Unraid array.  It isn't common, but it does have that ability.

     

    Just updating with this week's random Unraid problems on this topic...

    • Like 1
    Link to comment

    Since the best indication we have for now is that high i/o with a docker container is causing this issue anyone willing try please test with fio running in a docker:

     

    -look for the debian-bullseye container by ich777 in apps

    -add one or more paths for the storage where the torrents usually go, e.g.:

    imagem.png

     

    -after install open a terminal window to that container and type

    apt-get update
    apt-get -y install fio

     

    Fio is very flexible, you can do sequential and random write and read loads, for example for random reads and writes to a file:

     

    fio --filename=/x/test --size=100GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=1800 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1

     

    Some options than can be adjusted as you like:
     

    --filename= set the correct path/file name
    --size= file size used for testing
    --runtime= how long the test runs

     

    See here for other test examples, hopefully someone would find a test pattern that would reproduce this crash.

     

     

    Link to comment
    Quote

    Nov 11 09:11:18 Unraid  emhttpd: shcmd (3084): umount /mnt/disk5
    Nov 11 09:11:18 Unraid root: umount: /mnt/disk5: target is busy.
    Nov 11 09:11:18 Unraid  emhttpd: shcmd (3084): exit status: 32
    Nov 11 09:11:18 Unraid  emhttpd: shcmd (3085): umount /mnt/cache
    Nov 11 09:11:18 Unraid root: umount: /mnt/cache: target is busy.

     

    Since I'm 100% on crashes since I've increased the qBittorrent load, I took today's crash to dig around a bit more when trying to figure out how to cleanly shutdown things.  As I mentioned before, "docker stop $(docker ps -q)" no longer shuts my docker engine down completely - the qBittorrent docker continues to zombie.  Yesterday turning the docker engine off via the settings tab throws an error and a reboot to fix that error throws the array into parity check.  Today, instead of killing the docker engine (which apparently does zero good for my situation) I simply tried to take the array offline, which resulted in a repetitive error message above (it's repeated 100+ times during the writing of this).

     

    cache I can understand - I do write to it when I categorize a torrent into a "Movie_LongTerm" state (stored on the array).  However disk5 was news to me.  Disk 5 is the only disk with a "games torrent" directory on it.  I'm not sure if this is an indicator of specific functionality being suspect or simply that it failed randomly why trying to do games processing (raring, moving, etc).  I can say that the game torrents tend to be the most accessed and largest files on the system (some are 300-400gigs each).

     

    So while I look into @JorgeB's proposed test today, I wondered if anyone might be able to share command line commands that would be more "intrusive" in taking my array offline and/or being able to kill the qbittorrent docker WITHOUT throwing my entire system into a state of parity recheck?

     

    As a BTW:  I'm at 6.11.3 now, and this is occurs.  Not that there was any help that a random version increment would have corrected it, but still...

    Link to comment

    Others have way more knowledge than I can bring to this, but my experience with the recent behaviour:

     

    unraid 6.11.2

    Intel 3770k

    unraid GUI becomes unresponsive except for header bar.  Unsure how long after a reboot.

    Plex and -arr dockers seem to keep working, and accessible by clients and browser.

    binhex qbittorrent browser page is unresponsive.

    putty access maintained. powerdown -r doesn't work. Every other command-line attempt to shutdown or reboot fails. Eventually need to hard reboot.

    Here is the only thing I have to add that has not been mentioned in this thread, and I hope it helps give a clue to this:  from command line, htop does not run at all.

     

    Hard rebooting again now...

     

    Thanks

    III_D

    Link to comment
    1 hour ago, III_D said:

    putty access maintained. powerdown -r doesn't work. Every other command-line attempt to shutdown or reboot fails.

     

    The command to reboot is: /sbin/reboot

     

     

    As for testing results. I ran Fio for the 30mins to test it and see how it works. Test ran fine. Next I decided to run it for a 6hr test. It completed without any issue. Both times I used options posted above and only changed the runtime. But in typical fashion, randomly today when doing nothing besides seeding, had a crash on 6.11.3. Syslog shows the typical info we already know.

     

    As for more testing. I saw a package in NerdPack/NerdTools called iotop. If you know what top/htop is, its that for io disk info. Been messing with that and logging the output by running a script with the User Scripts plugin. It's basically spammed with deluge (my tor app of choice), but hoping to catch something with it when I start doing real testing with it.

     

    I'm not the best linux user so any help is welcome with the command. Here's what I'm current running. Found here.

    Quote

    iotop -botqqqk | grep -P "\d\d\.\d\d K/s"  >> /mnt/disks/Torrents/log

     

    The output is spammed with deluge since im seeding. But I'm not sure I want to exclude that from being logged.

    Quote

    22:15:31 15658 be/4 nobody    123.36 K/s    0.00 K/s  0.00 %  0.00 % python /usr/bin/deluged -c /config -L info -l /config/deluged.log [Disk]
    22:15:32 16552 be/4 nobody     77.15 K/s    0.00 K/s  0.00 %  0.00 % python /usr/bin/deluged -c /config -L info -l /config/deluged.log [Disk]
    22:15:32 22698 be/4 nobody    123.44 K/s    0.00 K/s  0.00 %  0.00 % python /usr/bin/deluged -c /config -L info -l /config/deluged.log [Disk]
    22:15:32 16568 be/4 nobody    138.87 K/s    0.00 K/s  0.00 %  0.00 % python /usr/bin/deluged -c /config -L info -l /config/deluged.log [Disk]

     

    But if I wanted to exclude the deluge info, how would I go about doing that.

    Link to comment

    I think a secondary pipe to grep with -v (for inverse) may be the easiest way.  I am sure someone else could offer something a lot more though out :)

     

    iotop -botqqqk | grep -P "\d\d\.\d\d K/s" | grep -v deluge  >> /mnt/disks/Torrents/log

     

    • Like 1
    Link to comment
    On 11/10/2022 at 1:23 PM, JorgeB said:

    Since the best indication we have for now is that high i/o with a docker container is causing this issue anyone willing try please test with fio running in a docker:

     

    Still running 6.11.2, but I'll run this test for a couple days and see what happens w/o my torrent app on. Since @ShadyDeth reported the crash still happens in 6.11.3 and nothing concerning this issue has changed, we'll see what we see.

    Link to comment
    On 11/14/2022 at 11:06 AM, JesterEE said:

     

    Still running 6.11.2, but I'll run this test for a couple days and see what happens w/o my torrent app on. Since @ShadyDeth reported the crash still happens in 6.11.3 and nothing concerning this issue has changed, we'll see what we see.

     

    I was playing with fio some and I settled on this command to test the system for random read/writes to emulate how I think a torrent client should perform while simultaneously downloading/seeding multiple files:

     

    fio --directory=/torrents --name=iops-test-job --ioengine=libaio --rw=randrw --bs=4k --iodepth=256 --direct=1 --group_reporting --eta-newline=1 --end_fsync=1 --time_based --size=2GB --numjobs=25 --runtime=86400
    

     

    This creates 25 2GB files (50GB total) in the /torrents directory (must be mapped in the docker template) and tests random read/write performance for the duration of runtime (in seconds).  I used timeanddate.com to create meaningful durations for my schedule.

     

    I'll run this for a few days and ... we'll see what we see. 😉

     

    If this turns up a big nothing-burger I think the next logical test would be the docker network interface.  I have never done this but found that either netstress or iperf may make 2 good candidates. Does anyone have experience with these tools or Linux network stress testing in general?

    Edited by JesterEE
    Typo
    Link to comment
    On 11/15/2022 at 12:58 PM, JesterEE said:

     

    fio --directory=/torrents --name=iops-test-job --ioengine=libaio --rw=randrw --bs=4k --iodepth=256 --direct=1 --group_reporting --eta-newline=1 --end_fsync=1 --time_based --size=2GB --numjobs=25 --runtime=86400
    

    I'll run this for a few days and ... we'll see what we see. 

     

    48 hours of continuous RW and still not triggering the error. 24 more and I'm ganna call it a pass.

     

    Still looking for insight on network stress testing. 

    • Like 1
    Link to comment

    Okay all - crashes continue.  Logs have stayed the same - but the NATURE of the crash has changed.  I'm not sure when this began happening, but if I had to speculate it happened with the upgrade to 6.11.3.

     

    Previously, my system interface would hang - no UI, nothing.  I could still SSH into the box and run diagnostics and navigate, but I couldn't do anything else.  I understand from others this was their functionality as well.  Performing a "docker stop $(docker ps -q)" would kill all the dockers and allow me to restart without a reboot.

     

    Now, the only thing that hangs on the system is my torrent docker (in my case, binhex-qbittorrentvpn).  The UI continues to be accessible as well as SSH.  This would seem to be a good sign, however "docker stop $(docker ps -q)" no longer stops all my dockers - it leaves binhex-qbittorrentvpn "running" (zombie).  Nothing I've found has been able to fix this, short of a reboot.  For instance, I can run a "/etc/rc.d/rc.docker stop", which won't succeed.  a "/etc/rc.d/rc.docker force_stop" does work, but upon executing a "/etc/rc.d/rc.docker start" qbittorrent looks like it is running but running a "docker logs binhex-qbittorrentvpn" shows no new entries from the last.

     

    In short, while the UI now remains functional and the rest of the system can be accessed (apparently as normal), there is no fixing the qbittorrent docker without a reboot.  It's unclear to me as to what rebooting does to correct the issue as I have nothing else to go on.

     

    I say all this to sadly say I know not everyone here is using qbittorrent while experiencing this issue, thus my findings are likely in some way unique to my config.  There is some sliver of hope in that upon trying to start the docker engine back up, I receive the error message "Error response from daemon: failed to allocate secondary ip address (server:192.168.3.1): Address already in use".  That's not a subnet that I've been using, so maybe there's some cross-over that I'm not seeing.  I'm safe to disable/delete it, so I'll turn that knob and see if it affects anything.

     

    Sorry, not much of an update from my situation.

    Link to comment

    Well, my iotop logging test has left me with nothing to show for it. Got the crash

     

    Quote

    Nov 17 22:16:52 Impulse kernel: BUG: kernel NULL pointer dereference, address: 0000000000000076

     

    But my logging script stopped logging all info 6 minutes before that.

     

    Quote

    22:01:04 28360 be/4 root        0.00 K/s  194.84 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:01:04 25324 be/4 root        0.00 K/s   45.85 K/s  0.00 %  0.00 % [kworker/u64:7-events_power_efficient]
    22:01:04  8153 be/4 root        0.00 K/s   76.41 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:01:04  7573 be/4 root        0.00 K/s   30.56 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:01:35  8126 be/4 root        0.00 K/s  137.80 K/s  0.00 %  0.00 % [btrfs-transaction]
    22:04:14  5311 be/4 root        0.00 K/s   45.84 K/s  0.00 %  0.01 % [kworker/u64:0-events_power_efficient]
    22:04:14 28360 be/4 root        0.00 K/s  191.01 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:04:14  8153 be/4 root        0.00 K/s   30.56 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:04:14  7573 be/4 root        0.00 K/s   76.40 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:04:45  8126 be/4 root        0.00 K/s  122.21 K/s  0.00 %  0.00 % [btrfs-transaction]
    22:04:54  5311 be/4 root       26.77 K/s    0.00 K/s  0.00 %  0.00 % [kworker/u64:0-btrfs-endio]
    22:05:00 30753 be/4 root        0.00 K/s   91.76 K/s  0.00 %  0.00 % [kworker/u64:6-btrfs-endio-write]
    22:05:00 32632 be/4 root        0.00 K/s  175.88 K/s  0.00 %  0.00 % [kworker/u64:1-mlx4_en]
    22:05:25 32632 be/4 root        0.00 K/s   61.21 K/s  0.00 %  0.00 % [kworker/u64:1-mlx4_en]
    22:05:31 32632 be/4 root        0.00 K/s   84.21 K/s  0.00 %  0.15 % [kworker/u64:1-writeback]
    22:05:31  5311 be/4 root        0.00 K/s   30.62 K/s  0.00 %  0.01 % [kworker/u64:0-events_power_efficient]
    22:05:31 30753 be/4 root        0.00 K/s  933.93 K/s  0.00 %  0.00 % [kworker/u64:6-btrfs-endio-write]
    22:05:31  8839 be/4 root        0.00 K/s  244.96 K/s  0.00 %  0.00 % [btrfs-transaction]
    22:05:31  8126 be/4 root        0.00 K/s  214.34 K/s  0.00 %  0.00 % [btrfs-transaction]
    22:06:01  8126 be/4 root        0.00 K/s  183.61 K/s  0.00 %  0.00 % [btrfs-transaction]
    22:07:23 30753 be/4 root        0.00 K/s   45.88 K/s  0.00 %  0.00 % [kworker/u64:6-btrfs-endio-write]
    22:07:23 28360 be/4 root        0.00 K/s   30.59 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:07:23 28464 be/4 root        0.00 K/s  191.19 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:07:23  8531 be/4 root        0.00 K/s   76.47 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:07:55  8126 be/4 root        0.00 K/s  138.04 K/s  0.00 %  0.00 % [btrfs-transaction]
    22:10:34 30753 be/4 root        0.00 K/s   45.86 K/s  0.00 %  0.00 % [kworker/u64:6-mlx4_en]
    22:10:34 28360 be/4 root        0.00 K/s  194.91 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o default_permissions,allow_other,noatime -o remember=0
    22:10:34  8152 be/4 root        0.00 K/s   76.44 K/s  0.00 %  0.00 % shfs /mnt/user -disks 8191 -o

     

    Everything here is normal info that was logged numerous times throughout the logging.

     

    6 hours ago, sundown said:

    Now, the only thing that hangs on the system is my torrent docker (in my case, binhex-qbittorrentvpn).

     

    Got a question for you @sundown. I would assume you're using the vpn function of that container. Are you using openvpn or wireguard?

     

    Also @JesterEE, @CiscoCoreX@III_D and anyone else reading this having the issue, could you give a little more info on your torrent container and whether you're using the built in vpn (openvpn or wireguard) or using the "VPN Manager" built into Unraid if using a vpn at all.

     

    Just trying to look at changes from 6.10.3 to 6.11.x. There were wireguard changes, but I use built in version in "binhex-delugevpn", so im not sure it has anything to do with that. Also the docker version was upgraded from 20.10.14 to 20.10.17 in 6.11.0 and 20.10.18 in 6.11.1. Is there anyway to downgrade to docker 20.10.14 and for testing? Would taking the docker.img from /mnt/user/system/docker/docker.img on 6.10.3 and copying it to 6.11.x work?

    Link to comment
    8 hours ago, ShadyDeth said:

    Got a question for you @sundown. I would assume you're using the vpn function of that container. Are you using openvpn or wireguard?

    @ShadyDeth Thanks for your continued analysis!  Yes, I am using the VPN functionality of the binhex-qbittorrentvpn docker...the wireguard portion.  I maintain an OpenVPN container (ich777's OpenVPN-Client) for NZBGet, etc.  Additionally, I use the VPN Manager Tunnel's Wireguard so my son at college has access to some sandbox VMs with some oomph behind them.

     

    8 hours ago, ShadyDeth said:

    Also the docker version was upgraded from 20.10.14 to 20.10.17 in 6.11.0 and 20.10.18 in 6.11.1. Is there anyway to downgrade to docker 20.10.14 and for testing? Would taking the docker.img from /mnt/user/system/docker/docker.img on 6.10.3 and copying it to 6.11.x work?

    I'm a big fan of researching docker engine changes, and would love to know the answer to this!

    Link to comment
    23 hours ago, Altwazar said:

    I don't use unraid, but same problem occurs with heavy usage and qbittorrent with libtorrent-rasterbar version 2 (https://github.com/arvidn/libtorrent/issues/6952). Qbittorrent compiled with version 1.2 don't solved issue for me.

     

    Thank you very much for joining our community forum just to let us know that this is repeatable on other Linux systems and it's an application error rather than a kernel error!

    Link to comment
    5 minutes ago, JesterEE said:

    Thank you very much for joining our community forum just to let us know that this is repeatable on other Linux systems

    Yes, thank you @Altwazar, everyone having this specific call trace with Unraid is using Qbittorrent or is there anyone not using it and still having this issue?

    Link to comment

    At minimum I believe @ShadyDeth mentioned he was using Deluge with this problem.  Though it could be related to it also leveraging the libtorrent 2.x libraries in the same manner.  I'm not terribly familiar with Deluge unfortunately.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.