• Crashes since updating to v6.11.x for qBittorrent and Deluge users


    JorgeB
    • Closed

    EDIT: issue was traced to libtorrent 2.x, it's not an Unraid problem, more info in this post:

     

    https://forums.unraid.net/bug-reports/stable-releases/crashes-since-updating-to-v611x-for-qbittorrent-and-deluge-users-r2153/?do=findComment&comment=21671

     

     

    Original Post:

     

    I'm creating this to better track an issue that some users have been reporting where Unraid started crashing after updating to v6.11.x (it happens with both 6.11.0 and 6.11.1), there's a very similar call traced logged for all cases, e.g:

     

    Oct 12 04:18:27 zaBOX kernel: BUG: kernel NULL pointer dereference, address: 00000000000000b6
    Oct 12 04:18:27 zaBOX kernel: #PF: supervisor read access in kernel mode
    Oct 12 04:18:27 zaBOX kernel: #PF: error_code(0x0000) - not-present page
    Oct 12 04:18:27 zaBOX kernel: PGD 0 P4D 0
    Oct 12 04:18:27 zaBOX kernel: Oops: 0000 [#1] PREEMPT SMP PTI
    Oct 12 04:18:27 zaBOX kernel: CPU: 4 PID: 28596 Comm: Disk Tainted: P     U  W  O      5.19.14-Unraid #1
    Oct 12 04:18:27 zaBOX kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS PRO WIFI/Z390 AORUS PRO WIFI-CF, BIOS F12 11/05/2021
    Oct 12 04:18:27 zaBOX kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21
    Oct 12 04:18:27 zaBOX kernel: Code: e8 8e 61 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 9e 9b 64 00 48 81 c4 88 00 00 00 5b c3 cc cc cc cc <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb
    Oct 12 04:18:27 zaBOX kernel: RSP: 0000:ffffc900070dbcc0 EFLAGS: 00010246
    Oct 12 04:18:27 zaBOX kernel: RAX: 0000000000000082 RBX: 0000000000000082 RCX: 0000000000000082
    Oct 12 04:18:27 zaBOX kernel: RDX: 0000000000000001 RSI: ffff888757426fe8 RDI: 0000000000000082
    Oct 12 04:18:27 zaBOX kernel: RBP: 0000000000000000 R08: 0000000000000028 R09: ffffc900070dbcd0
    Oct 12 04:18:27 zaBOX kernel: R10: ffffc900070dbcd0 R11: ffffc900070dbd48 R12: 0000000000000000
    Oct 12 04:18:27 zaBOX kernel: R13: ffff88824f95d138 R14: 000000000007292c R15: ffff88824f95d140
    Oct 12 04:18:27 zaBOX kernel: FS:  000014ed38204b38(0000) GS:ffff8888a0500000(0000) knlGS:0000000000000000
    Oct 12 04:18:27 zaBOX kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Oct 12 04:18:27 zaBOX kernel: CR2: 00000000000000b6 CR3: 0000000209854005 CR4: 00000000003706e0
    Oct 12 04:18:27 zaBOX kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Oct 12 04:18:27 zaBOX kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Oct 12 04:18:27 zaBOX kernel: Call Trace:
    Oct 12 04:18:27 zaBOX kernel: <TASK>
    Oct 12 04:18:27 zaBOX kernel: __filemap_get_folio+0x98/0x1ff
    Oct 12 04:18:27 zaBOX kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a
    Oct 12 04:18:27 zaBOX kernel: filemap_fault+0x6e/0x524
    Oct 12 04:18:27 zaBOX kernel: __do_fault+0x2d/0x6e
    Oct 12 04:18:27 zaBOX kernel: __handle_mm_fault+0x9a5/0xc7d
    Oct 12 04:18:27 zaBOX kernel: handle_mm_fault+0x113/0x1d7
    Oct 12 04:18:27 zaBOX kernel: do_user_addr_fault+0x36a/0x514
    Oct 12 04:18:27 zaBOX kernel: exc_page_fault+0xfc/0x11e
    Oct 12 04:18:27 zaBOX kernel: asm_exc_page_fault+0x22/0x30
    Oct 12 04:18:27 zaBOX kernel: RIP: 0033:0x14ed3a0ae7b5
    Oct 12 04:18:27 zaBOX kernel: Code: 8b 48 08 48 8b 32 48 8b 00 48 39 f0 73 09 48 8d 14 08 48 39 d6 eb 0c 48 39 c6 73 0b 48 8d 14 0e 48 39 d0 73 02 0f 0b 48 89 c7 <f3> a4 66 48 8d 3d 59 b7 22 00 66 66 48 e8 d9 d8 f6 ff 48 89 28 48
    Oct 12 04:18:27 zaBOX kernel: RSP: 002b:000014ed38203960 EFLAGS: 00010206
    Oct 12 04:18:27 zaBOX kernel: RAX: 000014ed371aa160 RBX: 000014ed38203ad0 RCX: 0000000000004000
    Oct 12 04:18:27 zaBOX kernel: RDX: 000014c036530000 RSI: 000014c03652c000 RDI: 000014ed371aa160
    Oct 12 04:18:27 zaBOX kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000014ed38203778
    Oct 12 04:18:27 zaBOX kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
    Oct 12 04:18:27 zaBOX kernel: R13: 000014ed38203b40 R14: 000014ed384fe940 R15: 000014ed38203ac0
    Oct 12 04:18:27 zaBOX kernel: </TASK>
    Oct 12 04:18:27 zaBOX kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net vhost vhost_iotlb tap tun veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs md_mod kvmgt mdev i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper intel_gtt agpgart hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls ipv6 nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) x86_pkg_temp_thermal intel_powerclamp drm_kms_helper btusb btrtl i2c_i801 btbcm coretemp gigabyte_wmi wmi_bmof intel_wmi_thunderbolt mxm_wmi kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd
    Oct 12 04:18:27 zaBOX kernel: btintel rapl intel_cstate intel_uncore e1000e i2c_smbus bluetooth drm nvme nvme_core ahci i2c_core libahci ecdh_generic ecc syscopyarea sysfillrect input_leds sysimgblt led_class joydev nzxt_kraken2 intel_pch_thermal fb_sys_fops thermal fan video tpm_crb wmi tpm_tis backlight tpm_tis_core tpm acpi_pad button unix
    Oct 12 04:18:27 zaBOX kernel: CR2: 00000000000000b6
    Oct 12 04:18:27 zaBOX kernel: ---[ end trace 0000000000000000 ]---

     

    Another example with very different hardware:

    Oct 11 21:32:08 Impulse kernel: BUG: kernel NULL pointer dereference, address: 0000000000000056
    Oct 11 21:32:08 Impulse kernel: #PF: supervisor read access in kernel mode
    Oct 11 21:32:08 Impulse kernel: #PF: error_code(0x0000) - not-present page
    Oct 11 21:32:08 Impulse kernel: PGD 0 P4D 0
    Oct 11 21:32:08 Impulse kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
    Oct 11 21:32:08 Impulse kernel: CPU: 1 PID: 5236 Comm: Disk Not tainted 5.19.14-Unraid #1
    Oct 11 21:32:08 Impulse kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING II, BIOS 4301 03/04/2021
    Oct 11 21:32:08 Impulse kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21
    Oct 11 21:32:08 Impulse kernel: Code: e8 8e 61 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 9e 9b 64 00 48 81 c4 88 00 00 00 5b e9 cc 5f 86 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb
    Oct 11 21:32:08 Impulse kernel: RSP: 0000:ffffc900026ffcc0 EFLAGS: 00010246
    Oct 11 21:32:08 Impulse kernel: RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000022
    Oct 11 21:32:08 Impulse kernel: RDX: 0000000000000001 RSI: ffff88801e450b68 RDI: 0000000000000022
    Oct 11 21:32:08 Impulse kernel: RBP: 0000000000000000 R08: 000000000000000c R09: ffffc900026ffcd0
    Oct 11 21:32:08 Impulse kernel: R10: ffffc900026ffcd0 R11: ffffc900026ffd48 R12: 0000000000000000
    Oct 11 21:32:08 Impulse kernel: R13: ffff888428441cb8 R14: 00000000000028cd R15: ffff888428441cc0
    Oct 11 21:32:08 Impulse kernel: FS:  00001548d34fa6c0(0000) GS:ffff88842e840000(0000) knlGS:0000000000000000
    Oct 11 21:32:08 Impulse kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Oct 11 21:32:08 Impulse kernel: CR2: 0000000000000056 CR3: 00000001a3fe6000 CR4: 00000000003506e0
    Oct 11 21:32:08 Impulse kernel: Call Trace:
    Oct 11 21:32:08 Impulse kernel: <TASK>
    Oct 11 21:32:08 Impulse kernel: __filemap_get_folio+0x98/0x1ff
    Oct 11 21:32:08 Impulse kernel: filemap_fault+0x6e/0x524
    Oct 11 21:32:08 Impulse kernel: __do_fault+0x30/0x6e
    Oct 11 21:32:08 Impulse kernel: __handle_mm_fault+0x9a5/0xc7d
    Oct 11 21:32:08 Impulse kernel: handle_mm_fault+0x113/0x1d7
    Oct 11 21:32:08 Impulse kernel: do_user_addr_fault+0x36a/0x514
    Oct 11 21:32:08 Impulse kernel: exc_page_fault+0xfc/0x11e
    Oct 11 21:32:08 Impulse kernel: asm_exc_page_fault+0x22/0x30
    Oct 11 21:32:08 Impulse kernel: RIP: 0033:0x1548dbc04741
    Oct 11 21:32:08 Impulse kernel: Code: 48 01 d0 eb 1b 0f 1f 40 00 f3 0f 1e fa 48 39 d1 0f 82 73 28 fc ff 0f 1f 00 f3 0f 1e fa 48 89 f8 48 83 fa 20 0f 82 af 00 00 00 <c5> fe 6f 06 48 83 fa 40 0f 87 3e 01 00 00 c5 fe 6f 4c 16 e0 c5 fe
    Oct 11 21:32:08 Impulse kernel: RSP: 002b:00001548d34f9808 EFLAGS: 00010202
    Oct 11 21:32:08 Impulse kernel: RAX: 000015480c010d30 RBX: 000015480c018418 RCX: 00001548d34f9a40
    Oct 11 21:32:08 Impulse kernel: RDX: 0000000000004000 RSI: 000015471f8cd50f RDI: 000015480c010d30
    Oct 11 21:32:08 Impulse kernel: RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
    Oct 11 21:32:08 Impulse kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
    Oct 11 21:32:08 Impulse kernel: R13: 00001548d34f9ac0 R14: 0000000000000003 R15: 0000154814013d10
    Oct 11 21:32:08 Impulse kernel: </TASK>
    Oct 11 21:32:08 Impulse kernel: Modules linked in: xt_connmark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc ipv6 mlx4_en mlx4_core igb i2c_algo_bit edac_mce_amd edac_core kvm_amd kvm wmi_bmof mxm_wmi asus_wmi_sensors crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas aesni_intel crypto_simd nvme cryptd ahci i2c_piix4 raid_class rapl k10temp i2c_core nvme_core ccp scsi_transport_sas libahci wmi button acpi_cpufreq unix [last unloaded: mlx4_core]
    Oct 11 21:32:08 Impulse kernel: CR2: 0000000000000056
    Oct 11 21:32:08 Impulse kernel: ---[ end trace 0000000000000000 ]---

     

    So they always start with this (end address will change):

     

    Oct 11 05:02:02 Cogsworth kernel: BUG: kernel NULL pointer dereference, address: 0000000000000076

     

    and always have this:

     

    Oct 11 05:02:02 Cogsworth kernel: Call Trace:
    Oct 11 05:02:02 Cogsworth kernel: <TASK>
    Oct 11 05:02:02 Cogsworth kernel: __filemap_get_folio+0x98/0x1ff

     

    The fact that it's happening to various users with very different hardware, both Intel and AMD, makes me think it's not a hardware/firmware issue, so we can try to find if they are running anything in common, these are the plugins I've found in common between the 4 or 5 cases found so far, these are some of the most used plugins so not surprising they are installed in all but it's also easy to rule them out:

     

    ca.backup2.plg - 2022.07.23  (Up to date)
    community.applications.plg - 2022.09.30  (Up to date)
    dynamix.active.streams.plg - 2020.06.17  (Up to date)
    file.activity.plg - 2022.08.19  (Up to date)
    fix.common.problems.plg - 2022.10.09  (Up to date)
    unassigned.devices.plg - 2022.10.03  (Up to date)
    unassigned.devices-plus.plg - 2022.08.19  (Up to date)

     

    So anyone having this issue try temporarily uninstalling/disabling these plugin to see if there's any difference.

    • Like 2
    • Upvote 1



    User Feedback

    Recommended Comments



    I'll chime in that I'm having the exact same issues. Was running 6.92 with a year of up time. Did some hardware upgrades and had almost 40 days uptime with a couple lock ups and decided to update after some troubleshooting. Updated to 6.11.1 a week ago I started getting the soft lock where I could only see the unraid banner in the webgui. I can only hard reboot and it seems to lock up after 2-3 days every time now. Keyboard is also unresponsive plugged directly into server after this happens.

     

    Everything seems to function except Binhex-Qbittorrent from what I can tell. I've enabled Pre-allocated disk space and removed the following plug ins to see if it makes any difference.

     

    Unassigned Devices

    Unassigned Devices Plus

    File Activity

     

    Link to comment

    Ok, I removed both plugins below and updating Binhex-DelugeVPN to pre-allocate.  I don't want to delete CA Backup as that is more useful/needed.  If I still lockup, I will try that.

     

    dynamix.active.streams.plg - 2020.06.17 (Up to date)

    file.activity.plg - 2022.08.19 (Up to date)

    Link to comment

    Since disabling the 7 plugins that were found to be in common with most users having this issue, I let my server run for 24hrs with no issues before reinstalling each plugin 1 by 1 and letting the server run for a full 24hrs before installing another one.

     

    • Oct 12th: Full 24hrs with none of the 7 plugins installed
    • Oct 13th: Reinstalled Community Applications (community.applications.plg). Ran full 24hrs, no issues.
    • Oct 14th: Reinstalled CA Backup / Restore Appdata (ca.backup2.plg). Ran full 24hrs, no issues.
    • Oct 15th: Reinstalled Fix Common Problems (fix.common.problems.plg). Ran full 24hrs, no issues.
    • Oct 16th: Reinstalled Unassigned Devices (unassigned.devices.plg). Installed at 1pm EST on the 16th, server crashed at 10am EST on Oct 17th.

     

    This could just be a coincidence and be one of the other plugins causing the problem, But it's looking like it might be Unassigned Devices.

     

    I have one 500GB ssd on Unassigned Devices for saving downloaded torrents onto via binhex-delugevpn.


    Running binhex-delugevpn using wireguard vpn option. The "Pre-allocate disk space" was turned off but I have now enabled it to see if that might help the issue somehow from other posts above.

     

    Just for testing purposes I'm going to uninstall the 3 plugins I installed before Unassigned Devices to see if I get another crash within a few days time.

     

    Including my diagnostics from the crash today.

    impulse-diagnostics-20221017-1545.zip

    • Like 2
    Link to comment

    Thanks for the report @ShadyDeth.  I did NOT remove unassigned devices only because I consider it fairly critical.  My error is identical in almost every way to those posted in this thread - and yet here I am on day 5 of no crashes after removing a few plugins I documented from a few posts up.  I even let qbittorrent run full-bore last night thinking it may just be a load issue - but to my surprise, it was still running this morning.  Maybe I'm just getting lucky.  Another few days should tell the tale...though I haven't been able to run consecutively for 5 days straight since coming off of 9.6.x...

     

    Here's to no crashes...

    Edited by sundown
    Link to comment
    1 hour ago, sundown said:

    Thanks for the report @ShadyDeth.  I did NOT remove unassigned devices only because I consider it fairly critical.  My error is identical in almost every way to those posted in this thread - and yet here I am on day 4 of no crashes after removing a few plugins I documented from a few posts up.  I even let qbittorrent run full-bore last night thinking it may just be a load issue - but to my surprise, it was still running this morning.  Maybe I'm just getting lucky.  Another few days should tell the tale...though I haven't been able to run consecutively for 4 days straight since coming off of 9.6.x...

     

    Here's to no crashes...

     

    I had another crash today.  

     

    Previous troubleshooting steps:

    Changed Docker to ipvlan

    Uninstalled dynamix.active.streams

    Uninstalled file.activity

    Set DelugeVPN to pre-allocate

     

    Steps taken since the crash.

    Uninstalled CA Backup / Restore (ca.backup2)

     

    Lets see how long I can run it now without crashing.  if it crashes again, I may just revert back to 6.10.x since I don't think there is anything else I haven't tried mentioned in this thread besides uninstalling UD which is not an option with my setup.

     

    Attaching additional diagnostics from today.

    debo-server-diagnostics-20221017-1336.zip

    • Like 1
    Link to comment
    On 10/12/2022 at 4:44 PM, binhex said:

    As you can probably see from my list of plugins i only have the following installed from your list of commonly installed plugins, which does shorten the list quite a lot, although i am only a metric of 1 so far so take it with a large pinch of salt 🙂


    community.applications.plg - 2022.09.30 (Up to date)
    unassigned.devices.plg - 2022.10.03 (Up to date)
    unassigned.devices-plus.plg - 2022.08.19 (Up to date)

     

    I am loathed to uninstall CA for obvious reasons, but i have nuked UD, so we shall see what happens.

     

    On 10/13/2022 at 11:14 AM, binhex said:

    i was using macvlan for one docker container, but have switched it to ipvlan now.

     

    @ich777 thanks for the pm.

    no crashes for me since doing the above, uptime 4 days 20 hours and counting, so MAYBE (a big maybe) it is related to Unassigned Devices (uninstalled after first (and only) crash as mentioned above), luckily for me i don't rely on UD, i only had it installed as i was previously playing with pre-clear on a USB connected drive.

    Link to comment
    7 minutes ago, binhex said:

     

    no crashes for me since doing the above, uptime 4 days 20 hours and counting, so MAYBE (a big maybe) it is related to Unassigned Devices (uninstalled after first (and only) crash as mentioned above), luckily for me i don't rely on UD, i only had it installed as i was previously playing with pre-clear on a USB connected drive.

     

    I hate that a pattern isn't necessarily emerging here - programmatically it doesn't make a whole lot of sense to me.  Unassigned Devices was really one of the only plugins I DIDN'T remove, and I'm now sitting at an uptime of 6d3h from doing nothing else after having 14 consecutive crashes.  I did have qbittorrent stop seeding torrents a few times, but the container didn't lock and the system stayed responsive and I was able to simply restart the container via the Docker interface and it's been running along just fine since. 

     

    There are new versions of Fix Common and the Community Apps plugin available.  I'm hesitant to update just because they would complicate/nullify the testing of this issue...  I checked the logs for both, and nothing mention any changes that would seemingly impact this situation either.

    Link to comment

    Crashed again last night.  I bit the bullet and reconfigured everything that was using UD and uninstalled UD/UD+ plugins.

     

    We'll see....

    Link to comment
    11 hours ago, binhex said:

     

    no crashes for me since doing the above, uptime 4 days 20 hours and counting, so MAYBE (a big maybe) it is related to Unassigned Devices (uninstalled after first (and only) crash as mentioned above), luckily for me i don't rely on UD, i only had it installed as i was previously playing with pre-clear on a USB connected drive.

     

    I have to agree. I only did 2 things on this cycle which currently has an uptime of 2 days 9 hours.

    1. Switched docker macvlan -> ipvlan (as noted above in another comment, this doesn't seem to be it)
    2. Moved all my drives off of Unassigned Devices and made them Pool Devices.

     

    All my plugins remain installed and fully updated as of this posts timestamp (i.e. all the ones @JorgeB listed in the OP and more).

     

    I'm thinking it was #2 and something is going on when interfacing UD with a lot of IO traffic (like torrenting) on the 6.11 series. I was previously using one of my UD drives for torrent seeding using LinuxServer.io's deluge container and the Gluetun VPN client docker container. I still use the same containers but moved my UD xfs formatted drive to a pool device (which, for those of you that haven't done this yet, will NOT destroy your data if you don't change the filesystem) and have not had another crash since. I also was using UD for my VM drive, Plex database, and scratch drive so UD was previously doing a lot of heavy lifting. But I really think it was the torrent traffic that did it in because I was often able to access the other features of my server (VMs, plex and docker containers) that were utilizing UD AFTER the effects from the OP were noted.

     

    I'm going to let it sit for another 15 hours or so and then switch back to docker macvlan to unfortunately, really point the finger at UD.

     

    -JesterEE

    • Like 3
    Link to comment

    In my case I don't use UD like most of you do. I only use it for mounting remote share.
    3 days uptime now.
    I have a dedicated disk for only downloads that qbittorrent are using. Is in my Pool Devices.

    The the cache pool, i have appdata, where my qbittorrent are installed.

    When this crash happens, i can't kill/restart qbittorrent. And I can't stop the Array, because it says cache disk is busy.....

    So i always have to do a soft reboot of my unraid server.

     

    Oct 13 05:58:09 Pegasus  emhttpd: shcmd (835): umount /mnt/cache
    Oct 13 05:58:09 Pegasus root: umount: /mnt/cache: target is busy.
    Oct 13 05:58:09 Pegasus  emhttpd: shcmd (835): exit status: 32
    Oct 13 05:58:09 Pegasus  emhttpd: Retry unmounting disk share(s)...
    Oct 13 05:58:14 Pegasus  emhttpd: Unmounting disks...
    Oct 13 05:58:14 Pegasus  emhttpd: shcmd (836): umount /mnt/cache
    Oct 13 05:58:14 Pegasus root: umount: /mnt/cache: target is busy.
    Oct 13 05:58:14 Pegasus  emhttpd: shcmd (836): exit status: 32
    Oct 13 05:58:14 Pegasus  emhttpd: Retry unmounting disk share(s)...
    Oct 13 05:58:19 Pegasus  emhttpd: Unmounting disks...
    Oct 13 05:58:19 Pegasus  emhttpd: shcmd (837): umount /mnt/cache

     

    What I did:

    - Made a script to restart my qbittorrent everyday at the same time. (not sure if this helps

    - Enable "Pre-allocate disk space for all files" in my qbittorrent

    - Did some other changes in advance tab also. https://imgur.com/a/3dsQ3gF  What i notice in the advance tab. I can't change noting for  Disk IO - types, read mode of write mode. Even if you change something there, it's just get blanked out like you see in the image. Not sure if this has something to do with this crash problem.

    - Switched docker macvlan -> ipvlan

     

    image.png

    Edited by CiscoCoreX
    • Like 1
    Link to comment

     Not easy to find the cause so far but please keep posting the things you are trying to see if we can get to a conclusion, I think it's our best bet for now.

    • Like 2
    Link to comment
    12 hours ago, JesterEE said:

    I'm going to let it sit for another 15 hours or so and then switch back to docker macvlan to unfortunately, really point the finger at UD.

     

    Welp ... glad I waited that extra 15 hours because it happened again just now.

     

    Attached diagnostics for those interested.

     

    After my restart, I'll be uninstalling Unassigned Devices and reverting to docker macvlan.

     

    -JesterEE

     

     

    cogsworth-diagnostics-20221019-1156.zip

    Link to comment
    9 hours ago, JorgeB said:

     Not easy to find the cause so far but please keep posting the things you are trying to see if we can get to a conclusion, I think it's our best bet for now.

    You are right, maybe bad idea to make that restart script. I disable it now :)

     

    What I did:

    - Enable "Pre-allocate disk space for all files" in my qbittorrent

    - Switched docker macvlan -> ipvlan

     

    - Still running plugins:

     

    1. fix.common.problems.plg ... 2022.10.17
    2. community.applications.plg ... 2022.10.16
    3. user.scripts.plg ... 2022.08.01
    4. unassigned.devices.plg ... 2022.10.12
    5. unassigned.devices-plus.plg ... 2022.08.19
    6. page.notes.plg ... 2021.07.17
    7. open.files.plg ... 2022.08.19
    8. nvidia-driver.plg ... 2022.10.05
    9. nut.plg ... 2022.03.20
    10. intel-gpu-top.plg ... 2022.09.27
    11. gpustat.plg ... 2022.02.22
    12. dynamix.unraid.net.plg ...
    13. dynamix.system.temp.plg ... 2022.09.16b
    14. dynamix.system.stats.plg ... 2022.05.20a
    15. dynamix.system.buttons.plg ...2022.06.20
    16. dynamix.file.manager.plg ... 2022.09.07
    17. docker.folder.plg ... 2022.09.24
    18. corsairpsu.plg ... 2021.10.05
    19. corefreq.plg ... 2022.07.21 DISABLE
    20. ca.mover.tuning.plg ... 2022.04.13 DISABLE
    21. ca.backup2.plg 2022.07.23

     

    Link to comment

    I have been crash free for 36 hours now after uninstalling Unassigned Devices plugins.  I wasn't able to go 24 hours in uptime since I installed 6.11.1.

     

    I will report back if I see another crash. 

    • Like 1
    Link to comment
    10 hours ago, JesterEE said:

    After my restart, I'll be uninstalling Unassigned Devices and reverting to docker macvlan.

     

     

    Crashed again after 8 hours. Diagnostics attached (it's the same though)

     

    This time, before restarting dirty I wanted to try to get back to my webUI. Still couldn't, but since I can ssh in, I decided to try and stop all my dockers with command:

     

    docker stop $(docker ps -q)

     

    This hung for a moment but worked. The only container it couldn't stop is my torrent container (deluge). But after I stopped all the others, I was able to get back to my local WebUI and proceed with a normal shutdown. This is kinda weird to me, because I don't think that should have made a difference to the Unraid web backend, but I'm not going to think too much into it; it's all weird right now.

     

    When I stopped my array (yes, I usually manually stop my array before a shutdown to see if I can catch bad behavior like this), it hung unmounting the cache drive (where my docker appdata resides), so while I was eventually able to issue a shut down command, I'm pretty sure it wasn't as graceful as it should be (hitting the timeout period and triggering a hard poweroff). I believe something in docker was holding onto that mount from the deluge container being hung.

     

    Also notable, I was in the middle of 'yet another' parity check. I was able to see it got to 22.2% and was still chugging along like nothing went wrong.

     

    This leaves me to believe it's some Unraid/docker edge case and not a plugin interaction at all.  Docker was updated between 6.10.3 and 6.11.1 so maybe something isn't quite working right on that release:

    • 6.10.3 - docker: version 20.10.17 (CVE-2022-29526 CVE-2022-30634 CVE-2022-30629 CVE-2022-30580 CVE-2022-29804 CVE-2022-29162 CVE-2022-31030)
    • 6.11.1: docker: version 20.10.18 (CVE-2022-27664 CVE-2022-32190 CVE-2022-36109)

     

    I'm going to restart, remove my deluge docker (but keep the appdata of course!) and reinstall Unassigned Devices. If this doesn't work, I think I'm going to head back to a stable 6.10.3 till LimeTech squares this off. If it does, I'm not sure what my next step will be (suggestions?).

     

    -JesterEE

     

    cogsworth-diagnostics-20221019-2143.zip

    Edited by JesterEE
    Update log with most relevant and more info.
    • Like 1
    Link to comment

    @JesterEE
    I see we have the same problems.  And same behave with torrent program hang and busy cache disk.

     

    Oct 19 21:42:31 Cogsworth root: umount: /mnt/cache: target is busy.
    Oct 19 21:42:31 Cogsworth  emhttpd: shcmd (29660): exit status: 32
    Oct 19 21:42:31 Cogsworth  emhttpd: Retry unmounting disk share(s)...
    Oct 19 21:42:36 Cogsworth  emhttpd: Unmounting disks...
    Oct 19 21:42:36 Cogsworth  emhttpd: shcmd (29661): umount /mnt/cache
    Oct 19 21:42:36 Cogsworth root: umount: /mnt/cache: target is busy.
    Oct 19 21:42:36 Cogsworth  emhttpd: shcmd (29661): exit status: 32
    Oct 19 21:42:36 Cogsworth  emhttpd: Retry unmounting disk share(s)...
    Oct 19 21:42:41 Cogsworth  emhttpd: Unmounting disks...


     

    Link to comment

    Just had another crash. This is what I came back to after a reboot.

     

    EATL7zP.png

     

    Too early in the morning to fix this.

     

    But once I get it fixed. I think im going to remove Unassigned Devices and see how long I can go without a crash. Maybe I didn't give it enough time the first go around.

     

    EDIT: Looks like I got a bad SFF 8087 TO 4 x SATA cable. Ordered some new ones. Testing on hold until the new ones arrive.

    Edited by ShadyDeth
    update
    Link to comment

    I'm optimistic that my problem is solved with unassigned devices removed. I just passed 5 days uptime and I was previously locking up after 2-3 days.

    Edited by NautilusGT
    Link to comment
    13 hours ago, ShadyDeth said:

    EDIT: Looks like I got a bad SFF 8087 TO 4 x SATA cable. Ordered some new ones. Testing on hold until the new ones arrive.

     

    Oooof, I've been there! Terrifying when you see your drives not there all at once.

    Link to comment

    I'm 15mins away from having stayed up 10 consecutive days.  I'm nearly ready to consider my issue solved.  These are the plugins I removed:

     

    ca.backup2.plg - 2022.07.23 (Up to date)

    dynamix.active.streams.plg - 2020.06.17 (Up to date)

    file.activity.plg - 2022.08.19 (Up to date)

     

    Give the current stability, I'm not willing to re-install them for further troubleshooting, as there is nothing there I can't live without.  I still have UD installed and it hosts my primary seed torrents with a little over 12tb total running 24x7.  At least in my situation it does not appear unassigned devices-related.

     

    It's like one of my wife's unsolved murder podcasts - I know at the end I probably won't know exactly who caused it, but it doesn't keep me from wanting to know...

    Link to comment

    Well...I got to a little over 4 days of uptime before it crashed again.  I confirmed what @JesterEE noted I was able to stop all dockers and get the WebUI back so there appears to be something with docker and not just UD plugins.

     

    I just reverted back to 6.10.3 as I need this system to be stable.  I wish my dev environment was also having these issues as I have no problem to keep troubleshooting there.  I will keep an eye on this thread to see if this gets resolved.

     

    Attaching diagnostics.

    debo-server-diagnostics-20221022-1601.zip

    • Like 1
    Link to comment

    WELP.  After 10 days and 4 hours of uptime, I encountered this issue yet again.

     

    @JesterEE Thank you tons for the docker command - I've tried several incarnations previously, but unable to get a successful response.  This allowed me to basically restart the dockers and be on my merry way. 

     

    Best I can tell going back through the posts, it appears everyone experiencing the issue is running some type of torrent client - mostly deluge or qbittorrent.  Is this somehow docker engine/IO related?  There are several here not running their torrents off of UD, so that doesn't appear as obvious, either.

     

    Over the 10 days I stayed running, I transferred a bit over 10tb of torrents.  I don't think that's unsubstantial - so maybe not load related at all?

     

    Latest diags attached for funsies.

    unraid-diagnostics-20221023-0827.zip

    • Like 1
    Link to comment

    I also am running a docker stack with ipvlan Ryzen 5900x windows VM freezes as others stated above. Rolling back to 6.10 for now. I also had unassigned devices and community applications plugins. Windows VM typically freeze after a couple hours of idle time. 

     

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.