• [6.7.0-rc1] System Hard Lock


    TechnoBabble28
    • Urgent

    I just upgraded from 5.6.3 to 6.7 rc1 yesterday afternoon. It initially locked after about 4 hours, no ssh or browser access, and required a hard reset. I then ran TS mode from the FCP plugin, left the dashboard window open on my screen, and went to bed. When i woke up this morning it had locked up again around 12.5 hours in. I am not running any VM's, this is purely a media server with minimal dockers. The logs are attached below.

    Unraid.thumb.PNG.89e543f8e0299621ee334448646ccc9e.PNG

    FCPsyslog_tail.txt

    mediaserver-diagnostics-20190122-0245.zip




    User Feedback

    Recommended Comments



    Bios was updated to lasted at the beginning of December and i haven't had any issues with it. I am running a zenstates script to fix the turbo boost issue and had previously disabled c-states in bios when i first set the system up. However, I have not checked the c-states since this bios update to see if they have been changed, mainly because everything was stable in 5.6.3 even after the bios update. I will check that now.

    Link to comment

    Nope, that just hard locks the system every 20 minutes lol. I think ill just go back to 6.5.3. That's a bummer since the new dashboard is pretty nice. I'll close this thread.

    Link to comment

    I had the same issue on my TR 2990WX.

     

    In my /boot/config/go I have:

    /usr/local/sbin/zenstates --c6-disable

    I have also rolled back to 6.6.6. Mainly due to this, but also because of the BTRFS issue.

     

    I don't have diagnostics from the time because the system halted completely and required a hard reset from the physical power button. I am open to trying further tweaks if suggested.

    Link to comment
    46 minutes ago, d2dyno said:

    I have also rolled back to 6.6.6. Mainly due to this

    Are you saying the zenstates program is not working correctly in 6.7.0-rc1?

    Link to comment
    22 minutes ago, limetech said:

    Are you saying the zenstates program is not working correctly in 6.7.0-rc1?

    If I'm being honest, I am not well versed in zen States and what it actually does. All I know is with the same settings as 6.6.6, I now get system crashes/kernel panic.

    Link to comment
    On 1/22/2019 at 1:35 PM, TechnoBabble28 said:

    C6 was set to auto in this new Bios version so i have reset it to disabled. Will see if that fixes the problem.

    @TechnoBabble28, I don't think you disabled the correct setting.  You do not want to adjust C6, as it will make the problem worse.  Leave it at Auto.  Instead, you need to disable "Global C-state Control", which is typically located here:

     

    • Advanced  -->  AMD CBS  -->  Zen Common Options  -->  Global C-state Control

     

    Tom, sorry I'm not going to try 6.7.0-RC1, I need stability in my life right now.

     

    Paul

     

    • Like 1
    Link to comment
    17 minutes ago, Pauven said:

    Tom, sorry I'm not going to try 6.7.0-RC1, I need stability in my life right now.

    No worries!

    • Like 1
    Link to comment
    3 minutes ago, TechnoBabble28 said:

    I have no such option anywhere in the Bios for global c-states. These are my only c-states settings.

    Those are definitely not the ones you want to touch!

     

    The AMD BIOS's are typically very complex, with settings hiding where you wouldn't think to look.  Try poking around, as Global C-state Control is a pretty universal option.

     

    Paul

    Link to comment

    You sir were correct. It was buried in the overclocking settings under "advanced cpu core settings". Have just disabled it and will re-update to 6.7.

    • Like 1
    Link to comment
    Just now, TechnoBabble28 said:

    You sir were correct. It was buried in the overclocking settings under "advanced cpu core settings". Have just disabled it and will re-update to 6.7.

    Let me know the results. I would really like to stay with 6.7 for BBR.

    Link to comment
    Quote

    Let me know the results. I would really like to stay with 6.7 for BBR.

    Will do. Just rebooted into 6.7. I think my longest up-time previously was about 12.5 hours so we will see if we make it that far.

    • Like 1
    Link to comment

    Most recent Ryzen bios added a power supply idle control setting that usually solves the hanging issue without completely disabling c-states, look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar).

    • Like 1
    Link to comment

    I saw that while I was looking for the global c-states setting and set mine to typical also. Figured it was worth a shot. It's still up so far, fingers crossed. 

    Link to comment

    Did it change anything for u?

     

    I also have this connection issue with the latest rc, but i dont have a ryzen...? :)

    Link to comment

    Server has been up and running for almost 18 hours as of now. That's the longest it's stayed on thus far. So far so good. Nothing unusual in the logs. 

    Link to comment
    17 hours ago, johnnie.black said:

    "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar).

    This alone did not fix the issue. Trying disabling c-state now.

    Link to comment
    3 hours ago, d2dyno said:

    This alone did not fix the issue. Trying disabling c-state now.

    That's really disappointing.  I keep hoping ASRock will add that feature to my BIOS, or maybe I'll get lucky and find it hiding somewhere I hadn't looked. 

     

    But if it doesn't even work.... then I guess I don't need to email ASRock support to ask them to add this setting.

     

    What's your motherboard?

    Link to comment

    Whelp, just had another hard lock. No diagnostic files were created, so I am going to leave a terminal with syslog opened to try and catch it next time.

    Link to comment

    Well, seems I made it happen quicker this time 😅

    Jan 25 04:44:01 TheiaHD root: Fix Common Problems Version 2019.01.19
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Share Backups set to not use the cache, but files / folders exist on the cache drive
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Share Misc set to not use the cache, but files / folders exist on the cache drive
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Share rec set to not use the cache, but files / folders exist on the cache drive
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application bazarr has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application binhex-rtorrentvpn has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application Influxdb has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application JellyFin has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application letsencrypt has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application lidarr has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application mariadb has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application nextcloud has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application ombi has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application openvpn-as has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application plex has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application plex2 has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application qbt1 has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application qbt2 has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application radarr has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application roon has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application rutorrent has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: Docker Application sonarr has an update available for it
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Error: Docker application ombi has volumes being passed that are mounted by Unassigned Devices, but they are not mounted with the slave option ** Ignored
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Error: Docker application plex2 has volumes being passed that are mounted by Unassigned Devices, but they are not mounted with the slave option ** Ignored
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Error: Docker application plex2 has volumes being passed that are mounted by Unassigned Devices, but they are not mounted with the slave option ** Ignored
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Error: Docker application plex2 has volumes being passed that are mounted by Unassigned Devices, but they are not mounted with the slave option ** Ignored
    Jan 25 04:44:01 TheiaHD root: Fix Common Problems: Warning: unRaid's built in FTP server is currently disabled, but users are defined ** Ignored
    Jan 25 04:44:07 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r14560 t886), syncing.
    Jan 25 04:44:07 TheiaHD root: Fix Common Problems: Warning: Template URL for docker application qbt1 is missing.
    Jan 25 04:44:07 TheiaHD root: Fix Common Problems: Warning: Template URL for docker application qbt2 is missing.
    Jan 25 04:44:39 TheiaHD kernel: br0: received packet on bond0 with own address as source address (addr:00:02:c9:54:99:2e, vlan:0)
    Jan 25 04:45:07 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r10045 t648), syncing.
    Jan 25 04:45:45 TheiaHD kernel: br0: received packet on bond0 with own address as source address (addr:00:02:c9:54:99:2e, vlan:0)
    Jan 25 04:46:07 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r4767 t489), syncing.
    Jan 25 04:46:10 TheiaHD kernel: br0: received packet on bond0 with own address as source address (addr:00:02:c9:54:99:2e, vlan:0)
    Jan 25 04:46:10 TheiaHD kernel: br0: received packet on bond0 with own address as source address (addr:00:02:c9:54:99:2e, vlan:0)
    Jan 25 04:46:23 TheiaHD kernel: br0: received packet on bond0 with own address as source address (addr:00:02:c9:54:99:2e, vlan:0)
    Jan 25 04:46:37 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r4356 t454), syncing.
    Jan 25 04:47:08 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (31->4263, r4505 t476), syncing.
    Jan 25 04:47:38 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r15263 t964), syncing.
    Jan 25 04:48:08 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r7365 t1192), syncing.
    Jan 25 04:48:38 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r4193 t595), syncing.
    Jan 25 04:49:38 TheiaHD vnstatd[12183]: Traffic rate for "br0" higher than set maximum 1000 Mbit (30->4125, r5809 t593), syncing.
    Jan 25 04:49:43 TheiaHD kernel: br0: received packet on bond0 with own address as source address (addr:00:02:c9:54:99:2e, vlan:0)
    Jan 25 04:50:47 TheiaHD kernel: stack segment: 0000 [#1] SMP NOPTI
    Jan 25 04:50:47 TheiaHD kernel: CPU: 37 PID: 0 Comm: swapper/37 Not tainted 4.19.17-Unraid #1
    Jan 25 04:50:47 TheiaHD kernel: Hardware name: System manufacturer System Product Name/PRIME X399-A, BIOS 0808 10/12/2018
    Jan 25 04:50:47 TheiaHD kernel: RIP: 0010:kmem_cache_alloc+0x8d/0xf3
    Jan 25 04:50:47 TheiaHD kernel: Code: 24 65 49 8b 50 08 65 4c 03 05 77 9a ee 7e 49 83 78 10 00 49 8b 28 74 31 48 85 ed 74 2c 41 8b 44 24 20 48 8d 4a 01 49 8b 3c 24 <48> 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 bf 41 8b
    Jan 25 04:50:47 TheiaHD kernel: RSP: 0018:ffff88885f343bb0 EFLAGS: 00010206
    Jan 25 04:50:47 TheiaHD kernel: RAX: 0000000000000000 RBX: 0000000000488020 RCX: 000000000070fba4
    Jan 25 04:50:47 TheiaHD kernel: RDX: 000000000070fba3 RSI: 0000000000488020 RDI: 0000000000024240
    Jan 25 04:50:47 TheiaHD kernel: RBP: 0000888fc0000000 R08: ffff88885f364240 R09: 0000000000000000
    Jan 25 04:50:47 TheiaHD kernel: R10: ffff88894bef7200 R11: 0000000000000014 R12: ffff88885ec07780
    Jan 25 04:50:47 TheiaHD kernel: R13: 0000000000488020 R14: ffffffff81625612 R15: ffffffff8161611d
    Jan 25 04:50:47 TheiaHD kernel: FS: 0000000000000000(0000) GS:ffff88885f340000(0000) knlGS:0000000000000000
    Jan 25 04:50:47 TheiaHD kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 04:50:47 TheiaHD kernel: CR2: 000000c0083d1000 CR3: 0000001026e70000 CR4: 00000000003406e0
    Jan 25 04:50:47 TheiaHD kernel: Call Trace:
    Jan 25 04:50:47 TheiaHD kernel: <IRQ>
    Jan 25 04:50:47 TheiaHD kernel: br_nf_pre_routing+0x221/0x2f7
    Jan 25 04:50:47 TheiaHD kernel: nf_hook_slow+0x37/0x96
    Jan 25 04:50:47 TheiaHD kernel: br_handle_frame+0x291/0x2d0
    Jan 25 04:50:47 TheiaHD kernel: ? br_pass_frame_up+0x143/0x143
    Jan 25 04:50:47 TheiaHD kernel: __netif_receive_skb_core+0x461/0x793
    Jan 25 04:50:47 TheiaHD kernel: __netif_receive_skb_one_core+0x31/0x69
    Jan 25 04:50:47 TheiaHD kernel: netif_receive_skb_internal+0x9f/0xba
    Jan 25 04:50:47 TheiaHD kernel: napi_gro_frags+0x153/0x18b
    Jan 25 04:50:47 TheiaHD kernel: mlx4_en_process_rx_cq+0x7ea/0x953 [mlx4_en]
    Jan 25 04:50:47 TheiaHD kernel: ? mlx4_cq_completion+0x1e/0x63 [mlx4_core]
    Jan 25 04:50:47 TheiaHD kernel: ? mlx4_en_rx_irq+0x23/0x3e [mlx4_en]
    Jan 25 04:50:47 TheiaHD kernel: ? mlx4_eq_int+0xb2a/0xb55 [mlx4_core]
    Jan 25 04:50:47 TheiaHD kernel: mlx4_en_poll_rx_cq+0x66/0xc6 [mlx4_en]
    Jan 25 04:50:47 TheiaHD kernel: net_rx_action+0x10b/0x274
    Jan 25 04:50:47 TheiaHD kernel: __do_softirq+0xce/0x1e2
    Jan 25 04:50:47 TheiaHD kernel: irq_exit+0x5e/0x9d
    Jan 25 04:50:47 TheiaHD kernel: do_IRQ+0xa9/0xc7
    Jan 25 04:50:47 TheiaHD kernel: common_interrupt+0xf/0xf
    Jan 25 04:50:47 TheiaHD kernel: </IRQ>
    Jan 25 04:50:47 TheiaHD kernel: RIP: 0010:cpuidle_enter_state+0xe8/0x141
    Jan 25 04:50:47 TheiaHD kernel: Code: ff 45 84 ff 74 1d 9c 58 0f 1f 44 00 00 0f ba e0 09 73 09 0f 0b fa 66 0f 1f 44 00 00 31 ff e8 03 52 be ff fb 66 0f 1f 44 00 00 <48> 2b 1c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00 48 39 cb
    Jan 25 04:50:47 TheiaHD kernel: RSP: 0018:ffffc9000662bea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
    Jan 25 04:50:47 TheiaHD kernel: RAX: ffff88885f360c40 RBX: 0000014ae7927c17 RCX: 000000000000001f
    Jan 25 04:50:47 TheiaHD kernel: RDX: 0000014ae7927c17 RSI: 000000002ac00b45 RDI: 0000000000000000
    Jan 25 04:50:47 TheiaHD kernel: RBP: ffff8888412e3c00 R08: 0000000000000002 R09: 0000000000020500
    Jan 25 04:50:47 TheiaHD kernel: R10: 0000000000000000 R11: 00000408a8a6bd6e R12: 0000000000000002
    Jan 25 04:50:47 TheiaHD kernel: R13: 0000000000000002 R14: ffffffff81e5c4f8 R15: 0000000000000000
    Jan 25 04:50:47 TheiaHD kernel: do_idle+0x192/0x20e
    Jan 25 04:50:47 TheiaHD kernel: cpu_startup_entry+0x6a/0x6c
    Jan 25 04:50:47 TheiaHD kernel: start_secondary+0x197/0x1b2
    Jan 25 04:50:47 TheiaHD kernel: secondary_startup_64+0xa4/0xb0
    Jan 25 04:50:47 TheiaHD kernel: Modules linked in: tun iptable_mangle veth xt_nat ipt_MASQUERADE iptable_nat nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod bonding mlx4_en mlx4_core igb i2c_algo_bit amd64_edac_mod edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd mpt3sas cryptd raid_class wmi_bmof mxm_wmi glue_helper scsi_transport_sas ahci ccp i2c_piix4 k10temp i2c_core libahci pcc_cpufreq wmi button acpi_cpufreq [last unloaded: mlx4_core]
    Jan 25 04:50:47 TheiaHD kernel: ---[ end trace 9734a78e09c09700 ]---
    Jan 25 04:50:47 TheiaHD kernel: RIP: 0010:kmem_cache_alloc+0x8d/0xf3
    Jan 25 04:50:47 TheiaHD kernel: Code: 24 65 49 8b 50 08 65 4c 03 05 77 9a ee 7e 49 83 78 10 00 49 8b 28 74 31 48 85 ed 74 2c 41 8b 44 24 20 48 8d 4a 01 49 8b 3c 24 <48> 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 bf 41 8b
    Jan 25 04:50:47 TheiaHD kernel: RSP: 0018:ffff88885f343bb0 EFLAGS: 00010206
    Jan 25 04:50:47 TheiaHD kernel: RAX: 0000000000000000 RBX: 0000000000488020 RCX: 000000000070fba4
    Jan 25 04:50:47 TheiaHD kernel: RDX: 000000000070fba3 RSI: 0000000000488020 RDI: 0000000000024240
    Jan 25 04:50:47 TheiaHD kernel: RBP: 0000888fc0000000 R08: ffff88885f364240 R09: 0000000000000000
    Jan 25 04:50:47 TheiaHD kernel: R10: ffff88894bef7200 R11: 0000000000000014 R12: ffff88885ec07780
    Jan 25 04:50:47 TheiaHD kernel: R13: 0000000000488020 R14: ffffffff81625612 R15: ffffffff8161611d
    Jan 25 04:50:47 TheiaHD kernel: FS: 0000000000000000(0000) GS:ffff88885f340000(0000) knlGS:0000000000000000
    Jan 25 04:50:47 TheiaHD kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 04:50:47 TheiaHD kernel: CR2: 000000c0083d1000 CR3: 0000001026e70000 CR4: 00000000003406e0
    Jan 25 04:50:47 TheiaHD kernel: Kernel panic - not syncing: Fatal exception in interrupt

     

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.