Uk_tomcat_fan Posted April 2, 2022 Share Posted April 2, 2022 I have recently upgraded my unraid server to use a spare ryzen 7 1700 cpu on a MSI B550 Tomahawk Motherboard. Since that time I have been seeing frequent crashes / lock ups for a variety of reasons. My most recent crash is : Apr 1 20:51:10 Tower kernel: ------------[ cut here ]------------ Apr 1 20:51:10 Tower kernel: WARNING: CPU: 7 PID: 7435 at kernel/exit.c:725 do_exit+0x4b/0x8eb Apr 1 20:51:10 Tower kernel: Modules linked in: xt_nat veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd lockd grace sunrpc md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables r8169 realtek sr_mod cdrom edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd r8125(O) cryptd glue_helper wmi_bmof i2c_piix4 rapl input_leds ccp ahci i2c_core wmi k10temp led_class cdc_acm libahci acpi_cpufreq button [last unloaded: realtek] Apr 1 20:51:10 Tower kernel: CPU: 7 PID: 7435 Comm: unraidd10 Tainted: G S D O 5.10.28-Unraid #1 Apr 1 20:51:10 Tower kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C91/MAG B550 TOMAHAWK (MS-7C91), BIOS A.80 12/16/2021 Apr 1 20:51:10 Tower kernel: RIP: 0010:do_exit+0x4b/0x8eb Apr 1 20:51:10 Tower kernel: Code: 65 48 8b 1c 25 c0 7b 01 00 48 8b 83 e8 06 00 00 48 85 c0 74 17 48 8b 10 48 39 d0 75 0d 48 8b 50 10 48 83 c0 10 48 39 c2 74 02 <0f> 0b 65 8b 0d ec 40 fc 7e 89 c8 48 c7 c7 2e 61 d7 81 25 00 ff ff Apr 1 20:51:10 Tower kernel: RSP: 0018:ffffc90000a7fee8 EFLAGS: 00010012 Apr 1 20:51:10 Tower kernel: RAX: ffffc90000a7fe40 RBX: ffff8881055a3800 RCX: 0000000000000027 Apr 1 20:51:10 Tower kernel: RDX: ffff88813be8c348 RSI: 0000000000000001 RDI: 0000000000000009 Apr 1 20:51:10 Tower kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffdfff Apr 1 20:51:10 Tower kernel: R10: ffffc90000a7f958 R11: ffffc90000a7f950 R12: 0000000000000009 Apr 1 20:51:10 Tower kernel: R13: 0000000000000009 R14: 0000000000000046 R15: 0000000000000000 Apr 1 20:51:10 Tower kernel: FS: 0000000000000000(0000) GS:ffff888fee9c0000(0000) knlGS:0000000000000000 Apr 1 20:51:10 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 1 20:51:10 Tower kernel: CR2: 0000000000000000 CR3: 0000000383a90000 CR4: 00000000003506e0 Apr 1 20:51:10 Tower kernel: Call Trace: Apr 1 20:51:10 Tower kernel: ? md_seq_show+0x69e/0x69e [md_mod] Apr 1 20:51:10 Tower kernel: ? kthread+0xe5/0xea Apr 1 20:51:10 Tower kernel: rewind_stack_do_exit+0x17/0x17 Apr 1 20:51:10 Tower kernel: RIP: 0000:0x0 Apr 1 20:51:10 Tower kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. Apr 1 20:51:10 Tower kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 Apr 1 20:51:10 Tower kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Apr 1 20:51:10 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Apr 1 20:51:10 Tower kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Apr 1 20:51:10 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Apr 1 20:51:10 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Apr 1 20:51:10 Tower kernel: ---[ end trace 558ddcf995bf62ba ]--- Getting frustrated with being unable to find out what the root cause is, I know i have a bad cache device, and that was causing the mover sequence to lockup / crash, Wondering if anyone can assist / point me in the correct direction Attached is my diagnostics tower-diagnostics-20220401-2103.zip Quote Link to comment
peterg23 Posted April 2, 2022 Share Posted April 2, 2022 The suggestions discussed in this link with regard to Ryzen cstates and PSU Idle power control may help. https://www.reddit.com/r/unRAID/comments/sohq5s/question_about_ryzen_5_1600/ Quote Link to comment
Uk_tomcat_fan Posted April 2, 2022 Author Share Posted April 2, 2022 Thank you, I forgot to mention i have /usr/local/sbin/zenstates --c6-disable in my go file on my flash partition (I believe this is where im meant to add it) I will look at the PSU Idle as well Quote Link to comment
JonathanM Posted April 2, 2022 Share Posted April 2, 2022 Another common mistake is running the RAM at too high a speed for the CPU / motherboard. The RAM may well be rated to handle it, but the CPU / board can't, and the BIOS defaults to what is basically an overclock. Quote Link to comment
ConnerVT Posted April 2, 2022 Share Posted April 2, 2022 (edited) Went through this with my 1500X build. Since setting the PSU idle power in BIOS and setting RAM to the Ryzen default speeds appropriate for my memory (and pay attention to which single/dual rank RAM you have), my system as been rock solid stable. I never touched my c-states. YMMV Edited April 2, 2022 by ConnerVT wrong word Quote Link to comment
Uk_tomcat_fan Posted April 2, 2022 Author Share Posted April 2, 2022 Just crashed again, couldn't get to the logs this time, checked my bios settings and hard turned off cstates, memory was set to 3333 mhz, changed that to 2400 mhz (below the 1700s limit of 2600) left the c states disabled in the go file Here is a picture of my screen Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.