mankey54 Posted March 6, 2021 Share Posted March 6, 2021 hi there need some help. I've been running Unraid for well over 4 years now upgraded hardware twice. current hardware have been running with well over a year no issues until recently. to explain my symptom upon start up of unraid my system automatically starts up the array and auto runs dockers and VMs. I'm recently encounter a crash of unraid that occurs when I start up the Array. I've disabled Docker and VM upon start up trying to narrow it down however it still crashes. I attach is the screenshot. My next step I will try to keep it running with the array stopped to confirm if indeed its the array causing the crash. I more than welcome to post any logs let me know which I will try to grab it before it crashes. Quote Link to comment
mankey54 Posted March 10, 2021 Author Share Posted March 10, 2021 Hi There Community. I have an update on this issue. Afterward I tried to leave Unraid running with Array stopped. it will continue running for maybe 1 hour and then it would crash with similar message before. give context I am running the previous stable release I believe it is 6.8.3 I've not yet upgraded to the latest stable release. Quote Link to comment
Squid Posted March 10, 2021 Share Posted March 10, 2021 Post your diagnostics to see if there's something obvious there. Quote Link to comment
mankey54 Posted March 10, 2021 Author Share Posted March 10, 2021 thank you Squid. I just read the Read me First message mentioned that Here go you. Also I need to make a correction the version of unraid is 6.8.1 supergrid-diagnostics-20210310-1450.zip Quote Link to comment
Squid Posted March 10, 2021 Share Posted March 10, 2021 First place to start is https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-819173 But I would also update to 6.9.1 as the later kernels have better Ryzen support And also run a memory test for a minimum of a single pass (ideally a couple of them) 1 Quote Link to comment
mankey54 Posted March 16, 2021 Author Share Posted March 16, 2021 Hi Squid. I wanted to give you an update on my troubleshooting progress Updated Unraid to 6.9.1 and complete memory test (did 10 passes without error). I was able to restart unraid and complete Parity Check. currently started up docker and continue to monitor the server. Slowly enabling all the settings, still need to enable VM waiting for another day of stability before I continue to this. Server has been up running for 1.5 days now will continue to monitor. Thank you so much for your support Quote Link to comment
mankey54 Posted March 20, 2021 Author Share Posted March 20, 2021 hi There I have an update on this issue. the problem continues to persists. I find it difficult capture the error however at one point when I was troubleshooting I was able to capture the problem here are the I see many segfault errors. below are the logs I was able to capture Mar 20 17:59:34 SuperGrid kernel: Call Trace: Mar 20 17:59:34 SuperGrid kernel: swake_up_one+0x16/0x24 Mar 20 17:59:34 SuperGrid kernel: rcu_nocb_gp_kthread+0x23c/0x433 Mar 20 17:59:34 SuperGrid kernel: ? rcu_bind_current_to_nocb+0x38/0x38 Mar 20 17:59:34 SuperGrid kernel: kthread+0xe5/0xea Mar 20 17:59:34 SuperGrid kernel: ? __kthread_bind_mask+0x57/0x57 Mar 20 17:59:34 SuperGrid kernel: ret_from_fork+0x22/0x30 Mar 20 17:59:34 SuperGrid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_nat nf_nat xfs md_mod hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables bonding edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel r8169 aesni_intel crypto_simd cryptd nvme wmi_bmof realtek i2c_piix4 k10temp i2c_core nvme_core ahci glue_helper wmi ccp rapl libahci thermal button acpi_cpufreq Mar 20 17:59:34 SuperGrid kernel: CR2: ffffffff81071b33 Mar 20 17:59:34 SuperGrid kernel: ---[ end trace 8f246789f2fd9320 ]--- Mar 20 17:59:34 SuperGrid kernel: RIP: 0010:update_nohz_stats+0x0/0x4e Mar 20 17:59:34 SuperGrid kernel: Code: 40 84 ed 74 0a 31 f6 4c 89 ff e8 56 de ff ff 48 8b 74 24 08 48 83 c4 18 4c 89 ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 6b e8 67 00 <83> 7f 20 00 74 45 53 48 89 fb 8b bf 48 0a 00 00 89 f2 48 c7 c6 80 Mar 20 17:59:34 SuperGrid kernel: RSP: 0018:ffffc9000043fb50 EFLAGS: 00010046 Mar 20 17:59:34 SuperGrid kernel: RAX: 0000000000000005 RBX: ffff11091e962380 RCX: 0000000000000005 Mar 20 17:59:34 SuperGrid kernel: RDX: ffff888100022380 RSI: 0000000000000000 RDI: ffff11091e962380 Mar 20 17:59:34 SuperGrid kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000005 Mar 20 17:59:34 SuperGrid kernel: R10: 0000000000000000 R11: ffff88881e962400 R12: ffffc9000043fc08 Mar 20 17:59:34 SuperGrid kernel: R13: ffffc9000043fc88 R14: ffffc9000043fd28 R15: ffff88810094f9c0 Mar 20 17:59:34 SuperGrid kernel: FS: 0000000000000000(0000) GS:ffff88881e9c0000(0000) knlGS:0000000000000000 Mar 20 17:59:34 SuperGrid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 20 17:59:34 SuperGrid kernel: CR2: ffffffff81071b33 CR3: 000000000200c000 CR4: 0000000000350ee0 Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873106, 0] ../../source3/libsmb/nmblib.c:922(send_udp) Mar 20 17:59:36 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(138) ERRNO=Network is unreachable Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873186, 0] ../../source3/libsmb/nmblib.c:922(send_udp) Mar 20 17:59:36 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(137) ERRNO=Network is unreachable Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873196, 0] ../../source3/nmbd/nmbd_packets.c:179(send_netbios_packet) Mar 20 17:59:36 SuperGrid nmbd[9868]: send_netbios_packet: send_packet() to IP 172.17.255.255 port 137 failed Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873206, 0] ../../source3/nmbd/nmbd_nameregister.c:581(register_name) Mar 20 17:59:36 SuperGrid nmbd[9868]: register_name: Failed to send packet trying to register name #001#002__MSBROWSE__#002<01> Mar 20 17:59:44 SuperGrid nmbd[9868]: [2021/03/20 17:59:44.885890, 0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2) Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: Samba name server SUPERGRID is now a local master browser for workgroup WORKGROUP on subnet 192.168.1.8 Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:44 SuperGrid nmbd[9868]: [2021/03/20 17:59:44.886001, 0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2) Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: Samba name server SUPERGRID is now a local master browser for workgroup WORKGROUP on subnet 192.168.122.1 Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:56 SuperGrid kernel: sensors[10601]: segfault at 7ffe8754b20b ip 00007ffe8754b20b sp 00007ffcaf35ba10 error 14 Mar 20 17:59:56 SuperGrid kernel: Code: Unable to access opcode bytes at RIP 0x7ffe8754b1e1. Mar 20 18:00:34 SuperGrid nmbd[9868]: [2021/03/20 18:00:34.939702, 0] ../../source3/libsmb/nmblib.c:922(send_udp) Mar 20 18:00:34 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(138) ERRNO=Network is unreachable Mar 20 18:00:35 SuperGrid kernel: plugin[10819]: segfault at 4129808 ip 0000150b83470053 sp 00007ffe041296a0 error 4 in ld-2.30.so[150b83466000+20000] Mar 20 18:00:35 SuperGrid kernel: Code: ef c0 48 83 7c 24 08 00 48 89 44 24 40 0f 29 44 24 50 74 11 f7 84 24 d0 00 00 00 fa ff ff ff 0f 85 e7 0c 00 00 48 8b 44 24 18 <49> 8b 0c 24 48 83 bc 24 d8 00 00 00 00 4c 8b 08 0f 85 67 02 00 00 Quote Link to comment
mankey54 Posted March 21, 2021 Author Share Posted March 21, 2021 Hi There, I was able to capture a diagnostics once the segfault appears. Once I see this segfault the system would crash in a few minutes. Please know that I have also stopped the array but this error persists. Mar 20 21:27:22 SuperGrid kernel: php[16009]: segfault at 2881cef8 ip 000015084ad30053 sp 00007ffc2881cd90 error 4 in ld-2.30.so[15084ad26000+20000] Mar 20 21:27:22 SuperGrid kernel: Code: ef c0 48 83 7c 24 08 00 48 89 44 24 40 0f 29 44 24 50 74 11 f7 84 24 d0 00 00 00 fa ff ff ff 0f 85 e7 0c 00 00 48 8b 44 24 18 <49> 8b 0c 24 48 83 bc 24 d8 00 00 00 00 4c 8b 08 0f 85 67 02 00 00 Troubleshooting I have done so far in this order - upgraded to latest unraid 6.9.1 - Completed a Mem test (completed 10 consective test with 0 failures) - turned off VM and Docker to isolate if it was related to VM or Docker image - rebuilt Docker image - stopped the array supergrid-diagnostics-20210320-2131.zip Quote Link to comment
trurl Posted March 21, 2021 Share Posted March 21, 2021 You mentioned completing RAM test, but you don't mention whether or not you made any other suggested changes in the FAQ linked above. Did you? Quote Link to comment
mankey54 Posted March 21, 2021 Author Share Posted March 21, 2021 Hi Trurl. I've not done any of the changes in the FAQ. it speaks about ram speeds. I use DDR 2400 1 stick 32GB. I've not done any tuning on the motherboard side and I dont over-clock. I've tested the ram multiple times let it run 2 night straight with all passing no failures. Also I like to note that I've been running with this setup for over a year now and just recently this issue is occuring. I've not overclocked anything or upgraded unraid previously I was on 6.8.3 and only upgraded 6.9.1 after the issue arised in hopes it fixed the problem. Quote Link to comment
ChatNoir Posted March 21, 2021 Share Posted March 21, 2021 5 hours ago, mankey54 said: Hi Trurl. I've not done any of the changes in the FAQ. it speaks about ram speeds. I use DDR 2400 1 stick 32GB. I've not done any tuning on the motherboard side and I dont over-clock. I've tested the ram multiple times let it run 2 night straight with all passing no failures. The linked FAQ post does speak of RAM speed but also of C-State. Those are the two major reason behind Ryzen systems lockup. Check your BIOS setting for that. Quote Link to comment
mankey54 Posted March 25, 2021 Author Share Posted March 25, 2021 Hi Chat and Nior, Thank you for your response. In regards to cstate upon reading it mention only to disable this if you are running a earlier Ryzen 1xxxx. However I am running a later version Ryzen3800x. With that being said I looked into my bios I was not able to find the cstate settings. I've tried searching on google to find this no luck. It's worth noting again that I've been running with this hardware configuration for well over a year and only recently ive encounter these crashes. There hasn't been any changes made to my system which makes me believe it to be possible a hardware problem. I wonder if the flash drive holding the unraid settings could be damaged or corrupt? I would like to investigate into cstate settings if anyone can share how navigate to through the bios. I have a Gigabyte x570 Gaming mobo. Since last I've went back to test the RAM and have completed 10 pass without error. I've restarted the system and try to complete a parrott check however now I received a new error. See attached screenshot. My next steps I will now look to transfer to a new flashdrive to see if it helps but also considering to rollback to 6.8.3 or 6.8.1. it's also worth noting again that I first had their problem when running 6.8.3 but it's been running stable under 6.8.3 for over 6 months and only recently upgrade to 6.9.1 in hopes to fix the problem. Quote Link to comment
mankey54 Posted March 29, 2021 Author Share Posted March 29, 2021 hi There I've an update. Unfortnately the problem is getting worse. The issue now the system is in a costant crash loop. Unraid would start up and in 5-10 minutes the system would crash and restart. I was able to pull down the diagnotics. I've not yet created a new USB flashdrive as I am not able to keep the system long enough to even pull a backup. I do have a older backup 2 weeks back that I will try to restore. However this backup was captured during the initial issue not before this issue first began. If anyone can help me it would greatly appreciate it. Any troubleshooting steps would help me out alot. Im little lost now. supergrid-diagnostics-20210329-1419.zip Quote Link to comment
trurl Posted March 29, 2021 Share Posted March 29, 2021 A lot we can't tell from those diagnostics since just after reboot without the array started. Have you tried going to Settings and disabling Docker and VM Manager, then booting in SAFE mode? You can put your flash in your PC to make a backup. You only need the config folder to put your configuration on a new install. If you change flash drives you would have to transfer the license of course. Quote Link to comment
John_M Posted March 29, 2021 Share Posted March 29, 2021 (edited) On 3/25/2021 at 10:57 PM, mankey54 said: I would like to investigate into cstate settings if anyone can share how navigate to through the bios. I have a Gigabyte x570 Gaming mobo. The Power Supply Idle Control is the one to change, if that indeed is your problem, though it won't do any harm anyway. Set it to "Typical Current Idle" instead of the default "Low Current Idle". You should find it under the AMD_CBS section of your BIOS. EDIT: https://download.gigabyte.com/FileList/Manual/mb_manual_x570-gaming-x_e_v1.pdf - page 23 Edited March 29, 2021 by John_M Added manual page Quote Link to comment
mankey54 Posted March 29, 2021 Author Share Posted March 29, 2021 hi trurl. Thank you for your reply. I was able to restore a previous backup 2 weeks back, this version is 6.8.1 i let it run normally and it still has the crashing problem. After which your direction to run in SAFE Mode I am currently doing that now with the back up (6.8.1). currently running for 5 mins no crash yet. keeping the System Log window open to capture anything. from here what should I do? I went ahead to pull down another diagnostic just in case supergrid-diagnostics-20210329-1520.zip Quote Link to comment
trurl Posted March 29, 2021 Share Posted March 29, 2021 Those diagnostics don't have the array started. Also, doesn't look like anyone has mentioned memtest. Quote Link to comment
mankey54 Posted March 29, 2021 Author Share Posted March 29, 2021 @ trurl, I sure I can start up the array. Since Im using a new usb flash I will have to transfer the key to the new flash drive. in regards to the MemTest I've actually ran MemTest few times already as well as reseeding the Memory stick for good measure. My last memory test I have ran 2 days back it completed 10 consecutive passes without any failures. my RAM is a Corsair Vengeance DDR4 PC2400 32GBx1 Update on the current run. I had the system running (6.8.1 SAFE mode Array stopped) for about 40 minutes and segmentation faults have returned. Attached the diagnostics when it first appeared. It continues to run abit longer and then the system would crash again. I will try again with array started and pull down the diagnostics and repost. @John_M thank you for your steps into the cstate. after I try these steps from trurl I will look into BIOS settings you suggested. supergrid-diagnostics-20210329-1555.zip Quote Link to comment
ConnerVT Posted March 29, 2021 Share Posted March 29, 2021 4 hours ago, John_M said: The Power Supply Idle Control is the one to change, if that indeed is your problem, though it won't do any harm anyway. Set it to "Typical Current Idle" instead of the default "Low Current Idle". You should find it under the AMD_CBS section of your BIOS. Definitely check this in your BIOS. It wouldn't explain why you are having issues now, after running without any problems for the past year. But id definitely was needed to keep my Ryzen Unraid server from crashing, usually after running for an hour or so. Quote Link to comment
trurl Posted March 29, 2021 Share Posted March 29, 2021 2 hours ago, mankey54 said: @ trurl Just some advice on using the forum. To get that @ trurl to actually be a Notification for that user you have to type the @ and then immediately without spaces continue typing the username and then you have to actually select the matching username from the popup. Like this: @mankey54 Quote Link to comment
John_M Posted March 29, 2021 Share Posted March 29, 2021 2 minutes ago, ConnerVT said: But id definitely was needed to keep my Ryzen Unraid server from crashing, usually after running for an hour or so. That make sense if you're still using a 1500X. I think this was fixed a long time ago and I've never needed it with 2000-series or later but it does no harm and it's better than globally disabling C-states, as some people recommend. Quote Link to comment
ConnerVT Posted March 30, 2021 Share Posted March 30, 2021 (edited) Exactly. Though I've read some mixed thoughts as if it is 100% resolved in Zen+ and Zen 2. But that's to be expected on the Internet. As Power Supply Idle Control has no cost other than perhaps a negligible amount of power draw, it is worth checking off the list. Edited March 30, 2021 by ConnerVT stutter Quote Link to comment
mankey54 Posted March 30, 2021 Author Share Posted March 30, 2021 @trurl want to give an update. I've replaced my usb flash drive and rebuilt the flashdrive following instructions. New Install of UNRAID 6.9.1 and copied all content from config folder to new drive. The configuration transferred over however upon starting up array to starting Parity Check the system would crash and reboot again. I was monitoring the syslog however at point of crash there were no error captured. I've done a couple things afterward hoping to isolate the problem listed them in order and results rebuilt new flashdrive with previous version 6.8.3 copying over config file result configuration was recognized. Upon starting up array and starting Parity Check the system would crash and reboot. same symptom made BIO Configuration change suggested by @John_M Help me identify the setting on my mobo to set Power setting to "Typical Current Idle" results boot up to UNRAID was same as step 1. This time I did not let array start up just let the system stabilize before enabling more features. about 1-2 hours later the system would crash and reboot again. upgraded BIOS. The current version running is initial release (F3 of Gigabyte x570 mobo) so I upgraded to next release (F4 of Gigabyte x570 mobo). results boot up to UNRAID was same as step 1. did not let array start up let system stabilize. again after 1 hour system would crash and reboot again. Things I've test so far upgrade UNRAID to 6.9.1 downgrade UNRAID to 6.8.3 Tested Memory (Memtest+83) re-seeded the Ram also moved Ram to another slot changed power setting to "Typical Current Idle" Clear BIOS (physically on mobo with screw driver touching the two points) upgraded BIOS to next version (following gigabyte procedures) at this point I am out of configuration options. I can only deduce at this point it is a hardware problem. Im in process to RMA my mobo since ram test seem successful. I dont have another system to transplant my hard drives will have to wait till I get a new mobo. however if you have any other things to test I am open to it. Quote Link to comment
John_M Posted March 30, 2021 Share Posted March 30, 2021 What power supply are you using? Brand? Wattage? Age? Quote Link to comment
mankey54 Posted March 30, 2021 Author Share Posted March 30, 2021 @John_M it's a RMx Series™ RM1000x — 1000 Watt 80 PLUS and I had it since 2019 so not yet 2 years Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.