[SOLVED] Unraid crashing upon starting up array


Recommended Posts

hi there need some help. I've been running Unraid for well over 4 years now upgraded hardware twice. current hardware have been running with well over a year no issues until recently. to explain my symptom upon start up of unraid my system automatically starts up the array and auto runs dockers and VMs. I'm recently encounter a crash of unraid that occurs when I start up the Array. I've disabled Docker and VM upon start up trying to narrow it down however it still crashes. I attach is the screenshot. My next step I will try to keep it running with the array stopped to confirm if indeed its the array causing the crash.

 

I more than welcome to post any logs let me know which I will try to grab it before it crashes.

20210305_211113.jpg

Link to comment

Hi There Community. I have an update on this issue. Afterward I tried to leave Unraid running with Array stopped. it will continue running for maybe 1 hour and then it would crash with similar message before. give context I am running the previous stable release I believe it is 6.8.3 I've not yet upgraded to the latest stable release.

Link to comment

Hi Squid. I wanted to give you an update on my troubleshooting progress

 

Updated Unraid to 6.9.1 and complete memory test (did 10 passes without error). I was able to restart unraid and complete Parity Check. currently started up docker and continue to monitor the server. Slowly enabling all the settings, still need to enable VM waiting for another day of stability before I continue to this. Server has been up running for 1.5 days now will continue to monitor. 

 

Thank you so much for your support

Link to comment

hi There I have an update on this issue. the problem continues to persists. I find it difficult capture the error  however at one point when I was troubleshooting I was able to capture the problem here are the I see many segfault errors. below are the logs I was able to capture

 

Mar 20 17:59:34 SuperGrid kernel: Call Trace:
Mar 20 17:59:34 SuperGrid kernel: swake_up_one+0x16/0x24
Mar 20 17:59:34 SuperGrid kernel: rcu_nocb_gp_kthread+0x23c/0x433
Mar 20 17:59:34 SuperGrid kernel: ? rcu_bind_current_to_nocb+0x38/0x38
Mar 20 17:59:34 SuperGrid kernel: kthread+0xe5/0xea
Mar 20 17:59:34 SuperGrid kernel: ? __kthread_bind_mask+0x57/0x57
Mar 20 17:59:34 SuperGrid kernel: ret_from_fork+0x22/0x30
Mar 20 17:59:34 SuperGrid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_nat nf_nat xfs md_mod hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables bonding edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel r8169 aesni_intel crypto_simd cryptd nvme wmi_bmof realtek i2c_piix4 k10temp i2c_core nvme_core ahci glue_helper wmi ccp rapl libahci thermal button acpi_cpufreq
Mar 20 17:59:34 SuperGrid kernel: CR2: ffffffff81071b33
Mar 20 17:59:34 SuperGrid kernel: ---[ end trace 8f246789f2fd9320 ]---
Mar 20 17:59:34 SuperGrid kernel: RIP: 0010:update_nohz_stats+0x0/0x4e
Mar 20 17:59:34 SuperGrid kernel: Code: 40 84 ed 74 0a 31 f6 4c 89 ff e8 56 de ff ff 48 8b 74 24 08 48 83 c4 18 4c 89 ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 6b e8 67 00 <83> 7f 20 00 74 45 53 48 89 fb 8b bf 48 0a 00 00 89 f2 48 c7 c6 80
Mar 20 17:59:34 SuperGrid kernel: RSP: 0018:ffffc9000043fb50 EFLAGS: 00010046
Mar 20 17:59:34 SuperGrid kernel: RAX: 0000000000000005 RBX: ffff11091e962380 RCX: 0000000000000005
Mar 20 17:59:34 SuperGrid kernel: RDX: ffff888100022380 RSI: 0000000000000000 RDI: ffff11091e962380
Mar 20 17:59:34 SuperGrid kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000005
Mar 20 17:59:34 SuperGrid kernel: R10: 0000000000000000 R11: ffff88881e962400 R12: ffffc9000043fc08
Mar 20 17:59:34 SuperGrid kernel: R13: ffffc9000043fc88 R14: ffffc9000043fd28 R15: ffff88810094f9c0
Mar 20 17:59:34 SuperGrid kernel: FS: 0000000000000000(0000) GS:ffff88881e9c0000(0000) knlGS:0000000000000000
Mar 20 17:59:34 SuperGrid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 17:59:34 SuperGrid kernel: CR2: ffffffff81071b33 CR3: 000000000200c000 CR4: 0000000000350ee0
Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873106, 0] ../../source3/libsmb/nmblib.c:922(send_udp)
Mar 20 17:59:36 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(138) ERRNO=Network is unreachable
Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873186, 0] ../../source3/libsmb/nmblib.c:922(send_udp)
Mar 20 17:59:36 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(137) ERRNO=Network is unreachable
Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873196, 0] ../../source3/nmbd/nmbd_packets.c:179(send_netbios_packet)
Mar 20 17:59:36 SuperGrid nmbd[9868]: send_netbios_packet: send_packet() to IP 172.17.255.255 port 137 failed
Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873206, 0] ../../source3/nmbd/nmbd_nameregister.c:581(register_name)
Mar 20 17:59:36 SuperGrid nmbd[9868]: register_name: Failed to send packet trying to register name #001#002__MSBROWSE__#002<01>
Mar 20 17:59:44 SuperGrid nmbd[9868]: [2021/03/20 17:59:44.885890, 0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
Mar 20 17:59:44 SuperGrid nmbd[9868]: *****
Mar 20 17:59:44 SuperGrid nmbd[9868]:
Mar 20 17:59:44 SuperGrid nmbd[9868]: Samba name server SUPERGRID is now a local master browser for workgroup WORKGROUP on subnet 192.168.1.8
Mar 20 17:59:44 SuperGrid nmbd[9868]:
Mar 20 17:59:44 SuperGrid nmbd[9868]: *****
Mar 20 17:59:44 SuperGrid nmbd[9868]: [2021/03/20 17:59:44.886001, 0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
Mar 20 17:59:44 SuperGrid nmbd[9868]: *****
Mar 20 17:59:44 SuperGrid nmbd[9868]:
Mar 20 17:59:44 SuperGrid nmbd[9868]: Samba name server SUPERGRID is now a local master browser for workgroup WORKGROUP on subnet 192.168.122.1
Mar 20 17:59:44 SuperGrid nmbd[9868]:
Mar 20 17:59:44 SuperGrid nmbd[9868]: *****
Mar 20 17:59:56 SuperGrid kernel: sensors[10601]: segfault at 7ffe8754b20b ip 00007ffe8754b20b sp 00007ffcaf35ba10 error 14
Mar 20 17:59:56 SuperGrid kernel: Code: Unable to access opcode bytes at RIP 0x7ffe8754b1e1.
Mar 20 18:00:34 SuperGrid nmbd[9868]: [2021/03/20 18:00:34.939702, 0] ../../source3/libsmb/nmblib.c:922(send_udp)
Mar 20 18:00:34 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(138) ERRNO=Network is unreachable
Mar 20 18:00:35 SuperGrid kernel: plugin[10819]: segfault at 4129808 ip 0000150b83470053 sp 00007ffe041296a0 error 4 in ld-2.30.so[150b83466000+20000]
Mar 20 18:00:35 SuperGrid kernel: Code: ef c0 48 83 7c 24 08 00 48 89 44 24 40 0f 29 44 24 50 74 11 f7 84 24 d0 00 00 00 fa ff ff ff 0f 85 e7 0c 00 00 48 8b 44 24 18 <49> 8b 0c 24 48 83 bc 24 d8 00 00 00 00 4c 8b 08 0f 85 67 02 00 00

Link to comment

Hi There,

   I was able to capture a diagnostics once the segfault appears. Once I see this segfault the system would crash in a few minutes. Please know that I have also stopped the array but this error persists. 

 

Mar 20 21:27:22 SuperGrid kernel: php[16009]: segfault at 2881cef8 ip 000015084ad30053 sp 00007ffc2881cd90 error 4 in ld-2.30.so[15084ad26000+20000]
Mar 20 21:27:22 SuperGrid kernel: Code: ef c0 48 83 7c 24 08 00 48 89 44 24 40 0f 29 44 24 50 74 11 f7 84 24 d0 00 00 00 fa ff ff ff 0f 85 e7 0c 00 00 48 8b 44 24 18 <49> 8b 0c 24 48 83 bc 24 d8 00 00 00 00 4c 8b 08 0f 85 67 02 00 00

 

Troubleshooting I have done so far in this order

- upgraded to latest unraid 6.9.1

- Completed a Mem test (completed 10 consective test with 0 failures)

- turned off VM and Docker to isolate if it was related to VM or Docker image

- rebuilt Docker image

- stopped the array

 

supergrid-diagnostics-20210320-2131.zip

Link to comment

Hi Trurl. I've not done any of the changes in the FAQ. it speaks about ram speeds. I use DDR 2400 1 stick 32GB. I've not done any tuning on the motherboard side and I dont over-clock. I've tested the ram multiple times let it run 2 night straight with all passing no failures.  Also I like to note that I've been running with this setup for over a year now and just recently this issue is occuring. I've not overclocked anything or upgraded unraid previously I was on 6.8.3 and only upgraded 6.9.1 after the issue arised in hopes it fixed the problem.

Link to comment
5 hours ago, mankey54 said:

Hi Trurl. I've not done any of the changes in the FAQ. it speaks about ram speeds. I use DDR 2400 1 stick 32GB. I've not done any tuning on the motherboard side and I dont over-clock. I've tested the ram multiple times let it run 2 night straight with all passing no failures.

 

The linked FAQ post does speak of RAM speed but also of C-State. Those are the two major reason behind Ryzen systems lockup.

 

Check your BIOS setting for that.

Link to comment

Hi Chat and Nior,

    Thank you for your response. In regards to cstate upon reading it mention only to disable this if you are running a earlier Ryzen 1xxxx. However I am running a later version Ryzen3800x. With that being said I looked into my bios I was not able to find the cstate settings. I've tried searching on google to find this no luck.

 

It's worth noting again that I've been running with this hardware configuration for well over a year and only recently ive encounter these crashes. There hasn't been any changes made to my system which makes me believe it to be possible a hardware problem. 

 

I wonder if the flash drive holding the unraid settings could be damaged or corrupt? 

 

I would like to investigate into cstate settings if anyone can share how navigate to through the bios. I have a Gigabyte x570 Gaming mobo.

 

Since last I've went back to test the RAM and have completed 10 pass without error. I've restarted the system and try to complete a parrott check however now I received a new error. See attached screenshot. 

 

My next steps I will now look to transfer to a new flashdrive to see if it helps but also considering to rollback to 6.8.3 or 6.8.1. it's also worth noting again that I first had their problem when running 6.8.3 but it's been running stable under 6.8.3 for over 6 months and only recently upgrade to 6.9.1 in hopes to fix the problem.

 

 

 

20210325_174555.jpg

Link to comment

hi There I've an update. Unfortnately the problem is getting worse. The issue now the system is in a costant crash loop. Unraid would start up and in 5-10 minutes the system would crash and restart. I was able to pull down the diagnotics. I've not yet created a new USB flashdrive as I am not able to keep the system long enough to even pull a backup. I do have a older backup 2 weeks back that I will try to restore. However this backup was captured during the initial issue not before this issue first began. 

 

If anyone can help me it would greatly appreciate it.  Any troubleshooting steps would help me out alot. Im little lost now.

supergrid-diagnostics-20210329-1419.zip

Link to comment

A lot we can't tell from those diagnostics since just after reboot without the array started.

 

Have you tried going to Settings and disabling Docker and VM Manager, then booting in SAFE mode?

 

You can put your flash in your PC to make a backup. You only need the config folder to put your configuration on a new install. If you change flash drives you would have to transfer the license of course.

Link to comment
On 3/25/2021 at 10:57 PM, mankey54 said:

I would like to investigate into cstate settings if anyone can share how navigate to through the bios. I have a Gigabyte x570 Gaming mobo.

 

The Power Supply Idle Control is the one to change, if that indeed is your problem, though it won't do any harm anyway. Set it to "Typical Current Idle" instead of the default "Low Current Idle". You should find it under the AMD_CBS section of your BIOS.

 

EDIT:  https://download.gigabyte.com/FileList/Manual/mb_manual_x570-gaming-x_e_v1.pdf  - page 23

 

 

Edited by John_M
Added manual page
Link to comment

hi trurl. Thank you for your reply. I was able to restore a previous backup 2 weeks back, this version is 6.8.1 i let it run normally and it still has the crashing problem. After which your direction to run in SAFE Mode I am currently doing that now with the back up (6.8.1). currently running for 5 mins no crash yet. keeping the System Log window open to capture anything. from here what should I do? I went ahead to pull down another diagnostic just in case

supergrid-diagnostics-20210329-1520.zip

Link to comment

@ trurl,

     I sure I can start up the array. Since Im using a new usb flash I will have to transfer the key to the new flash drive.  

 

     in regards to the MemTest I've actually ran MemTest few times already as well as reseeding the Memory stick for good measure. My last memory test I have ran 2 days back it completed 10 consecutive passes without any failures. my RAM is a Corsair Vengeance DDR4 PC2400 32GBx1

 

   Update on the current run. I had the system running (6.8.1 SAFE mode Array stopped) for about 40 minutes and segmentation faults have returned. Attached the diagnostics when it first appeared. It continues to run abit longer and then the system would crash again.

 

I will try again with array started and pull down the diagnostics and repost.

 

@John_M thank you for your steps into the cstate. after I try these steps from trurl I will look into BIOS settings you suggested.

 

 

supergrid-diagnostics-20210329-1555.zip

Link to comment
4 hours ago, John_M said:

The Power Supply Idle Control is the one to change, if that indeed is your problem, though it won't do any harm anyway. Set it to "Typical Current Idle" instead of the default "Low Current Idle". You should find it under the AMD_CBS section of your BIOS.

 

Definitely check this in your BIOS.  It wouldn't explain why you are having issues now, after running without any problems for the past year.  But id definitely was needed to keep my Ryzen Unraid server from crashing, usually after running for an hour or so.

Link to comment
2 hours ago, mankey54 said:

@ trurl

Just some advice on using the forum.

 

To get that @ trurl to actually be a Notification for that user

you have to type the @

and then immediately without spaces continue typing the username

and then you have to actually select the matching username from the popup.

 

Like this:

@mankey54

Link to comment
2 minutes ago, ConnerVT said:

But id definitely was needed to keep my Ryzen Unraid server from crashing, usually after running for an hour or so.

 

That make sense if you're still using a 1500X. I think this was fixed a long time ago and I've never needed it with 2000-series or later but it does no harm and it's better than globally disabling C-states, as some people recommend.

 

Link to comment

Exactly.  Though I've read some mixed thoughts as if it is 100% resolved in Zen+ and Zen 2.  But that's to be expected on the Internet.  As Power Supply Idle Control has no cost other than perhaps a negligible amount of power draw, it is worth checking off the list.

Edited by ConnerVT
stutter
Link to comment

@trurl want to give an update. I've replaced my usb flash drive and rebuilt the flashdrive following instructions. New Install of UNRAID 6.9.1 and copied all content from config folder to new drive. The configuration transferred over however upon starting up array to starting Parity Check the system would crash and reboot again. I was monitoring the syslog however at point of crash there were no error captured.

 

I've done a couple things afterward hoping to isolate the problem listed them in order and results

  1. rebuilt new flashdrive with previous version 6.8.3 copying over config file
    1. result configuration was recognized. Upon starting up array and starting Parity Check the system would crash and reboot. same symptom
  2. made BIO Configuration change suggested by @John_M Help me identify the setting on my mobo to set Power setting to "Typical Current Idle"
    1. results boot up to UNRAID was same as step 1. This time I did not let array start up just let the system stabilize before enabling more features. about 1-2 hours later the system would crash and reboot again.
  3. upgraded BIOS. The current version running is initial release (F3 of Gigabyte x570 mobo) so I upgraded to next release (F4 of Gigabyte x570 mobo). 
    1. results boot up to UNRAID was same as step 1. did not let array start up let system stabilize. again after 1 hour system would crash and reboot again.

 

Things I've test so far

  1. upgrade UNRAID to 6.9.1
  2. downgrade UNRAID to 6.8.3
  3. Tested Memory (Memtest+83)
  4. re-seeded the Ram also moved Ram to another slot
  5. changed power setting to "Typical Current Idle"
  6. Clear BIOS (physically on mobo with screw driver touching the two points)
  7. upgraded BIOS to next version (following gigabyte procedures)

at this point I am out of configuration options. I can only deduce at this point it is a hardware problem. Im in process to RMA my mobo since ram test seem successful. I dont have another system to transplant my hard drives will have to wait till I get a new mobo. 

 

however if you have any other things to test I am open to it. 

 

 

Link to comment
  • JorgeB changed the title to [SOLVED] Unraid crashing upon starting up array

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.