Jump to content

Unraid 6.9.2 Crashes constantly even on different hardware


Recommended Posts

Hi all,

 

I've had a pretty rough go with unraid so far. I originally built a brand new box with a Ryzen 1600, 16gb ram, and 3x WD Red 4tb HD's, 1 parity. When I built that box and booted up unraid it was extremely unstable, I rarely got more than 5 minutes uptime. I found the FAQ about Ryzen and made all the bios tweaks including turning off cstate. This seemed to make the crashes occur randomly every couple of hours. After endless tweaking nothing could stop the crashes. I turned on all the logging I could find, including to a remote rsyslog as well as mirror'd to the flash but nothing ever showed up in the log of interest.

 

So I decided to buy a new i5, mobo and ram and sell the Ryzen since it has known issues. Annoyingly the crashes continue with the same thing, nothing in the logs at all, mostly happening at the 2-2.5 hour mark. The parity disk never has time to rebuild and keeps starting from 0.

 

Any ideas? I've attached the syslog even though its not very useful.

 

syslog tower-diagnostics-20220413-1149.zip

Link to comment
1 hour ago, JorgeB said:

Nothing obvious, are you still using any hardware from the old build, like PSU, RAM, etc?

Just PSU and the HD's. I'd be surprised if it was the PSU as its a corsair that I took from another perfectly running server (not Unraid).

 

In saying this now, I've hit 8 hours uptime all of sudden. Where I had about 5 or 6 crashes so far, so maybe its sorted itself out? Will have to see how it goes overnight.

Link to comment
52 minutes ago, lowsydawg said:

so maybe its sorted itself out?

Hopefully but unlikely, if it continues to crash you can try to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
19 hours ago, JorgeB said:

Hopefully but unlikely, if it continues to crash you can try to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Yeah it crashed after about 21h, I changed some bios settings + updated it and it crashed again after about 5h. I got this in the logs seems a bit more interesting then previous

 

Apr 14 10:26:50 Tower nmbd[2924]: [2022/04/14 10:26:50.053692,  0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2)
Apr 14 10:26:50 Tower nmbd[2924]:   *****
Apr 14 10:26:50 Tower nmbd[2924]:   
Apr 14 10:26:50 Tower nmbd[2924]:   Samba name server TOWER is now a local master browser for workgroup WORKGROUP on subnet 172.17.0.1
Apr 14 10:26:50 Tower nmbd[2924]:   
Apr 14 10:26:50 Tower nmbd[2924]:   *****
Apr 14 10:26:57 Tower ntpd[1920]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Apr 14 10:27:25 Tower kernel: veth070eade: renamed from eth0
Apr 14 10:27:25 Tower kernel: docker0: port 1(veth9ce5876) entered disabled state
Apr 14 10:27:39 Tower kernel: docker0: port 1(veth9ce5876) entered disabled state
Apr 14 10:27:39 Tower kernel: device veth9ce5876 left promiscuous mode
Apr 14 10:27:39 Tower kernel: docker0: port 1(veth9ce5876) entered disabled state
Apr 14 12:07:29 Tower webGUI: Successful login user root from 192.168.100.194
Apr 14 12:07:57 Tower init: Switching to runlevel: 0
Apr 14 12:07:57 Tower init: Trying to re-exec init
Apr 14 12:07:59 Tower nginx: 2022/04/14 12:07:59 [alert] 4072#4072: *19490 open socket #20 left in connection 8
Apr 14 12:07:59 Tower nginx: 2022/04/14 12:07:59 [alert] 4072#4072: aborting
Apr 14 12:09:08 Tower kernel: Linux version 5.10.28-Unraid (root@Develop) (gcc (GCC) 9.3.0, GNU ld version 2.33.1-slack15) #1 SMP Wed Apr 7 08:23:18 PDT 2021
Apr 14 12:09:08 Tower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot,/bzroot-gui unraidsafemode
Apr 14 12:09:08 Tower kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Apr 14 12:09:08 Tower kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Apr 14 12:09:08 Tower kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

 

I've booted into safemode now and will see what happens.

 

EDIT: Sigh, it crashed only after a couple hours whilst in safemode. It wouldn't have anything to do with the coral chip I have plugged in? I might try unplugging that next.

Edited by lowsydawg
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...