Jump to content

(SOLVED)[6.3.5 and 6.4.0_rc11i] New UnRAID server crashing every 2 Days


Kaninm

Recommended Posts

Hello all,

I'm hoping someone can offer some guidance to get my new UnRAID system stable. I just built a new Ryzen system for unraid with the following specs:

-Ryzen 1700X

-16GB ECC memory

-2 LSI HBA cards in IT mode

-12 hard disks, half new half from old system (SMART tests good)

-4 SSD Cache drives (2 new samsung EVOs and 2 used PNY drives)

 

Every since I built it, it has been hard locking every 2ish days. Sometimes I can get to the GUI for a couple seconds, other times it doesnt show up on the network at all. It usually happens around 11PM-1AM and I have no scheduled tasks running at those times.

 

Memory tested good in a memtest (although I only did 1 pass). System runs great outside of the crashing issue. Tried upgrading to the latest 6.4 build but the problem persists.

System runs headless and I dont have a good way to have a monitor hooked up to see crash.

 

I attached the Troubleshooting mode logs. If anyone has any ideas, I'd really appreciate it.

FCPsyslog_tail.txt

syslog.txt

Link to comment

By compare the log, seems CPU:12 have some issue.

 

Yours

Nov 18 07:15:00 MegaNAS kernel: Hierarchical SRCU implementation.
Nov 18 07:15:00 MegaNAS kernel: MCE: In-kernel MCE decoding enabled.
Nov 18 07:15:00 MegaNAS kernel: smp: Bringing up secondary CPUs ...
Nov 18 07:15:00 MegaNAS kernel: x86: Booting SMP configuration:
Nov 18 07:15:00 MegaNAS kernel: .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12
Nov 18 07:15:00 MegaNAS kernel: mce: [Hardware Error]: Machine check events logged
Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: System Fatal error.
Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: CPU:12 (17:1:1) MC5_STATUS[-|UE|MiscV|PCC|AddrV|-|-|SyndV|TCC]: 0xbea0000000000108
Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: Error Addr: 0x0001ffff815c9472
Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: Execution Unit Extended Error Code: 0
Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: Execution Unit Error: Watchdog timeout error.
Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN

Nov 18 07:15:00 MegaNAS kernel: #13 #14 #15
Nov 18 07:15:00 MegaNAS kernel: smp: Brought up 1 node, 16 CPUs

 

Mine

Nov 22 18:20:06 X370 kernel: Hierarchical SRCU implementation.
Nov 22 18:20:06 X370 kernel: MCE: In-kernel MCE decoding enabled.
Nov 22 18:20:06 X370 kernel: smp: Bringing up secondary CPUs ...
Nov 22 18:20:06 X370 kernel: x86: Booting SMP configuration:
Nov 22 18:20:06 X370 kernel: .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15
Nov 22 18:20:06 X370 kernel: smp: Brought up 1 node, 16 CPUs

Link to comment
On 11/22/2017 at 9:15 PM, david279 said:

I know this bug well. It's the ryzen big. Do the instructions here https://forums.lime-technology.com/topic/61129-ryzen-freezes/?do=findComment&comment=599763

If that doesn't work turn off c states in the bios. I have a 1600x so I know this struggle.


Sent from my SM-G955U using Tapatalk
 

This seems to have solved the issue (4+ days stable so far at least)! Thank you for your help!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...