Kaninm Posted November 20, 2017 Share Posted November 20, 2017 Hello all, I'm hoping someone can offer some guidance to get my new UnRAID system stable. I just built a new Ryzen system for unraid with the following specs: -Ryzen 1700X -16GB ECC memory -2 LSI HBA cards in IT mode -12 hard disks, half new half from old system (SMART tests good) -4 SSD Cache drives (2 new samsung EVOs and 2 used PNY drives) Every since I built it, it has been hard locking every 2ish days. Sometimes I can get to the GUI for a couple seconds, other times it doesnt show up on the network at all. It usually happens around 11PM-1AM and I have no scheduled tasks running at those times. Memory tested good in a memtest (although I only did 1 pass). System runs great outside of the crashing issue. Tried upgrading to the latest 6.4 build but the problem persists. System runs headless and I dont have a good way to have a monitor hooked up to see crash. I attached the Troubleshooting mode logs. If anyone has any ideas, I'd really appreciate it. FCPsyslog_tail.txt syslog.txt Link to comment
Vr2Io Posted November 22, 2017 Share Posted November 22, 2017 By compare the log, seems CPU:12 have some issue. Yours Nov 18 07:15:00 MegaNAS kernel: Hierarchical SRCU implementation. Nov 18 07:15:00 MegaNAS kernel: MCE: In-kernel MCE decoding enabled. Nov 18 07:15:00 MegaNAS kernel: smp: Bringing up secondary CPUs ... Nov 18 07:15:00 MegaNAS kernel: x86: Booting SMP configuration: Nov 18 07:15:00 MegaNAS kernel: .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12Nov 18 07:15:00 MegaNAS kernel: mce: [Hardware Error]: Machine check events logged Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: System Fatal error. Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: CPU:12 (17:1:1) MC5_STATUS[-|UE|MiscV|PCC|AddrV|-|-|SyndV|TCC]: 0xbea0000000000108 Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: Error Addr: 0x0001ffff815c9472 Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000 Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: Execution Unit Extended Error Code: 0 Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: Execution Unit Error: Watchdog timeout error. Nov 18 07:15:00 MegaNAS kernel: [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN Nov 18 07:15:00 MegaNAS kernel: #13 #14 #15 Nov 18 07:15:00 MegaNAS kernel: smp: Brought up 1 node, 16 CPUs Mine Nov 22 18:20:06 X370 kernel: Hierarchical SRCU implementation. Nov 22 18:20:06 X370 kernel: MCE: In-kernel MCE decoding enabled. Nov 22 18:20:06 X370 kernel: smp: Bringing up secondary CPUs ... Nov 22 18:20:06 X370 kernel: x86: Booting SMP configuration: Nov 22 18:20:06 X370 kernel: .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 Nov 22 18:20:06 X370 kernel: smp: Brought up 1 node, 16 CPUs Link to comment
david279 Posted November 23, 2017 Share Posted November 23, 2017 I know this bug well. It's the ryzen big. Do the instructions here https://forums.lime-technology.com/topic/61129-ryzen-freezes/?do=findComment&comment=599763If that doesn't work turn off c states in the bios. I have a 1600x so I know this struggle.Sent from my SM-G955U using Tapatalk Link to comment
david279 Posted November 23, 2017 Share Posted November 23, 2017 Also if you search 0xbea0000000000108 AMD bug you'll see it's not just a unRAID bug but people all over are having issues with this bug in Linux.Sent from my SM-G955U using Tapatalk Link to comment
Kaninm Posted November 27, 2017 Author Share Posted November 27, 2017 On 11/22/2017 at 9:15 PM, david279 said: I know this bug well. It's the ryzen big. Do the instructions here https://forums.lime-technology.com/topic/61129-ryzen-freezes/?do=findComment&comment=599763 If that doesn't work turn off c states in the bios. I have a 1600x so I know this struggle. Sent from my SM-G955U using Tapatalk This seems to have solved the issue (4+ days stable so far at least)! Thank you for your help! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.