December 26, 20223 yr Hello all, I've set up Unraid to use as a NAS and Plex server using some old hardware. The RAM sticks passed Memtest after 4 or so hours, I've pushed the CPU to try to get it to overheat and it did fine, removed all extraneous drives. I reset the BIOS and everything works fine except for the restarts. They only seem to happen when the server is not working. I wrote syslog to flash (attached) and Fix Common Problems tells me that I have MCEs. ASRock AB350 Gaming K4 AMD Ryzen 5 1600 Six-Core @ 3200 MHz Bios 9/6/2017 <- currently updating G.SKILL 16GB 2X8GB DDR4 3200 KIT Today's errors in log: Dec 25 21:13:21 Tower ntpd[1149]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Dec 26 11:21:18 Tower root: modprobe: ERROR: could not insert 'kvm_amd': Operation not supported Dec 26 11:20:11 Tower kernel: mce: [Hardware Error]: Machine check events logged Dec 26 11:20:11 Tower kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 5: bea0000000000108 Dec 26 11:20:11 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff814f28ea MISC d012000101000000 SYND 4d000000 IPID 500b000000000 Dec 26 11:20:11 Tower kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1672075185 SOCKET 0 APIC a microcode 8001129 I'd appreciate any help you can provide! tower-diagnostics-20221225-2311 (1).zip syslog Edited December 26, 20223 yr by samauger
December 26, 20223 yr Author I've updated the bios to the most recent firmware (7.40) and no longer see those errors in syslog. Hopefully that will fix the issue.
December 27, 20223 yr Author Updating the bios did not help. Here is the last logs around the last seemingly unprovoked reboot. Dec 26 18:58:27 Tower kernel: br-177e8ea15a89: port 9(veth40b213a) entered disabled state Dec 26 18:58:27 Tower kernel: device veth40b213a left promiscuous mode Dec 26 18:58:27 Tower kernel: br-177e8ea15a89: port 9(veth40b213a) entered disabled state Dec 26 18:58:27 Tower avahi-daemon[3891]: Withdrawing address record for fe80::c43e:a5ff:fe3e:2c10 on veth40b213a. Dec 26 19:15:33 Tower webGUI: Successful login user root from 192.168.0.197 Dec 26 19:47:02 Tower root: Fix Common Problems Version 2022.12.18 Dec 26 19:47:08 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash Dec 26 20:47:01 Tower root: Fix Common Problems Version 2022.12.18 Dec 26 20:47:05 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash Dec 26 21:21:25 Tower webGUI: Successful login user root from 192.168.0.197 Dec 26 21:47:01 Tower root: Fix Common Problems Version 2022.12.18 Dec 26 21:47:09 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash Dec 26 22:47:01 Tower root: Fix Common Problems Version 2022.12.18 Dec 26 22:47:05 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash Dec 26 23:47:01 Tower root: Fix Common Problems Version 2022.12.18 Dec 26 23:47:04 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash Dec 27 00:47:01 Tower root: Fix Common Problems Version 2022.12.18 Dec 27 00:47:05 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash Dec 27 01:47:01 Tower root: Fix Common Problems Version 2022.12.18 Dec 27 01:47:04 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash Dec 27 01:53:40 Tower kernel: Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022 Dec 27 01:53:40 Tower kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot Dec 27 01:53:40 Tower kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' Dec 27 01:53:40 Tower kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' Dec 27 01:53:40 Tower kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' Dec 27 01:53:40 Tower kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 Dec 27 01:53:40 Tower kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format. Dec 27 01:53:40 Tower kernel: signal: max sigframe size: 1776
December 27, 20223 yr Community Expert Make sure you take a look here, overclocked RAM has been known to sometimes do that.
December 27, 20223 yr Author Solution Thanks, It wasn't overclocked but I've knocked it down to 1866 and turned off c-states. Hopefully that fixes it! Strangely it seems to only be when the machine is idling. Update: now 3+ days of uptime without any issues! Thanks for suggesting the RAM fix. Hard to say if it was that or the c-state issue, but appreciate the assistance nevertheless. Edited December 31, 20223 yr by samauger
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.