December 18, 20178 yr Disclaimer: I have not done any due diligence or troubleshooting on this issue. Late last week (just prior to -rc16b being released apparently) I upgraded my server from -14rc to -rc15e and noticed a couple of days later that it had crashed/rebooted. After restarting the array and starting a parity check, it ran fine for about 4 hours and rebooted again. When I looked in the syslog, I noticed these errors which were not there previously: Quote Dec 17 04:14:11 Tower kernel: smpboot: CPU0: AMD A4-4000 APU with Radeon(tm) HD Graphics (family: 0x15, model: 0x13, stepping: 0x1)Dec 17 04:14:11 Tower kernel: mce: [Hardware Error]: Machine check events logged Dec 17 04:14:11 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: b2000010000b0c0f Dec 17 04:14:11 Tower kernel: mce: [Hardware Error]: TSC 0 Dec 17 04:14:11 Tower kernel: mce: [Hardware Error]: PROCESSOR 2:610f31 TIME 1513502032 SOCKET 0 APIC 0 microcode 6001119 Dec 17 04:14:11 Tower kernel: Performance Events: Fam15h core perfctr, AMD PMU driver. I didn't have time to dig into it, so I rolled back to -rc14 and the errors went away and things have been stable. My first guess is it's related to the kernel change from 4.13 in -rc14 to 4.14 in -rc15b, but have no proof. Has anyone else seen an issue with the 4.14 kernel on an older AMD processor? Thanks. Dave tower-syslog-20171217-0759.zip
December 18, 20178 yr 19 minutes ago, dave.friant said: Has anyone else seen an issue with the 4.14 kernel on an older AMD processor? I'm running an A8-6600K which is actually older (release date wise) than your A4. If you have MCE errors, they tend to be hardware related, and not software related. Usual advice is to have the NerdPack plugin install MCELOG, and then after seeing another mce error to post the output of mcelog or to do a scan with the Fix Common Problems plugin which will include the output in its logging.
December 18, 20178 yr Author Thanks. I installed mcelog, but get this when I tried to run it: mcelog: ERROR: AMD Processor family 21: mcelog does not support this processor. Please use the edac_mce_amd module instead. CPU is unsupported I've installed Fix Common Problems as well. I'll try reinstalling either -rc15e or -rc16b later and see what I get.
December 18, 20178 yr 20 minutes ago, dave.friant said: Please use the edac_mce_amd module instead. I actually thought (obviously mistaken) that the module got added a couple of revs ago. I'll check it out tonight as I also only run AMD chips.
December 19, 20178 yr @limetech can we get the edac_mce_amd module added in? Edited December 19, 20178 yr by Squid
Archived
This topic is now archived and is closed to further replies.