stinger303 Posted August 21, 2020 Share Posted August 21, 2020 Hello, My log is getting alot of Machine Check Event errors. Not sure what the issue is. Here's my log. syslog.txt Quote Link to comment
trurl Posted August 21, 2020 Share Posted August 21, 2020 Have you done memtest? Quote Link to comment
stinger303 Posted August 22, 2020 Author Share Posted August 22, 2020 Have you done memtest?Not for about 5 years since I put the server together.Should I run one?Sent from my LM-Q720 using Tapatalk Quote Link to comment
ChatNoir Posted August 22, 2020 Share Posted August 22, 2020 It could be a good place to start and try to eliminate that. If you are running ECC RAM, deactivate ECC in BIOS before running memtest. Can you post your diagnostics in your next post ? (Tools / Diagnostics) It will provide more data for analysis. Quote Link to comment
Squid Posted August 22, 2020 Share Posted August 22, 2020 Definitely memory Aug 21 15:58:11 Tower1 kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Aug 21 15:58:11 Tower1 kernel: EDAC sbridge MC0: CPU 20: Machine Check Event: 0 Bank 5: cc026a8000010092 Aug 21 15:58:11 Tower1 kernel: EDAC sbridge MC0: TSC 0 Aug 21 15:58:11 Tower1 kernel: EDAC sbridge MC0: ADDR 85a848b80 Aug 21 15:58:11 Tower1 kernel: EDAC sbridge MC0: MISC 2440169686 Aug 21 15:58:11 Tower1 kernel: EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1598047091 SOCKET 0 APIC 9 Aug 21 15:58:11 Tower1 kernel: EDAC MC0: 2474 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0x85a848 offset:0xb80 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:1) Your BIOS' event log may have more information on the exact location of the affected DIMM (dimm 0, channel 2) Quote Link to comment
stinger303 Posted August 27, 2020 Author Share Posted August 27, 2020 I'm running the memtest now. The bios log only said communication error it didn't say anything specific about ram. Quote Link to comment
ghost82 Posted August 27, 2020 Share Posted August 27, 2020 (edited) Not sure I could say it's 100% related to ram. Same errors happened in my build and after replacing 2 slots of ram I noticed that the problem was one cpu with some scratched and oxidazed pins Start from ram, let on place only one slot and run the os, check, if everything is good add a second one, and so on. If you get error with one slot of ram, even after changing its position, you will have to search the failure elsewhere, cpu or motherboard, most probably. PS: address given in logs, such as dimm 0, channel 2 was not correct looking at the motherboard user manual, so I suggest to check the ram manually. Edited August 27, 2020 by ghost82 Quote Link to comment
stinger303 Posted September 21, 2020 Author Share Posted September 21, 2020 Thanks for everyone's help. I ran the memtest which took like 4 days to complete. It didn't come up with any errors. I ended up just pulling out all the ram and reseating them. I haven't had anymore machine errors since. Thanks Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.