September 14, 20232 yr Hello, Im getting the following Errors in my system log. was wondering if someone would be able to help me understand what they are pointing to as the cause? System is a 13900k I had 128gbs of memory installed but removed 2 sticks to see if that was part of the issue. Currently running 64Gbs. Sep 14 12:21:40 Tower kernel: mce: [Hardware Error]: Machine check events logged Sep 14 12:21:40 Tower kernel: mce: [Hardware Error]: Machine check events logged Sep 14 12:21:40 Tower kernel: traps: smartctl[23418] trap invalid opcode ip:1498cb85ed3f sp:7ffe28c07ba0 error:0 in libc-2.37.so[1498cb7e8000+169000] Sep 14 12:52:04 Tower kernel: mce_notify_irq: 2 callbacks suppressed Sep 14 12:52:04 Tower kernel: mce: [Hardware Error]: Machine check events logged Sep 14 12:52:04 Tower kernel: traps: smartctl[9764] trap invalid opcode ip:14aeabe6ea85 sp:7fffb4afdc00 error:0 in libc-2.37.so[14aeabd9c000+169000] Sep 14 13:03:20 Tower kernel: mce: [Hardware Error]: Machine check events logged Sep 14 13:52:50 Tower kernel: mce: [Hardware Error]: Machine check events logged Sep 14 13:52:50 Tower kernel: mce: [Hardware Error]: Machine check events logged Sep 14 14:23:13 Tower kernel: smartctl[11388]: segfault at 337 ip 00001479e99b11f8 sp 00007ffef67ffcd0 error 4 in libc-2.37.so[1479e993a000+169000] likely on CPU 3 (core 4, socket 0) Sep 14 14:23:13 Tower kernel: Code: c0 0f 84 b3 00 00 00 48 89 c2 4c 8d 52 10 48 8b 42 10 4c 89 d7 48 c1 ef 0c 49 89 f9 49 31 c1 48 39 c7 74 be eb ac 0f 1f 40 00 <31> c0 be 10 00 00 00 41 bd 02 00 00 00 bb 20 00 00 00 4c 8d 44 c5 Sep 14 14:23:13 Tower kernel: smartctl[11415]: segfault at 4b ip 000014a8023515fc sp 00007ffdfd75f620 error 6 in libc-2.37.so[14a80227e000+169000] likely on CPU 3 (core 4, socket 0) Sep 14 14:23:13 Tower kernel: Code: 4c 89 d7 be e0 00 00 00 e8 51 ca f2 ff 49 89 c2 48 85 c0 0f 84 6d 02 00 00 49 c7 45 08 e0 00 00 00 49 89 45 00 e9 f5 ea ff ff <48> 89 ca e9 1c f4 ff ff bf c8 03 00 00 48 89 74 24 10 4c 89 54 24 Sep 14 14:23:13 Tower kernel: mce: [Hardware Error]: Machine check events logged Been scratching my head with this for a couple of days, any help would be appreciated.
September 16, 20232 yr Author Thanks for the reply, I put 2 sticks in and tried those with members, passed after about 2 hours. Moved the sticks to the other 2 slots and ran it again with all 4 sticks installed and it passed as well. rebooting into unraid to see if I can find a pattern of when the issue comes up after the server starts. anything else I should try? Update: I swapped the Gskill RAM to Corsair ram and the “ hardware error” appeared in the log once again. Could it be an issue with the processor or motherboard? Is there a way to diagnose that or just swapping out to new components and checking if the issue persists? Edited September 16, 20232 yr by Stanui New information
September 17, 20232 yr 18 hours ago, Stanui said: Could it be an issue with the processor or motherboard? It can, I would first try with just 2 RAM sticks, if the same try the other two, that would basically rule out a RAM issue.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.