February 10, 20233 yr HW errors in the syslog. If anyone can help decipher it would be appreciated. Looks to me like RAM had a bit of a hiccup. https://pastebin.com/peJvDPSP Here are the parts I found most concerning. This is a server running on hardware less than 6 months old. AMD EPYC 7302 and ECC RAM on a Supermicro h11ssl-i. Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: event severity: fatal Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Error 0, type: fatal Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: fru_text: DIMM Locate: P0E0 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: section_type: memory error Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: error_status: Storage error in DRAM memory (0x0000000000040400) Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: physical_address: 0x0000000489573800 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: node:0 card:6 module:0 rank:0 bank:2 row:17301 column:128 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Error 1, type: recoverable Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: fru_text: ProcessorError Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: section_type: IA32/X64 processor error Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Local APIC_ID: 0x22 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: CPUID Info: Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: 00000000: 00830f10 00000000 22200800 00000000 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: 00000010: 76d8320b 00000000 178bfbff 00000000 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Error Information Structure 0: Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Error Structure Type: cache error Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Check Information: 0x000000003c4d00f7 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Transaction Type: 1, Data Access Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Operation: 3, data read Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Level: 1 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Uncorrected: true Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Precise IP: true Feb 9 21:24:27 unRAID-PLEX kernel: usb 9-2: new high-speed USB device number 2 using xhci_hcd Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Restartable IP: true Feb 9 21:24:27 unRAID-PLEX kernel: usb 7-1: new full-speed USB device number 2 using xhci_hcd Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Overflow: true Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Instruction Pointer: 0x0000000000000011 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Context Information Structure 0: Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Register Context Type: MSR Registers (Machine Check and other MSRs) Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: Register Array Size: 0x0050 Feb 9 21:24:27 unRAID-PLEX kernel: [Hardware Error]: MSR Address: 0xc0002001 Feb 9 21:24:27 unRAID-PLEX kernel: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 Feb 9 21:24:27 unRAID-PLEX kernel: caller is mce_setup+0x29/0xea Feb 9 21:24:27 unRAID-PLEX kernel: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.19.17-Unraid #2 Feb 9 21:24:27 unRAID-PLEX kernel: Hardware name: Supermicro Super Server/H11SSL-i, BIOS 2.4 12/27/2021 Feb 9 21:24:27 unRAID-PLEX kernel: Call Trace: Feb 9 21:24:27 unRAID-PLEX kernel: <TASK> Feb 9 21:24:27 unRAID-PLEX kernel: dump_stack_lvl+0x44/0x5c Feb 9 21:24:27 unRAID-PLEX kernel: check_preemption_disabled+0xcb/0xe0 Feb 9 21:24:27 unRAID-PLEX kernel: mce_setup+0x29/0xea Feb 9 21:24:27 unRAID-PLEX kernel: apei_smca_report_x86_error+0x71/0x12e Feb 9 21:24:27 unRAID-PLEX kernel: cper_print_proc_ia+0x560/0x59d Feb 9 21:24:27 unRAID-PLEX kernel: ? vprintk_emit+0x175/0x18a Feb 9 21:24:27 unRAID-PLEX kernel: ? thaw_kernel_threads+0x9b/0xad Feb 9 21:24:27 unRAID-PLEX kernel: cper_estatus_print_section+0x6e1/0x88d Feb 9 21:24:27 unRAID-PLEX kernel: ? snprintf+0x49/0x64 Feb 9 21:24:27 unRAID-PLEX kernel: ? cper_estatus_print+0x9e/0xe0 Feb 9 21:24:27 unRAID-PLEX kernel: cper_estatus_print+0x9e/0xe0 Feb 9 21:24:27 unRAID-PLEX kernel: bert_init+0x1d1/0x25a Feb 9 21:24:27 unRAID-PLEX kernel: ? setup_bert_disable+0x19/0x19 Feb 9 21:24:27 unRAID-PLEX kernel: do_one_initcall+0x85/0x196 Feb 9 21:24:27 unRAID-PLEX kernel: kernel_init_freeable+0x1e1/0x224 Feb 9 21:24:27 unRAID-PLEX kernel: ? rest_init+0xcb/0xcb Feb 9 21:24:27 unRAID-PLEX kernel: kernel_init+0x16/0x123 Feb 9 21:24:27 unRAID-PLEX kernel: ret_from_fork+0x22/0x30 Feb 9 21:24:27 unRAID-PLEX kernel: </TASK> Feb 9 21:24:27 unRAID-PLEX kernel: mce: [Hardware Error]: Machine check events logged Feb 9 21:24:27 unRAID-PLEX kernel: mce: [Hardware Error]: CPU 9: Machine Check: 0 Bank 0: fc002800000c0135 Feb 9 21:24:27 unRAID-PLEX kernel: mce: [Hardware Error]: TSC 0 ADDR 100000489573800 MISC d01c0dff00000000 PPIN 2b4b09634ec8065 IPID b000000000 Feb 9 21:24:27 unRAID-PLEX kernel: mce: [Hardware Error]: PROCESSOR 2:830f10 TIME 1676006579 SOCKET 0 APIC 22 microcode 8301055
February 10, 20233 yr Community Expert Looks like a RAM problem, check the system event log in the BIOS/IPMI, it might have some more info.
May 19, 20233 yr It's not helpful to anything, but I figured I'd mention I'm having the same issue on the exact same hardware. I'm running a 7302P on a Supermicro H11SSL-I with 4x16GB DDR4 (it was 8x16, but I took half out to troubleshoot). I was also getting some weird crashes due to my bios resetting and cstates getting re-enabled, but after I fixed that I started having memory crashes.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.