Darqfallen

Members
  • Posts

    174
  • Joined

  • Last visited

Report Comments posted by Darqfallen

  1. What's also interesting is what is picked up by the IPMI

    1306	11/04/2019 21:21:18	Unknown	BIOS POST Progress	Progress - Asserted
    1305	11/04/2019 21:19:01	Unknown	BIOS POST Progress	Progress - Asserted
    1304	11/04/2019 21:18:52	Unknown	BIOS POST Progress	Progress - Asserted
    1303	11/04/2019 21:18:52	Unknown	BIOS POST Progress	Progress - Asserted
    1302	11/04/2019 21:18:45	Unknown	BIOS POST Progress	Progress - Asserted
    1301	11/04/2019 21:18:38	Unknown	BIOS POST Progress	Progress - Asserted
    1300	11/04/2019 21:18:31	Unknown	BIOS POST Progress	Progress - Asserted
    1299	11/04/2019 21:17:15	Unknown	BIOS POST Progress	Progress - Asserted
    1298	11/04/2019 21:17:11	Unknown	BIOS POST Progress	Progress - Asserted
    1297	11/04/2019 21:17:02	Unknown	BIOS POST Progress	Progress - Asserted
    1296	11/04/2019 21:17:02	Unknown	BIOS POST Progress	Progress - Asserted
    1295	11/04/2019 21:17:02	Unknown	BIOS POST Progress	Progress - Asserted
    1294	11/04/2019 21:17:01	Unknown	BIOS POST Progress	Progress - Asserted
    1293	11/04/2019 21:16:51	Unknown	BIOS POST Progress	Progress - Asserted
    1292	11/04/2019 21:16:49	Unknown	BIOS POST Progress	Progress - Asserted
    1291	10/25/2030 22:31:12	Unknown		Progress - Asserted - Asserted
    1290	11/26/2031 15:22:24	Unknown	OS Critical Stop	Progress - Asserted - Asserted - Asserted
    1289	05/14/2023 08:49:52	Unknown	[undefined]	Progress - Asserted - Asserted - Asserted - Asserted
    1288	09/09/2028 04:57:20	Unknown	[undefined]	Progress - Asserted - Asserted - Asserted - Asserted - Asserted
    1287	04/03/1987 11:36:16	Unknown	OS Critical Stop	Progress - Asserted - Asserted - Asserted - Asserted - Asserted - Asserted
    1286	12/30/2025 09:38:56	Unknown	[undefined]	Progress - Asserted - Asserted - Asserted - Asserted - Asserted - Asserted - Asserted
    1285	11/04/2019 21:16:16	Unknown	OS Critical Stop	Run-Time Stop - Asserted

    Perhaps is some update in the microcode that causes my cpus to not play nice with the ram.

    Ive also included the syslog that is logging to the USB drive.

    The server crashed again as I was writing this so I've downgraded back to 6.7.0

    syslog.zip

  2. I'm sorry but I didn't realize that the diagnostics didn't use the whole syslog. I posted above my diag of when it was crashing. Attached is  my whole syslog including the crashing.syslog.zip

    I'm not sure what else to do, if I leave the server running on anything > 6.7.0 then it crashes and reboots every 1-4 hours.

  3. Server ran for 2 hours then spat out these errors and rebooted.

    Oct 13 14:04:31 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 14:04:42 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 14:10:39 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 14:10:52 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 14:34:08 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:04:12 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:04:14 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:09:48 Dirge kernel: mce_notify_irq: 1 callbacks suppressed
    Oct 13 15:09:48 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:13:16 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:15:25 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:19:14 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:19:15 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:21:30 Dirge kernel: mce_notify_irq: 1 callbacks suppressed
    Oct 13 15:21:30 Dirge kernel: mce: [Hardware Error]: Machine check events logged
    Oct 13 15:35:23 Dirge kernel: mce: [Hardware Error]: Machine check events logged

    So Im running a memtest to see if any memory modules have failed. I am not sure what that means.

  4. Yes 6.7.2, nope, the only reason I know it restarts is I get the email a parity check has started,

    I opened her up and pulled every stick of ram and swapped with its opposite bank. Went into the BIOS and then noticed the DDR3-10600 was running at DDR3-12800 (1333mhz vs 1600mhz), so I've dropped it back down to the proper speed and will monitor for stability. I will let you know in 12 hours or so if its still stable.

  5. Fans are working great, all fans reporting >5000rpm, ambient temp in the room is 25C, System Ambient is 30C, CPU1 36C, CPU2 40C. Didn't have this issue in 6.7.0, had a similar issue in 3.7.2 6.7.2. But didn't have the time to work on it so I had downgraded to 6.7.0 again

  6. 2 minutes ago, itimpi said:

    Have you checked that all fans are working.   Random restarts can happen if the CPU is overheating.    Other common causes are power supply and RAM issues.

    Fans are working great, all fans reporting >5000rpm, ambient temp in the room is 25C, System Ambient is 30C, CPU1 36C, CPU2 40C. Didn't have this issue in 6.7.0, had a similar issue in 3.7.2. But didn't have the time to work on it so I had downgraded to 6.7.0 again.