May 28, 20206 yr Hello hive mind. Since last night my server keeps going offline, unreachable and unable to ping. I have had to power cycle it then it does the same within a few mins of starting up again. On a startup just (longest it has stayed online for) it has just come up in fix common as Machine Check Error and said i should post my diagnostics on here. I did have a little issue a few months ago where the cache was becoming unwriteable that turned out to be a ram issue which has since been replaced. Dont know if its connected just posting for more background. Anyhow, attached is my diagnostics. alpha-diagnostics-20200528-1356.zip
May 28, 20206 yr May 28 13:49:33 Alpha kernel: mce: [Hardware Error]: Machine check events logged May 28 13:49:33 Alpha kernel: [Hardware Error]: Corrected error, no action required. May 28 13:49:33 Alpha kernel: [Hardware Error]: CPU:1 (15:2:0) MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc25402000040136 May 28 13:49:33 Alpha kernel: [Hardware Error]: Error Addr: 0x00000002dd798938 May 28 13:49:33 Alpha kernel: [Hardware Error]: MC2 Error: Fill ECC error on data fills. May 28 13:49:33 Alpha kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD Quick googling suggests that it might be power related https://unix.stackexchange.com/questions/238454/hardware-error-in-syslog-after-gaming
May 31, 20206 yr Author Thanks I had a spare PSU kicking around so incase that was throwing a bit of a wobbly I changed it out but again over night I got another message saying machine check error events and woke up to the server being unreachable.
Archived
This topic is now archived and is closed to further replies.