fazj Posted May 28, 2020 Posted May 28, 2020 Hello hive mind. Since last night my server keeps going offline, unreachable and unable to ping. I have had to power cycle it then it does the same within a few mins of starting up again. On a startup just (longest it has stayed online for) it has just come up in fix common as Machine Check Error and said i should post my diagnostics on here. I did have a little issue a few months ago where the cache was becoming unwriteable that turned out to be a ram issue which has since been replaced. Dont know if its connected just posting for more background. Anyhow, attached is my diagnostics. alpha-diagnostics-20200528-1356.zip Quote
Squid Posted May 28, 2020 Posted May 28, 2020 May 28 13:49:33 Alpha kernel: mce: [Hardware Error]: Machine check events logged May 28 13:49:33 Alpha kernel: [Hardware Error]: Corrected error, no action required. May 28 13:49:33 Alpha kernel: [Hardware Error]: CPU:1 (15:2:0) MC2_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc25402000040136 May 28 13:49:33 Alpha kernel: [Hardware Error]: Error Addr: 0x00000002dd798938 May 28 13:49:33 Alpha kernel: [Hardware Error]: MC2 Error: Fill ECC error on data fills. May 28 13:49:33 Alpha kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD Quick googling suggests that it might be power related https://unix.stackexchange.com/questions/238454/hardware-error-in-syslog-after-gaming Quote
fazj Posted May 31, 2020 Author Posted May 31, 2020 Thanks I had a spare PSU kicking around so incase that was throwing a bit of a wobbly I changed it out but again over night I got another message saying machine check error events and woke up to the server being unreachable. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.