April 30, 20242 yr For some time now I'm getting random system crashes and I cannot pin down the cause. This happens usually in morning hours in highly irregular intervals: it happened today about 7am, and last time was about 3 weeks ago, before that it did not crash for 3 months but before that it was crashing every few days. I'm attaching server diagnostics file and syslog from today (I'm storing syslog in share, so its not lost on a crash). What can I do to find the cause of this? beholder-diagnostics-20240430-0850.zip syslog.log
April 30, 20242 yr Community Expert Apr 29 23:50:54 Beholder kernel: [Hardware Error]: Corrected error, no action required. Apr 29 23:50:54 Beholder kernel: [Hardware Error]: CPU:0 (17:71:0) MC27_STATUS[-|CE|MiscV|-|-|-|SyndV|-|-|-]: 0x982000000002080b Apr 29 23:50:54 Beholder kernel: [Hardware Error]: IPID: 0x0001002e00000500, Syndrome: 0x000000005a020001 Apr 29 23:50:54 Beholder kernel: [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error. Apr 29 23:50:54 Beholder kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout) This suggests a CPU problem, this time the error was corrected, but sometimes it may not be, and cause a crash.
May 7, 20242 yr Author On 4/30/2024 at 11:39 AM, JorgeB said: Apr 29 23:50:54 Beholder kernel: [Hardware Error]: Corrected error, no action required. Apr 29 23:50:54 Beholder kernel: [Hardware Error]: CPU:0 (17:71:0) MC27_STATUS[-|CE|MiscV|-|-|-|SyndV|-|-|-]: 0x982000000002080b Apr 29 23:50:54 Beholder kernel: [Hardware Error]: IPID: 0x0001002e00000500, Syndrome: 0x000000005a020001 Apr 29 23:50:54 Beholder kernel: [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error. Apr 29 23:50:54 Beholder kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout) This suggests a CPU problem, this time the error was corrected, but sometimes it may not be, and cause a crash. So your recommendation is to replace my CPU?
May 7, 20242 yr Community Expert If you can try a different one first to confirm, I can't say for sure that the CPU is the problem, but it would be my main suspect.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.