Server Detected Hardware Errors


Recommended Posts

I got the following alert from Fix Common Problems. I did not have mcelog installed at the time I received this message. It's installed now, although I haven't received the notification again since I installed. I have attached my diagnostics for further assistance. Thanks in advance!

 

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged

 

EDIT: Updated Diagnostic report with mce log

 

Edited by cmarshall85
Update diagnostics with mce log
  • Thanks 1
Link to comment

Ok I checked out syslog.txt from the diagnostics and included relevant log entries below.

 

It looks like on April 17th the Hardware Error occurred:

 

Apr 17 21:30:26 Tower kernel: mce: [Hardware Error]: Machine check events logged
Apr 17 21:30:26 Tower kernel: [Hardware Error]: Corrected error, no action required.
Apr 17 21:30:26 Tower kernel: [Hardware Error]: CPU:0 (17:71:0) MC27_STATUS[-|CE|MiscV|-|-|-|-|SyndV|-]: 0x982000000002080b
Apr 17 21:30:26 Tower kernel: [Hardware Error]: IPID: 0x0001002e00000500, Syndrome: 0x000000005a020001
Apr 17 21:30:26 Tower kernel: [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
Apr 17 21:30:26 Tower kernel: [Hardware Error]: Power, Interrupts, etc. Error: Error on GMI link.
Apr 17 21:30:26 Tower kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout)

 

FCP logs this every day when it runs the daily scan:

 

Apr 18 04:40:08 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server
Apr 18 04:40:08 Tower root: mcelog not installed

 

On April 22nd I installed mcelog. From April 23rd to present, FCP logs it slightly different on its daily scan.

 

Apr 23 04:40:08 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server
Apr 23 04:40:08 Tower root: mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor.  Please use the edac_mce_amd module instead.
Apr 23 04:40:08 Tower root: CPU is unsupported

 

This thread I found explains why the log is slightly different once mcelog was installed (mcelog doesn't work with AMD).

 

My questions are as follows:

  1. Is that initial error anything to worry about if it just occurred once?
  2. Do I keep getting the FCP popup notification about a hardware error, even after I told it to ignore it, because that error from the 17th still exists in the system log and FCP keeps detecting it when it runs the daily scan? 
  3. If I just reboot the server, and the error from the 17th doesn't happen again, will I not get that FCP popup notification anymore because the syslog will be erased on reboot and will no longer contain that error from the 17th?
Edited by cmarshall85
  • Thanks 1
Link to comment
7 hours ago, cmarshall85 said:

mcelog doesn't work with AMD

The note about mcelog not supporting the processor is a misnomer.  It's telling you (in very bad english) that it's using the edac_mce_amd package instead)

7 hours ago, cmarshall85 said:

Do I keep getting the FCP popup notification about a hardware error,

You shouldn't

 

7 hours ago, cmarshall85 said:

If I just reboot the server, and the error from the 17th doesn't happen again, will I not get that FCP popup notification anymore because the syslog will be erased on reboot and will no longer contain that error from the 17th?

Yes

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.