Hardware Error MCE


Recommended Posts

Oddly this started after I started transferring a few files directly from a rclone mount. Not sure if it was just a coincidence.

 

Quote

Jan 15 22:23:21 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:23:21 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:23:21 Backup kernel: CMCI storm detected: switching to poll mode
Jan 15 22:24:25 Backup kernel: mce_notify_irq: 15 callbacks suppressed
Jan 15 22:24:25 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:24:35 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:26:48 Backup kernel: mce_notify_irq: 2 callbacks suppressed
Jan 15 22:26:48 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:27:00 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:29:00 Backup kernel: mce_notify_irq: 2 callbacks suppressed
Jan 15 22:29:00 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:34:00 Backup kernel: CMCI storm subsided: switching to interrupt mode
Jan 15 22:37:07 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:37:54 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:38:20 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:38:29 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:39:20 Backup kernel: CMCI storm detected: switching to poll mode
Jan 15 22:39:21 Backup kernel: mce_notify_irq: 18 callbacks suppressed
Jan 15 22:39:21 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:39:22 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:40:26 Backup kernel: mce_notify_irq: 1 callbacks suppressed
Jan 15 22:40:26 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:40:40 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:43:14 Backup kernel: mce_notify_irq: 1 callbacks suppressed
Jan 15 22:43:14 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:46:30 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:50:00 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:53:10 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:54:32 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:54:37 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:55:42 Backup kernel: mce_notify_irq: 5 callbacks suppressed
Jan 15 22:55:42 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:55:52 Backup login[32403]: ROOT LOGIN on '/dev/pts/0'
Jan 15 22:56:00 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:56:43 Backup kernel: mce_notify_irq: 1 callbacks suppressed
Jan 15 22:56:43 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 15 22:56:54 Backup kernel: mce: [Hardware Error]: Machine check events logged



 

backup-diagnostics-20190115-2259.zip

Link to comment

Thanks johnnie.black, always willing to help. Much appreciated. So I have mcelog installed in the nerd pack but it's never worked. I also know a ton of people who say the same thing. System event log only shows sys_fan4, 3, 2, 1, cpu2,1 fan all lower critrical going low asserted or deasserted. Probably due to the fans I'm using.
image.thumb.png.618a627f09aeab3f951525999c3ec4a7.png


I did notice this

Quote

70 01/10/2019 01:47:5734AC LostPower SupplyPower Supply Input Lost or Out of Range - Asserted

But realized that was when I shut down the server gracefully and had to pull it. I didn't open it, just did some rearranging in the rack. I'm doing a memtest right now. 

Link to comment
  • 2 weeks later...

Upgraded to the latest RC version available, hardware errors are still continuing. Here is the latest syslog messages that I haven't seen before.

 

Quote

Jan 25 06:00:14 Backup kernel: Uhhuh. NMI received for unknown reason 21 on CPU 0.
Jan 25 06:00:14 Backup kernel: Do you have a strange power saving mode enabled?
Jan 25 06:00:14 Backup kernel: Dazed and confused, but trying to continue
Jan 25 06:00:14 Backup kernel: DMAR: DRHD: handling fault status reg 2
Jan 25 06:00:14 Backup kernel: DMAR: [DMA Read] Request device [02:00.0] fault addr ff0cf000 [fault reason 06] PTE Read access is not set
Jan 25 06:00:14 Backup kernel: DMAR: [DMA Read] Request device [03:00.0] fault addr fed22000 [fault reason 06] PTE Read access is not set
Jan 25 06:00:14 Backup kernel: DMAR: DRHD: handling fault status reg 202
Jan 25 06:00:14 Backup kernel: DMAR: [DMA Read] Request device [03:00.0] fault addr fed13000 [fault reason 06] PTE Read access is not set
Jan 25 06:00:14 Backup kernel: DMAR: DRHD: handling fault status reg 302
Jan 25 06:00:14 Backup kernel: DMAR: [DMA Read] Request device [03:00.0] fault addr fed14000 [fault reason 06] PTE Read access is not set
Jan 25 06:00:15 Backup kernel: mce_notify_irq: 58 callbacks suppressed
Jan 25 06:00:15 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 25 06:00:16 Backup kernel: mce: [Hardware Error]: Machine check events logged
Jan 25 06:00:22 Backup kernel: dmar_fault: 8424 callbacks suppressed
Jan 25 06:00:22 Backup kernel: DMAR: DRHD: handling fault status reg 402
Jan 25 06:00:22 Backup kernel: DMAR: [DMA Read] Request device [02:00.0] fault addr fea3a000 [fault reason 06] PTE Read access is not set
Jan 25 06:00:23 Backup kernel: DMAR: DRHD: handling fault status reg 502
Jan 25 06:00:23 Backup kernel: DMAR: [DMA Read] Request device [03:00.0] fault addr fec10000 [fault reason 06] PTE Read access is not set
Jan 25 06:00:23 Backup kernel: DMAR: DRHD: handling fault status reg 602
Jan 25 06:00:23 Backup kernel: DMAR: [DMA Read] Request device [03:00.0] fault addr fe7df000 [fault reason 06] PTE Read access is not set
Jan 25 06:00:23 Backup kernel: DMAR: DRHD: handling fault status reg 702

 

backup-diagnostics-20190125-0754.zip

Edited by slimshizn
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.