Jump to content

Hardware error from APEI Generic Hardware Error Source: 514


Recommended Posts

Good afternoon all!

 

Fix Common Problems just notified me that my log folder was filling up - currently about 67% full.  I took a look and saw that there are two 3 syslog entries - totaling about 256Mb, so i took a look in the newest syslog and I'm seeing this repeated:

 

Jun  6 15:31:25 Guardian kernel: nvme 0000:02:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Jun  6 15:31:25 Guardian kernel: nvme 0000:02:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
Jun  6 15:31:25 Guardian kernel: nvme 0000:02:00.0:    [ 0] RxErr                  (First)
Jun  6 15:31:25 Guardian kernel: nvme 0000:02:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]: It has been corrected by h/w and requires no further action
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]: event severity: corrected
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:  Error 0, type: corrected
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   section_type: PCIe error
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   port_type: 0, PCIe end point
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   version: 0.2
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   command: 0x0406, status: 0x0010
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   device_id: 0000:02:00.0
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   slot: 0
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   secondary_bus: 0x00
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   class_code: 010802
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:  Error 1, type: corrected
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   section_type: PCIe error
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   port_type: 0, PCIe end point
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   version: 0.2
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   command: 0x0406, status: 0x0010
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   device_id: 0000:02:00.0
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   slot: 0
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   secondary_bus: 0x00
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   class_code: 010802
Jun  6 15:31:31 Guardian kernel: {127142}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000

 

device_id: 0000:02:00.0 is my Samsung 980Pro NVMe drive, which is the 2nd drive in my cache pool.

 

I haven't noticed this error before and have been running this setup for about 3-4 months.  The only thing that has changed is upgrading from 6.9 -> 6.10.1 -> 6.10.2

 

One thing that's weird is I looked at the attributes for the drive and see: "Power on hours 315 (13d, 3h)" - which is NOT right....

Its partner drive has "Power on hours 2,605", which is ~108 days or ~3 1/2 months - and sounds about right as they were installed at almost the same time (about a week apart)

 

Any suggestions?

 

guardian-diagnostics-20220606-1527.zip

Link to comment
  • 1 month later...

Well I finally got a chance to look into this.  I have 4 nvme slots on my board and 2 nvme drives in a cache pool.

I pulled both drives and installed heatsinks on them because every so often when the mover was running one of the drives would hit 50-60C and I didn't like that much. (I think it was the drive that was erroring, but i'm not sure)

 

Originally the drive with the errors was in slot 0, and the 2nd drive in slot 1.

 

When reinstalling the drives i put the drive with errors in slot 2, and the 2nd drive in slot 0, leaving slots 1 and 3 unoccupied.

 

I'm still seeing the errors and they've followed the drive....

Some google research seems to indicate this is a harmless error - BUT in a week or so, log folder will fill up and I'll need to delete the syslog.1 or reboot.

 

Any suggestions besides seeing if Samsung will replace the drive?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...