PCIe error


Recommended Posts

Hi, I have a recurring error that I have noticed in the logs that I can't track down! All my PCIe cards seem to function correctly. `does anyone know what they are telling me?

Thanks,

Tim

Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)

 

Link to comment
  • 1 month later...

Same here, all is functioning fine, but syslog is filling up in minutes everytime after I clear the log. And I have no PCI-e slots available to switch to/from.

Can someone at unRAID dev tell me how to disable logging for this?

Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4

This is coming from the PCI slot where a LSI HBA card is. I only use 4 SATA connections of the 8 available, perhaps that causes this, but still, it's pretty ridiculous.

Link to comment
13 hours ago, Julius said:

Same here, all is functioning fine, but syslog is filling up in minutes everytime after I clear the log. And I have no PCI-e slots available to switch to/from.

Can someone at unRAID dev tell me how to disable logging for this?


Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:   device [8086:a32c] error status/mask=00000001/00002000
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4:    [ 0] RxErr                  (First)
Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4

This is coming from the PCI slot where a LSI HBA card is. I only use 4 SATA connections of the 8 available, perhaps that causes this, but still, it's pretty ridiculous.

I also have HBA card with only 4 SATA connected, no such issue. what's firmware version

Link to comment

Hmmm.. apparently doesn't make one bit of difference either how I set boot options for the card, or if the boot-flash is even there or not.

As soon as I use this LSI 9207-8i card in this PCI-e slot, I get this;

 

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134131744 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20

 

Note that *everything* else is error-free on this unRAID server and its hardware.

Attached is the latest run that filled up syslog..

syslog.zip

Link to comment
5 hours ago, trott said:

turn off the AER did actully not fix the issue,  if I were you, I will change the HBA card,  once the error cannot be automatically corrected, you might lose your data

I don't think you've read correctly, the AER is doing the reporting of the corrected errors, not the correcting itself. The AER driver receives the corrected error notification but fails to clear it. Besides, the error is not coming from the card, but from the pcie hardware on the mainboard. Of course I can try a different card, but I doubt it will make a difference, the source is in unRAID's linux kernel, not the card's hardware. (Already one of the best ones with vast config options and up to date firmware.)

Edited by Julius
Link to comment
3 hours ago, Julius said:

I don't think you've read correctly, the AER is doing the reporting of the corrected errors, not the correcting itself. The AER driver receives the corrected error notification but fails to clear it. Besides, the error is not coming from the card, but from the pcie hardware on the mainboard. Of course I can try a different card, but I doubt it will make a difference, the source is in unRAID's linux kernel, not the card's hardware. (Already one of the best ones with vast config options and up to date firmware.)

Yes, the error is corrected, but you need to find out why there is error at the first place;  Corrected  error with report turn off does not means there is no issue

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.