August 13, 20196 yr Hi, I have a recurring error that I have noticed in the logs that I can't track down! All my PCIe cards seem to function correctly. `does anyone know what they are telling me? Thanks, Tim Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First)
August 13, 20196 yr Community Expert This can sometimes be fixed by using a different PCIe slot, especially changing from a CPU to a PCH slot or vice versa, a bios update might also help.
August 13, 20196 yr Community Expert They usually are, but if nothing they will spam and fill up the log.
August 13, 20196 yr Author Ok thanks guys, I’ll try and work out which slot it’s referring to and then I might be able to workout what’s going on! Cheers, Tim
September 23, 20196 yr Same here, all is functioning fine, but syslog is filling up in minutes everytime after I clear the log. And I have no PCI-e slots available to switch to/from. Can someone at unRAID dev tell me how to disable logging for this? Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 This is coming from the PCI slot where a LSI HBA card is. I only use 4 SATA connections of the 8 available, perhaps that causes this, but still, it's pretty ridiculous.
September 24, 20196 yr OK, for now I 'fixed' this by destroying syslog every few hours via the user scripts plugin; #!/bin/bash rm -f /var/log/syslog touch /var/log/syslog chmod 0644 /var/log/syslog exit 0
September 24, 20196 yr 13 hours ago, Julius said: Same here, all is functioning fine, but syslog is filling up in minutes everytime after I clear the log. And I have no PCI-e slots available to switch to/from. Can someone at unRAID dev tell me how to disable logging for this? Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 This is coming from the PCI slot where a LSI HBA card is. I only use 4 SATA connections of the 8 available, perhaps that causes this, but still, it's pretty ridiculous. I also have HBA card with only 4 SATA connected, no such issue. what's firmware version
September 24, 20196 yr 2 minutes ago, trott said: I also have HBA card with only 4 SATA connected, no such issue. what's firmware version Latest version from the broadcom site, think it's called Avago in bios now. Going to try and erase BIOS entirely from the card, following this and check if that solves it..
September 24, 20196 yr Hmmm.. apparently doesn't make one bit of difference either how I set boot options for the card, or if the boot-flash is even there or not. As soon as I use this LSI 9207-8i card in this PCI-e slot, I get this; Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134131744 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20 Note that *everything* else is error-free on this unRAID server and its hardware. Attached is the latest run that filled up syslog.. syslog.zip
September 24, 20196 yr Found out it is a known kernel error for many linux distros relating to Advanced Error Reporting; http://billauer.co.il/blog/2015/10/linux-pcie-aer/ and apparently can be switched off per device. See also https://gist.github.com/Brainiarc7/3179144393747f35e5155fdbfd675554 Problem is, I can't test for days yet, because parity is being recreated here.
September 25, 20196 yr turn off the AER did actully not fix the issue, if I were you, I will change the HBA card, once the error cannot be automatically corrected, you might lose your data
September 25, 20196 yr 5 hours ago, trott said: turn off the AER did actully not fix the issue, if I were you, I will change the HBA card, once the error cannot be automatically corrected, you might lose your data I don't think you've read correctly, the AER is doing the reporting of the corrected errors, not the correcting itself. The AER driver receives the corrected error notification but fails to clear it. Besides, the error is not coming from the card, but from the pcie hardware on the mainboard. Of course I can try a different card, but I doubt it will make a difference, the source is in unRAID's linux kernel, not the card's hardware. (Already one of the best ones with vast config options and up to date firmware.) Edited September 25, 20196 yr by Julius
September 25, 20196 yr 3 hours ago, Julius said: I don't think you've read correctly, the AER is doing the reporting of the corrected errors, not the correcting itself. The AER driver receives the corrected error notification but fails to clear it. Besides, the error is not coming from the card, but from the pcie hardware on the mainboard. Of course I can try a different card, but I doubt it will make a difference, the source is in unRAID's linux kernel, not the card's hardware. (Already one of the best ones with vast config options and up to date firmware.) Yes, the error is corrected, but you need to find out why there is error at the first place; Corrected error with report turn off does not means there is no issue
Archived
This topic is now archived and is closed to further replies.