MothyTim Posted August 13, 2019 Share Posted August 13, 2019 Hi, I have a recurring error that I have noticed in the logs that I can't track down! All my PCIe cards seem to function correctly. `does anyone know what they are telling me? Thanks, Tim Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:38:22 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:39:34 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:39:35 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Aug 13 16:39:41 Tower kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Quote Link to comment
JorgeB Posted August 13, 2019 Share Posted August 13, 2019 This can sometimes be fixed by using a different PCIe slot, especially changing from a CPU to a PCH slot or vice versa, a bios update might also help. Quote Link to comment
testdasi Posted August 13, 2019 Share Posted August 13, 2019 Also there's no need to be too concerned. These errors are harmless. Quote Link to comment
JorgeB Posted August 13, 2019 Share Posted August 13, 2019 They usually are, but if nothing they will spam and fill up the log. Quote Link to comment
MothyTim Posted August 13, 2019 Author Share Posted August 13, 2019 Ok thanks guys, I’ll try and work out which slot it’s referring to and then I might be able to workout what’s going on! Cheers, Tim Quote Link to comment
Julius Posted September 23, 2019 Share Posted September 23, 2019 Same here, all is functioning fine, but syslog is filling up in minutes everytime after I clear the log. And I have no PCI-e slots available to switch to/from. Can someone at unRAID dev tell me how to disable logging for this? Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 This is coming from the PCI slot where a LSI HBA card is. I only use 4 SATA connections of the 8 available, perhaps that causes this, but still, it's pretty ridiculous. Quote Link to comment
Julius Posted September 24, 2019 Share Posted September 24, 2019 OK, for now I 'fixed' this by destroying syslog every few hours via the user scripts plugin; #!/bin/bash rm -f /var/log/syslog touch /var/log/syslog chmod 0644 /var/log/syslog exit 0 Quote Link to comment
trott Posted September 24, 2019 Share Posted September 24, 2019 13 hours ago, Julius said: Same here, all is functioning fine, but syslog is filling up in minutes everytime after I clear the log. And I have no PCI-e slots available to switch to/from. Can someone at unRAID dev tell me how to disable logging for this? Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: can't find device of ID00dc Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Multiple Corrected error received: 0000:00:1b.4 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: device [8086:a32c] error status/mask=00000001/00002000 Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: [ 0] RxErr (First) Sep 23 11:09:05 silent kernel: pcieport 0000:00:1b.4: AER: Corrected error received: 0000:00:1b.4 This is coming from the PCI slot where a LSI HBA card is. I only use 4 SATA connections of the 8 available, perhaps that causes this, but still, it's pretty ridiculous. I also have HBA card with only 4 SATA connected, no such issue. what's firmware version Quote Link to comment
Julius Posted September 24, 2019 Share Posted September 24, 2019 2 minutes ago, trott said: I also have HBA card with only 4 SATA connected, no such issue. what's firmware version Latest version from the broadcom site, think it's called Avago in bios now. Going to try and erase BIOS entirely from the card, following this and check if that solves it.. Quote Link to comment
Julius Posted September 24, 2019 Share Posted September 24, 2019 Hmmm.. apparently doesn't make one bit of difference either how I set boot options for the card, or if the boot-flash is even there or not. As soon as I use this LSI 9207-8i card in this PCI-e slot, I get this; Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 134131744 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20 Note that *everything* else is error-free on this unRAID server and its hardware. Attached is the latest run that filled up syslog.. syslog.zip Quote Link to comment
Julius Posted September 24, 2019 Share Posted September 24, 2019 Found out it is a known kernel error for many linux distros relating to Advanced Error Reporting; http://billauer.co.il/blog/2015/10/linux-pcie-aer/ and apparently can be switched off per device. See also https://gist.github.com/Brainiarc7/3179144393747f35e5155fdbfd675554 Problem is, I can't test for days yet, because parity is being recreated here. Quote Link to comment
trott Posted September 25, 2019 Share Posted September 25, 2019 turn off the AER did actully not fix the issue, if I were you, I will change the HBA card, once the error cannot be automatically corrected, you might lose your data Quote Link to comment
Julius Posted September 25, 2019 Share Posted September 25, 2019 (edited) 5 hours ago, trott said: turn off the AER did actully not fix the issue, if I were you, I will change the HBA card, once the error cannot be automatically corrected, you might lose your data I don't think you've read correctly, the AER is doing the reporting of the corrected errors, not the correcting itself. The AER driver receives the corrected error notification but fails to clear it. Besides, the error is not coming from the card, but from the pcie hardware on the mainboard. Of course I can try a different card, but I doubt it will make a difference, the source is in unRAID's linux kernel, not the card's hardware. (Already one of the best ones with vast config options and up to date firmware.) Edited September 25, 2019 by Julius Quote Link to comment
trott Posted September 25, 2019 Share Posted September 25, 2019 3 hours ago, Julius said: I don't think you've read correctly, the AER is doing the reporting of the corrected errors, not the correcting itself. The AER driver receives the corrected error notification but fails to clear it. Besides, the error is not coming from the card, but from the pcie hardware on the mainboard. Of course I can try a different card, but I doubt it will make a difference, the source is in unRAID's linux kernel, not the card's hardware. (Already one of the best ones with vast config options and up to date firmware.) Yes, the error is corrected, but you need to find out why there is error at the first place; Corrected error with report turn off does not means there is no issue Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.