• [6.6.7] PCI Errors in the log


    Cisnet
    • Minor

    Noticed my unraid randomly freeze over night after being on for a while. Have to shut the whole PC down. I looked in the logs and saw a bunch of these errors.

     

    Mar 22 17:04:20 Tower kernel: pcieport 0000:00:03.2: [12] Replay Timer Timeout
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: AER: Corrected error received: 0000:00:03.2
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000040/00002000
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: [ 6] Bad TLP
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: AER: Corrected error received: 0000:00:03.2
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000080/00002000
    Mar 22 17:04:21 Tower kernel: pcieport 0000:00:03.2: [ 7] Bad DLLP
    Mar 22 17:04:22 Tower kernel: pcieport 0000:00:03.2: AER: Multiple Corrected error received: 0000:00:03.2
    Mar 22 17:04:22 Tower kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Mar 22 17:04:22 Tower kernel: pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000040/00002000
    Mar 22 17:04:22 Tower kernel: pcieport 0000:00:03.2: [ 6] Bad TLP
    Mar 22 17:05:23 Tower kernel: pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
    Mar 22 17:05:23 Tower kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
    Mar 22 17:05:23 Tower kernel: pcieport 0000:00:03.0: device [8086:2f08] error status/mask=00001000/00002000
    Mar 22 17:05:23 Tower kernel: pcieport 0000:00:03.0: [12] Replay Timer Timeout
    Mar 22 17:05:47 Tower kernel: pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
    Mar 22 17:05:47 Tower kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
    Mar 22 17:05:47 Tower kernel: pcieport 0000:00:03.0: device [8086:2f08] error status/mask=00001000/00002000
    Mar 22 17:05:47 Tower kernel: pcieport 0000:00:03.0: [12] Replay Timer Timeout
    Mar 22 17:06:15 Tower kernel: pcieport 0000:00:03.2: AER: Corrected error received: 0000:00:03.2
    Mar 22 17:06:15 Tower kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Mar 22 17:06:15 Tower kernel: pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000080/00002000
    Mar 22 17:06:15 Tower kernel: pcieport 0000:00:03.2: [ 7] Bad DLLP
    Mar 22 17:06:16 Tower kernel: pcieport 0000:00:03.2: AER: Corrected error received: 0000:00:03.2
    Mar 22 17:06:16 Tower kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
    Mar 22 17:06:16 Tower kernel: pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00001000/00002000
    Mar 22 17:06:16 Tower kernel: pcieport 0000:00:03.2: [12] Replay Timer Timeout
    Mar 22 17:06:32 Tower kernel: pcieport 0000:00:03.2: AER: Corrected error received: 0000:00:03.2
    Mar 22 17:06:32 Tower kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
    Mar 22 17:06:32 Tower kernel: pcieport 0000:00:03.2: device [8086:2f0a] error status/mask=00000040/00002000
    Mar 22 17:06:32 Tower kernel: pcieport 0000:00:03.2: [ 6] Bad TLP
    Mar 22 17:07:13 Tower kernel: pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
    Mar 22 17:07:13 Tower kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
    Mar 22 17:07:13 Tower kernel: pcieport 0000:00:03.0: device [8086:2f08] error status/mask=00001000/00002000
    Mar 22 17:07:13 Tower kernel: pcieport 0000:00:03.0: [12] Replay Timer Timeout
    Mar 22 17:07:55 Tower ntpd[2703]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

     

    Any ideas?

    tower-diagnostics-20190322-1712.zip




    User Feedback

    Recommended Comments

    I got plenty of same errors except my device is different => device [8086:6f0a].

    That translates to Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3.

     

    So machine is stable under windows server, windows 10. All devices are functioning like a charm. 

    I don't see any freezes in Unraid - at least web UI is working fine. All seems to be ok. But it's flooding log with those errors. 

     

    Anyone knows why ? And how to fix it ?

    Link to comment

    So, it was a network card in PCIE slot 1. Small cheap 2,5G network based on realtek. I tried in different ports   - same hardware layer error flood, only with different address - different Express Root Port. 

    What's interesting it is not showing errors until network cable is plugged, then it goes off like a avalanche. What's funny when log is open it is communicating via this card to server to show log, so more log output I see, more faults are generated.... To the point that unraid webui gets super slow... 

     

    Fortunately this card will not be kept in future server, it's temporary solution until full 10G network architecture will be in place (current switch has max 2,5G capability and my network cards can do only 10G/1000/100, no 2,5).

     

    Also I tried all possible setting in BIOS (maybe not all,  combination numbers are mssive), and it's alwas the same... Cheap card = sometimes problems..?

     

     

    Edited by Grzywa
    grammar ?;]
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.