L0rdRaiden Posted March 4, 2018 Share Posted March 4, 2018 I have started to have crashes/Freeze of the entire Unraide It has started while using plex but not on load so not sure if it's related. This is the error log, but I don't understand it. ANy idea? where should I start to troubleshoot this? Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:54:47 MediaCenter root: Fix Common Problems Version 2018.02.18 Mar 4 07:54:49 MediaCenter root: Fix Common Problems: Error: Default docker appdata location is not a cache-only share Mar 4 07:54:49 MediaCenter root: Fix Common Problems Version 2018.02.18 Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:54:51 MediaCenter root: Fix Common Problems: Error: Default docker appdata location is not a cache-only share Mar 4 07:54:51 MediaCenter root: Fix Common Problems: Error: unclean shutdown detected of your server Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:54:53 MediaCenter root: Fix Common Problems: Error: unclean shutdown detected of your server Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 4 07:55:22 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 4 07:55:22 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 4 07:55:22 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Link to comment
Squid Posted March 4, 2018 Share Posted March 4, 2018 You really should post the entire diagnostics to put the snippet into perspective. Link to comment
L0rdRaiden Posted March 4, 2018 Author Share Posted March 4, 2018 1 minute ago, Squid said: You really should post the entire diagnostics to put the snippet into perspective. I have attached the file. Where should I start to look? Thanks mediacenter-diagnostics-20180304-0808.zip Link to comment
L0rdRaiden Posted March 4, 2018 Author Share Posted March 4, 2018 It happend again after I unrar a file using my PC directly in one of the unraid shares through SMB. Could this be the issue? How can I fix it? Link to comment
L0rdRaiden Posted March 4, 2018 Author Share Posted March 4, 2018 Please I need help, it happend again while I was just using radarr and sending files to ruTorrent, all I have are docker containers but if a docker container fails isn't supponse to kill the host. Link to comment
Squid Posted March 4, 2018 Share Posted March 4, 2018 Seems to me that you've got a hardware problem and/or BIOS problem. Perhaps someone like @johnnie.black might know more info. I myself haven't seen a dummy host bridge, and that's what your logs are referencing. 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452] 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3] Kernel driver in use: pcieport Link to comment
L0rdRaiden Posted March 4, 2018 Author Share Posted March 4, 2018 1 hour ago, Squid said: Seems to me that you've got a hardware problem and/or BIOS problem. Perhaps someone like @johnnie.black might know more info. I myself haven't seen a dummy host bridge, and that's what your logs are referencing. 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452] 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3] Kernel driver in use: pcieport Thanks for the help These are the devices IOMMU group 1:[1022:15d3] 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3 IOMMU group 0:[1022:1452] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge IOMMU group 2:[1022:1452] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge I'm only using a PCI slot where I have an ethernet card with 2 intel NICs. I350 I already applied this before the freezes Quote We ported a simplified version of the zenstates.py utility to C (to avoid including python in bzroot) which may be used to disable Ryzen C6 states (as workaround for Ryzen idle freeze issue). We have found that sometimes bios option to disable C6 does not exist or does not do the right thing. If you want to use this utility, we suggest that you edit the config/go file on your USB flash device. Add this line just before emhttp is invoked: /usr/local/sbin/zenstates --c6-disable Link to comment
JorgeB Posted March 5, 2018 Share Posted March 5, 2018 18 hours ago, L0rdRaiden said: Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 These errors can usually be fixed with a bios update or by using the offending PCIe card in a different slot, preferably swapping from a CPU slot to a chipset slot, or vice versa. Link to comment
L0rdRaiden Posted March 5, 2018 Author Share Posted March 5, 2018 11 minutes ago, johnnie.black said: These errors can usually be fixed with a bios update or by using the offending PCIe card in a different slot, preferably swapping from a CPU slot to a chipset slot, or vice versa. For now I have changed a BIOS setting (RYZEN here) related to the psu iddle state, it looks like it's working, we will see. Link to comment
L0rdRaiden Posted March 7, 2018 Author Share Posted March 7, 2018 On 3/5/2018 at 11:12 AM, johnnie.black said: These errors can usually be fixed with a bios update or by using the offending PCIe card in a different slot, preferably swapping from a CPU slot to a chipset slot, or vice versa. I'm having again the freeze. Could you please confirm me if this is related whith the Ryzen c6 state issue or not? or should I try to switch the PCI card to another slot? If it's a problem with the PCI card what is the root cause? a hardware fail? @limetech Please help me here, I'm about the RMA the processor and getting something else. C6 State Ryzeb bug https://bugzilla.kernel.org/show_bug.cgi?id=196683#c194 Mar 7 23:54:50 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 7 23:54:50 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 7 23:54:50 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 7 23:55:07 MediaCenter kernel: vethfda2b9d: renamed from eth0 Mar 7 23:55:08 MediaCenter login[6512]: ROOT LOGIN on '/dev/pts/1' Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout Mar 7 23:55:19 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008 Mar 7 23:55:19 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID) Mar 7 23:55:19 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000 mediacenter-diagnostics-20180307-2357.zip Link to comment
JorgeB Posted March 7, 2018 Share Posted March 7, 2018 7 minutes ago, L0rdRaiden said: I'm having again the freeze. Could you please confirm me if this is related whith the Ryzen c6 state issue or not? or should I try to switch the PCI card to another slot? I can't confirm nothing, those errors might be unrelated to the crash, but they are obviously not good and I already told you what you can do to try to get rid of them, if a bios update and/or changing slots won't help only a different mother board or whichever pcie card is causing them, this usually isn't bad hardware but a compatibility issue. Link to comment
L0rdRaiden Posted March 7, 2018 Author Share Posted March 7, 2018 2 minutes ago, johnnie.black said: I can't confirm nothing, those errors might be unrelated to the crash, but they are obviously not good and I already told you what you can do to try to get rid of them, if a bios update and/or changing slots won't help only a different mother board or whichever pcie card is causing them, this usually isn't bad hardware but a compatibility issue. Thanks, I already have the latest bios, I will try other PCI port It seems that I'm not the only one https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173 Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.