Crash/Freeze


Recommended Posts

I have started to have crashes/Freeze of the entire Unraide

It has started while using plex but not on load so not sure if it's related.

 

This is the error log, but I don't understand it. ANy idea? where should I start to troubleshoot this?

Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:52:17 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:53:45 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:54:10 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:54:14 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:54:47 MediaCenter root: Fix Common Problems Version 2018.02.18
Mar 4 07:54:49 MediaCenter root: Fix Common Problems: Error: Default docker appdata location is not a cache-only share
Mar 4 07:54:49 MediaCenter root: Fix Common Problems Version 2018.02.18
Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:54:50 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:54:51 MediaCenter root: Fix Common Problems: Error: Default docker appdata location is not a cache-only share
Mar 4 07:54:51 MediaCenter root: Fix Common Problems: Error: unclean shutdown detected of your server
Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:54:51 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:54:53 MediaCenter root: Fix Common Problems: Error: unclean shutdown detected of your server
Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:55:03 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 4 07:55:17 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 4 07:55:22 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 4 07:55:22 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 4 07:55:22 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000

 

Edited by L0rdRaiden
Link to comment

Seems to me that you've got a hardware problem and/or BIOS problem.  Perhaps someone like @johnnie.black might know more info.  I myself haven't seen a dummy host bridge, and that's what your logs are referencing.

 

00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3]
	Kernel driver in use: pcieport

 

Link to comment
1 hour ago, Squid said:

Seems to me that you've got a hardware problem and/or BIOS problem.  Perhaps someone like @johnnie.black might know more info.  I myself haven't seen a dummy host bridge, and that's what your logs are referencing.

 


00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3]
	Kernel driver in use: pcieport

 

Thanks for the help

These are the devices

IOMMU group 1:[1022:15d3] 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3

IOMMU group 0:[1022:1452] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge

IOMMU group 2:[1022:1452] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge

 

I'm only using a PCI slot where I have an ethernet card with 2 intel NICs. I350

 

I already applied this before the freezes

Quote

We ported a simplified version of the zenstates.py utility to C (to avoid including python in bzroot) which may be used to disable Ryzen C6 states (as workaround for Ryzen idle freeze issue). We have found that sometimes bios option to disable C6 does not exist or does not do the right thing. If you want to use this utility, we suggest that you edit the config/go file on your USB flash device. Add this line just before emhttp is invoked:

/usr/local/sbin/zenstates --c6-disable

 

 

Edited by L0rdRaiden
Link to comment
18 hours ago, L0rdRaiden said:

Mar 4 07:52:50 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008

 

These errors can usually be fixed with a bios update or by using the offending PCIe card in a different slot, preferably swapping from a CPU slot to a chipset slot, or vice versa.

Link to comment
11 minutes ago, johnnie.black said:

 

These errors can usually be fixed with a bios update or by using the offending PCIe card in a different slot, preferably swapping from a CPU slot to a chipset slot, or vice versa.

For now I have changed a BIOS setting (RYZEN here) related to the psu iddle state, it looks like it's working, we will see.

Link to comment
On 3/5/2018 at 11:12 AM, johnnie.black said:

 

These errors can usually be fixed with a bios update or by using the offending PCIe card in a different slot, preferably swapping from a CPU slot to a chipset slot, or vice versa.

 

I'm having again the freeze. Could you please confirm me if this is related whith the Ryzen c6 state issue or not? or should I try to switch the PCI card to another slot?

If it's a problem with the PCI card what is the root cause? a hardware fail?

@limetech

 

Please help me here, I'm about the RMA the processor and getting something else.

 

C6 State Ryzeb bug

https://bugzilla.kernel.org/show_bug.cgi?id=196683#c194

 

Mar 7 23:54:50 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 7 23:54:50 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 7 23:54:50 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 7 23:55:02 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 7 23:55:07 MediaCenter kernel: vethfda2b9d: renamed from eth0
Mar 7 23:55:08 MediaCenter login[6512]: ROOT LOGIN on '/dev/pts/1'
Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 7 23:55:09 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
Mar 7 23:55:15 MediaCenter kernel: pcieport 0000:00:01.2: [12] Replay Timer Timeout
Mar 7 23:55:19 MediaCenter kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Mar 7 23:55:19 MediaCenter kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
Mar 7 23:55:19 MediaCenter kernel: pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000

 

mediacenter-diagnostics-20180307-2357.zip

Edited by L0rdRaiden
Link to comment
7 minutes ago, L0rdRaiden said:

I'm having again the freeze. Could you please confirm me if this is related whith the Ryzen c6 state issue or not? or should I try to switch the PCI card to another slot?

I can't confirm nothing, those errors might be unrelated to the crash, but they are obviously not good and I already told you what you can do to try to get rid of them, if a bios update and/or changing slots won't help only a different mother board or whichever pcie card is causing them, this usually isn't bad hardware but a compatibility issue.

Link to comment
2 minutes ago, johnnie.black said:

I can't confirm nothing, those errors might be unrelated to the crash, but they are obviously not good and I already told you what you can do to try to get rid of them, if a bios update and/or changing slots won't help only a different mother board or whichever pcie card is causing them, this usually isn't bad hardware but a compatibility issue.

Thanks, I already have the latest bios, I will try other PCI port

 

It seems that I'm not the only one https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173

Edited by L0rdRaiden
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.