UNRaid Crashes - 6.7.2

playisfun60 · September 22, 2019

Seems Totally Random, cannot access any dockers, ssh, or Web GUI, have to IPMI in and reset server,

at one point was happening every day, then for the last 2 weeks no issue, and then again today

Seeing this in some of the Logs

Sep 22 14:02:32 NAS kernel: rcu: INFO: rcu_sched self-detected stall on CPU

Any Ideas?

Thanks

syslog-bad.log

Edited September 22, 2019 by playisfun60

playisfun60 · September 22, 2019

Here are the diagnostics, but this was after the issue and a reboot

nas-diagnostics-20190922-1627.zip

trurl · September 23, 2019

Have you done memtest?

Ancan · September 23, 2019

Hit this thread looking for info on the exact same message I got today. For me the shares still seemed to be up, and I could connect via SSH. Web-gui and the hosted VM's was dead though.

Haven't done memtest, but plan to. Otherwise I've found out there's some stubborn issues with Ryzen on Linux, which might or might not be fixed by limiting the C-state the CPU is allowed to enter, or completely disable C-states at all. Hopefully a new fresh Linux kernel would help as well, but outlook doesn't look good for that since the latest beta is still on the old 4.19 LTS.

playisfun60 · September 24, 2019

My Spec are as Follows, Issue started up randomly, before never an issue

X11SPH-nCTF

Xeon Silver 4114

192GB ECC

8 x 8TB with Dual Parity

2 Samsung 500 GB SSD + 512 ADATA NVME Raid total = 756GB Cache

No MemTest as of Yet, could that be an issue, as before this started, I was running UNRAID for over 2 years without any issue?

Edited September 24, 2019 by playisfun60

playisfun60 · September 29, 2019

Anyone have any Ideas, just crashed again?

trurl · September 29, 2019

52 minutes ago, playisfun60 said:

Anyone have any Ideas, just crashed again?

On 9/23/2019 at 10:15 PM, playisfun60 said:

No MemTest as of Yet

Still no memtest?

playisfun60 · September 29, 2019

Ran for 24 hours no Issue was found, I think it might be related to PLEX docker high CPU Usage, when scanning for media, I have disabled automatic scanning, will see how this goes?

Any Other thoughts on causes?

Thanks

playisfun60 · October 16, 2019

This turned out to not be the issue, just happened again?

Vr2Io · October 16, 2019

Pls try boot in safe mode first.

If no change then suggest remove the NVMe first.

Sep 13 06:00:02 NAS kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ed8d1000 [fault reason 06] PTE Read access is not set
Sep 13 06:00:02 NAS kernel: DMAR: DRHD: handling fault status reg 502

04:00.0 Non-Volatile memory controller [0108]: Silicon Motion, Inc. Device [126f:2260] (rev 03)
Subsystem: Silicon Motion, Inc. Device [126f:2260]

Edited October 16, 2019 by Benson

UNRaid Crashes - 6.7.2

Recommended Posts

playisfun60

Link to comment

playisfun60

Link to comment

trurl

Link to comment

Ancan

Link to comment

playisfun60

Link to comment

playisfun60

Link to comment

trurl

Link to comment

playisfun60

Link to comment

playisfun60

Link to comment

Vr2Io

Link to comment

Join the conversation