September 22, 20196 yr Seems Totally Random, cannot access any dockers, ssh, or Web GUI, have to IPMI in and reset server, at one point was happening every day, then for the last 2 weeks no issue, and then again today Seeing this in some of the Logs Sep 22 14:02:32 NAS kernel: rcu: INFO: rcu_sched self-detected stall on CPU Any Ideas? Thanks syslog-bad.log Edited September 22, 20196 yr by playisfun60
September 22, 20196 yr Author Here are the diagnostics, but this was after the issue and a reboot nas-diagnostics-20190922-1627.zip
September 23, 20196 yr Hit this thread looking for info on the exact same message I got today. For me the shares still seemed to be up, and I could connect via SSH. Web-gui and the hosted VM's was dead though. Haven't done memtest, but plan to. Otherwise I've found out there's some stubborn issues with Ryzen on Linux, which might or might not be fixed by limiting the C-state the CPU is allowed to enter, or completely disable C-states at all. Hopefully a new fresh Linux kernel would help as well, but outlook doesn't look good for that since the latest beta is still on the old 4.19 LTS.
September 24, 20196 yr Author My Spec are as Follows, Issue started up randomly, before never an issue X11SPH-nCTF Xeon Silver 4114 192GB ECC 8 x 8TB with Dual Parity 2 Samsung 500 GB SSD + 512 ADATA NVME Raid total = 756GB Cache No MemTest as of Yet, could that be an issue, as before this started, I was running UNRAID for over 2 years without any issue? Edited September 24, 20196 yr by playisfun60
September 29, 20196 yr Community Expert 52 minutes ago, playisfun60 said: Anyone have any Ideas, just crashed again? On 9/23/2019 at 10:15 PM, playisfun60 said: No MemTest as of Yet Still no memtest?
September 29, 20196 yr Author Ran for 24 hours no Issue was found, I think it might be related to PLEX docker high CPU Usage, when scanning for media, I have disabled automatic scanning, will see how this goes? Any Other thoughts on causes? Thanks
October 16, 20196 yr Pls try boot in safe mode first. If no change then suggest remove the NVMe first. Sep 13 06:00:02 NAS kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ed8d1000 [fault reason 06] PTE Read access is not set Sep 13 06:00:02 NAS kernel: DMAR: DRHD: handling fault status reg 502 04:00.0 Non-Volatile memory controller [0108]: Silicon Motion, Inc. Device [126f:2260] (rev 03) Subsystem: Silicon Motion, Inc. Device [126f:2260] Edited October 16, 20196 yr by Benson
Archived
This topic is now archived and is closed to further replies.