playisfun60 Posted September 22, 2019 Share Posted September 22, 2019 (edited) Seems Totally Random, cannot access any dockers, ssh, or Web GUI, have to IPMI in and reset server, at one point was happening every day, then for the last 2 weeks no issue, and then again today Seeing this in some of the Logs Sep 22 14:02:32 NAS kernel: rcu: INFO: rcu_sched self-detected stall on CPU Any Ideas? Thanks syslog-bad.log Edited September 22, 2019 by playisfun60 Quote Link to comment
playisfun60 Posted September 22, 2019 Author Share Posted September 22, 2019 Here are the diagnostics, but this was after the issue and a reboot nas-diagnostics-20190922-1627.zip Quote Link to comment
trurl Posted September 23, 2019 Share Posted September 23, 2019 Have you done memtest? Quote Link to comment
Ancan Posted September 23, 2019 Share Posted September 23, 2019 Hit this thread looking for info on the exact same message I got today. For me the shares still seemed to be up, and I could connect via SSH. Web-gui and the hosted VM's was dead though. Haven't done memtest, but plan to. Otherwise I've found out there's some stubborn issues with Ryzen on Linux, which might or might not be fixed by limiting the C-state the CPU is allowed to enter, or completely disable C-states at all. Hopefully a new fresh Linux kernel would help as well, but outlook doesn't look good for that since the latest beta is still on the old 4.19 LTS. Quote Link to comment
playisfun60 Posted September 24, 2019 Author Share Posted September 24, 2019 (edited) My Spec are as Follows, Issue started up randomly, before never an issue X11SPH-nCTF Xeon Silver 4114 192GB ECC 8 x 8TB with Dual Parity 2 Samsung 500 GB SSD + 512 ADATA NVME Raid total = 756GB Cache No MemTest as of Yet, could that be an issue, as before this started, I was running UNRAID for over 2 years without any issue? Edited September 24, 2019 by playisfun60 Quote Link to comment
playisfun60 Posted September 29, 2019 Author Share Posted September 29, 2019 Anyone have any Ideas, just crashed again? Quote Link to comment
trurl Posted September 29, 2019 Share Posted September 29, 2019 52 minutes ago, playisfun60 said: Anyone have any Ideas, just crashed again? On 9/23/2019 at 10:15 PM, playisfun60 said: No MemTest as of Yet Still no memtest? Quote Link to comment
playisfun60 Posted September 29, 2019 Author Share Posted September 29, 2019 Ran for 24 hours no Issue was found, I think it might be related to PLEX docker high CPU Usage, when scanning for media, I have disabled automatic scanning, will see how this goes? Any Other thoughts on causes? Thanks Quote Link to comment
playisfun60 Posted October 16, 2019 Author Share Posted October 16, 2019 This turned out to not be the issue, just happened again? Quote Link to comment
Vr2Io Posted October 16, 2019 Share Posted October 16, 2019 (edited) Pls try boot in safe mode first. If no change then suggest remove the NVMe first. Sep 13 06:00:02 NAS kernel: DMAR: [DMA Read] Request device [04:00.0] fault addr ed8d1000 [fault reason 06] PTE Read access is not set Sep 13 06:00:02 NAS kernel: DMAR: DRHD: handling fault status reg 502 04:00.0 Non-Volatile memory controller [0108]: Silicon Motion, Inc. Device [126f:2260] (rev 03) Subsystem: Silicon Motion, Inc. Device [126f:2260] Edited October 16, 2019 by Benson Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.