slcc2c Posted June 26 Share Posted June 26 Hi all, thanks in advance for the help. Issue Presented as the inability to connect to my server over the network, I am also unable to login to the CLI with a kb/m+monitor. Unraid appears to be crashing in the background, in my syslogs I'm seeing messages similar to "CPU: 9 PID: 1076 Comm: kworker/u48:14 Tainted" and CPU: 9 PID: 24651 Comm: C1 CompilerThre Tainted: P" across multiple crashes. Memtests are passing, safe mode works fine and I can start my dockers, etc. I am using the latest version of Unraid. Timeline Full rebuild in January and moved my USB to a new machine. Over the last 3-4 months I had noticed instability, pinned down to the USB drive failing, and migrated to a new USB 2.0 32GB drive in early June. Since then I have been experiencing <24hrs of uptime before the server became unreachable, during these times the machine was still on and the CLI was up. Over the last 2 weeks, I have been experiencing the issue above. Files syslog_crash & syslog-previous: both have CPU Tainted error syslog_login_crash: syslog when I was able to login over web interface but system immediately crashed tower-diagnostics-20240625-1718.zip: most recent diagnostic that was successfully written. If it is helpful I can download diagnostics from safe mode as well. Quote Link to comment
slcc2c Posted June 26 Author Share Posted June 26 I was attempting some troubleshooting in safe mode and got another crash, with no obvious errors. During these "crashes" the CLI and WebUI are unresponsive but the system stays on. syslog_safe Quote Link to comment
JorgeB Posted June 26 Share Posted June 26 Try booting the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
slcc2c Posted June 26 Author Share Posted June 26 (edited) @JorgeB appreciate it. Any take on logging I can use during this time to capture what the HW failure might be? All the HW is a few months old and purchased at the same time so no obvious culprits. Edited June 26 by slcc2c Grammar Quote Link to comment
slcc2c Posted June 27 Author Share Posted June 27 Another safe mode crash, doesn't have anything interesting in the log. Points to HW failure but having trouble id'ing the weak HW. syslog_safe_crash Quote Link to comment
JorgeB Posted June 28 Share Posted June 28 12 hours ago, slcc2c said: but having trouble id'ing the weak HW. That can be difficult to do without starting to swap some parts, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.