June 5, 20242 yr Hi all, Every so often (every once a day, maybe once every 2 days, depending) my system hangs. The system normally runs headless, but to diagnose the issues I've hooked up a monitor. When the problem occurs the screen will be black as well. The system is still powered on. I'm thinking it's a hardware issue, but I'm not very good with diagnosing these type of issues on Linux. I'm running a CWWk N305 NAS board. I've already ran memtest for 8 passes. It didn't find any errors. Just to be sure and rule out some software related stuff, I undid some of the powertop optimizations I had in place. The system is not running super cool, but I've not seen temperatures on the CPU die hitting more than 70 degrees (and even if this was an issue, it should throttle to very low clock speeds instead of crash outright). How do I go about diagnosing this issue? When the problem occurs the only way to get it started again is hard rebooting the machine. I believe from experience in other Linux environments, there should be logs of this (like for previous boots/runs) somewhere. That could be a starting point. Any suggestions debugging this would be very appreciated! If I'm missing out important details, please tell me, I'd be happy to provide them. Edited June 5, 20242 yr by joz
June 5, 20242 yr Community Expert You should post your system's diagnostics zip file in your next post in this thread to get more informed feedback. It is always a good idea to post this if your question might involve us seeing how you have things set up or to look at recent logs. The syslog in the diagnostics is the RAM version that starts afresh every time the system is booted. You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what leads up to a crash. The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field.
June 7, 20242 yr Author Thanks! I turned the syslog server on (pointing to the machine's ip) but so far no lockups. Sure it'll happen again, so we'll wait and see. Edited June 7, 20242 yr by joz
June 9, 20242 yr Author Crashed again today. At first glance I don't see anything relevant in the syslog, but maybe you guys can find something. See the diagnostics attached diagnostics-20240609-1446.zip
June 9, 20242 yr Community Expert Can you also post the syslog created by the syslog server - it is not automatically included in the diagnostics.
June 9, 20242 yr Author Ah didn't realize. See attached. The last log entry before it crashed and I rebooted it was at Jun 9 13:13:03 I do have SSDs that are throwing SMART errors. It might be the cause of the instability, not sure. The SSDs are used as cache and both of those (2x256GB and 2x1TB) are redundant. I thought I'd drive them into the ground (that is until they are fully dead) since they don't contain anything valuable that's not on the array (or otherwise backed up). I thought a bad cache drive like that couldn't bring the whole system down, but I might be wrong. Edited June 12, 20242 yr by joz
June 9, 20242 yr Author The docker image however is on the 2X256GB SSD. Next step is probably removing those drives (that have smart errors) to see if this helps with stability. That is if you guys don't have any other recommendations to diagnose or try. Edited June 9, 20242 yr by joz
June 9, 20242 yr Community Expert Unfortunately there's nothing relevant logged, this can be a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.