ds123 Posted November 23, 2023 Share Posted November 23, 2023 Hi, My server crashes randomly every once in a while (twice already this month) causing multiple parity checks. This started happening recently even though no new containers or plugins were installed. Attaching diagnostics taken after the last crash. What could be the issue and what is the best way to find the root cause? tower-diagnostics-20231124-0040.zip Quote Link to comment
JorgeB Posted November 24, 2023 Share Posted November 24, 2023 Enable the syslog server and post that after a crash. Quote Link to comment
ds123 Posted December 2, 2023 Author Share Posted December 2, 2023 (edited) Hi, I enabled the syslog server logging to a local cache share (to avoid a lot of writes to the flash drive). Today, a week after, the server crashed again and started a parity checl again. However, the local syslog file doesn't have logs prior to the crash (the last log before the crash was written an hour earlier). Two questions- 1. Is using a local file for the syslog server problematic to catch crash issues? If so, will writing to the flash drive help? 2. Is it safe to stop the parity check? it's the third time over the past 3 weeks the system runs a parity check due to this issue. Edited December 2, 2023 by ds123 Quote Link to comment
JorgeB Posted December 3, 2023 Share Posted December 3, 2023 14 hours ago, ds123 said: 1. Is using a local file for the syslog server problematic to catch crash issues? It usual works fine, make sure it's correctly configured, a lot of users misread the instructions and don't fill in the remote server IP. 14 hours ago, ds123 said: Is it safe to stop the parity check? it's the third time over the past 3 weeks the system runs a parity check due to this issue You can stop it for now but good to run one once this issue is resolved. Quote Link to comment
ds123 Posted December 3, 2023 Author Share Posted December 3, 2023 5 hours ago, JorgeB said: It usual works fine, make sure it's correctly configured, a lot of users misread the instructions and don't fill in the remote server IP. You can stop it for now but good to run one once this issue is resolved. It configured correctly, there is a syslog file in the share and logs are written, it just don't have any log from the crash Quote Link to comment
JorgeB Posted December 3, 2023 Share Posted December 3, 2023 If there's nothing relevant logged it usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
ds123 Posted December 3, 2023 Author Share Posted December 3, 2023 (edited) 2 hours ago, JorgeB said: If there's nothing relevant logged it usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. What capabilities are affected by switching to safe mode? will the array be inactive? If it's a hardware issue, how to identify the faulty hardware? Edited December 3, 2023 by ds123 Quote Link to comment
itimpi Posted December 3, 2023 Share Posted December 3, 2023 1 hour ago, ds123 said: What capabilities are affected by switching to safe mode? will the array be inactive? Safe Mode stops all plugins from installing/running. Other functionality is not affected. Quote Link to comment
ds123 Posted January 4 Author Share Posted January 4 (edited) Hi, I switched to safe mode lasy friday and disabled all docker containers and VMs. About 5 days later, the server crashed. This time the server didn't restart by itself - I found the server off and had to press the power button to start it again. Nothing relevant was logged to the local syslog file, in fact there are no logs from the day the crash occurred. Does this mean it's a hardware problem? What is the next step? Edited January 4 by ds123 Quote Link to comment
JorgeB Posted January 5 Share Posted January 5 10 hours ago, ds123 said: Does this mean it's a hardware problem? Most likely, server should never reboot or power off on its own, start by using a different PSU if available. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.