Gibbo592 Posted August 31, 2021 Share Posted August 31, 2021 Hi Everyone, Looking for some assistance to trouble shoot my new server install which randomly crashes 5-10 minutes after starting, I have removed all cards and drives except one to continue the testing. I cant migrate my old server until I get this resolved - any help gretly appriciated. Attached is the diagnostic and syslog files after the last crash tower-diagnostics-20210831-2006.zip syslog Quote Link to comment
Gibbo592 Posted September 1, 2021 Author Share Posted September 1, 2021 I have temporarily installed windows 10 it has been running for 5 hours no issues Quote Link to comment
Gibbo592 Posted September 2, 2021 Author Share Posted September 2, 2021 Definitely an issue with unraid slackware14.2 running for more than 24 hours no issues Quote Link to comment
trurl Posted September 2, 2021 Share Posted September 2, 2021 Have you done memtest? Quote Link to comment
Gibbo592 Posted September 2, 2021 Author Share Posted September 2, 2021 (edited) Hi trurl thank you for commenting appreciate your suggestion- I actually swapped out the two 8gb modules which were new with another new module that was originally in my hp gen 10 server. I don’t believe this issue is hardware related as windows 10 pro that I installed as a test ran fine SIS sandra benchmark ran fine , I also installed the latest Slackware tried the live cd first and then installed all run fine only unraid crashes after 5~10 mins. however I will run memtest and advise. Edited September 2, 2021 by Gibbo592 Quote Link to comment
Gibbo592 Posted September 2, 2021 Author Share Posted September 2, 2021 (edited) Hi trurl, Memtest has been running for almost one hour 20 mins no issues screen capture attached Edited September 2, 2021 by Gibbo592 Quote Link to comment
Gibbo592 Posted September 5, 2021 Author Share Posted September 5, 2021 Temporarily installed Truenas and Ubuntu server both run OK only unraid has the issue - of shutting down so I assume it is a driver issue ? Quote Link to comment
jonp Posted September 6, 2021 Share Posted September 6, 2021 Hi there, What we really need is for you to get the system running with a monitor and keyboard attached. Login to the console and then type the following command: tail /var/log/syslog -f This will begin printing your log directly to the monitor and when the system crashes, you can capture the final events in the log prior to the crash, which should provide some indication of what is causing it. All the best, Jon Quote Link to comment
Gibbo592 Posted September 7, 2021 Author Share Posted September 7, 2021 hi Jon Syslog tail attached - the only thing I can see is there is a time difference 10 hours which is the difference between UTC and Australian EST Quote Link to comment
Gibbo592 Posted September 7, 2021 Author Share Posted September 7, 2021 tried setting bios time to UTC and then get this error Quote Link to comment
Gibbo592 Posted September 7, 2021 Author Share Posted September 7, 2021 just managed to get diagnostic to download before it became unresponsive tower-diagnostics-20210907-1027.zip Quote Link to comment
Gibbo592 Posted September 7, 2021 Author Share Posted September 7, 2021 something weird with the time going on tried changing the NTP servers root@Tower:~# hwclock --show 2021-09-07 20:47:19.137429+10:00 root@Tower:~# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== LOCAL(0) .LOCL. 10 l 438 64 100 0.000 +0.000 0.000 +Server.local 216.239.35.0 2 u 46 64 177 0.533 -3.040 1.881 -resolv.internod 203.35.83.242 2 u 27 64 177 13.468 +5.395 2.128 +time.cloudflare 10.85.8.92 3 u 35 64 177 11.104 -0.635 2.659 *time4.google.co .GOOG. 1 u 34 64 177 158.321 +1.003 2.384 Quote Link to comment
Gibbo592 Posted September 7, 2021 Author Share Posted September 7, 2021 Hi Jon, kernel reports TIME_ERROR: 0x41: Clock Unsynchronized appears ? in the syslog tower-syslog-20210907-0217.zip Quote Link to comment
jonp Posted September 7, 2021 Share Posted September 7, 2021 Hi Gibbo, Is the last screenshot you have posted above what was seen when the system crashed? Quote Link to comment
Gibbo592 Posted September 7, 2021 Author Share Posted September 7, 2021 Hi Jonp yes all the screenshots are from when the system becomes unresponsive gui is still there but only hard reset gets it working. it is becoming very frustrating- love the simplicity of unraid operation but this new build is going bad. any suggestion greatly appreciated. Quote Link to comment
jonp Posted September 8, 2021 Share Posted September 8, 2021 To be perfectly honest I have absolutely no idea what is wrong with your system. The logs aren't printing any significant errors out and I doubt time sync issue is causing the crash. One thing to try would be to leave the array in a stopped state but with the server on and see if just idling like that you can recreate the crash. If so, then we know it isn't storage related. If not, then maybe there is something amiss with the storage or storage controller(s). Quote Link to comment
Gibbo592 Posted September 8, 2021 Author Share Posted September 8, 2021 Hi Jonp Appriciate your help. I Emailed you some more information - I have tried to run the system with nothing running just the USB and still the same outcome random crash just hangs only physical restart will solve it. Quote Link to comment
Gibbo592 Posted September 15, 2021 Author Share Posted September 15, 2021 (edited) I have found a work around to the random crashing which may help someone else, I added initrd=/bzroot acpi=off kernel option in the syslinux configuration. This worked for both 6.9.2 and 6.10.0rc You can temporarirly try by selecting tab at the boot menu and manually add acpi=off to the line to make it permanent add the acpi=off option to the syslinux config. I also found issues with the NTP service and syslog was being flooded with "nchan: a message from the past has just been published. unless the system time has been adjusted, this should never happen." I tried several NTP servers close to home but still the error persisted - I have now stopped the NTP service and the issue is no longer apparent. Edited September 15, 2021 by Gibbo592 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.