August 31, 20214 yr Hi Everyone, Looking for some assistance to trouble shoot my new server install which randomly crashes 5-10 minutes after starting, I have removed all cards and drives except one to continue the testing. I cant migrate my old server until I get this resolved - any help gretly appriciated. Attached is the diagnostic and syslog files after the last crash tower-diagnostics-20210831-2006.zip syslog
September 1, 20214 yr Author I have temporarily installed windows 10 it has been running for 5 hours no issues
September 2, 20214 yr Author Definitely an issue with unraid slackware14.2 running for more than 24 hours no issues
September 2, 20214 yr Author Hi trurl thank you for commenting appreciate your suggestion- I actually swapped out the two 8gb modules which were new with another new module that was originally in my hp gen 10 server. I don’t believe this issue is hardware related as windows 10 pro that I installed as a test ran fine SIS sandra benchmark ran fine , I also installed the latest Slackware tried the live cd first and then installed all run fine only unraid crashes after 5~10 mins. however I will run memtest and advise. Edited September 2, 20214 yr by Gibbo592
September 2, 20214 yr Author Hi trurl, Memtest has been running for almost one hour 20 mins no issues screen capture attached Edited September 2, 20214 yr by Gibbo592
September 5, 20214 yr Author Temporarily installed Truenas and Ubuntu server both run OK only unraid has the issue - of shutting down so I assume it is a driver issue ?
September 6, 20214 yr Hi there, What we really need is for you to get the system running with a monitor and keyboard attached. Login to the console and then type the following command: tail /var/log/syslog -f This will begin printing your log directly to the monitor and when the system crashes, you can capture the final events in the log prior to the crash, which should provide some indication of what is causing it. All the best, Jon
September 7, 20214 yr Author hi Jon Syslog tail attached - the only thing I can see is there is a time difference 10 hours which is the difference between UTC and Australian EST
September 7, 20214 yr Author just managed to get diagnostic to download before it became unresponsive tower-diagnostics-20210907-1027.zip
September 7, 20214 yr Author something weird with the time going on tried changing the NTP servers root@Tower:~# hwclock --show 2021-09-07 20:47:19.137429+10:00 root@Tower:~# ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== LOCAL(0) .LOCL. 10 l 438 64 100 0.000 +0.000 0.000 +Server.local 216.239.35.0 2 u 46 64 177 0.533 -3.040 1.881 -resolv.internod 203.35.83.242 2 u 27 64 177 13.468 +5.395 2.128 +time.cloudflare 10.85.8.92 3 u 35 64 177 11.104 -0.635 2.659 *time4.google.co .GOOG. 1 u 34 64 177 158.321 +1.003 2.384
September 7, 20214 yr Author Hi Jon, kernel reports TIME_ERROR: 0x41: Clock Unsynchronized appears ? in the syslog tower-syslog-20210907-0217.zip
September 7, 20214 yr Hi Gibbo, Is the last screenshot you have posted above what was seen when the system crashed?
September 7, 20214 yr Author Hi Jonp yes all the screenshots are from when the system becomes unresponsive gui is still there but only hard reset gets it working. it is becoming very frustrating- love the simplicity of unraid operation but this new build is going bad. any suggestion greatly appreciated.
September 8, 20214 yr To be perfectly honest I have absolutely no idea what is wrong with your system. The logs aren't printing any significant errors out and I doubt time sync issue is causing the crash. One thing to try would be to leave the array in a stopped state but with the server on and see if just idling like that you can recreate the crash. If so, then we know it isn't storage related. If not, then maybe there is something amiss with the storage or storage controller(s).
September 8, 20214 yr Author Hi Jonp Appriciate your help. I Emailed you some more information - I have tried to run the system with nothing running just the USB and still the same outcome random crash just hangs only physical restart will solve it.
September 15, 20214 yr Author I have found a work around to the random crashing which may help someone else, I added initrd=/bzroot acpi=off kernel option in the syslinux configuration. This worked for both 6.9.2 and 6.10.0rc You can temporarirly try by selecting tab at the boot menu and manually add acpi=off to the line to make it permanent add the acpi=off option to the syslinux config. I also found issues with the NTP service and syslog was being flooded with "nchan: a message from the past has just been published. unless the system time has been adjusted, this should never happen." I tried several NTP servers close to home but still the error persisted - I have now stopped the NTP service and the issue is no longer apparent. Edited September 15, 20214 yr by Gibbo592
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.