mwasserman Posted August 4, 2023 Share Posted August 4, 2023 (edited) HI everyone, I've been running Unraid on this Lenovo ThinkServer TS140 for about 6 years without a single issue. As of about 2-3 months ago I've been getting random lockups roughly every 2-3 weeks. Unraid 6.12.2 Process: Intel Xeon E3-1246 v3 Memory: 32GB ECC Running many dockers and VMs, nothing new between stable and random crashes. syslog to usb stick was enabled during the last crash. diagnostics dump and syslog attached. The last crash occurred sometime between these 2 lines. Aug 3 02:00:38 Tower root: /mnt/cache: 188.6 GiB (202545577984 bytes) trimmed on /dev/sdg1 Aug 3 18:19:16 Tower kernel: microcode: microcode updated early to revision 0x28, date = 2019-11-12 I'm in the process of running Memtest86+ v6.20 now to see if anything comes up. Any help to figure out what is going on here is much approached. tower-diagnostics-20230803-1828.zip syslog Edited September 5, 2023 by mwasserman Quote Link to comment
JorgeB Posted August 4, 2023 Share Posted August 4, 2023 Nothing being logged usually points to a hardware problem, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
mwasserman Posted September 5, 2023 Author Share Posted September 5, 2023 (edited) I've tried a few different changes, so far still getting random crashes every 3-6 days. Here is what I have done and some new information. Can anyone help me make sense of the errors I as able to see on the monitor Attached monitor and keyboard so I can see the terminal after crash Ran Memtest86+ v6.20. Passed 1 round Changed out power supply Upgraded to 6.12.3 Server ran for 6 days before complete dead lock. Nothing on monitor or keyboard, numlock didn't even work Read on other posts, this can be caused by duplicati docker. Shut down duplicati docker Crashed after 2 days but this time the terminal still worked. Screenshot of errors OCR of errors to make this searchable Tower login: crond [1420]: exit status 126 from user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null crond [11850]: unable to exec /usr/sbin/sendmail: cron output for user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null to /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [14201: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null Hint: Num Lock on Tower login: crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null I tried to call "diagnostics" from the command line to do diagnostics collection but received "command not found I just upgrade to 6.12.4, lets see if that makes any difference. Any other suggestions for things to try? My next step may be to roll back to a pre 6.12 version as everything seems to have gone down hill as of 6.12.X Edited September 5, 2023 by mwasserman Quote Link to comment
Solution mwasserman Posted October 15, 2023 Author Solution Share Posted October 15, 2023 Downgraded to 6.11.5 and have been up for 21 days, Definitely not a hardware issue likely just another bug in the 6.12.X of Unraid. Going to stay with 6.11.5 for awhile now. Quote Link to comment
exibit Posted October 21, 2023 Share Posted October 21, 2023 (edited) I'm having this exact same issue, and this seems to be the only thread I can find with the same symptoms. Random processes dying with exit code 135 in the system log, making the WebUI, SSH, and other services (i.e. Docker containers) unstable and inaccessible. I was already on 6.12.3 before this started occurring though, and I've introduced several changes into my (previously rock-solid) Unraid server this week: - Upgraded CPU from i5-10400 to i7-11700k - Added an RTX 3090 - Upgraded Unraid from 6.12.3 to 6.12.4 - Added several GPU-utilizing containers I'm also considering the possibility my current PSU just isn't strong enough (850 watts) to power my 12 drives _and_ a GPU, but I don't just have a spare lying around to test this theory - I would have to order one. @mwasserman, you don't happen to have a GPU in your machine? Edited October 21, 2023 by exibit Typos Quote Link to comment
mwasserman Posted October 21, 2023 Author Share Posted October 21, 2023 @exibit, no dedicated GPU on this system. Just using Intel Quick Sync for Plex transcoding. I've now been up 28 days running 6.11.5. No plans to move from this version for awhile. Quote Link to comment
exibit Posted November 15, 2023 Share Posted November 15, 2023 (edited) For future readers, I just wanted to follow up with what worked for me. I have been using the same USB 3 port for my boot drive for over a year. For some reason (maybe the new processor I installed in my machine, or the latest version of Unraid?) using a USB 3 port with my boot drive now makes my system extremely unstable. All issues went away and I've had 2 weeks of uptime after simply moving my boot drive to a USB 2 port on my machine ¯\_(ツ)_/¯ Edited November 15, 2023 by exibit Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.