Jump to content

6.12.X Random Crashes roughly every 2-3 weeks


Go to solution Solved by mwasserman,

Recommended Posts

HI everyone,

I've been running Unraid on this Lenovo ThinkServer TS140 for about 6 years without a single issue. As of about 2-3 months ago I've been getting random lockups roughly every 2-3 weeks.

 

  • Unraid 6.12.2
  • Process: Intel Xeon E3-1246 v3
  • Memory: 32GB ECC
  • Running many dockers and VMs, nothing new between stable and random crashes. 

 

syslog to usb stick was enabled during the last crash. diagnostics dump and syslog attached.

 

The last crash occurred sometime between these 2 lines. 
Aug  3 02:00:38 Tower root: /mnt/cache: 188.6 GiB (202545577984 bytes) trimmed on /dev/sdg1
Aug  3 18:19:16 Tower kernel: microcode: microcode updated early to revision 0x28, date = 2019-11-12

 

I'm in the process of running Memtest86+ v6.20 now to see if anything comes up.

 

Any help to figure out what is going on here is much approached. 

tower-diagnostics-20230803-1828.zip syslog

Edited by mwasserman
Link to comment
  • mwasserman changed the title to 6.12.2 Random Crashes roughly every 2-3 weeks

Nothing being logged usually points to a hardware problem, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. 

Link to comment
  • 1 month later...

I've tried a few different changes, so far still getting random crashes every 3-6 days.

 

Here is what I have done and some new information. Can anyone help me make sense of the errors I as able to see on the monitor

  1. Attached monitor and keyboard so I can see the terminal after crash
  2. Ran Memtest86+ v6.20. Passed 1 round
    1. MemTest.thumb.jpg.94c9b899c890a28efb281a9d3ffba411.jpg
  3. Changed out power supply
  4. Upgraded to 6.12.3
  5. Server ran for 6 days before complete dead lock. Nothing on monitor or keyboard, numlock didn't even work
  6. Read on other posts, this can be caused by duplicati docker. Shut down duplicati docker
  7. Crashed after 2 days but this time the terminal still worked. Screenshot of errors
    1. 588196923_Erroraftercrash2023-09-04.thumb.jpg.7a5061b043c312eafb7eaab098f4d74f.jpg
    2. OCR of errors to make this searchable
      1. Tower login: crond [1420]: exit status 126 from user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null 
        crond [11850]: unable to exec /usr/sbin/sendmail: cron output for user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null to /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [14201: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
        Hint: Num Lock on
        Tower login: crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
        crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
  8. I tried to call "diagnostics" from the command line to do diagnostics collection but received "command not found
  9. I just upgrade to 6.12.4, lets see if that makes any difference.

Any other suggestions for things to try? My next step may be to roll back to a pre 6.12 version as everything seems to have gone down hill as of 6.12.X 

 

Edited by mwasserman
Link to comment
  • mwasserman changed the title to 6.12.X Random Crashes roughly every 2-3 weeks
  • 1 month later...

I'm having this exact same issue, and this seems to be the only thread I can find with the same symptoms. Random processes dying with exit code 135 in the system log, making the WebUI, SSH, and other services (i.e. Docker containers) unstable and inaccessible. I was already on 6.12.3 before this started occurring though, and I've introduced several changes into my (previously rock-solid) Unraid server this week:

- Upgraded CPU from i5-10400 to i7-11700k

- Added an RTX 3090

- Upgraded Unraid from 6.12.3 to 6.12.4

- Added several GPU-utilizing containers

 

I'm also considering the possibility my current PSU just isn't strong enough (850 watts) to power my 12 drives _and_ a GPU, but I don't just have a spare lying around to test this theory - I would have to order one.

 

@mwasserman, you don't happen to have a GPU in your machine?

Edited by exibit
Typos
Link to comment
  • 4 weeks later...

For future readers, I just wanted to follow up with what worked for me. I have been using the same USB 3 port for my boot drive for over a year. For some reason (maybe the new processor I installed in my machine, or the latest version of Unraid?) using a USB 3 port with my boot drive now makes my system extremely unstable. All issues went away and I've had 2 weeks of uptime after simply moving my boot drive to a USB 2 port on my machine ¯\_(ツ)_/¯

Edited by exibit
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...