Jump to content

Constant crashes since updating to 6.9.x

Featured Replies

Posted

I have had constant crashes (every 12-48hr) ever since updating to 6.9.0, and im now at 6.9.2. I have not been able to figure out why. Every time I have no choice but to do a hard reboot, I cant even SSH in. 

 

The crashes seem to be random as far as I can tell. 

 

I attached diagnostics, but they are from after a reboot. However, I do run a syslog server on my Synology and I have attached what I believe are the logs from when it crashed (~1:30AM) to when I did a hard reboot (~6:15AM).

 

serverus-diagnostics-20210414-0639.zip

All_2021-4-14-6_38_29.csv

I appear to be in the same boat. Only adding my voice to amplify the message. 

On 4/14/2021 at 1:02 PM, relink said:

The crashes seem to be random as far as I can tell. 

I notice that you have a fairly old BIOS from 2019 - there are several updates since then according to the Asus web site for your motherboard.  That may or may not be relevant.


Also, and before you try updating anything, have you run a Memtest lately?  How many RAM modules are plugged in?  Are they running at stock speeds?  (2933MHz

 

While it is possible that the newer Unraid builds are responsible, it is also possible that there is a weakness elsewhere in the system and the newer software is using in the memory slightly differently.  Whenever I have a system that was previously OK start to misbehave, I check out the RAM first (several complete passes if possible).  RAM can also occasionally just go bad over time, so it's always worth checking.

  • Author
3 minutes ago, S80_UK said:

I notice that you have a fairly old BIOS from 2019 - there are several updates since then according to the Asus web site for your motherboard.  That may or may not be relevant.

I can certainly try it. 

 

4 minutes ago, S80_UK said:

Also, and before you try updating anything, have you run a Memtest lately?  How many RAM modules are plugged in?  Are they running at stock speeds?  (2933MHz

 

Yup, it was fine just a few months ago. I have 4 modules, and everything in this system is stock speeds.

 

I did read in another topic that it could possibly be the GPU Stats plugin, so I have also uninstalled that for now. I did not have that plugin prior to 6.9

  • Community Expert

If I read this FAQ entry right you are above acceptable mem speed.

You should run your RAM at 2133MHz, not 2666. (1st gen Ryzen, 4 DIMM, single rank)

You might want to test that.

Yes - that is certainly a possibility.  Again, my advice to the OP wold be to make no change to their system initially but to run a long Memtest and see what happens.  If there are errors, then reduce the RAM speed and retest.  But by how much would depend on the motherboard.  I would say going down to 2133MHz would be a bit drastic for his motherboard - 2666 Might be worth trying first.   If the mem test doesn't show anything I would still try a lower RAM speed setting to see if stability under Unraid then improved.  If it is a genuine software issue somewhere then the RAM speed should make little difference to the stability. 

 

4 hours ago, DuzAwe said:

I appear to be in the same boat. Only adding my voice to amplify the message. 

@DuzAwe - by all means try the guidance here, but it is perfectly possible that you have a different issue.  It would probably benefit you and the OP for you to create a separate thread for your issue so that responses and suggestions do not get mixed.

  • Author

I’ll try running another Ram test, but I won’t be able to take my rig down for that long until Saturday. 
 

is there anything else I could try looking into in the meantime?

 

Also I’ve had no issues since removing the GPU stats plugin, however I am still within the window for another crash to happen. So I’m not calling it solved just yet. 

@S80_UK Im mirroring logs to my usb currently waiting for another crash. But I have also removed the GPU stat plugin and have previously ruled out my ram. Should I get another crash Ill create my own thread. 

  • Author

So far removing the GPU Stats plugin seems to have been the fix. 

 

EDIT: After some reading in the nvidia driver thread, it would appear this issue can be resolved by running the below script on array start. Im going to try it out and see what happens. 

#!/bin/bash
nvidia-persistenced
fuser -v /dev/nvidia*

 

Edited by relink

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...