UNRaid becoming unresponsive every few days.


Recommended Posts

Hi Hi,

 

I recently built a new server, jump a few hurdles and it is up and running but every few days it just locks up. It is a Ryzen based one and I have done all the suggested tweaks for it but I am not sure this is the cause. It appears to get stuck in a loop after loosing connection to the NAS. It has happened twice now in a week and I have managed to grab the diagnostics both times.

 

Anyone got any ideas? I have to pretty much forcibly shutdown the server and reboot it every time which obviously is not really healthy.

 

Any help would be greatly appreciated.

 

diagnostics.zip

Link to comment
14 hours ago, LFFPicard said:

Hi Hi,

 

I recently built a new server, jump a few hurdles and it is up and running but every few days it just locks up. It is a Ryzen based one and I have done all the suggested tweaks for it but I am not sure this is the cause. It appears to get stuck in a loop after loosing connection to the NAS. It has happened twice now in a week and I have managed to grab the diagnostics both times.

 

Anyone got any ideas? I have to pretty much forcibly shutdown the server and reboot it every time which obviously is not really healthy.

 

Any help would be greatly appreciated.

 

diagnostics.zip 253.06 kB · 1 download

Which Ryzen version. I think the first gen chips had issues like this but there were workarounds.  If it is a first Gen, spaceinvader one has some tutorials on youtube to resolve the problem.  Years ago I had lockup problems on a x170 xeon platform and a firmware upgrade fixed it.

Edited by Jessie
Link to comment

 

16 hours ago, Vr2Io said:

It not like Ryzen related, you have OOM ( out of memory ) and call trace. Suggest troubleshoot on docker/apps.

 

 

Thanks for this, not sure how I am to troubleshoot it as I am still very new to Unraid. But essentially are you saying that after a few days I simply run out of Ram?

 

Edit: I have already run a memtest when I first set up the server and after finding my first few issues (now resolved) and came up clear after like 20 passes.

 

 

18 hours ago, Jessie said:

Which Ryzen version. I think the first gen chips had issues like this but there were workarounds.  If it is a first Gen, spaceinvader one has some tutorials on youtube to resolve the problem.  Years ago I had lockup problems on a x170 xeon platform and a firmware upgrade fixed it.

It is a new build so is a Ryzen 7 3700X I have done the c-states work around etc. allready

Edited by LFFPicard
Link to comment
Just now, Vr2Io said:

Yes, seems cause by docker.

That is a surprise ha, I thought 16gb would of been plenty. I mean I can slap 32gb in there instead but what would the best way of determining which docker is causing it? 

I have Plex using 6gb of ram as transcoder but as far as I have seen/know the other dockers are obviously limited but not using a lot of memory. My first thing is I can remove or lower the plex transcode for a start.

Link to comment
15 minutes ago, LFFPicard said:

Thanks for this, not sure how I am to troubleshoot it as I am still very new to Unraid. But essentially are you saying that after a few days I simply run out of Ram?

If it is caused by one or more docker containers you can check which ones are using the most RAM by turning on the advanced view in the upper right corner in the Dockers tab.

 

image.thumb.png.f781a303f212ae4f9fa57b02a96cb58f.png

 

You can also limit how much RAM a docker container uses by adding the --memory= parameter to the docker container Extra Parameters.

 

Here is an example of one of my containers which I have limited to using a maximum of 4GB RAM:

 

image.thumb.png.ab509c562aec8da7501e1efb2597c67c.png

Edited by Hoopster
  • Like 1
Link to comment

Increase system RAM not a solution, you need found out the cause. In fact, I don't family with docker and its setting, but I think some setting may avoid crash whole system.

 

7 minutes ago, LFFPicard said:

My first thing is I can remove or lower the plex transcode for a start.

Should be a good try.

Link to comment
4 minutes ago, Hoopster said:

If it is caused by one or more docker containers you can check which ones are using the most RAM by turning on the advanced view in the upper right corner in the Dockers tab.

Yeah that's about the only thing I could think of to keep an eye on it. In fact I can see Tdarr_aio is using almost 2gb at the moment. I am wondering it that will exponentially get higher... I will keep an eye on it. I know v2 of that docker is due soon so maybe a complete overhaul fixes that eating the most ram. Will keep an eye on it.

 

5 minutes ago, Vr2Io said:

Increase system RAM not a solution, you need found out the cause. In fact, I don't family with docker and its setting, but I think some setting may avoid crash whole system.

 

Should be a good try.

Dropped Plex from 6gb to 4gb ram to use. If it still causes a problem I will stop it using ram transcode entirely.

Link to comment
20 minutes ago, LFFPicard said:

I thought 16gb would of been plenty.

It should be if things are properly configured, unless you have a really high number of docker containers or if you are running multiple VMs.

 

When things start to slow down on your server or after it has been running for a while, check how unRAID memory is being used by looking at the Memory graph in Stats --> System Stats tab.

 

image.png.b13507957186a587d680f705cb10928c.png

 

Having a lot of cached RAM is not an issue; that's just how Linux works as cached RAM can be allocated as needed.  If your Used RAM is consistently high, you need to investigate further why that is happening.

Edited by Hoopster
Link to comment
9 minutes ago, LFFPicard said:

Dropped Plex from 6gb to 4gb ram to use

How are you doing this? 

 

Just setting a RAM limit on the container as illustrated above does not affect the RAM Plex transcoding can use.  That only has an impact on the Plex application itself.

 

If you are not limiting Plex RAM transcoding by setting up a RAM disk, Plex transcoding can use ALL the RAM on your server and cause it to crash; especially if more than one transcode is happening simultaneously or a long transcode is taking place.

 

Link to comment
9 minutes ago, Hoopster said:

How are you doing this? 

 

Just setting a RAM limit on the container as illustrated above does not affect the RAM Plex transcoding can use.  That only has an impact on the Plex application itself.

 

If you are not limiting Plex RAM transcoding by setting up a RAM disk, Plex transcoding can use ALL the RAM on your server and cause it to crash; especially if more than one transcode is happening simultaneously or a long transcode is taking place.

 

I followed a guide for it, forget which one but I have a user script set up so I changed that. Obviously wont take effect until I restart the array as it is set to run the script on array start up.

 

Script - I changed from 6g to 4g for the moment.

#!/bin/bash
mkdir /tmp/PlexRamScratch
chmod -R 777 /tmp/PlexRamScratch
mount -t tmpfs -o size=4g tmpfs /tmp/PlexRamScratch

 

Link to comment
On 1/10/2021 at 9:13 AM, LFFPicard said:

 

Thanks for this, not sure how I am to troubleshoot it as I am still very new to Unraid. But essentially are you saying that after a few days I simply run out of Ram?

 

Edit: I have already run a memtest when I first set up the server and after finding my first few issues (now resolved) and came up clear after like 20 passes.

 

 

It is a new build so is a Ryzen 7 3700X I have done the c-states work around etc. allready

I think the c state issue was relevant to ryzen 1 systems.  I've built plenty of systems on ryzen 2 and 3 platforms with no mods needed.  Was it doing it before you adjusted the c states?  Maybe reset the board back to factory and start again.

For reference, I built on asus and gigabyte x170 platforms and x570 gaming x and b550 gaming x, all of which ran fine with r7 3770x.

 

If you are passing through pci devices maybe isolate them and see if that is causing it.  And as the previous person said, introduce your dockers one at a time to see if they are causing it.

 

Link to comment
  • 2 weeks later...

Just an update for anyone who was following this.

It turned out to be a memory leak in the tdarr_aio docker. I had it running constantly and after a few days it had eaten all the ram. Since the last crash I have not had the docker running at all and I am currently on 13days uptime with no issues.

 

That being said. tdarr version 2 is out soon so that will fix the issue and I will give it another test run then.

  • Like 2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.