Web GUI & Docker Crashed


Recommended Posts

I've been having crashes pretty much from the beginning of installing unraid. I've dealt with it by having my server plugged into a smart plug that I just turn off and on to restart the server..... Yea it's gross. So I'd like to figure out what the issue is. The longest the server has been running was most recently 14 days (ended about a week ago), which I didn't touch config or files much, besides normal cloud/media streaming. Otherwise it is rarely up more than 3 days at a time.

 

I have done some basic stuff when I first set it up, memtest overnight, downgrading a few versions. Etc. I've tried a ton of Ryzen specific fixes on the forum, but I have no record of exactly what i've tried. I do have the Ryzen 1700 it overclocked to 3.8 (I think), but since the system is still up that doesn't seem to be an issue, temps have been rock solid even when stressing it. I just upgraded to 6.4.1 and was hoping that would resolve it. But it didn't. I did realize that the VM running on the server did not exit, but I couldn't access the web gui. Telnet works though! So I was able to dump the diagnostics. I also have another diagnostics from this morning, that I don't remember creating. One note: I see that the docker volume is full, and `docker ps` doesn't respond. Could this really be what's been bringing down my server so often?

 

Currently: The server is sitting where Web GUI is not accessible, none of the docker containers are running, VM is functional and telnet is functional. Is there any other diagnostics I can try, or should I just restart it?

 

Hardware: https://pcpartpicker.com/user/sup3r_b0wlz/saved/CM7WXL

Unraid Version: 6.4.1

Diagnostics: The 1347 one is the current one, and I included the one from 9:46 this morning also

starkillerbase-diagnostics-20180313-0946.zip

starkillerbase-diagnostics-20180313-1347.zip

syslog-20180313-094600.txt

Edited by sup3rb0wlz
Link to comment
2 hours ago, trurl said:

See the Docker FAQ section about this:

 

https://lime-technology.com/forums/topic/57181-real-docker-faq/

 

 

Thanks! That's a good read. I know I've dealt with this on my decision machine by accessing the docker image and checking the sizes and such. 

 

Could that really be bringing the entire web gui and such down? It makes sense it would crash docker of course.

Link to comment
2 hours ago, Frank1940 said:

I believe the general recommendation is to NEVER overclock a server!!!  You certainly don't need for the basic NAS part of unRAID and if required for some VM usage, it should probably be on bare metal!  I would suggest that you start there.  

 

I'm doing a lot more than "basic NAS" stuff tho. I see a huge benefit in game in my VMs from the extra clock speed. I can play with stock settings. But I haven't seen a difference in my limited testing. 

Link to comment
51 minutes ago, sup3rb0wlz said:

 

Thanks! That's a good read. I know I've dealt with this on my decision machine by accessing the docker image and checking the sizes and such. 

 

Could that really be bringing the entire web gui and such down? It makes sense it would crash docker of course.

Your docker image should not be allowed to fill up. And, it's possible to misconfigure volume mappings in such a way that the application writes into RAM instead of to unRAID storage.

 

All of the unRAID OS, with all of its usual linux OS folders, is in RAM. The boot process unpacks the OS fresh from the archives on the flash drive into RAM on each boot. The only actual storage is the flash drive (which you don't want dockers to write on), which is mounted at /boot, and mountpoints within /mnt, such as /mnt/user (the user shares), /mnt/cache, /mnt/disk1, ... (the actual disks), and /mnt/disks/... (mounted Unassigned Devices). If you map a container volume to some other host path, the application is going to write into RAM. If you fill up RAM, it is going to start killing processes.

 

The other part of the puzzle is that even if you get the container volumes mapped OK, you must configure the application to only write to those container volumes. If you let it write anywhere else, it is going to be writing into the docker image and fill it up.

Link to comment

So what I'm getting is that this is all caused by mis-configured docker volumes or applications writing to a bad path in the container. I'll need to investigate some more though. I just didn't think that that would lock up the web gui. But if it's writing to ram it could be starving the system causing the web gui process to crash It also seems weird that that would be so random.

Link to comment

Another thing that sometimes trips up people is that linux is case sensitive. For example, if you map something to /mnt/cache/..., that is the cache disk, but if you map something to /mnt/Cache/..., that isn't a mountpoint and would go to RAM. Same thing within the application. If you have mapped /mnt/user/media to /Media, but then you tell the application to use /media instead, it is going to be in the docker image instead of the mapped storage.

 

And, the application must reference the container volume, not the host path. So for example, if you have host path /mnt/user/media mapped to container /Media, you must have the application use /Media and not /mnt/user/media.

Link to comment
3 hours ago, trurl said:

Another thing that sometimes trips up people is that linux is case sensitive. For example, if you map something to /mnt/cache/..., that is the cache disk, but if you map something to /mnt/Cache/..., that isn't a mountpoint and would go to RAM. Same thing within the application. If you have mapped /mnt/user/media to /Media, but then you tell the application to use /media instead, it is going to be in the docker image instead of the mapped storage.

 

And, the application must reference the container volume, not the host path. So for example, if you have host path /mnt/user/media mapped to container /Media, you must have the application use /Media and not /mnt/user/media.

 

Yea I know that the difference between container and host path threw me off at first with docker. I re-created my docker image file (with 30GB to debug) and re-created all of my containers. All of the paths look right, but I still need to check the application paths. I think Resilio may be logging to inside the docker image. Hopefully I don't see another crash in the next couple days. If I do I think something else is the culprit

Edited by sup3rb0wlz
Use English
Link to comment
  • 3 weeks later...

Running in safe mode worked great for about 2 weeks. During that time i one-by-one added plugins, until I was back to about what I run with normally. Then after a few days it crashed again.


Then I ran in safe mode again with no plugins but fix common problems for it's troubleshooting mode and got another crash. Here are the logs for it. Any Ideas? I don't see anything in the logs to indicate any issues. I don't think any applications are writing to ram, and the server wasn't under much load. I'm stumped, and I'm thinking it might just be a random crash due to hardware. Even though it doesn't completely shut off. 

starkillerbase-diagnostics-20180401-2139.zip

Link to comment

I don't see anything obvious in your diagnostics.

I see you're using zenstates in your go file to control C6 state. Good.

Have you tried the syslinux rcu_nocbs tweak? Even if you were using it in normal mode you're not using it in safe mode.

label unRAID OS
  menu default
  kernel /bzimage
  append rcu_nocbs=0-15 initrd=/bzroot 

If you want to try a different unRAID version, try 6.5.1-rc3.

Are you still overclocking? Don't.

 

Link to comment
17 minutes ago, sup3rb0wlz said:

My memory XMP profile is set, I'll try disabling that, even though it ran memtest for like 2 days strong.

 

It depends what speed you're using after enabling XMP. If you don't enable XMP your RAM will run an the maximum JEDEC speed (1066 MHz * 2 = 2133). If your RAM is rated fast enough you can enable XMP and choose the 1333 MHz * 2 = 2666 profile without over-clocking anything because that's the rated speed of the 1000-series Ryzen uncore. With the 2000-series you can choose the 1466 MHz * 2 = 2933 profile (if your RAM is fast enough) without over-clocking anything. This assumes you have only two single rank DIMMs. If you have more physical RAM chips the rated speed is lower because of bus loading due to the RAM being unbuffered.

Link to comment

My ram is posted in the first post. Currently running at 2993 instead of the advertised 3200.

 

I tried updating to the latest in the next branch

heres one of the many errors from nginx along with a ls of var/run. Everything else seems to be working correctly.

 

Apr  3 00:16:01 tower nginx: 2018/04/03 00:16:01 [crit] 15952#15952: *793 connect() to unix:/var/run/php5-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: 192.168.1.200, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.250", referrer: "http://192.168.1.250/Main"

 

 


root@tower:/var/log# ls /var/run
acpid.pid      docker.sock=     nginx.origin   ntpd.pid       rsyslogd.pid   syslogd.pid
acpid.socket=  dockerd.pid      nginx.pid      rpc.statd.pid  samba/         ttyd.sock=
atd.pid        emhttpd.socket=  nginx.socket=  rpcbind/       sm-notify.pid  utmp
dbus/          inetd.pid        nmbd.pid       rpcbind.lock   smbd.pid       winbindd.pid
docker/        libvirt/         nscd/          rpcbind.sock=  sshd.pid

 

Edited by sup3rb0wlz
Link to comment
5 hours ago, sup3rb0wlz said:

My ram is posted in the first post. Currently running at 2993 instead of the advertised 3200.

 

Which means that you're overclocking your Ryzen 1700's uncore - which includes the memory controller, L3 cache and Infinity Fabric. Many people think that overclocking only refers to the cores. The memory is over-specified for the task. That's not to say you're unable to use it at 3200 - just that by doing so you'd be overclocking part of the processor.

 

I don't see your memory on the Gigabyte QVL. Are you really using a single DIMM?

Link to comment

yea 1 module. DDR4 is expensive. And I want to have room to upgrade 

 

This is the closest model. C16D instead of C16C and 16GVKB instead of 16CVK

G.SKILL 8GB 2Rx8 F4-3200C16D-16GVKB DS 16-18-18-38 1.35v v v v 2133 G

 

I swear I checked the support list or the ram before I bought it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.