Hello my unraid brothers,
Running into a head scratcher here and I'm probably just not knowledgeable enough to work my way through this one.
Recently It seems as though something is crashing on my server which then causes a runaway memory leak until log is filled and I have to restart unraid.
I get a lot of:
Apr 12 06:36:47 Mars nginx: 2020/04/12 06:36:47 [alert] 7446#7446: worker process 17403 exited on signal 6
Apr 12 06:36:49 Mars nginx: 2020/04/12 06:36:49 [alert] 7446#7446: worker process 17404 exited on signal 6
Apr 12 06:36:51 Mars nginx: 2020/04/12 06:36:51 [alert] 7446#7446: worker process 17406 exited on signal 6
Apr 12 06:36:53 Mars nginx: 2020/04/12 06:36:53 [alert] 7446#7446: worker process 17410 exited on signal 6
Apr 12 06:36:55 Mars nginx: 2020/04/12 06:36:55 [alert] 7446#7446: worker process 17411 exited on signal 6
Apr 12 06:36:57 Mars nginx: 2020/04/12 06:36:57 [alert] 7446#7446: worker process 17414 exited on signal 6
Apr 12 06:36:59 Mars nginx: 2020/04/12 06:36:59 [alert] 7446#7446: worker process 17417 exited on signal 6
Apr 12 06:37:01 Mars nginx: 2020/04/12 06:37:01 [alert] 7446#7446: worker process 17418 exited on signal 6
Apr 12 06:37:03 Mars nginx: 2020/04/12 06:37:03 [alert] 7446#7446: worker process 17446 exited on signal 6
Apr 12 06:37:05 Mars nginx: 2020/04/12 06:37:05 [alert] 7446#7446: worker process 17464 exited on signal 6
Apr 12 06:37:07 Mars nginx: 2020/04/12 06:37:07 [alert] 7446#7446: worker process 17467 exited on signal 6
Apr 12 06:37:09 Mars nginx: 2020/04/12 06:37:09 [alert] 7446#7446: worker process 17468 exited on signal 6
Apr 12 06:37:11 Mars nginx: 2020/04/12 06:37:11 [alert] 7446#7446: worker process 17471 exited on signal 6
Apr 12 06:37:13 Mars nginx: 2020/04/12 06:37:13 [alert] 7446#7446: worker process 17477 exited on signal 6
Until ultimately:
Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [crit] 17478#17478: ngx_slab_alloc() failed: no memory
Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: shpool alloc failed
Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: nchan: Out of shared memory while allocating message of size 9665. Increase nchan_max_reserved_memory.
Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: *464356 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost"
Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: MEMSTORE:00: can't create shared message for channel /disks
This kind of continues until the log gets filled and I can no longer access it. I'm having a hard time understanding where to even look for an issue here as it seems like its exclusive to nginx which I'm assuming is whats used to host the web interface of unraid.
dmesg is equally unhelpful (to me that is)
[176756.112509] nginx[26013]: segfault at 0 ip 0000000000000000 sp 00007ffc2ec8f258 error 14 in nginx[400000+21000]
[176756.112515] Code: Bad RIP value.
this repeats as far as my mouse can scroll. The machine still works for the most part but the UI doesnt really respond correctly a lot of the time.
Restarting the nginx service with
/etc/rc.d/rc.nginx restart
seems to fix whatever problem I'm dealing with.
It may be relevant that I am using a docker container to do reverse proxy so I can have valid certs and access to certain pages including the unraid webui. Also while I'm not new to linux, I still struggle with understanding a lot of aspects, so ELI5 is appreciated and welcome.
Thoughs on what to try?
Thanks for your time.
mars-syslog-20200412-1728.zip