Adubs Posted April 12, 2020 Share Posted April 12, 2020 (edited) Hello my unraid brothers, Running into a head scratcher here and I'm probably just not knowledgeable enough to work my way through this one. Recently It seems as though something is crashing on my server which then causes a runaway memory leak until log is filled and I have to restart unraid. I get a lot of: Apr 12 06:36:47 Mars nginx: 2020/04/12 06:36:47 [alert] 7446#7446: worker process 17403 exited on signal 6 Apr 12 06:36:49 Mars nginx: 2020/04/12 06:36:49 [alert] 7446#7446: worker process 17404 exited on signal 6 Apr 12 06:36:51 Mars nginx: 2020/04/12 06:36:51 [alert] 7446#7446: worker process 17406 exited on signal 6 Apr 12 06:36:53 Mars nginx: 2020/04/12 06:36:53 [alert] 7446#7446: worker process 17410 exited on signal 6 Apr 12 06:36:55 Mars nginx: 2020/04/12 06:36:55 [alert] 7446#7446: worker process 17411 exited on signal 6 Apr 12 06:36:57 Mars nginx: 2020/04/12 06:36:57 [alert] 7446#7446: worker process 17414 exited on signal 6 Apr 12 06:36:59 Mars nginx: 2020/04/12 06:36:59 [alert] 7446#7446: worker process 17417 exited on signal 6 Apr 12 06:37:01 Mars nginx: 2020/04/12 06:37:01 [alert] 7446#7446: worker process 17418 exited on signal 6 Apr 12 06:37:03 Mars nginx: 2020/04/12 06:37:03 [alert] 7446#7446: worker process 17446 exited on signal 6 Apr 12 06:37:05 Mars nginx: 2020/04/12 06:37:05 [alert] 7446#7446: worker process 17464 exited on signal 6 Apr 12 06:37:07 Mars nginx: 2020/04/12 06:37:07 [alert] 7446#7446: worker process 17467 exited on signal 6 Apr 12 06:37:09 Mars nginx: 2020/04/12 06:37:09 [alert] 7446#7446: worker process 17468 exited on signal 6 Apr 12 06:37:11 Mars nginx: 2020/04/12 06:37:11 [alert] 7446#7446: worker process 17471 exited on signal 6 Apr 12 06:37:13 Mars nginx: 2020/04/12 06:37:13 [alert] 7446#7446: worker process 17477 exited on signal 6 Until ultimately: Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [crit] 17478#17478: ngx_slab_alloc() failed: no memory Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: shpool alloc failed Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: nchan: Out of shared memory while allocating message of size 9665. Increase nchan_max_reserved_memory. Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: *464356 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Apr 12 06:37:15 Mars nginx: 2020/04/12 06:37:15 [error] 17478#17478: MEMSTORE:00: can't create shared message for channel /disks This kind of continues until the log gets filled and I can no longer access it. I'm having a hard time understanding where to even look for an issue here as it seems like its exclusive to nginx which I'm assuming is whats used to host the web interface of unraid. dmesg is equally unhelpful (to me that is) [176756.112509] nginx[26013]: segfault at 0 ip 0000000000000000 sp 00007ffc2ec8f258 error 14 in nginx[400000+21000] [176756.112515] Code: Bad RIP value. this repeats as far as my mouse can scroll. The machine still works for the most part but the UI doesnt really respond correctly a lot of the time. Restarting the nginx service with /etc/rc.d/rc.nginx restart seems to fix whatever problem I'm dealing with. It may be relevant that I am using a docker container to do reverse proxy so I can have valid certs and access to certain pages including the unraid webui. Also while I'm not new to linux, I still struggle with understanding a lot of aspects, so ELI5 is appreciated and welcome. Thoughs on what to try? Thanks for your time. mars-syslog-20200412-1728.zip Edited April 14, 2020 by Adubs Quote Link to comment
JonathanM Posted April 12, 2020 Share Posted April 12, 2020 58 minutes ago, Adubs said: including the unraid webui. Try removing external access to the webui. While it should be ok in theory, it's not a validated configuration, for now all webui access should be secured by VPN and not exposed. Quote Link to comment
Adubs Posted April 12, 2020 Author Share Posted April 12, 2020 I've been using a reverse proxy for a while now but this error is new to me. I will refrain from doing so though for the sake of testing. I typically only browse locally anyway. Quote Link to comment
Adubs Posted April 14, 2020 Author Share Posted April 14, 2020 (edited) Well, it happened again browsing local only CPU monitor is dead and log is full Anyone wanna take another stab? mars-syslog-20200414-0100.zip Edited April 14, 2020 by Adubs Quote Link to comment
Adubs Posted April 14, 2020 Author Share Posted April 14, 2020 Just for giggles I removed the dark theme plugin too just now. Shot in the dark but I'm out of ideas. Quote Link to comment
trurl Posted April 14, 2020 Share Posted April 14, 2020 Does it work for even a little while when you first boot? If so, get us the diagnostics instead of just the syslog. Tools - Diagnostics, attach complete diagnostics zip file to your NEXT post. Quote Link to comment
Adubs Posted April 14, 2020 Author Share Posted April 14, 2020 The server seems to work somewhat normally besides the UI taking a dump. mars-diagnostics-20200413-1814.zip Restarting the nginx daemon brings some functionality back but it seems as though it breaks overnight again. Quote Link to comment
Adubs Posted April 14, 2020 Author Share Posted April 14, 2020 I have not. I did get the memory based on the QVL for the board, and I would think that if I was having a memory issue that I shouldnt be experiencing the same issue with nginx workers crashing out endlessly, and that other parts of the system would also be affected. At any rate I will run one tonight. What amount of time would you like me to leave it running for? Quote Link to comment
Dissones4U Posted April 14, 2020 Share Posted April 14, 2020 5 minutes ago, Adubs said: What amount of time would you like me to leave it running for? 24 hours is the typical recommendation Quote Link to comment
Adubs Posted April 14, 2020 Author Share Posted April 14, 2020 I can feel my data addiction withdrawals already. See you guys tomorrow. Quote Link to comment
Adubs Posted April 15, 2020 Author Share Posted April 15, 2020 Good news, no memory issues to report. Quote Link to comment
bunzie Posted April 26, 2020 Share Posted April 26, 2020 I am experiencing the same problem. 128M of mostly: [Sun Apr 26 15:35:26 2020] nginx[22457]: segfault at 0 ip 0000000000000000 sp 00007ffe25f56028 error 14 in nginx[400000+21000] [Sun Apr 26 15:35:26 2020] Code: Bad RIP value. Expanded the log ramdisk: mount -o remount,size=512m /var/log Restarted nginx: /etc/rc.d/rc.nginx restart The UI behavior has returned to normal, and the log messages have stopped. I upgraded to 6.8.3 on 4/21, and this is the first time I have seen this behavior. Quote Link to comment
Adubs Posted April 26, 2020 Author Share Posted April 26, 2020 (edited) I have yet to have the same issue after removing the dark theme plugin. I don't know that it was the direct cause but it stopped after I removed it. You should upload your diagnostics for good measure. Edited April 26, 2020 by Adubs Quote Link to comment
enmesh-parisian-latest Posted August 3, 2020 Share Posted August 3, 2020 Same problem here on 6.8.3 Quote Link to comment
trurl Posted August 3, 2020 Share Posted August 3, 2020 10 hours ago, enmesh-parisian-latest said: Same problem here on 6.8.3 This thread is a few months old now. Since your problem is the "same" have you tried any of the suggestions in this thread? Quote Link to comment
Adubs Posted August 3, 2020 Author Share Posted August 3, 2020 I had it happen again randomly a couple times. I think it happens when you leave a NoVNC session open for an extended period of time. I switched to just using SSH on my VMs and havent looked back. Quote Link to comment
Adubs Posted August 4, 2020 Author Share Posted August 4, 2020 (edited) Did it again to me last night. mars-diagnostics-20200804-0504.zip I'm out of ideas guys. Edited August 4, 2020 by Adubs Quote Link to comment
Adubs Posted August 4, 2020 Author Share Posted August 4, 2020 I was thinking about this on my way to work and I believe leaving the UI open in chrome overnight is causing this. It only ever seems to happen when the interface gets used, which I have not been doing very often these days. I am usually pretty good about closing everything out once I'm done messing with it, but this time I left it up overnight. I'll see if I can replicate it consistently. Quote Link to comment
Squid Posted August 4, 2020 Share Posted August 4, 2020 Try 2 tests. Leave it on the dashboard (which it appears you've been doing), and then leave it on something like Settings Quote Link to comment
Frostbite2600 Posted August 24, 2020 Share Posted August 24, 2020 Mine's back too, here's my thread of troubleshooting as well I did however leave the UI open in a browser for the last couple days, which I haven't done in a while. I'll continue to monitor mine as well but glad to know it's not just me, cause this has plagued me for months and caused me to rebuild my entire setup only to be back at square 1. 1 Quote Link to comment
Adubs Posted August 24, 2020 Author Share Posted August 24, 2020 I have been unsuccessful at reproducing this issue again despite leaving the window open overnight in different screens. 1 Quote Link to comment
Dravas Posted April 8, 2021 Share Posted April 8, 2021 I know this is a old post but i found this that might help https://futurestud.io/tutorials/nginx-solve-reponse-status-0-worker-process-exited Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.