• [6.9.1] bug with nginx / nchan "exited on signal 6"


    Dovy6
    • Urgent

    There appears to be a bug with nchan as seen in this link: https://github.com/slact/nchan/issues/534

    It would seem to me that this is related to the issues I am having. This is the second or third time this has happened to me. Out of nowhere, with no apparent, obvious trigger, my syslog gets filled with hundreds of messages like this one:

    root@unraid:~# tail /var/log/syslog
    Mar 15 00:45:47 unraid nginx: 2021/03/15 00:45:47 [alert] 3161#3161: worker process 4945 exited on signal 6
    Mar 15 00:45:49 unraid nginx: 2021/03/15 00:45:49 [alert] 3161#3161: worker process 4964 exited on signal 6
    Mar 15 00:45:51 unraid nginx: 2021/03/15 00:45:51 [alert] 3161#3161: worker process 4985 exited on signal 6
    Mar 15 00:45:53 unraid nginx: 2021/03/15 00:45:53 [alert] 3161#3161: worker process 5003 exited on signal 6
    Mar 15 00:45:55 unraid nginx: 2021/03/15 00:45:55 [alert] 3161#3161: worker process 5023 exited on signal 6

    This repeats forever until the logs fill up, and while this is happening Unraid grinds slowly to a halt.

    tail /var/log/nginx/error.log shows this

    root@unraid:~# tail -n 50 /var/log/nginx/error.log
    2021/03/15 00:45:20 [alert] 3161#3161: worker process 4358 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:22 [alert] 3161#3161: worker process 4427 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:22 [alert] 3161#3161: worker process 4454 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:24 [alert] 3161#3161: worker process 4461 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:26 [alert] 3161#3161: worker process 4514 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:27 [alert] 3161#3161: worker process 4584 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:28 [alert] 3161#3161: worker process 4599 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:29 [alert] 3161#3161: worker process 4607 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:30 [alert] 3161#3161: worker process 4659 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:31 [alert] 3161#3161: worker process 4712 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:32 [alert] 3161#3161: worker process 4747 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:34 [alert] 3161#3161: worker process 4776 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:36 [alert] 3161#3161: worker process 4795 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:38 [alert] 3161#3161: worker process 4816 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:40 [alert] 3161#3161: worker process 4850 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:41 [alert] 3161#3161: worker process 4872 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:43 [alert] 3161#3161: worker process 4886 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:45 [alert] 3161#3161: worker process 4908 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:47 [alert] 3161#3161: worker process 4945 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:49 [alert] 3161#3161: worker process 4964 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:51 [alert] 3161#3161: worker process 4985 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.
    2021/03/15 00:45:53 [alert] 3161#3161: worker process 5003 exited on signal 6
    ker process: ./nchan-1.2.7/src/store/spool.c:479: spool_fetch_msg: Assertion `spool->msg_status == MSG_INVALID' failed.

    I happened to be using my server, logged in via ssh, when this happened this time, and I was able to run '/etc/rc.d/rc.nginx stop', and this terminates nginx (and obviously means I cannot use the Unraid GUI) but appears to stop the system from crashing to a halt.

     

    Please see

     where some others have noted that this may be related to an old, stale Unraid tab open in a browser somewhere. I will try to track down any open tabs. I only have one other computer that may possibly have a tab open but don't have access to it at this exact moment, so can't test that theory right now.

     

    I am unfortunately unable to trigger this bug on demand.

    I was able to generate a diagnostics.zip, but I'm having trouble uploading it right now. I think its a permissions issue. I'll attach it once I figure that out.

     

    Thanks for your help everyone

     



    User Feedback

    Recommended Comments

    I've found any and all Unraid tabs on all computers and either closed or restarted the tabs, and this problem is still persisting, so that is not actually the fix. My next step will have to be rebooting the server, which from my experience fixes the problem, but only temporarily. Once again, unfortunately, I'm not quite sure how to intentionally reproduce the issue. I do know it renders my server neigh unusable, so I've upped the Priority to Urgent. If there are any other troubleshooting steps to do, please advise

    Link to comment
    Share on other sites

    I went and restarted the browsers on my laptop, while I had putty open in several windows running 'tail -f /var/log/syslog' and 'tail -f /var/log/nginx/error.log'

     

    The errors stopped after restarting the browsers. I am attaching an updated diagnostics.zip file from after in the hope that it can help.

    unraid-diagnostics-20210315-1541.zip

    Link to comment
    Share on other sites

    I am having the exact same issue. I only have tabs open on one computer at the moment. I am trying to get the diagnostics but I cant get it to download right now. 

     

    EDIT: Added Dianostics.

    serverus-diagnostics-20210331-0914.zip

    Edited by relink
    Added Dianostics.
    Link to comment
    Share on other sites

     

    On 3/31/2021 at 6:12 AM, relink said:

    I am having the exact same issue. I only have tabs open on one computer at the moment. I am trying to get the diagnostics but I cant get it to download right now. 

     

    Your issue is not related. The logs show:
     

    Mar 31 09:07:40 SERVERUS kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
    Mar 31 09:07:40 SERVERUS kernel: caller _nv000708rm+0x1af/0x200 [nvidia] mapping multiple BARs

     

    Here is a starting point: 

     

    Link to comment
    Share on other sites

    When you say:

       I happened to be using my server, logged in via ssh, when this happened

    and:

       I had putty open in several windows

    Are you referring to the built-in web terminal that you access from the Unraid webgui? Or do you mean you are using the stand alone application called putty to SSH into your server, and not the built-in web terminal?

     

    Both ways are valid, I'm just trying to narrow down what could be happening

    Link to comment
    Share on other sites

    Also, what browsers are you using? If you use multiple, have you noticed that the issue resolves when you close a particular one?

    Link to comment
    Share on other sites

    I am using the stand alone app, putty, to acess my server. I have not yet determined how to intentionally trigger this happening, unfortunately, so I can't explore and figure out exactly where the bug gets triggered. I access Unraid with both Firefox and Chrome, and yeah, I'm the guy with 3 browser windows and 30 tabs open in each. I am unsure, but to the best of my recollection, it was Chrome that I restarted to fix the problem. I can't promise that I'm correct, though, unfortunately. 

    Link to comment
    Share on other sites

    Please let me know if there's anything you can suggest I do to try to trigger it to help isolate the bug. Also, be advised, I updated to 6.9.2 today. Again, this only happens occasionally (has happened to me at least 3 separate times,) but I don't know why or when

    Link to comment
    Share on other sites

    Thanks! Sorry to hammer on this point, are you saying you don't use the web terminal at all? So we can rule it out as a potential source of the problem?

    Link to comment
    Share on other sites

    I have used the web terminal before. The last time I had this error crop up I was using putty, though. I'm not sure if the web terminal is a potential source. I can open the web terminal and play around and see if I can trigger the bug... Any clues as to what I should do that would potentially cause it? Any help from my logs attached above, when the error happened and when I was able to clear it by restarting web browsers?

     

    Link to comment
    Share on other sites
    On 4/14/2021 at 11:52 AM, ljm42 said:

    Your issue is not related.

    Thank you for this, I believe you may have helped me solve another issue I was having. 

    • Like 1
    Link to comment
    Share on other sites
    On 4/14/2021 at 11:13 AM, Dovy6 said:

    Any clues as to what I should do that would potentially cause it?

     

    You mentioned that you solved the problem by closing forgotten tabs on another computer. Can you estimate how long those tabs had been left open? More than 7 days?

    Link to comment
    Share on other sites

    Almost definitely. I use the tabs often, though. I have the same open browser windows on my laptop for months now, both firefox and chrome, each with at least 2 unraid tabs. Recently I've been using my desktop computer more and my laptop less, meaning my laptop tabs have been stale for a while. Probably over a week since I've used them...

    Link to comment
    Share on other sites

    I'm having a similar issue I have the log filling up in two places though one is /var/log/nginx and the other is /var/log/syslog. They both record the same error.

     

    image.thumb.png.b7bde057023487b8fb0d7477bd0bceab.png

     

    I restart my machine every 2 days, so nothing is stale on my end.

    Quote

     

    root@Tower:~# du -sm /var/log/*

    0 /var/log/btmp

    0 /var/log/cron

    0 /var/log/debug

    1 /var/log/dmesg

    1 /var/log/docker.log

    1 /var/log/faillog

    1 /var/log/lastlog

    1 /var/log/libvirt

    0 /var/log/maillog

    0 /var/log/messages

    0 /var/log/nfsd

    54 /var/log/nginx

    0 /var/log/packages

    1 /var/log/pkgtools

    0 /var/log/plugins

    0 /var/log/pwfail

    0 /var/log/removed_packages

    0 /var/log/removed_scripts

    0 /var/log/removed_uninstall_scripts

    1 /var/log/samba

    0 /var/log/scripts

    0 /var/log/secure

    0 /var/log/setup

    0 /var/log/spooler

    0 /var/log/swtpm

    75 /var/log/syslog

    0 /var/log/vfio-pci

    1 /var/log/wtmp

     

     

    Edited by Mustafa
    Link to comment
    Share on other sites

    Instead of restarting the server I did the following:

    /etc/rc.d/rc.nginx restart

     

    I have dynamix system stats installed, so I went ahead and also increased the size of memory allocated to log from ~120 MBs to 300 MBs

    Edited by Mustafa
    grammar buddy, grammar
    Link to comment
    Share on other sites


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.