John_M Posted March 25, 2018 Share Posted March 25, 2018 (edited) SOLVED: The problem was specific to the Safari web browser and was fixed in unRAID 6.5.1-rc6. I'm getting a series of nginx-related error messages in my syslog that I've never seen before: Mar 25 19:23:49 Northolt nginx: 2018/03/25 19:23:49 [crit] 7004#7004: ngx_slab_alloc() failed: no memory Mar 25 19:23:49 Northolt nginx: 2018/03/25 19:23:49 [error] 7004#7004: shpool alloc failed Mar 25 19:23:49 Northolt nginx: 2018/03/25 19:23:49 [error] 7004#7004: nchan: Out of shared memory while allocating message of size 293. Increase nchan_max_reserved_memory. Mar 25 19:23:49 Northolt nginx: 2018/03/25 19:23:49 [error] 7004#7004: *224281 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" Mar 25 19:23:49 Northolt nginx: 2018/03/25 19:23:49 [error] 7004#7004: MEMSTORE:00: can't create shared message for channel /cpuload Mar 25 19:31:26 Northolt bunker: Verify task for disk3 finished, duration: 13 hr, 31 min, 25 sec. They are repeated over and over and began during a scheduled run of the Dynamix File Integrity plugin to check the contents of disk3. Since the end of that task they have stopped appearing. It looks as though something is failing to allocate memory. northolt-diagnostics-20180325-1948.zip EDIT: I found this from two years ago but it was supposedly fixed. Edited April 16, 2018 by John_M Found a reference, marked as solved Quote Link to comment
John_M Posted March 27, 2018 Author Share Posted March 27, 2018 It did a similar thing yesterday while idle. It spewed the error messages for less than a minute (from 17:08:16 to 17:08:37) and then stopped doing it. I haven't rebooted. I'll see if it does it again to try to determine if there's a pattern. northolt-diagnostics-20180327-1133.zip Quote Link to comment
shaunsund Posted March 27, 2018 Share Posted March 27, 2018 I'll post my diags after OOM messages. I like nginx, but the memory handling/configuration seems to ned some work. unzorg-diagnostics-20180327-0741.zip Quote Link to comment
John_M Posted March 27, 2018 Author Share Posted March 27, 2018 25 minutes ago, shaunsund said: I'll post my diags after OOM messages. I like nginx, but the memory handling/configuration seems to ned some work. I don't see any similarities with my case at all. You had an OOM - it looks as though python triggered it - and nginx got reaped. I'm seeing nginx/nchan allocation failures and not a single OOM. There are things you can try to avoid the OOMs, or you might want to start your own thread. Quote Link to comment
John_M Posted April 1, 2018 Author Share Posted April 1, 2018 I'm now seeing this on my other two servers during the monthly parity check: Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [crit] 13132#13132: ngx_slab_alloc() failed: no memory Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: shpool alloc failed Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: nchan: Out of shared memory while allocating message of size 3181. Increase nchan_max_reserved_memory. Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: *280770 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost" Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: MEMSTORE:00: can't create shared message for channel /var Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [crit] 13132#13132: ngx_slab_alloc() failed: no memory Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: shpool alloc failed Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: nchan: Out of shared memory while allocating message of size 8911. Increase nchan_max_reserved_memory. Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: *280771 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Apr 1 14:42:03 Mandaue nginx: 2018/04/01 14:42:03 [error] 13132#13132: MEMSTORE:00: can't create shared message for channel /disks and Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [crit] 7019#7019: ngx_slab_alloc() failed: no memory Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: shpool alloc failed Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: nchan: Out of shared memory while allocating message of size 3177. Increase nchan_max_reserved_memory. Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: *1328700 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost" Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: MEMSTORE:00: can't create shared message for channel /var Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [crit] 7019#7019: ngx_slab_alloc() failed: no memory Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: shpool alloc failed Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: nchan: Out of shared memory while allocating message of size 5178. Increase nchan_max_reserved_memory. Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: *1328701 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Apr 1 10:02:49 Lapulapu nginx: 2018/04/01 10:02:49 [error] 7019#7019: MEMSTORE:00: can't create shared message for channel /disks It seems to be associated with periods of heavy disk activity. Am I really the only one seeing this? Diagnostics in OP. Quote Link to comment
John_M Posted April 1, 2018 Author Share Posted April 1, 2018 (edited) Just to confirm, the third server is doing the same during its parity check: Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [crit] 7004#7004: ngx_slab_alloc() failed: no memory Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: shpool alloc failed Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: nchan: Out of shared memory while allocating message of size 3181. Increase nchan_max_reserved_memory. Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: *1542554 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost" Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: MEMSTORE:00: can't create shared message for channel /var Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [crit] 7004#7004: ngx_slab_alloc() failed: no memory Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: shpool alloc failed Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: nchan: Out of shared memory while allocating message of size 5233. Increase nchan_max_reserved_memory. Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: *1542555 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Apr 1 14:46:15 Northolt nginx: 2018/04/01 14:46:15 [error] 7004#7004: MEMSTORE:00: can't create shared message for channel /disks In each case this is while the parity check is in progress. Does anyone know what "channel /disks" and "channel /var" refer to? In my OP is was "channel /cpuload". Edited April 1, 2018 by John_M Added the fact that it's during parity check Quote Link to comment
John_M Posted April 1, 2018 Author Share Posted April 1, 2018 The repeated error messages stopped once the parity check was complete. Has anyone else seen them today? There's no indication in the GUI, only in the syslog. Quote Link to comment
John_M Posted April 1, 2018 Author Share Posted April 1, 2018 Would anyone be prepared to look and confirm that they are not seeing similar entries in their syslog, please? I'm using 6.5.0 on two servers and 6.5.1-rc3 on the third and the error messages seem to appear during periods of high disk activity - today during a monthly parity check and last Sunday during a scheduled Dynamix File Integrity scan - and they repeat over and over. Quote Link to comment
1812 Posted April 1, 2018 Share Posted April 1, 2018 That error does not appear on my server running 6.5.1-rc3, nor on the one running 6.5.0 Quote Link to comment
John_M Posted April 1, 2018 Author Share Posted April 1, 2018 1 hour ago, 1812 said: That error does not appear on my server running 6.5.1-rc3, nor on the one running 6.5.0 Thanks for checking, I appreciate it. 1 Quote Link to comment
John_M Posted April 2, 2018 Author Share Posted April 2, 2018 Inspired by this thread I updated the server that first showed the error to version 6.5.1-rc3 and restarted in Safe Mode. I then ran a non-correcting parity check and waited. After a while the errors started appearing in the syslog. I was beginning to think that it was caused by the SNMP plugin but that doesn't seem to be the case. I have no dockers running and VMs are disabled. The nginx worker process is using around 800 MB of RAM. northolt-diagnostics-20180402-0207.zip Quote Link to comment
John_M Posted April 2, 2018 Author Share Posted April 2, 2018 The parity check I started last night in Safe Mode completed successfully but my syslog is full of nginx-related errors just like the ones above. So the issue is not related to a plugin. It isn't related to a docker either - I had the docker service enabled but no dockers running. I have no VMs on this server and the VM service is permanently disabled. I've attached the diagnostics taken just after the parity check completed with zero errors. northolt-diagnostics-20180402-1258.zip Quote Link to comment
John_M Posted April 8, 2018 Author Share Posted April 8, 2018 (edited) I updated one server to 6.5.1-rc4 in the hope it would fix the problem but it didn't. Diagnostics grabbed during a parity check. mandaue-diagnostics-20180408-0246.zip Edited April 8, 2018 by John_M Added diagnostics Quote Link to comment
limetech Posted April 8, 2018 Share Posted April 8, 2018 1 hour ago, John_M said: I updated one server to 6.5.1-rc4 in the hope it would fix the problem but it didn't. There are three topics going for this same issue & we have been trying to reproduce without success. Seems like you can reproduce this issue right? If so, please try adding this line in your 'go' file just before emhttp is started: sed -i 's/$arg_buffer_length/1/g' /etc/rc.d/rc.nginx Then reboot (sorrry) and see if issue goes away. Quote Link to comment
John_M Posted April 8, 2018 Author Share Posted April 8, 2018 4 minutes ago, limetech said: There are three topics going for this same issue & we have been trying to reproduce without success. Seems like you can reproduce this issue right? If so, please try adding this line in your 'go' file just before emhttp is started: sed -i 's/$arg_buffer_length/1/g' /etc/rc.d/rc.nginx Then reboot (sorrry) and see if issue goes away. Thanks for the reply, Tom. I've searched and haven't found any other threads about this specific error spamming the syslog. It might well be linked with the problem some people are seeing where nginx uses an excessive amount (several gigabytes) of RAM but that isn't what I'm seeing. Nor has anyone else confirmed that they are seeing my particular problem. I can indeed reproduce my issue simply by running a parity check and waiting. So thanks for the suggestion. I'll edit my go file and reboot, start a non-correcting parity check and report back tomorrow. Thanks again. Quote Link to comment
John_M Posted April 8, 2018 Author Share Posted April 8, 2018 @limetech The modification to the go file did not fix the problem. Errors from nginx started to spam the syslog about two hours after the start of a parity check. This time I'm getting three similar sets of five messages per second relating to the "channels" /var, /disks and /cpuload and the syslog rapidly fills up: Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [crit] 7210#7210: ngx_slab_alloc() failed: no memory Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: shpool alloc failed Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: nchan: Out of shared memory while allocating message of size 3181. Increase nchan_max_reserved_memory. Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: *31760 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost" Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: MEMSTORE:00: can't create shared message for channel /var Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [crit] 7210#7210: ngx_slab_alloc() failed: no memory Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: shpool alloc failed Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: nchan: Out of shared memory while allocating message of size 8888. Increase nchan_max_reserved_memory. Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: *31761 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: MEMSTORE:00: can't create shared message for channel /disks Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [crit] 7210#7210: ngx_slab_alloc() failed: no memory Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: shpool alloc failed Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: nchan: Out of shared memory while allocating message of size 339. Increase nchan_max_reserved_memory. Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: *31762 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" Apr 8 07:04:42 Mandaue nginx: 2018/04/08 07:04:42 [error] 7210#7210: MEMSTORE:00: can't create shared message for channel /cpuload I'll keep one server on 6.5.1-rc4 for experimenting with but I'm going to have to move the others back to 6.4.1. Is there anything else I could try? I have already tried Safe Mode. Diagnostic grabbed during parity check. mandaue-diagnostics-20180408-1035.zip Quote Link to comment
John_M Posted April 8, 2018 Author Share Posted April 8, 2018 I've tried searching the forum and the only reports I can find that mention these error messages (apart from a couple of false alarms) are written by me. It might be related to the high memory usage issue but one person experiencing that issue has confirmed that he isn't seeing this issue. I've tried Googling the error messages but I'm only seeing reports from 3 to 5 years ago. For example a search for "ngx_slab_alloc() failed: no memory" reveals this from 2013 and this from 2016. What is interesting is that the same guy, Wandenberg answers both questions and in 2013 he said that there was a known issue with the nginx code, which surely must have been fixed by now. As part of his reply he gives a link but I really don't have a clue what they are talking about. I have the issue on all three servers and it's 100% repeatable - just run a parity check and stand back - it takes a couple of hours for the error messages to start. That's the case for normal boot and safe mode, with no Dockers running and no VMs. I first saw the problem with unRAID 6.5.0 and I've seen it with 6.5.1-rc3 and now -rc4. I don't believe I can be alone but so far nobody has confirmed that they have the problem too. If you don't check your syslog you might well not notice as the system continues to function, though it certainly affects performance. 1 Quote Link to comment
limetech Posted April 8, 2018 Share Posted April 8, 2018 7 hours ago, John_M said: I have the issue on all three servers and it's 100% repeatable - just run a parity check and stand back - it takes a couple of hours for the error messages to start. Please try 6.5.1-rc5 on the next branch. This brings nginx and nchan up to their latest versions which will be necessary in order to correspond with the developers to help solve this problem. What would also be most helpful, if it doesn't make a difference in repeatability, is to run in safe mode just to be as vanilla as possible. I think this issue is related to the other nginx/nchan issues being reported, just manifesting a little differently. Thx for your help! Quote Link to comment
John_M Posted April 8, 2018 Author Share Posted April 8, 2018 3 minutes ago, limetech said: Please try 6.5.1-rc5 on the next branch. This brings nginx and nchan up to their latest versions which will be necessary in order to correspond with the developers to help solve this problem. What would also be most helpful, if it doesn't make a difference in repeatability, is to run in safe mode just to be as vanilla as possible. I think this issue is related to the other nginx/nchan issues being reported, just manifesting a little differently. Thx for your help! Thanks Tom. I'm downloading it now. I'll restart in safe mode with no dockers or VMs and run a parity check again. Quote Link to comment
John_M Posted April 8, 2018 Author Share Posted April 8, 2018 Updated to -rc5 and before rebooting I commented out the sed command in my go file that @limetech asked me to try. Now running in Safe Mode. No dockers or VMs running. I kicked off another non-correcting parity check and I'll leave it alone for a few hours and then check on it. Quote Link to comment
limetech Posted April 8, 2018 Share Posted April 8, 2018 1 hour ago, John_M said: I commented out the sed command in my go file Right, you can get rid of that. It should have had absolutely no effect, just wanted to confirm that. Quote Link to comment
John_M Posted April 8, 2018 Author Share Posted April 8, 2018 3 hours ago, limetech said: Right, you can get rid of that. It should have had absolutely no effect, just wanted to confirm that. Thanks. I'll delete the line completely before my next reboot. Parity check has been running for 4 hours 40 minutes and the nginx error messages have not appeared this time. This is a definite improvement. I'll let it run its course and see how it stands tomorrow. Quote Link to comment
John_M Posted April 9, 2018 Author Share Posted April 9, 2018 @limetech I'm afraid -rc5 doesn't fix the nginx errors. It took 6 hours before they started happening but they are back, thick and fast. I'll let the parity check continue while I sleep. Safe Mode diagnostics: mandaue-diagnostics-20180409-0214.zip 1 Quote Link to comment
John_M Posted April 9, 2018 Author Share Posted April 9, 2018 Parity check completed without parity errors but with plenty of nginx errors. Is it possible to try increasing the nchan_max_reserved_memory parameter they mention? Safe Mode diagnostics after completion: mandaue-diagnostics-20180409-0959.zip Quote Link to comment
John_M Posted April 9, 2018 Author Share Posted April 9, 2018 A different server also running 6.5.1-rc5 but in normal mode with plugins and two dockers running plus a lightweight Observium VM also produced the nginx errors during a parity check, but also took longer for the messages to start appearing that with previous unRAID versions. Servers "lapulapu" and "mandaue" have similar hardware specs: consumer Asus motherboards, AMD A88X chipsets, socket FM2/FM2+ processors, 16 GB RAM, Dell H310 HBAs. Normal boot mode diagnostics after completion: lapulapu-diagnostics-20180409-1235.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.