acbaldwi Posted April 17, 2020 Share Posted April 17, 2020 Hello, I am getting some interetsing errors after a day or 3 of running my shares drop and all my dockers cease to work, the gui of course is also buggy and not working. server has 72gb of ram and 49tb of free space.... The only way to fix it is by rebooting the server I am running crashplan as a docker which is trying to backup several tb of data could it be eating up all the memory ? I am seeing this error in syslogs 2020-04-15 09:36:47 Local7.Alert 192.168.1.12 Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [alert] 7394#7394: worker process 4099 exited on signal 6 2020-04-15 09:36:47 Local7.Critical 192.168.1.12 Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [crit] 4100#4100: ngx_slab_alloc() failed: no memory 2020-04-15 09:36:47 Local7.Error 192.168.1.12 Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: shpool alloc failed 2020-04-15 09:36:47 Local7.Error 192.168.1.12 Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: nchan: Out of shared memory while allocating message of size 996. Increase nchan_max_reserved_memory. 2020-04-15 09:36:47 Local7.Error 192.168.1.12 Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: *1035724 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" 2020-04-15 09:36:47 Local7.Error 192.168.1.12 Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: MEMSTORE:00: can't create shared message for channel /cpuload 2020-04-15 09:36:48 Local7.Alert 192.168.1.12 Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [alert] 7394#7394: worker process 4100 exited on signal 6 2020-04-15 09:36:48 Local7.Critical 192.168.1.12 Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [crit] 4113#4113: ngx_slab_alloc() failed: no memory 2020-04-15 09:36:48 Local7.Error 192.168.1.12 Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: shpool alloc failed 2020-04-15 09:36:48 Local7.Error 192.168.1.12 Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: nchan: Out of shared memory while allocating message of size 995. Increase nchan_max_reserved_memory. 2020-04-15 09:36:48 Local7.Error 192.168.1.12 Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: *1035729 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" 2020-04-15 09:36:48 Local7.Error 192.168.1.12 Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: MEMSTORE:00: can't create shared message for channel /cpuload 2020-04-15 09:36:49 Local7.Alert 192.168.1.12 Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [alert] 7394#7394: worker process 4113 exited on signal 6 2020-04-15 09:36:49 Local7.Alert 192.168.1.12 Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [alert] 7394#7394: worker process 4114 exited on signal 6 2020-04-15 09:36:49 Local7.Critical 192.168.1.12 Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [crit] 4115#4115: ngx_slab_alloc() failed: no memory 2020-04-15 09:36:49 Local7.Error 192.168.1.12 Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [error] 4115#4115: shpool alloc failed 2020-04-15 09:36:49 Local7.Error 192.168.1.12 Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [error] 4115#4115: nchan: Out of shared memory while allocating message of size 997. Increase nchan_max_reserved_memory. 2020-04-15 09:36:49 Local7.Error 192.168.1.12 Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [error] 4115#4115: *1035752 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" Quote Link to comment
acbaldwi Posted April 17, 2020 Author Share Posted April 17, 2020 (edited) Also as an fyi i cannot run the normal diags on this as of course they are frozen when the server is... the logs after a reboot are here and i dound this disk error but the ball is green (it is a ssd cache) Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] 976773168 512-byte logical blocks: (500 GB/466 GiB) Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Write Protect is off Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Mode Sense: 7f 00 10 08 Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA Apr 16 23:59:20 Argos kernel: sdd: sdd1 Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Attached SCSI disk Apr 16 23:59:20 Argos kernel: BTRFS: device fsid f64d3fca-2954-4e54-a739-b3c62a96626e devid 1 transid 6078670 /dev/sdd1 Apr 17 00:00:25 Argos emhttpd: Samsung_SSD_860_EVO_500GB_S3Z1NB0KA81758X (sdd) 512 976773168 Apr 17 00:00:25 Argos emhttpd: import 30 cache device: (sdd) Samsung_SSD_860_EVO_500GB_S3Z1NB0KA81758X Apr 17 00:00:30 Argos root: /usr/sbin/wsdd Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): disk space caching is enabled Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): has skinny extents Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): enabling ssd optimizations Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): resizing devid 1 Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): new size for /dev/sdd1 is 500107829248 Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): resizing devid 2 Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): new size for /dev/sdb1 is 500107829248 Apr 17 00:03:04 Argos root: /usr/sbin/wsdd Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 Sense Key : 0x5 [current] Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 ASC=0x21 ASCQ=0x0 Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 CDB: opcode=0x42 42 00 00 00 00 00 00 00 18 00 Apr 17 00:30:45 Argos kernel: print_req_error: critical target error, dev sdd, sector 976773056 Apr 17 00:30:45 Argos kernel: BTRFS warning (device sdd1): failed to trim 1 device(s), last error -121 argos-diagnostics-20200417-0010.zip Edited April 17, 2020 by acbaldwi added info Quote Link to comment
JorgeB Posted April 17, 2020 Share Posted April 17, 2020 See here to enable syslog mirror to flash, then post that syslog when the shares fails again. Cache pool is not redundant, see here to fix. Quote Link to comment
acbaldwi Posted April 18, 2020 Author Share Posted April 18, 2020 On 4/17/2020 at 2:44 AM, johnnie.black said: See here to enable syslog mirror to flash, then post that syslog when the shares fails again. Cache pool is not redundant, see here to fix. Attached are the logs form my syslog server, sorry they are quite big SyslogCatchAll-2020-04-16.zip Quote Link to comment
JorgeB Posted April 18, 2020 Share Posted April 18, 2020 File is too big, even for Notepad ++, post the standard diags, so we can see if something is spamming the syslog. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.