Jump to content

unraid shares drop dockers fail


Recommended Posts

Hello,

 

I am getting some interetsing errors after a day or 3 of running my shares drop and all my dockers cease to work, the gui of course is also buggy and not working. server has 72gb of ram and 49tb of free space.... 

 

The only way to fix it is by rebooting the server

 

I am running crashplan as a docker which is trying to backup several tb of data could it be eating up all the memory ?

 

 

I am seeing this error in syslogs

 

2020-04-15 09:36:47    Local7.Alert    192.168.1.12    Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [alert] 7394#7394: worker process 4099 exited on signal 6
2020-04-15 09:36:47    Local7.Critical    192.168.1.12    Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [crit] 4100#4100: ngx_slab_alloc() failed: no memory
2020-04-15 09:36:47    Local7.Error    192.168.1.12    Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: shpool alloc failed
2020-04-15 09:36:47    Local7.Error    192.168.1.12    Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: nchan: Out of shared memory while allocating message of size 996. Increase nchan_max_reserved_memory.
2020-04-15 09:36:47    Local7.Error    192.168.1.12    Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: *1035724 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
2020-04-15 09:36:47    Local7.Error    192.168.1.12    Apr 15 09:36:47 Argos nginx: 2020/04/15 09:36:47 [error] 4100#4100: MEMSTORE:00: can't create shared message for channel /cpuload
2020-04-15 09:36:48    Local7.Alert    192.168.1.12    Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [alert] 7394#7394: worker process 4100 exited on signal 6
2020-04-15 09:36:48    Local7.Critical    192.168.1.12    Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [crit] 4113#4113: ngx_slab_alloc() failed: no memory
2020-04-15 09:36:48    Local7.Error    192.168.1.12    Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: shpool alloc failed
2020-04-15 09:36:48    Local7.Error    192.168.1.12    Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: nchan: Out of shared memory while allocating message of size 995. Increase nchan_max_reserved_memory.
2020-04-15 09:36:48    Local7.Error    192.168.1.12    Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: *1035729 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
2020-04-15 09:36:48    Local7.Error    192.168.1.12    Apr 15 09:36:48 Argos nginx: 2020/04/15 09:36:48 [error] 4113#4113: MEMSTORE:00: can't create shared message for channel /cpuload
2020-04-15 09:36:49    Local7.Alert    192.168.1.12    Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [alert] 7394#7394: worker process 4113 exited on signal 6
2020-04-15 09:36:49    Local7.Alert    192.168.1.12    Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [alert] 7394#7394: worker process 4114 exited on signal 6
2020-04-15 09:36:49    Local7.Critical    192.168.1.12    Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [crit] 4115#4115: ngx_slab_alloc() failed: no memory
2020-04-15 09:36:49    Local7.Error    192.168.1.12    Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [error] 4115#4115: shpool alloc failed
2020-04-15 09:36:49    Local7.Error    192.168.1.12    Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [error] 4115#4115: nchan: Out of shared memory while allocating message of size 997. Increase nchan_max_reserved_memory.
2020-04-15 09:36:49    Local7.Error    192.168.1.12    Apr 15 09:36:49 Argos nginx: 2020/04/15 09:36:49 [error] 4115#4115: *1035752 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
 

 

 

memory stats.PNG

syslog.PNG

outofdiskspace.PNG

dockerdrive.PNG

noshares.PNG

drivesaregreen.PNG

dockersarerunning.PNG

Link to comment

Also as an fyi i cannot run the normal diags on this as of course they are frozen when the server is... the logs after a reboot are here 

 

and i dound this disk error but the ball is green (it is a ssd cache)

Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Write Protect is off
Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Mode Sense: 7f 00 10 08
Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
Apr 16 23:59:20 Argos kernel: sdd: sdd1
Apr 16 23:59:20 Argos kernel: sd 3:0:2:0: [sdd] Attached SCSI disk
Apr 16 23:59:20 Argos kernel: BTRFS: device fsid f64d3fca-2954-4e54-a739-b3c62a96626e devid 1 transid 6078670 /dev/sdd1
Apr 17 00:00:25 Argos emhttpd: Samsung_SSD_860_EVO_500GB_S3Z1NB0KA81758X (sdd) 512 976773168
Apr 17 00:00:25 Argos emhttpd: import 30 cache device: (sdd) Samsung_SSD_860_EVO_500GB_S3Z1NB0KA81758X
Apr 17 00:00:30 Argos root: /usr/sbin/wsdd
Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): disk space caching is enabled
Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): has skinny extents
Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): enabling ssd optimizations
Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): resizing devid 1
Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): new size for /dev/sdd1 is 500107829248
Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): resizing devid 2
Apr 17 00:02:57 Argos kernel: BTRFS info (device sdd1): new size for /dev/sdb1 is 500107829248
Apr 17 00:03:04 Argos root: /usr/sbin/wsdd
Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 Sense Key : 0x5 [current]
Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 ASC=0x21 ASCQ=0x0
Apr 17 00:30:45 Argos kernel: sd 3:0:2:0: [sdd] tag#152 CDB: opcode=0x42 42 00 00 00 00 00 00 00 18 00
Apr 17 00:30:45 Argos kernel: print_req_error: critical target error, dev sdd, sector 976773056
Apr 17 00:30:45 Argos kernel: BTRFS warning (device sdd1): failed to trim 1 device(s), last error -121

argos-diagnostics-20200417-0010.zip

Edited by acbaldwi
added info
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...