Jump to content

Random unresponsiveness after large data movements


TheSkaz

Recommended Posts

Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 3559. Increase nchan_max_reserved_memory.
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436728 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost"
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /var
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 2892. Increase nchan_max_reserved_memory.
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436730 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devs?buffer_length=1 HTTP/1.1", host: "localhost"
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /devs
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 2289. Increase nchan_max_reserved_memory.
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436731 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/shares?buffer_length=1 HTTP/1.1", host: "localhost"
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /shares
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 3233. Increase nchan_max_reserved_memory.
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436732 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
Sep  8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /cpuload

System froze again. here is something of use I think. 

Link to comment

Those are ZFS pool that I did create myself. Usually its writing to those pools  at 2TB/s or greater that seems to cause this. I could be wrong. and upon reboot, the vmstorage and fast pools have corrupt files on them. I delete the files, scrub, and clean the pools and we are good to go. I dont know if the system crashing is causing the corruption, or the corruption is causing the system to crash.

 

fast is a raidz array of 2TB nvme drives

vmstorage is a raidz array of 240GB ssd drives. 

Link to comment
20 hours ago, trurl said:

Is that using the ZFS plugin? Do you have any problems without those?

 

You can go directly to the correct support thread for any of your plugins by selecting its Support Link on the Plugins page.

I have been testing that. I have been doing writes only to the cache drive (nvme 1TB drive) and its "more" stable but will still crash

Link to comment
On 9/10/2021 at 9:56 AM, JorgeB said:

You should run memtest.

 

Memtest resulted in 0 errors. 

 

what I observed was that if RAM usage shot up (like filling up a ramdrive or similar) the "cached ram" would not release fast enough causing system crash. also, ZFS seems to be a culprit. Effectively having 4 Sabrent Gen4 2TB NVME drives in a Raid0 equiv array seems to cause issues with high write speeds. at the moment, I think I have the RAM under control, and researching ZFS.

Link to comment
On 9/10/2021 at 9:56 AM, JorgeB said:

You should run memtest.

I was still having issues, and am rerunning memtest:20210922_171730.thumb.jpg.b498b4a869aa3fcfa4dcc46874f5e889.jpg

 

it takes about 24 hours for 1 pass, is it good to go, or do I need to let all the passes complete?

 

Also, i finally found the zfs settings that are best for nvme drives (as proposed by LTT), and have them implemented. I still get weird memory errors. 

primarycache=metadata

autortrim=on

atime=off

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...