September 7, 20214 yr My server freezes and requires a hard reset at random times. It seems that creating/moving large data sets triggers this. Ill attempt to see if I can recreate it now that I have syslog writing to flash. tower-diagnostics-20210907-0907.zip
September 8, 20214 yr Author Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 3559. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436728 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /var Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 2892. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436730 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devs?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /devs Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 2289. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436731 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/shares?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /shares Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 3233. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436732 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /cpuload System froze again. here is something of use I think.
September 8, 20214 yr On 9/7/2021 at 5:17 PM, TheSkaz said: It seems that creating/moving large data sets triggers this. How are you moving / creating this data ? Is it through the network or within Unraid ?
September 8, 20214 yr Community Expert df in your diagnostics shows a few mounts that aren't the usual that Unraid would create: Filesystem Size Used Avail Use% Mounted on datastore 3.6T 2.3G 3.6T 1% /datastore vmstorage 1.1T 916G 159G 86% /vmstorage fast 7.1T 492G 6.6T 7% /fast so I assume you must have done that yourself. Are these involved in your problem?
September 8, 20214 yr Author Those are ZFS pool that I did create myself. Usually its writing to those pools at 2TB/s or greater that seems to cause this. I could be wrong. and upon reboot, the vmstorage and fast pools have corrupt files on them. I delete the files, scrub, and clean the pools and we are good to go. I dont know if the system crashing is causing the corruption, or the corruption is causing the system to crash. fast is a raidz array of 2TB nvme drives vmstorage is a raidz array of 240GB ssd drives.
September 8, 20214 yr Author 28 minutes ago, ChatNoir said: How are you moving / creating this data ? Is it through the network or within Unraid ? within a docker or vm in unraid
September 8, 20214 yr 3 hours ago, TheSkaz said: within a docker or vm in unraid Since your log extract mentions out of memory issues, are you sure that your are not using a wrong path somehow, writing your data to memory instead thus crashing the server ?
September 8, 20214 yr Author that may be possible... let me see. on a different front, It crashed again. So what happends after a crash is I reboot, get a kernel panic, reboot again and it boots up: this happens consistently.
September 8, 20214 yr Author did it again. this time I have 0 dockers running, and the VM service stopped. it is showing that I am using 85ish GB of RAM... tower-diagnostics-20210908-1451.zip
September 8, 20214 yr Author looking at processes, shows that I have 0% memory usage over all 2367 processes
September 9, 20214 yr Author after a ton of googling, I ran echo 3 > /proc/sys/vm/drop_caches this morning, and that cleared up a lot of the ram. system still crashed overnight.
September 9, 20214 yr Community Expert On 9/8/2021 at 11:52 AM, TheSkaz said: ZFS pool Is that using the ZFS plugin? Do you have any problems without those? You can go directly to the correct support thread for any of your plugins by selecting its Support Link on the Plugins page.
September 10, 20214 yr Author 20 hours ago, trurl said: Is that using the ZFS plugin? Do you have any problems without those? You can go directly to the correct support thread for any of your plugins by selecting its Support Link on the Plugins page. I have been testing that. I have been doing writes only to the cache drive (nvme 1TB drive) and its "more" stable but will still crash
September 16, 20214 yr Author On 9/10/2021 at 9:56 AM, JorgeB said: You should run memtest. Memtest resulted in 0 errors. what I observed was that if RAM usage shot up (like filling up a ramdrive or similar) the "cached ram" would not release fast enough causing system crash. also, ZFS seems to be a culprit. Effectively having 4 Sabrent Gen4 2TB NVME drives in a Raid0 equiv array seems to cause issues with high write speeds. at the moment, I think I have the RAM under control, and researching ZFS.
September 22, 20214 yr Author On 9/10/2021 at 9:56 AM, JorgeB said: You should run memtest. I was still having issues, and am rerunning memtest: it takes about 24 hours for 1 pass, is it good to go, or do I need to let all the passes complete? Also, i finally found the zfs settings that are best for nvme drives (as proposed by LTT), and have them implemented. I still get weird memory errors. primarycache=metadata autortrim=on atime=off
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.