TheSkaz Posted September 7, 2021 Share Posted September 7, 2021 My server freezes and requires a hard reset at random times. It seems that creating/moving large data sets triggers this. Ill attempt to see if I can recreate it now that I have syslog writing to flash. tower-diagnostics-20210907-0907.zip Quote Link to comment
TheSkaz Posted September 8, 2021 Author Share Posted September 8, 2021 Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 3559. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436728 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/var?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /var Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 2892. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436730 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/devs?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /devs Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 2289. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436731 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/shares?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /shares Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [crit] 26160#26160: ngx_slab_alloc() failed: no memory Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: shpool alloc failed Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: nchan: Out of shared memory while allocating message of size 3233. Increase nchan_max_reserved_memory. Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: *436732 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost" Sep 8 02:28:40 Tower nginx: 2021/09/08 02:28:40 [error] 26160#26160: MEMSTORE:00: can't create shared message for channel /cpuload System froze again. here is something of use I think. Quote Link to comment
ChatNoir Posted September 8, 2021 Share Posted September 8, 2021 On 9/7/2021 at 5:17 PM, TheSkaz said: It seems that creating/moving large data sets triggers this. How are you moving / creating this data ? Is it through the network or within Unraid ? Quote Link to comment
trurl Posted September 8, 2021 Share Posted September 8, 2021 df in your diagnostics shows a few mounts that aren't the usual that Unraid would create: Filesystem Size Used Avail Use% Mounted on datastore 3.6T 2.3G 3.6T 1% /datastore vmstorage 1.1T 916G 159G 86% /vmstorage fast 7.1T 492G 6.6T 7% /fast so I assume you must have done that yourself. Are these involved in your problem? Quote Link to comment
TheSkaz Posted September 8, 2021 Author Share Posted September 8, 2021 Those are ZFS pool that I did create myself. Usually its writing to those pools at 2TB/s or greater that seems to cause this. I could be wrong. and upon reboot, the vmstorage and fast pools have corrupt files on them. I delete the files, scrub, and clean the pools and we are good to go. I dont know if the system crashing is causing the corruption, or the corruption is causing the system to crash. fast is a raidz array of 2TB nvme drives vmstorage is a raidz array of 240GB ssd drives. Quote Link to comment
TheSkaz Posted September 8, 2021 Author Share Posted September 8, 2021 28 minutes ago, ChatNoir said: How are you moving / creating this data ? Is it through the network or within Unraid ? within a docker or vm in unraid Quote Link to comment
ChatNoir Posted September 8, 2021 Share Posted September 8, 2021 3 hours ago, TheSkaz said: within a docker or vm in unraid Since your log extract mentions out of memory issues, are you sure that your are not using a wrong path somehow, writing your data to memory instead thus crashing the server ? Quote Link to comment
TheSkaz Posted September 8, 2021 Author Share Posted September 8, 2021 that may be possible... let me see. on a different front, It crashed again. So what happends after a crash is I reboot, get a kernel panic, reboot again and it boots up: this happens consistently. Quote Link to comment
TheSkaz Posted September 8, 2021 Author Share Posted September 8, 2021 did it again. this time I have 0 dockers running, and the VM service stopped. it is showing that I am using 85ish GB of RAM... tower-diagnostics-20210908-1451.zip Quote Link to comment
TheSkaz Posted September 8, 2021 Author Share Posted September 8, 2021 looking at processes, shows that I have 0% memory usage over all 2367 processes Quote Link to comment
TheSkaz Posted September 9, 2021 Author Share Posted September 9, 2021 after a ton of googling, I ran echo 3 > /proc/sys/vm/drop_caches this morning, and that cleared up a lot of the ram. system still crashed overnight. Quote Link to comment
trurl Posted September 9, 2021 Share Posted September 9, 2021 On 9/8/2021 at 11:52 AM, TheSkaz said: ZFS pool Is that using the ZFS plugin? Do you have any problems without those? You can go directly to the correct support thread for any of your plugins by selecting its Support Link on the Plugins page. Quote Link to comment
TheSkaz Posted September 10, 2021 Author Share Posted September 10, 2021 20 hours ago, trurl said: Is that using the ZFS plugin? Do you have any problems without those? You can go directly to the correct support thread for any of your plugins by selecting its Support Link on the Plugins page. I have been testing that. I have been doing writes only to the cache drive (nvme 1TB drive) and its "more" stable but will still crash Quote Link to comment
JorgeB Posted September 10, 2021 Share Posted September 10, 2021 You should run memtest. Quote Link to comment
TheSkaz Posted September 16, 2021 Author Share Posted September 16, 2021 On 9/10/2021 at 9:56 AM, JorgeB said: You should run memtest. Memtest resulted in 0 errors. what I observed was that if RAM usage shot up (like filling up a ramdrive or similar) the "cached ram" would not release fast enough causing system crash. also, ZFS seems to be a culprit. Effectively having 4 Sabrent Gen4 2TB NVME drives in a Raid0 equiv array seems to cause issues with high write speeds. at the moment, I think I have the RAM under control, and researching ZFS. Quote Link to comment
TheSkaz Posted September 22, 2021 Author Share Posted September 22, 2021 On 9/10/2021 at 9:56 AM, JorgeB said: You should run memtest. I was still having issues, and am rerunning memtest: it takes about 24 hours for 1 pass, is it good to go, or do I need to let all the passes complete? Also, i finally found the zfs settings that are best for nvme drives (as proposed by LTT), and have them implemented. I still get weird memory errors. primarycache=metadata autortrim=on atime=off Quote Link to comment
TheSkaz Posted September 22, 2021 Author Share Posted September 22, 2021 side note, absolutely killing me that its using 1 core.... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.