Server hanging, logs bloating, unable to download diagnostics

Ninjadude101 · August 10, 2021

Hi all,

I'm having a problem with my Unraid 6.9.2 server lately, whereby it will suddenly start filling up the logs (I am talking specifically about the "Log" value shown in the "Memory" section on the Dashboard page of the web ui). Seen here:

image.png.5276dd42e8638d50d11d2c687f770a41.png

I'll grant you, 27% isn't that bad, but this has happened before (at a guess about once a month), this morning it was at 10% or so and previously I haven't noticed until the Log was at 100%. Since I want to figure out why this is happening and put a stop to it, the last time it happened, I set up a local syslog server so that I could retain the logs after a reboot.

I would post diagnostics however trying to retrieve them via the web ui just hangs, seemingly forever.

I've had a quick look over the logs from my PC (trying to view them in the web ui also hangs and shows a white page), and I notice a lot of this:

Aug 10 12:40:15 Empress kernel: nginx[22272]: segfault at 0 ip 0000000000000000 sp 00007ffe39131ec8 error 14 in nginx[400000+22000]
Aug 10 12:40:15 Empress kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Aug 10 12:40:15 Empress nginx: 2021/08/10 12:40:15 [alert] 10570#10570: worker process 22272 exited on signal 11
Aug 10 12:40:15 Empress nginx: 2021/08/10 12:40:15 [crit] 22275#22275: ngx_slab_alloc() failed: no memory
Aug 10 12:40:15 Empress nginx: 2021/08/10 12:40:15 [error] 22275#22275: shpool alloc failed
Aug 10 12:40:15 Empress nginx: 2021/08/10 12:40:15 [error] 22275#22275: nchan: Out of shared memory while allocating channel /dockerload. Increase nchan_max_reserved_memory.
Aug 10 12:40:15 Empress nginx: 2021/08/10 12:40:15 [error] 22275#22275: *5505839 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/dockerload?buffer_length=0 HTTP/1.1", host: "localhost"
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [crit] 22275#22275: ngx_slab_alloc() failed: no memory
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: shpool alloc failed
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: nchan: Out of shared memory while allocating channel /cpuload. Increase nchan_max_reserved_memory.
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: *5505840 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [crit] 22275#22275: ngx_slab_alloc() failed: no memory
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: shpool alloc failed
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: nchan: Out of shared memory while allocating channel /disks. Increase nchan_max_reserved_memory.
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: *5505841 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost"
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [crit] 22275#22275: ngx_slab_alloc() failed: no memory
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: shpool alloc failed
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: nchan: Out of shared memory while allocating channel /shares. Increase nchan_max_reserved_memory.
Aug 10 12:40:16 Empress nginx: 2021/08/10 12:40:16 [error] 22275#22275: *5505842 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/shares?buffer_length=1 HTTP/1.1", host: "localhost"
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [crit] 22275#22275: ngx_slab_alloc() failed: no memory
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [error] 22275#22275: shpool alloc failed
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [error] 22275#22275: nchan: Out of shared memory while allocating channel /cpuload. Increase nchan_max_reserved_memory.
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [error] 22275#22275: *5505843 nchan: error publishing message (HTTP status code 507), client: unix:, server: , request: "POST /pub/cpuload?buffer_length=1 HTTP/1.1", host: "localhost"
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [crit] 22275#22275: ngx_slab_alloc() failed: no memory
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [error] 22275#22275: shpool alloc failed
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [error] 22275#22275: nchan: Out of shared memory while allocating channel /var. Increase nchan_max_reserved_memory.
Aug 10 12:40:17 Empress nginx: 2021/08/10 12:40:17 [alert] 22275#22275: *5505844 header already sent while keepalive, client: 172.16.0.68, server: 0.0.0.0:80

I can't get into the terminal via the web ui (this also hangs), but I am doubtful that I have run out of memory as I have 32GB and it's usually only consuming about 10-15% of that.

How can I troubleshoot this? I have the docker service running, currently 18 containers are up. I have the VM service enabled however no VMs are currently running. The plugins I have installed are:

Auto Turbo Write Mode (Squid)
Community Applications (Squid)
Disk Location (Olehj)
Docker Folder (GuildDart)
Dynamix SSD Trim (Dynamix)
File Activity (Dlandon)
Fix Common Problems (Squid)
IPMI Tools (Dmacias)
NerdPack GUI (Dmacias)
Network Stats (Dorgan)
Nvidia Driver (Ich777)
Open Files (Dlandon)
SSH Config Tool (Docgyver) (Not sure if this one is enabled - it states incompatibility with this version of Unraid)
Theme Engine (Skitals)
Tips and Tweaks (Dlandon)
Unassigned Devices (Dlandon)
Unassigned Devices Plus (Dlandon)
unBALANCE (Jbrodriguez)
User Scripts (Squid)
Wake On Lan (Dmacias)

Looking forward to your reply

Dan

Ninjadude101 · August 10, 2021

Btw, in the ~30 minutes or so it's taken me to write the first post, the logs has grown to 37%.

trurl · August 10, 2021

1 minute ago, Ninjadude101 said:

NerdPack

What do you load from that?

Ninjadude101 · August 10, 2021

1 minute ago, trurl said:

What do you load from that?

Good question.

image.png.76a4bbec3dcf3f5d0bd24fc02e4baaa7.png

image.png.3bdd9c8b21e16170f2a9d83d9b0425ad.png

I'll be honest, I was surprised to see that many installed - I thought I had only used it for one or two things... Though it escapes me right now which of these I actually need.

Dan

trurl · August 10, 2021

Just wanted to make sure you weren't loading atop, that one is notorious for eating log space.

Reboot and post diagnostics just to give us a better idea of your system and configuration. Also

https://wiki.unraid.net/Manual/Troubleshooting#Persistent_Logs_.28Syslog_server.29

Ninjadude101 · August 10, 2021

14 minutes ago, trurl said:

Just wanted to make sure you weren't loading atop, that one is notorious for eating log space.

Reboot and post diagnostics just to give us a better idea of your system and configuration. Also

https://wiki.unraid.net/Manual/Troubleshooting#Persistent_Logs_.28Syslog_server.29

Ok, have restarted the server gracefully and the diagnostics are attached.

With regard to the syslog server, I have it set up as described under the "Local Syslog Server" section in the link you posted, albeit, the share is not cache-only (though I might change that, makes sense).

- EDIT -

Actually I would like to point out that the syslog rotation settings do not appear to be working as expected. Here are the settings regarding that:

image.png.24e04e80294466c696c13ec85fde9301.png

Yet, the single file in that share is currently ~31MB.

Thanks

Dan

empress-diagnostics-20210810-1447.zip

Edited August 10, 2021 by Ninjadude101

trurl · August 10, 2021

Not obviously what you have asked about, not sure if related or not yet.

Why have you given 80G to docker.img? 20G is usually more than enough and making it larger won't keep you from filling it, it will only make it take longer to fill. The usual reason for filling docker.img is an application writing to a path (case-sensitive) that isn't mapped.

Ninjadude101 · August 10, 2021

Just now, trurl said:

Not obviously what you have asked about, not sure if related or not yet.

Why have you given 80G to docker.img? 20G is usually more than enough and making it larger won't keep you from filling it, it will only make it take longer to fill. The usual reason for filling docker.img is an application writing to a path (case-sensitive) that isn't mapped.

Did that a while ago now, can't remember exactly why but I'm sure it had filled up and caused me headaches having to reinstall all the containers - I just wanted to avoid it happening again. If there is still a container filling it up, is there a way to determine which one?

Dan

trurl · August 10, 2021

Also looks like you had created a share named Downloads or downloads or something. It doesn't exist anymore but its settings do. And you specified a path somewhere, perhaps docker mapping, that created a share with default settings, with a similar name in different upper/lower case. Any folder at the top of pools or array is automatically a user share, whether you intentionally created it as a share or not, and will have default settings unless you change them.

Check each of your container applications (in the application itself) and make sure that any path you have specified corresponds to a container path in that docker's mappings, case-sensitive and not relative (must begin with /).

And, it looks like your appdata must have overflowed to disk6, which makes me wonder if you aren't downloading into that share by some application.

13 minutes ago, Ninjadude101 said:

If there is still a container filling it up, is there a way to determine which one?

On the Docker page, click Container Size at the bottom and post the results.

Ninjadude101 · August 10, 2021

20 minutes ago, trurl said:

Also looks like you had created a share named Downloads or downloads or something. It doesn't exist anymore but its settings do. And you specified a path somewhere, perhaps docker mapping, that created a share with default settings, with a similar name in different upper/lower case. Any folder at the top of pools or array is automatically a user share, whether you intentionally created it as a share or not, and will have default settings unless you change them.

Check each of your container applications (in the application itself) and make sure that any path you have specified corresponds to a container path in that docker's mappings, case-sensitive and not relative (must begin with /).

And, it looks like your appdata must have overflowed to disk6, which makes me wonder if you aren't downloading into that share by some application.

On the Docker page, click Container Size at the bottom and post the results.

Name                              Container     Writable          Log
---------------------------------------------------------------------
binhex-krusader                     1.92 GB      35.6 MB      57.6 kB
DiskSpeed                           1.23 GB          0 B      23.2 MB
jenkins-master                      1.05 GB          0 B      23.2 MB
Octoprint-A30                        907 MB      5.18 MB      18.8 MB
Octoprint-Prusa                      907 MB      5.06 MB      52.3 kB
Octoprint-E180                       902 MB          0 B      23.2 MB
Sonarr                               807 MB       178 MB      5.89 MB
plex                                 801 MB       138 MB       119 kB
Lidarr                               800 MB       457 MB      1.40 MB
LANCache                             624 MB      69.7 kB       384 kB
Draw.io                              534 MB          0 B      23.2 MB
flaresolverr                         440 MB      4.95 MB       465 kB
speedtest                            440 MB      82.6 kB      2.67 kB
Requestrr                            420 MB          0 B      23.2 MB
HL2RPTeaser                          383 MB      3.88 MB      1.85 kB
RGForum                              383 MB      3.81 MB      1.86 kB
bazarr                               380 MB      9.27 kB      11.6 MB
Jackett                              370 MB       109 MB      8.68 MB
Radarr                               358 MB      50.3 MB      1.88 MB
swag                                 357 MB          0 B      23.2 MB
RGDatabase                           352 MB          0 B      23.2 MB
ombi                                 289 MB          0 B      23.2 MB
trilium                              267 MB          0 B      23.2 MB
Deluge                               260 MB       304 kB      6.10 MB
QDirStat                             248 MB      23.6 kB      67.1 kB
nginx                                182 MB          0 B      23.2 MB
tautulli                             158 MB      38.8 MB      1.79 MB
DHCP                                 117 MB          0 B      23.2 MB
GM_BalanceTest                       117 MB      3.88 MB      11.6 kB
GM_BalanceTest_2                     117 MB      3.88 MB      12.4 kB
cadvisor                            69.6 MB          0 B      1.21 kB
OpenSpeedTest                       55.5 MB          0 B      23.2 MB
AdGuard-Home                        51.4 MB          0 B      23.2 MB
Authelia                            49.9 MB          0 B      23.2 MB
network-multitool                   38.1 MB          0 B      23.2 MB
unpackerr                           4.47 MB          0 B      1.44 MB
---------------------------------------------------------------------
Total size                          16.4 GB      1.04 GB       407 MB

I'm checking the likely candidates for that folder mapping issue - you are correct that I have a downloads share, at /mnt/user/downloads/ and all containers which are intentionally making use of it have it mapped internally to /downloads, but I will check the settings within the applications to see if I've made a typo somewhere...

I suspect it's either Lidarr, Radarr, or Sonarr to be honest, but the path mappings in the settings on those isn't amazingly organised.

These are all the ones which have that share mapped, or are somehow related to that share:

Thanks

Dan

trurl · August 10, 2021

Probably you created a user share named Downloads, cache-yes and exclude disk6, but aren't using it now, and the paths which specified downloads created a user share named downloads with default settings, cache-no and include all. For some reason that share does have files on cache though. Maybe they were on that old Downloads share and you moved them to downloads.

What does this look like now?

1 hour ago, Ninjadude101 said:

Ninjadude101 · August 10, 2021

9 minutes ago, trurl said:

Probably you created a user share named Downloads, cache-yes and exclude disk6, but aren't using it now, and the paths which specified downloads created a user share named downloads with default settings, cache-no and include all. For some reason that share does have files on cache though. Maybe they were on that old Downloads share and you moved them to downloads.

What does this look like now?

It looks like this now:

image.png.6a8807a54103635c4b08d9b4387754d4.png

Which is pretty much as expected - when this has happened before, the Log has always gone back to a reasonable size after a reboot, and will stay like that until it starts going crazy again (which may be a month from now).

I do have a downloads share, but not a Downloads share. I am not sure where you are seeing the Downloads share? I've checked with ls against all disks and the cache and can confirm that there does not appear to be a Downloads (capital D) folder in the root of any of them.

It's possible at one point in time I DID have a Downloads share but I must have changed the name some time ago.

The downloads share has free reign to use all disks in the array, and it is set to use the cache but move to the array when the mover runs.

Thanks

Dan

Edited August 10, 2021 by Ninjadude101

trurl · August 10, 2021

What do you get from the command line with this?

ls /boot/config/shares

Ninjadude101 · August 10, 2021

12 minutes ago, trurl said:
What do you get from the command line with this?
ls /boot/config/shares

image.png.e14ac383bd9a0bc0ca5cd750e4e22812.png

I guess I should rename that Downloads.cfg to downloads.cfg

Thanks

Dan

trurl · August 10, 2021

Post a screenshot of the settings for your downloads share.

Ninjadude101 · August 10, 2021

I notice at this point that the downloads share does NOT have free reign like I said - I am not sure why I excluded disk 6 but I will probably remove that.

I will also likely change the allocation method to high water, I didn't realise it was using fill-up.

Left these alone before taking the screenshot though so I'm not tampering with the investigation

Thanks

Dan

trurl · August 10, 2021

Thanks, I was just trying to make sense of conflicting information in the shares folder in your Diagnostics. Looks like it is really reading in the settings from Downloads.cfg, but in Diagnostics, shares/d-------s (1).cfg shows the settings but has no files, and shares/d-------s.cfg has default settings (which means cache-no) but it has files on multiple disks including cache and disk6.

Sorry to digress from your problem.

41 minutes ago, Ninjadude101 said:

It looks like this now:

Keep an eye on it for a while and if Log or Docker grow much post new diagnostics.

Ninjadude101 · August 10, 2021

2 minutes ago, trurl said:

Thanks, I was just trying to make sense of conflicting information in the shares folder in your Diagnostics. Looks like it is really reading in the settings from Downloads.cfg, but in Diagnostics, shares/d-------s (1).cfg shows the settings but has no files, and shares/d-------s.cfg has default settings (which means cache-no) but it has files on multiple disks including cache and disk6.

Sorry to digress from your problem.

Keep an eye on it for a while and if Log or Docker grow much post new diagnostics.

Ah - I think the confusion comes from the fact I also have a domains share - that one is only allowed on disk6 and cache drive.

I'll keep an eye like you said, but it could be a few weeks before anything interesting happens.

Thanks for the help!

Dan

trurl · August 10, 2021

Names of the appdata, domains, system shares (or whatever you have specified in your VM and Docker settings in config/docker.cfg and config/domains.cfg) are not anonymized so that one I didn't have any doubt about.

Ninjadude101 · August 23, 2021

@trurl

I'm not sure if it's relevant or a different issue altogether, but today my unraid terminal (via web UI) keeps disconnecting and reconnecting - this happened before when I first logged this issue (although I think I forgot to mention it).

Current log file is 7% so not too bad yet but it IS preventing me from using the terminal.

Checked the logs, and there's a lot of this currently:

Aug 23 11:47:53 Empress nginx: 2021/08/23 11:47:53 [alert] 10743#10743: worker process 8217 exited on signal 6
Aug 23 11:47:55 Empress nginx: 2021/08/23 11:47:55 [alert] 10743#10743: worker process 8277 exited on signal 6
Aug 23 11:47:57 Empress nginx: 2021/08/23 11:47:57 [alert] 10743#10743: worker process 8339 exited on signal 6
Aug 23 11:47:59 Empress nginx: 2021/08/23 11:47:59 [alert] 10743#10743: worker process 8408 exited on signal 6
Aug 23 11:48:01 Empress nginx: 2021/08/23 11:48:01 [alert] 10743#10743: worker process 8477 exited on signal 6
Aug 23 11:48:03 Empress nginx: 2021/08/23 11:48:03 [alert] 10743#10743: worker process 8581 exited on signal 6
Aug 23 11:48:05 Empress nginx: 2021/08/23 11:48:05 [alert] 10743#10743: worker process 8637 exited on signal 6
Aug 23 11:48:07 Empress nginx: 2021/08/23 11:48:07 [alert] 10743#10743: worker process 8699 exited on signal 6
Aug 23 11:48:09 Empress nginx: 2021/08/23 11:48:09 [alert] 10743#10743: worker process 8761 exited on signal 6
Aug 23 11:48:11 Empress nginx: 2021/08/23 11:48:11 [alert] 10743#10743: worker process 8831 exited on signal 6
Aug 23 11:48:13 Empress nginx: 2021/08/23 11:48:13 [alert] 10743#10743: worker process 9018 exited on signal 6
Aug 23 11:48:15 Empress nginx: 2021/08/23 11:48:15 [alert] 10743#10743: worker process 9082 exited on signal 6
Aug 23 11:48:17 Empress nginx: 2021/08/23 11:48:17 [alert] 10743#10743: worker process 9177 exited on signal 6
Aug 23 11:48:19 Empress nginx: 2021/08/23 11:48:19 [alert] 10743#10743: worker process 9257 exited on signal 6
Aug 23 11:48:21 Empress nginx: 2021/08/23 11:48:21 [alert] 10743#10743: worker process 9358 exited on signal 6
Aug 23 11:48:23 Empress nginx: 2021/08/23 11:48:23 [alert] 10743#10743: worker process 9420 exited on signal 6
Aug 23 11:48:25 Empress nginx: 2021/08/23 11:48:25 [alert] 10743#10743: worker process 9563 exited on signal 6
Aug 23 11:48:27 Empress nginx: 2021/08/23 11:48:27 [alert] 10743#10743: worker process 9636 exited on signal 6
Aug 23 11:48:29 Empress nginx: 2021/08/23 11:48:29 [alert] 10743#10743: worker process 9711 exited on signal 6
Aug 23 11:48:31 Empress nginx: 2021/08/23 11:48:31 [alert] 10743#10743: worker process 9779 exited on signal 6

trurl · August 23, 2021

No idea

Server hanging, logs bloating, unable to download diagnostics

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation