vagrantprodigy Posted January 14, 2020 Share Posted January 14, 2020 I've been having some problems with my system recently, and am getting the same error constantly in my syslog. The error is Tower rsyslogd: action 'action 1' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.36.0 try http://www.rsyslog.com/e/2027 ] I'm currently on 6.6.7 (upgrading to 6.8 caused all of my containers to vanish), and am running a Ryzen based system. How do I narrow down what is causing this issue? Quote Link to comment
trurl Posted January 14, 2020 Share Posted January 14, 2020 Go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post. Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 Diagnostics attached. tower-diagnostics-20200114-0904.zip Quote Link to comment
trurl Posted January 14, 2020 Share Posted January 14, 2020 Does it log those messages in SAFE mode? Unrelated, but you have literally the largest docker image I have ever seen, 200G. I guess it's possible but unlikely that your large number of dockers is legitimately using the 38G of that 200G it has currently, but usually a docker image that large means one or more applications are misconfigured and are writing into the docker image instead of to mapped storage. Have you had problems filling docker image? If so that might explain why 29 minutes ago, vagrantprodigy said: all of my containers to vanish Making docker image large will not fix this problem, it will only make it take longer to fill. Docker image should not grow. I usually recommend 20G as being sufficient for most users. The typical cause of filling docker image is application writing to a path that does not match the container mappings. Common mistakes are not using the same upper/lower case as in the mappings (Linux is case-sensitive) or not using an absolute path (beginning with /). And your system share is on the array instead of all on cache where it belongs. Quote Link to comment
itimpi Posted January 14, 2020 Share Posted January 14, 2020 21 minutes ago, vagrantprodigy said: Diagnostics attached. tower-diagnostics-20200114-0904.zip 145.04 kB · 1 download Looking at those diagnostics the error messages start appearing after the NerdPack plugin starts loading modules into RAM. Do you get the same issues if you boot in Safe Mode (which stops plugins running) or even just stopping NerdPack from loading anything. Using plugins always runs the risk of them loading code modules that are incompatible with the release of Unraid that you are running and thus de-stabilising the system. Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 4 minutes ago, trurl said: Does it log those messages in SAFE mode? Unrelated, but you have literally the largest docker image I have ever seen, 200G. I guess it's possible but unlikely that your large number of dockers is legitimately using the 38G of that 200G it has currently, but usually a docker image that large means one or more applications are misconfigured and are writing into the docker image instead of to mapped storage. Have you had problems filling docker image? If so that might explain why Making docker image large will not fix this problem, it will only make it take longer to fill. Docker image should not grow. I usually recommend 20G as being sufficient for most users. The typical cause of filling docker image is application writing to a path that does not match the container mappings. Common mistakes are not using the same upper/lower case as in the mappings (Linux is case-sensitive) or not using an absolute path (beginning with /). And your system share is on the array instead of all on cache where it belongs. I haven't tried safe mode yet. I increased the docker image size over a year ago due to plex filling the image. That finally stopped for me with an update about 6 months ago. I just added the cache a few months ago, and am gradually moving shares over to it. System is the next on the list to be moved. Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 3 minutes ago, itimpi said: Looking at those diagnostics the error messages start appearing after the NerdPack plugin starts loading modules into RAM. Do you get the same issues if you boot in Safe Mode (which stops plugins running) or even just stopping NerdPack from loading anything. Using plugins always runs the risk of them loading code modules that are incompatible with the release of Unraid that you are running and thus de-stabilising the system. I'll give that a shot. Thanks. Quote Link to comment
trurl Posted January 14, 2020 Share Posted January 14, 2020 7 minutes ago, vagrantprodigy said: I increased the docker image size over a year ago due to plex filling the image. That finally stopped for me with an update about 6 months ago. Just an update to the docker or an update to the application wouldn't have fixed the problem since the docker or the application wasn't the cause of the problem. Probably you changed something else that caused it to stop. I still think your docker image usage is too large at 38G, and docker allocation of 200G is ridiculous. 17 minutes ago, trurl said: Does it log those messages in SAFE mode? 15 minutes ago, itimpi said: Looking at those diagnostics the error messages start appearing after the NerdPack plugin starts loading modules into RAM. Exactly my thoughts. Are you actually using all those NerdPack modules you install? Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 11 minutes ago, trurl said: Just an update to the docker or an update to the application wouldn't have fixed the problem since the docker or the application wasn't the cause of the problem. Probably you changed something else that caused it to stop. I still think your docker image usage is too large at 38G, and docker allocation of 200G is ridiculous. Exactly my thoughts. Are you actually using all those NerdPack modules you install? I believe what stopped the plex overruns was an actual unRAID update, but it was quite a while ago, so I could be mistaken. I previously used the NerdPack modules, but I'm not at the moment. I'm going to uninstall the entire NerdPack plugin and reboot later today. Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 I am still getting an error in safe mode, after NerdPack was uninstalled. I've attached the safe mode diagnostic. tower-diagnostics-20200114-1045.zip Quote Link to comment
mlapaglia Posted January 14, 2020 Share Posted January 14, 2020 Does uninstalling NerdPack uninstall all of the modules? You might need to uninstall the modules through NerdPark, then uninstall NerdPack. Quote Link to comment
itimpi Posted January 14, 2020 Share Posted January 14, 2020 39 minutes ago, vagrantprodigy said: I am still getting an error in safe mode, after NerdPack was uninstalled. I've attached the safe mode diagnostic. tower-diagnostics-20200114-1045.zip 105.06 kB · 1 download There are network errors around not being able to contact the syslog server you have configured to be used at address 192.168.0.50. Do you have such a server up and running a syslog server>? Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 1 hour ago, itimpi said: There are network errors around not being able to contact the syslog server you have configured to be used at address 192.168.0.50. Do you have such a server up and running a syslog server>? I don't. Do you know offhand where the setting to disable this is? Quote Link to comment
itimpi Posted January 14, 2020 Share Posted January 14, 2020 5 minutes ago, vagrantprodigy said: I don't. Do you know offhand where the setting to disable this is? Settings->Syslog Server Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 1 minute ago, itimpi said: Settings->Syslog Server I don't have this on my server. My page map is attached. unraid_page_map.txt Quote Link to comment
itimpi Posted January 14, 2020 Share Posted January 14, 2020 The syslog server was introduced in the 6.7 series of release which will be why you do not have that page as I see you are currently on the 6.6.7 release. What you DO have is the config/rsyslog.cfg file on the flash drive which stores settings associated with that page - I wonder if this is a remnant of your previous update attempt? You could try renaming that file to see if it fixes things. Quote Link to comment
vagrantprodigy Posted January 14, 2020 Author Share Posted January 14, 2020 That file shows a syslog server at 192.168.0.90 ip address. That is my PRTG server. I believe I did have syslog routed there at one point. I'll stand a syslog sensor up again to see if that fixes it. If not, I'll rename the file and reboot. Quote Link to comment
vagrantprodigy Posted March 12, 2020 Author Share Posted March 12, 2020 I upgraded from 6.6.7 to 6.8.3 today, and spent most of the day reinstalling containers, fixing plugins, etc to squash all of the bugs/incompatibilities. I have two remaining showstoppers. One is that the server can't reach the internet post upgrade. It appears to me that the default static route is for the wrong bridge, and therefore the traffic can't exit to the internet. The bridge it is trying to use is a local storage network (connects to my ESXi host). I have tried to delete this, but the delete button is not working. My other issue, possibly related, is that my containers are painfully slow, my docker page takes several minutes to load, and all of the icons for my containers are missing. I just see the ? icon instead of the icon that should appear for each container. The appdata for these is on NVME storage, and neither issue existed this morning on 6.6.7. tower-diagnostics-20200312-1611.zip Quote Link to comment
Squid Posted March 12, 2020 Share Posted March 12, 2020 20 minutes ago, vagrantprodigy said: My other issue, possibly related, is that my containers are painfully slow, my docker page takes several minutes to load, and all of the icons for my containers are missing. I just see the ? icon instead of the icon that should appear for each container. The appdata for these is on NVME storage, and neither issue existed this morning on 6.6.7. Its because of the inability to reach the internet. One change in 6.8 was how docker icons are handled, and it necessitates that it redownload them, and its failing. Until it does manage that, you will see that issue (unless it's a real issue and I can supply a couple commands to fix it) On the issue of not reaching the internet, because of all the bonding etc I'm not particularly able to help. Quote Link to comment
vagrantprodigy Posted March 12, 2020 Author Share Posted March 12, 2020 2 minutes ago, Squid said: Its because of the inability to reach the internet. One change in 6.8 was how docker icons are handled, and it necessitates that it redownload them, and its failing. Until it does manage that, you will see that issue (unless it's a real issue and I can supply a couple commands to fix it) On the issue of not reaching the internet, because of all the bonding etc I'm not particularly able to help. Good to know. So I really just have one issue to fix, which is why the network setup works in 6.6.7, and not in 6.8.3. Hopefully someone is able to assist, I'd have to have to roll back to 6.6.7 for the 7th time. Quote Link to comment
trurl Posted March 12, 2020 Share Posted March 12, 2020 1 hour ago, vagrantprodigy said: (connects to my ESXi host) Are you running Unraid on ESXi? Quote Link to comment
vagrantprodigy Posted March 12, 2020 Author Share Posted March 12, 2020 no, I have a separate host. unRAID is on bare metal, as is ESXi. Quote Link to comment
vagrantprodigy Posted March 14, 2020 Author Share Posted March 14, 2020 I seem to have fixed it. Adding a metric of 2 (versus the default of 1) to the storage network gateway made br0 take precedence, and I can now access external resources. Quote Link to comment
vagrantprodigy Posted March 14, 2020 Author Share Posted March 14, 2020 (edited) I was able to fix this recently. The fix was to rename the syslog file on the flash drive (in the logs folder) to syslog.old. This generated a new file, and stopped the mass error messages. The original syslog file was up to 4GB in size, which I would assume is a hard limit for it. Edited March 14, 2020 by vagrantprodigy Quote Link to comment
trurl Posted March 14, 2020 Share Posted March 14, 2020 3 hours ago, vagrantprodigy said: I was able to fix this recently. The fix was to rename the syslog file on the flash drive (in the logs folder) to syslog.old. This generated a new file, and stopped the mass error messages. The original syslog file was up to 4GB in size, which I would assume is a hard limit for it. Not really recommended to continually write syslog to flash. That should only be done for specific troubleshooting purposes. Even better is to have it stored somewhere else if you are running syslog server. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.