Helmonder Posted February 24, 2020 Share Posted February 24, 2020 (edited) My log is flooded with the following errors (note: system is running in safe mode): Feb 24 09:51:56 Tower kernel: CPU: 2 PID: 24127 Comm: kworker/u16:2 Tainted: G W O 4.19.98-Unraid #1 Feb 24 09:51:56 Tower kernel: Call Trace: Feb 24 09:51:56 Tower kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W O 4.19.98-Unraid #1 Feb 24 09:51:56 Tower kernel: Call Trace: Feb 24 09:52:35 Tower kernel: CPU: 2 PID: 24127 Comm: kworker/u16:2 Tainted: G W O 4.19.98-Unraid #1 Feb 24 09:52:35 Tower kernel: Call Trace: My system is acting unstable since several weeks.. Still trying to find out what is going on.. I have seen these kind of errors before, mostly with a crashed system. System was responding but after asking for diagnostics it appears to now have crashed. I can get to the console with IPMI and have been able to log on. Last thing I saw on the console is attached as a JPG. As I was able to get to the console I have done an attempt of trying to do what diagnostics does by hand, attached are: output.lsscsi output.lspci output.lsusb output.free I then tried to capture the output of lsof, this appears to crash the system (or at least this takes more then 10 minutes, since that is not consistent with how long diagnostics normally runs I am assuming something has crashed here. I have waited 30 minutes to see if the system would come out of it. Did not happen so I needed to do a reboot. output.lsof was created as a file but with a zero length. I think that should mean that the lsof command on itself failed. Please note that the attached diagnostics file is after the reboot ! It is probably worthwhile to manage that my VM was still running. But Dockers did not work anymore. The webgui itself also remained responsive (but nothing on it leads to the array doing anything, a shut down for example can be given, but the array does nothing with the command. The spin up command for example also has no effect ( I would have been able to hear the disks spinning up if it would have done that) The dockerlog (after the reboot (can this not be managed by the syslog server ?) contains a few errors: time="2020-02-24T10:23:43.239786942+01:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.19.98-Unraid\n": exit status 1" time="2020-02-24T10:23:43.240002302+01:00" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" 20-02-24T10:23:43.240006594+01:00" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.19.98-Unraid\n": exit status 1" time="2020-02-24T10:23:43.702525077+01:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 65965ca92a4db8e8e1120be09586cccb830390e0797118d6e678a05677c09533 8cf70541ca1ae1d586bb62231f7dc766a431efe9e335640e6aff7b4348239b0a], retrying...." time="2020-02-24T10:23:43.769540653+01:00" level=info msg="Removing stale sandbox d61df21b9b07644e15d16fffc9e6f335862f8873fd3c973a091ec6a54af6e5ae (0d861ea3fb51f011d4b933019d431c748929e514113d30377ca78d7338527156)" time="2020-02-24T10:23:43.776965114+01:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 65965ca92a4db8e8e1120be09586cccb830390e0797118d6e678a05677c09533 266f272cf438655a4355648f790f1946576a6bcabe95b75125d117a8199733c6], retrying...." time="2020-02-24T10:23:43.863984737+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" output.free output.lspci output.lsscsi output.lsusb syslog.crash tower-diagnostics-20200224-1024.zip Edited March 20, 2020 by Helmonder Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 First call trace is macvlan related, that usually happens when using dockers with custom IP addresses, after that there are out of memory errors, so also check you RAM allocation/usage. Quote Link to comment
Helmonder Posted February 24, 2020 Author Share Posted February 24, 2020 All of my dockers have their own IP address, but that should work right ? I am running with 32gb of ECC memory, and that is more then half full with cache allways.. Low memory should be the issue with OOM if I am correct, I am not sure what I could check here ? Quote Link to comment
JorgeB Posted February 24, 2020 Share Posted February 24, 2020 9 minutes ago, Helmonder said: All of my dockers have their own IP address, but that should work right ? Quote Link to comment
Helmonder Posted February 24, 2020 Author Share Posted February 24, 2020 (edited) That is a lot of info.. I think that I can extrapolate that it looks like an advice to move my dockers with own ip address away from the br0 interface towards another interface. I also think I have understood that that has a link with a seperate VLAN ? I have been able to create a VLAN under settings/network but that does not appear to make a different network available for my dockers ? I am missing something.. EDIT: Edited February 24, 2020 by Helmonder Quote Link to comment
Helmonder Posted February 24, 2020 Author Share Posted February 24, 2020 (edited) Think I have found it... Enabling advanced settings shows more possibilities in the docker setup.. I have now enabled BR0.5 in there (with the same settings as BR0) and disabled BR0.. (or should I keep BR0 active ?) Dockers appear to be running but are not reachable.. Edited February 24, 2020 by Helmonder Quote Link to comment
Helmonder Posted February 24, 2020 Author Share Posted February 24, 2020 I think I need to do something with the routing table, never did that before, it now looks as follows: Quote Link to comment
Helmonder Posted February 24, 2020 Author Share Posted February 24, 2020 (edited) I now changed it to the following: But this has not fixed anything... I cannot reach my dockers anymore .. I actually changed the VLAN number to 6 since 6 is a VLAN number I also use internally.. Does not seem to do anything though.. Edited February 24, 2020 by Helmonder Quote Link to comment
Helmonder Posted February 24, 2020 Author Share Posted February 24, 2020 (edited) Since I cannot get the vlan thing to work I first build back everything to how it was (all on br0), that made everything back again.. Maybe someone can assist me on getting this to work? As an alternative I have now moved as much as possible dockers back to the regular bridge interface (so without a dedicated ip address), I kind of liked having all on a seperate address but it is not really necessary.. Maybe this helps. One thing I for some reason cannot get switched back is Plex.. When I switch it to bridge I end up having no IP address at all.. Also in the allocations I see a lot of plex ports dedicated but when clicking "edit" they are not visible... I moved it to HOST mode, that appears to work. This means that I now have no more Dockers with dedicatec IP addresses. Hopefully that solves my crashes. Good to specify though that my setup (with seperate IP addresses) has been working for as long as Unraid has that functionality and only started giving issues somewhere starting january.. I do hope that @limetech solves this in the end... Different IP's make stuff with firewalls and VPN somewhat easier. At the moment I am still using Safe mode, will keep it that way for a couple of weeks to see if the issues come back.. Edited February 24, 2020 by Helmonder Quote Link to comment
Helmonder Posted February 25, 2020 Author Share Posted February 25, 2020 (edited) Totally no errors in the log since yesterday... So that points in a good direction I think.. The only errors now visible are rsyslogd errors during startup.. The daemon starts doint network traffic before the network is available.. The following appears to be for redhat, but maybe usefull: copy /usr/lib/systemd/system/rsyslog.service to /etc/systemd/system edit /etc/systemd/system/rsyslogd.service and add "After=network-online.target" to the [Unit] section Edited February 25, 2020 by Helmonder Quote Link to comment
Helmonder Posted February 27, 2020 Author Share Posted February 27, 2020 Still stable... System is unusually "quiet", a lot more disks spun down.. All dockers still work and do their thing though. I will keep the system in this state untill end of next week and report back. If everything keeps fine I will then restart without safe mode and see if it stays that way. Quote Link to comment
Helmonder Posted March 6, 2020 Author Share Posted March 6, 2020 (edited) Restarted without safe mode yesterday and errors are comming back: Mar 5 17:49:07 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G O 4.19.98-Unraid #1 Mar 5 17:49:07 Tower kernel: Call Trace: Mar 5 22:58:33 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G O 4.19.98-Unraid #1 Mar 5 22:58:33 Tower kernel: Call Trace: Mar 5 23:04:08 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G O 4.19.98-Unraid #1 Mar 5 23:04:08 Tower kernel: Call Trace: Mar 5 23:04:08 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 5 23:04:08 Tower kernel: Call Trace: Mar 5 23:04:08 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 5 23:04:08 Tower kernel: Call Trace: Mar 5 23:06:04 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 5 23:06:04 Tower kernel: Call Trace: Mar 5 23:06:04 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 5 23:06:04 Tower kernel: Call Trace: Mar 5 23:06:18 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 5 23:06:18 Tower kernel: Call Trace: Mar 5 23:06:18 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 5 23:06:18 Tower kernel: Call Trace: Mar 6 01:08:35 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 6 01:08:35 Tower kernel: Call Trace: Mar 6 01:08:35 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 6 01:08:35 Tower kernel: Call Trace: Mar 6 16:36:04 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G W O 4.19.98-Unraid #1 Mar 6 16:36:04 Tower kernel: Call Trace: Diagnostics are attached. I will prune back my plugins to see if I can find the culprit. I will first remove all my dynamics plugins.. Not because I think those are the culprit, but because I want to put them back asap.. tower-diagnostics-20200306-1724.zip Edited March 6, 2020 by Helmonder Quote Link to comment
Helmonder Posted March 7, 2020 Author Share Posted March 7, 2020 24 hours no errors without the dynamics plugins... Not what I expected to be honoust... I will keep it like this for a few days just to be sure and will then slowly bring back plugin after plugin.. Quote Link to comment
Helmonder Posted March 8, 2020 Author Share Posted March 8, 2020 (edited) Still no errors... Installed Dynamix SSD Trim, Wireguard and SystemStats as the first ones to bring back. Since these are not continuously doing something I am not suspecting those of issues... Will report back in a few days. Edited March 8, 2020 by Helmonder Quote Link to comment
Helmonder Posted March 9, 2020 Author Share Posted March 9, 2020 36 hours further and still no crashes. I now installed cache dirs... Hoping I will keep stable.. Would really miss this one.. Quote Link to comment
Helmonder Posted March 10, 2020 Author Share Posted March 10, 2020 Still no crashes... Updated to 6.8.3. No errors (But for the long list of rsyslogd errors at startup that show up because it gets launched before network is available). Quote Link to comment
Helmonder Posted March 20, 2020 Author Share Posted March 20, 2020 System has been fully stable for a week. I will keep it in this state. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.