FAtal system crash (SOLVED)


Recommended Posts

My log is flooded with the following errors (note: system is running in safe mode):

 

Feb 24 09:51:56 Tower kernel: CPU: 2 PID: 24127 Comm: kworker/u16:2 Tainted: G        W  O      4.19.98-Unraid #1
Feb 24 09:51:56 Tower kernel: Call Trace:
Feb 24 09:51:56 Tower kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W  O      4.19.98-Unraid #1
Feb 24 09:51:56 Tower kernel: Call Trace:
Feb 24 09:52:35 Tower kernel: CPU: 2 PID: 24127 Comm: kworker/u16:2 Tainted: G        W  O      4.19.98-Unraid #1
Feb 24 09:52:35 Tower kernel: Call Trace:

My system is acting unstable since several weeks.. Still trying to find out what is going on.. I have seen these kind of errors before, mostly with a crashed system. 

 

System was responding but after asking for diagnostics it appears to now have crashed. I can get to the console with IPMI and have been able to log on.  Last thing I saw on the console is attached as a JPG. 

 

As I was able to get to the console I have done an attempt of trying to do what diagnostics does by hand, attached are:

 

output.lsscsi

output.lspci

output.lsusb

output.free

 

I then tried to capture the output of lsof, this appears to crash the system (or at least this takes more then 10 minutes, since that is not consistent with how long diagnostics normally runs I am assuming something has crashed here. I have waited 30 minutes to see if the system would come out of it. Did not happen so I needed to do a reboot. output.lsof was created as a file but with a zero length. I think that should mean that the lsof command on itself failed.

 

Please note that the attached diagnostics file is after the reboot !

 

 

It is probably worthwhile to manage that my VM was still running. But Dockers did not work anymore. The webgui itself also remained responsive (but nothing on it leads to the array doing anything, a shut down for example can be given, but the array does nothing with the command. The spin up command for example also has no effect ( I would have been able to hear the disks spinning up if it would have done that)

 

 

Capture.JPG

 

The dockerlog (after the reboot (can this not be managed by the syslog server ?) contains a few errors:

 

time="2020-02-24T10:23:43.239786942+01:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.19.98-Unraid\n": exit status 1" 

time="2020-02-24T10:23:43.240002302+01:00" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" 

20-02-24T10:23:43.240006594+01:00" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.19.98-Unraid\n": exit status 1" 

time="2020-02-24T10:23:43.702525077+01:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 65965ca92a4db8e8e1120be09586cccb830390e0797118d6e678a05677c09533 8cf70541ca1ae1d586bb62231f7dc766a431efe9e335640e6aff7b4348239b0a], retrying...."
time="2020-02-24T10:23:43.769540653+01:00" level=info msg="Removing stale sandbox d61df21b9b07644e15d16fffc9e6f335862f8873fd3c973a091ec6a54af6e5ae (0d861ea3fb51f011d4b933019d431c748929e514113d30377ca78d7338527156)"
time="2020-02-24T10:23:43.776965114+01:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 65965ca92a4db8e8e1120be09586cccb830390e0797118d6e678a05677c09533 266f272cf438655a4355648f790f1946576a6bcabe95b75125d117a8199733c6], retrying...."
time="2020-02-24T10:23:43.863984737+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"

 

output.free output.lspci output.lsscsi output.lsusb syslog.crash tower-diagnostics-20200224-1024.zip

Edited by Helmonder
Link to comment

That is a lot of info.. I think that I can extrapolate that it looks like an advice to move my dockers with own ip address away from the br0 interface towards another interface.

 

I also think I have understood that that has a link with a seperate VLAN ?  I have been able to create a VLAN under settings/network but that does not appear to make a different network available for my dockers ?  I am missing something..

 

EDIT:

 

 

Edited by Helmonder
Link to comment

Think I have found it... Enabling advanced settings shows more possibilities in the docker setup..

 

I have now enabled BR0.5 in there (with the same settings as BR0) and disabled BR0.. (or should I keep BR0 active ?)

 

Dockers appear to be running but are not reachable..

Edited by Helmonder
Link to comment

Since I cannot get the vlan thing to work I first build back everything to how it was (all on br0), that made everything back again..

 

Maybe someone can assist me on getting this to work?

 

As an alternative I have now moved as much as possible dockers back to the regular bridge interface (so without a dedicated ip address), I kind of liked having all on a seperate address but it is not really necessary.. Maybe this helps.

 

One thing I for some reason cannot get switched back is Plex.. When I switch it to bridge I end up having no IP address at all.. Also in the allocations I see a lot of plex ports dedicated but when clicking "edit" they are not visible... I moved it to HOST mode, that appears to work. This means that I now have no more Dockers with dedicatec IP addresses. Hopefully that solves my crashes.

 

Good to specify though that my setup (with seperate IP addresses) has been working for as long as Unraid has that functionality and only started giving issues somewhere starting january.. I do hope that @limetech solves this in the end... Different IP's make stuff with firewalls and VPN somewhat easier.

 

At the moment I am still using Safe mode, will keep it that way for a couple of weeks to see if the issues come back..

Edited by Helmonder
Link to comment

Totally no errors in the log since yesterday... So that points in a good direction I think..

 

The only errors now visible are rsyslogd errors during startup.. The daemon starts doint network traffic before the network is available..

 

The following appears to be for redhat, but maybe usefull:

 

copy /usr/lib/systemd/system/rsyslog.service to /etc/systemd/system
edit /etc/systemd/system/rsyslogd.service and add "After=network-online.target" to the [Unit] section

 

Edited by Helmonder
Link to comment
  • 2 weeks later...

Restarted without safe mode yesterday and errors are comming back:

 

Mar  5 17:49:07 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G           O      4.19.98-Unraid #1
Mar  5 17:49:07 Tower kernel: Call Trace:
Mar  5 22:58:33 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G           O      4.19.98-Unraid #1
Mar  5 22:58:33 Tower kernel: Call Trace:
Mar  5 23:04:08 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G           O      4.19.98-Unraid #1
Mar  5 23:04:08 Tower kernel: Call Trace:
Mar  5 23:04:08 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  5 23:04:08 Tower kernel: Call Trace:
Mar  5 23:04:08 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  5 23:04:08 Tower kernel: Call Trace:
Mar  5 23:06:04 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  5 23:06:04 Tower kernel: Call Trace:
Mar  5 23:06:04 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  5 23:06:04 Tower kernel: Call Trace:
Mar  5 23:06:18 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  5 23:06:18 Tower kernel: Call Trace:
Mar  5 23:06:18 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  5 23:06:18 Tower kernel: Call Trace:
Mar  6 01:08:35 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  6 01:08:35 Tower kernel: Call Trace:
Mar  6 01:08:35 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  6 01:08:35 Tower kernel: Call Trace:
Mar  6 16:36:04 Tower kernel: CPU: 4 PID: 15956 Comm: CPU 0/KVM Tainted: G        W  O      4.19.98-Unraid #1
Mar  6 16:36:04 Tower kernel: Call Trace:

Diagnostics are attached.

 

I will prune back my plugins to see if I can find the culprit.

 

I will first remove all my dynamics plugins.. Not because I think those are the culprit, but because I want  to put them back asap..

 

tower-diagnostics-20200306-1724.zip

Edited by Helmonder
Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.