Helmonder

May 9, 2019

And it works out great !

All my dockers are back up and running..

It cost me a cache drive crash to start it off but effectively I have now "powerwashed" my complete cache drive..

May 9, 2019

Since I am redoing the complete cache drive anyhow I decided to also recreate my docker image file and redownload all my dockers..

Actually a very easy process:

1) turn of docker in settings

2) delete docker file

3) turn on docker in settings (and recreate file)

4) all of your dockers are now gone

5) choose "add docker" in the docker screen and look up your docker under "user templates" in the drop down, that will reinstall the docker with all your previous sessions and mappings

6) set the dockers to auto start if you had that before (this is not automatic)

May 8, 2019

Its still a hassle... but... to be preferred over having a format option that people might make mistakes with... I dont mind jumping a few hoops..

May 8, 2019

1) I have copied the complete contents of the cache drive to a share in my array. Before I did that I turned off docker and KVM in settings (them beiing active might interfere with the copy).

2) The copy I ran through putty but in "screen" to make sure an interrupted ssh session would not kill the copy (I used MC).

3) The copy went fine (no errors), but just to make sure it really - really was ok I did a compare on file size of the copy and the original; they were the same size.

4) Then on to reformatting the cache drive, that is not a straightforward process, it appears... there only is a format button when a drive is not formatted.. There used to be a way araound this by doing:

- Stop the array

- Go main, cache drive, change file system if you need to, then press format.

- In case the filesystem was allready what you wanted you needed to change to a filesystem you do not want, format, and then do it the other way around.

Now however, there only is the option for BTRFS... So this does not work any more... To get the disk to a point that I could reformat I did in the end:

- Stop the array

- Remove cache drive out of the array (not physically, just change to "no drive" in the cache choice tab

- Now start the array, the drive will show up as "unassigned drive"

- I did a limited preclear (erase only), that removes the filesystem

- Stop the array again

- Add the cache drive back in its original spot

- Start the array and format the drive (which is now an option)

5) Now copiing all the data back from the array to the cache drive..

May 8, 2019

Thanks for the up ! I run monthly appdata backups but I will make an extra one right now and then do a reformat.

April 30, 2019

but those call traces I get with macvlan are not what should happen right ?

April 29, 2019

Well.. disabling bonding does not cause the promiscuous mode to go away.

April 28, 2019

Mmm.... Did some more searching...:

https://www.linuxquestions.org/questions/linux-security-4/kernel-device-eth0-entered-promiscuous-mode-756884/

As explained before promiscuous mode means a packet sniffer instructed your ethernet device to listen to all traffic. This can be a benign or a malicious act, but usually you will know if you run an application that provides you with traffic statistics (say ntop or vnstat) or an IDS (say Snort, Prelude, tcpdump or wireshark) or Something Else (say a DHCP client which isn't promiscuous mode but could be identified as one). Reviewing your installed packages might turn up valid applications that fit the above categories. Else, if an interface is (still) in promiscuous mode (old or new style) then running 'ip link show' will show the "PROMISC" tag and when a sniffer is not hidden then running Chkrootkit or Rootkit Hunter (or both) should show details about applications. If none of the above returns satisfying results then a more thorough inspection of the system is warranted (regardless the time between promisc mode switching as posted above being ridiculously short).

In my case I do not expect that there is something bad going on... Question is however, is this "promiscuous mode" triggered by an individual docker (and is that docker maybe "bad"), or is this mode triggered by unraid in combination with the docker mechanism, and if so: why ..

So I checked...

Starting any docker on my system will imediately trigger "promiscuous mode" to be on... It does not matter what docker it is.. So that points to unraid doing something there..

I checked my log file:

Apr 28 07:07:27 Tower rc.inet1: ip link set bond0 promisc on master br0 up

Apr 28 07:07:27 Tower kernel: device bond0 entered promiscuous mode

Apr 28 07:07:30 Tower kernel: device eth1 entered promiscuous mode

Apr 28 07:08:10 Tower kernel: device br0 entered promiscuous mode

pr 28 07:08:12 Tower kernel: device virbr0-nic entered promiscuous mode

The promiscuous mode is related to network bonding, which is what I am using..

I find some info on it in combination with changing a VLAN's hardware address:

https://wiki.linuxfoundation.org/networking/bonding

Note that changing a VLAN interface's HW address would set the underlying device – i.e. the bonding interface – to promiscuous mode, which might not be what you want.

I am going to stop digging as I am completely out of my comfort zone... And into some kind of rabbithole, maybe this promiscous mode has nothing to do with the issue..

Anyone ?

April 28, 2019

The last few lines in the log now show:

Apr 28 11:27:35 Tower kernel: vetha56c284: renamed from eth0
Apr 28 11:27:47 Tower kernel: device br0 left promiscuous mode
Apr 28 11:27:47 Tower kernel: veth3808ad4: renamed from eth0

The "renamed from eth0" point to the two Dockers stopping, the "device br0 left promiscuous mode"

I googled a bit and read that "promiscuous mode" is most likely activated when some kind of traffic monitoring / sniffering is going on.. Is there something resembling that active in combination with Dockers in unraid ?

April 28, 2019

I am seeing code traces in the current log (so at this time):

Apr 28 11:08:45 Tower kernel: Call Trace:
Apr 28 11:08:45 Tower kernel: <IRQ>
Apr 28 11:08:45 Tower kernel: ipv4_confirm+0xaf/0xb7
Apr 28 11:08:45 Tower kernel: nf_hook_slow+0x37/0x96
Apr 28 11:08:45 Tower kernel: ip_local_deliver+0xa7/0xd5
Apr 28 11:08:45 Tower kernel: ? ip_sublist_rcv_finish+0x53/0x53
Apr 28 11:08:45 Tower kernel: ip_rcv+0x9e/0xbc
Apr 28 11:08:45 Tower kernel: ? ip_rcv_finish_core.isra.0+0x2e5/0x2e5
Apr 28 11:08:45 Tower kernel: __netif_receive_skb_one_core+0x4d/0x69
Apr 28 11:08:45 Tower kernel: process_backlog+0x7e/0x116
Apr 28 11:08:45 Tower kernel: net_rx_action+0x10b/0x274
Apr 28 11:08:45 Tower kernel: __do_softirq+0xce/0x1e2
Apr 28 11:08:45 Tower kernel: do_softirq_own_stack+0x2a/0x40
Apr 28 11:08:45 Tower kernel: </IRQ>
Apr 28 11:08:45 Tower kernel: do_softirq+0x4d/0x59
Apr 28 11:08:45 Tower kernel: netif_rx_ni+0x1c/0x22
Apr 28 11:08:45 Tower kernel: macvlan_broadcast+0x10f/0x153 [macvlan]
Apr 28 11:08:45 Tower kernel: macvlan_process_broadcast+0xd5/0x131 [macvlan]
Apr 28 11:08:45 Tower kernel: process_one_work+0x16e/0x24f
Apr 28 11:08:45 Tower kernel: ? pwq_unbound_release_workfn+0xb7/0xb7
Apr 28 11:08:45 Tower kernel: worker_thread+0x1dc/0x2ac
Apr 28 11:08:45 Tower kernel: kthread+0x10b/0x113
Apr 28 11:08:45 Tower kernel: ? kthread_park+0x71/0x71
Apr 28 11:08:45 Tower kernel: ret_from_fork+0x35/0x40
Apr 28 11:08:45 Tower kernel: ---[ end trace c12044621539eec0 ]---

This seems to correspond with "macvlan" as discussed in the following post:

https://forums.unraid.net/topic/75175-macvlan-call-traces/

Maybe tonight I experienced a kernel panic as a result of this Macvlan issue ? Weird though.. Server has been stable for weeks... So maybe there is some combination going on with the amount of disk traffic the parity rebuild was causing ?

In the days before the parity rebuild I had two WD red's 10TB doing a preclear.. That did not cause an issue... So the parity sync might be the thing that pushes something over the edge..

I actually allready closed down all of my dockers but for Pihole and the HA-Bridge for Domoticz.. I turned those off also now... Since there is no docker with its own ip address running anymore now I would expect the errors to go away now..

April 28, 2019

System has come up again with invalid parity, it also restartes parity sync / data rebuild. The notification in the gui told me that parity sync was completed succesfully (ofcourse this wasn';t the case, seperate issue in the notification system).

The preclear was stopped but could be resumed (restarted the post-read phase)

I do not know if it is valuable but I included the current syslog. I also checked to logrotation but the previous one is five days old, so that will not help.

Added: all dockers show update ready, does not seem correct, this was not the case yesterday. I have now stopped all dockers to give the array some rest during the parity rebuild

syslog

April 28, 2019

I will now restart the array, it is not responding to anything

Helmonder

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Report Comments posted by Helmonder

MAJOR ISSUE: CACHE DRIVES FILESYSTEM GONE ?? (SOLVED)

MAJOR ISSUE: CACHE DRIVES FILESYSTEM GONE ?? (SOLVED)

MAJOR ISSUE: CACHE DRIVES FILESYSTEM GONE ?? (SOLVED)

MAJOR ISSUE: CACHE DRIVES FILESYSTEM GONE ?? (SOLVED)

MAJOR ISSUE: CACHE DRIVES FILESYSTEM GONE ?? (SOLVED)

Server crash under 6.7RC7

Server crash under 6.7RC7

Server crash under 6.7RC7

Server crash under 6.7RC7

Server crash under 6.7RC7

Server crash under 6.7RC7

Server crash under 6.7RC7