Unresponsive Dockers (and WebUI)


DZMM

Recommended Posts

I've posted about this problem before and not found a solution, but after my experience tonight I really need a fix/help please.

 

I stupidly made a mistake with changing a CPU pin via the new CPU pinning page, which meant I had to reinstall my dockers.  This has taken me over 2 hours to do and I've been stuck on the last one for about 30 mins because my dockers keep freezing/locking up/becoming unresponsive.  During this period I haven't been able to use my other dockers as they are so slow 

 

Edit: forgot to add the webui is slow as well)

 

This has been happening intermittently for the last couple of months and is a real pain in the ass.

 

Help please.

 

highlander-diagnostics-20181011-2303.zip

Edited by DZMM
Link to comment
5 minutes ago, johnnie.black said:

You appear to be running a btrfs balance and it's going for some time, while it's running it might be normal for the server to be much slower due to high I/O, wait for the balance to finish or pause it and see if it makes a difference.

will do, but I started the balance after reading the other post and I've also been having the problem for a long time.

 

I will report back when the problem starts again when the balance has stopped and re-post diagnostics.

Link to comment
16 minutes ago, johnnie.black said:

There are some nginx errors, not sure if they are important or not, does rebooting fix the problem?

the problem comes and goes.  Rebooting doesn't fix it permanently e.g. last night I rebooted to do a fresh install of all my dockers which took me around 3 hours, which was the final straw.  I've had this problem since at least 6.5.2

 

 

It's been suggested I've got something incompatible in Nerd Pack, but the only thing it could be is unionfs as the only other bits I have installed are unrar, screen and python, and I really need that - although I'd love to use mergerfs, but I don't know how to install

 

 

It's so bad I'm tempted to even try a fresh unRAID installation as I can't think of anything else to do.  I've fired off a couple of bug reports to limetech but I've had no response

Edited by DZMM
Link to comment

I've removed python from nerd Pack as I don't need it.  After just rebooting I saw this in my logs:

 

Oct 12 12:05:49 Highlander kernel: TCP: request_sock_TCP: Possible SYN flooding on port 19182. Sending cookies.  Check SNMP counters.

19182 is the port I use for inbound torrents.  Maybe deluge is using too many connections with max set at 1200?  I'm going to reduce to 600 after stopping deluge for a bit to see if that's the cause, although I doubt it as last night deluge was one of the last dockers I tried to re-install

Link to comment
  • 2 weeks later...

Still having no joy - pleading for help from the forum or @limetech as it's super-frustrating when I can't get into dockers or the dashboard:

 

Oct 26 17:18:12 Highlander nginx: 2018/10/26 17:18:12 [error] 7448#7448: *431830 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.30.10, server: , request: "POST /plugins/dynamix.docker.manager/include/Events.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net", referrer: "https://1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net/Dashboard" Oct 26 17:18:25 Highlander nginx: 2018/10/26 17:18:25 [error] 7448#7448: *431830 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.30.10, server: , request: "POST /webGui/include/DashUpdate.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net", referrer: "https://1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net/Dashboard"

Even collecting diagnostics took forever and took a few attempts 😞

 

highlander-diagnostics-20181026-1738.zip

Link to comment

A huge number of log entries are:

Oct 25 13:43:11 Highlander kernel: DMAR: DRHD: handling fault status reg 2
Oct 25 13:43:11 Highlander kernel: DMAR: [DMA Write] Request device [08:00.0] fault addr 2187c3000 [fault reason 02] Present bit in context entry is clear

Device [08:00.0] is a USB controller:

08:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]
	Subsystem: ASUSTeK Computer Inc. ASM1142 USB 3.1 Host Controller [1043:8675]
	Kernel driver in use: vfio-pci

Two things to try.  First you can try adding this to kernel append line in syslinux:

iommu=pt

Or, don't try to pass that controller through to a VM.

Link to comment
8 minutes ago, limetech said:

Two things to try.  First you can try adding this to kernel append line in syslinux:


iommu=pt

 

Thanks - like this?

 

default menu.c32
menu title Highlander Boot Options
prompt 0
timeout 80
label unRAID OS (stubbed)
  menu default
  kernel /bzimage
  append vfio-pci.ids=8086:1521,8086:8d20,1b21:1242 initrd=/bzroot iommu=pt
label unRAID OS GUI Mode (stubbed)
  kernel /bzimage
  append vfio-pci.ids=8086:1521,8086:8d20,1b21:1242 initrd=/bzroot,/bzroot-gui iommu=pt
label unRAID OS GUI Safe Mode (no plugins or stubs)
  kernel /bzimage
  append initrd=/bzroot,/bzroot-gui unraidsafemode
label unRAID OS Safe Mode (no plugins, no GUI, no stubs)
  kernel /bzimage
  append initrd=/bzroot unraidsafemode
label Memtest86+
  kernel /memtest

 

10 minutes ago, limetech said:

Or, don't try to pass that controller through to a VM.

I need to passthrough the USB controller if possible not just for convenience, but because my logitech C920 webcam doesn't work when assigned via the VM manager.

Link to comment

The syslinux change didn't fix the problem - dockers are still timing out and struggling to use the webui and the dashboard/docker pages in particular.  Diags attached

 

highlander-diagnostics-20181028-0811.zip

 

@limetechI'm going to try not passing through the USB controller to a VM as suggested today, but I think that's not the problem.  The timestamps for the DMAR errors were where I failed several times to delete/make changes to a VM using that controller i.e. I think that's probably a different problem

Link to comment

hmm this is weird.  I removed the USB controller from my syslinux, but it's still available to passthrough to a VM?  I'm pretty sure this isn't possible, or it never used to be?  I'm now wondering if the stubbing was the source of my problems, because I was having problems hot plugging a USB keyboard working on that controller that now works without the stub.  I'll run this way for a bit to see if things get better with dockers and the UI.

 

The problem I had which I think was the cause of the DMAR faults above is still there though.  I have two VM profiles that use the same image file (one has an extra keyboard passed through when I play a LEGO Star Wars with my kids that which requires keyboard sharing, so we use two keyboards to make it easier).  Now that I can hotplug the 2nd keyboard as per above properly I tried to 'Remove VM' not 'Remove VM & Disks' but it's just spinning round and round.

 

highlander-diagnostics-20181028-0905.zip

Link to comment

6 hour update:  Looking very good so far with no issues - starting and stopping dockers has been fine, and dockers/unRAID have not been dropping out and have been much snappier e.g. navigating plex is at least x3 times faster, producing diags was instant rather than waiting a min or two.  It's given my machine a new lease of life.

 

When did it become possible to passthrough USB controllers without stubbing?

 

Only probably non-related problem is I can't delete the unwanted VM profile - is there a safe way to do manually outside the GUI?

 

Will keep going for another day, but this is looking promising

 

highlander-syslog-20181028-1516.zip

Edited by DZMM
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.