DZMM Posted October 11, 2018 Share Posted October 11, 2018 (edited) I've posted about this problem before and not found a solution, but after my experience tonight I really need a fix/help please. I stupidly made a mistake with changing a CPU pin via the new CPU pinning page, which meant I had to reinstall my dockers. This has taken me over 2 hours to do and I've been stuck on the last one for about 30 mins because my dockers keep freezing/locking up/becoming unresponsive. During this period I haven't been able to use my other dockers as they are so slow Edit: forgot to add the webui is slow as well) This has been happening intermittently for the last couple of months and is a real pain in the ass. Help please. highlander-diagnostics-20181011-2303.zip Edited October 11, 2018 by DZMM Quote Link to comment
JorgeB Posted October 11, 2018 Share Posted October 11, 2018 You appear to be running a btrfs balance and it's going for some time, while it's running it might be normal for the server to be much slower due to high I/O, wait for the balance to finish or pause it and see if it makes a difference. Quote Link to comment
DZMM Posted October 11, 2018 Author Share Posted October 11, 2018 5 minutes ago, johnnie.black said: You appear to be running a btrfs balance and it's going for some time, while it's running it might be normal for the server to be much slower due to high I/O, wait for the balance to finish or pause it and see if it makes a difference. will do, but I started the balance after reading the other post and I've also been having the problem for a long time. I will report back when the problem starts again when the balance has stopped and re-post diagnostics. Quote Link to comment
DZMM Posted October 12, 2018 Author Share Posted October 12, 2018 updated diagnostics after balance finished - still having problems accessing webui and dockers highlander-diagnostics-20181012-1025.zip Quote Link to comment
JorgeB Posted October 12, 2018 Share Posted October 12, 2018 There are some nginx errors, not sure if they are important or not, does rebooting fix the problem? Quote Link to comment
DZMM Posted October 12, 2018 Author Share Posted October 12, 2018 (edited) 16 minutes ago, johnnie.black said: There are some nginx errors, not sure if they are important or not, does rebooting fix the problem? the problem comes and goes. Rebooting doesn't fix it permanently e.g. last night I rebooted to do a fresh install of all my dockers which took me around 3 hours, which was the final straw. I've had this problem since at least 6.5.2 It's been suggested I've got something incompatible in Nerd Pack, but the only thing it could be is unionfs as the only other bits I have installed are unrar, screen and python, and I really need that - although I'd love to use mergerfs, but I don't know how to install It's so bad I'm tempted to even try a fresh unRAID installation as I can't think of anything else to do. I've fired off a couple of bug reports to limetech but I've had no response Edited October 12, 2018 by DZMM Quote Link to comment
JorgeB Posted October 12, 2018 Share Posted October 12, 2018 1 minute ago, DZMM said: I'm tempted to even try a fresh unRAID installation That or disable all docker/plugins and re-enable one by one. Quote Link to comment
DZMM Posted October 12, 2018 Author Share Posted October 12, 2018 I've removed python from nerd Pack as I don't need it. After just rebooting I saw this in my logs: Oct 12 12:05:49 Highlander kernel: TCP: request_sock_TCP: Possible SYN flooding on port 19182. Sending cookies. Check SNMP counters. 19182 is the port I use for inbound torrents. Maybe deluge is using too many connections with max set at 1200? I'm going to reduce to 600 after stopping deluge for a bit to see if that's the cause, although I doubt it as last night deluge was one of the last dockers I tried to re-install Quote Link to comment
DZMM Posted October 12, 2018 Author Share Posted October 12, 2018 @bonienl I was checking my SSL/TLS settings and I spotted the instructions for pfsense users to add: server: private-domain: "unraid.net" has gone - should I remove this in pfsense? Thanks Quote Link to comment
DZMM Posted October 14, 2018 Author Share Posted October 14, 2018 Still a pain losing access to dockers and all of the webui randomly highlander-diagnostics-20181014-0825.zip Quote Link to comment
DZMM Posted October 26, 2018 Author Share Posted October 26, 2018 Still having no joy - pleading for help from the forum or @limetech as it's super-frustrating when I can't get into dockers or the dashboard: Oct 26 17:18:12 Highlander nginx: 2018/10/26 17:18:12 [error] 7448#7448: *431830 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.30.10, server: , request: "POST /plugins/dynamix.docker.manager/include/Events.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net", referrer: "https://1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net/Dashboard" Oct 26 17:18:25 Highlander nginx: 2018/10/26 17:18:25 [error] 7448#7448: *431830 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.30.10, server: , request: "POST /webGui/include/DashUpdate.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net", referrer: "https://1d087a25aac48109ee9a15217a105d14c06e02a6.unraid.net/Dashboard" Even collecting diagnostics took forever and took a few attempts 😞 highlander-diagnostics-20181026-1738.zip Quote Link to comment
limetech Posted October 26, 2018 Share Posted October 26, 2018 A huge number of log entries are: Oct 25 13:43:11 Highlander kernel: DMAR: DRHD: handling fault status reg 2 Oct 25 13:43:11 Highlander kernel: DMAR: [DMA Write] Request device [08:00.0] fault addr 2187c3000 [fault reason 02] Present bit in context entry is clear Device [08:00.0] is a USB controller: 08:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242] Subsystem: ASUSTeK Computer Inc. ASM1142 USB 3.1 Host Controller [1043:8675] Kernel driver in use: vfio-pci Two things to try. First you can try adding this to kernel append line in syslinux: iommu=pt Or, don't try to pass that controller through to a VM. Quote Link to comment
DZMM Posted October 26, 2018 Author Share Posted October 26, 2018 8 minutes ago, limetech said: Two things to try. First you can try adding this to kernel append line in syslinux: iommu=pt Thanks - like this? default menu.c32 menu title Highlander Boot Options prompt 0 timeout 80 label unRAID OS (stubbed) menu default kernel /bzimage append vfio-pci.ids=8086:1521,8086:8d20,1b21:1242 initrd=/bzroot iommu=pt label unRAID OS GUI Mode (stubbed) kernel /bzimage append vfio-pci.ids=8086:1521,8086:8d20,1b21:1242 initrd=/bzroot,/bzroot-gui iommu=pt label unRAID OS GUI Safe Mode (no plugins or stubs) kernel /bzimage append initrd=/bzroot,/bzroot-gui unraidsafemode label unRAID OS Safe Mode (no plugins, no GUI, no stubs) kernel /bzimage append initrd=/bzroot unraidsafemode label Memtest86+ kernel /memtest 10 minutes ago, limetech said: Or, don't try to pass that controller through to a VM. I need to passthrough the USB controller if possible not just for convenience, but because my logitech C920 webcam doesn't work when assigned via the VM manager. Quote Link to comment
limetech Posted October 26, 2018 Share Posted October 26, 2018 3 minutes ago, DZMM said: Thanks - like this? Yup Quote Link to comment
limetech Posted October 26, 2018 Share Posted October 26, 2018 4 minutes ago, DZMM said: I need to passthrough the USB controller if possible not just for convenience, but because my logitech C920 webcam doesn't work when assigned via the VM manager. Doing so to see if it makes any difference would help isolating the issue. Quote Link to comment
DZMM Posted October 26, 2018 Author Share Posted October 26, 2018 thanks - rebooting now. Fingers crossed Quote Link to comment
DZMM Posted October 26, 2018 Author Share Posted October 26, 2018 19 minutes ago, limetech said: Doing so to see if it makes any difference would help isolating the issue. NP - will try that next if the syslinux change doesn't work Quote Link to comment
DZMM Posted October 28, 2018 Author Share Posted October 28, 2018 The syslinux change didn't fix the problem - dockers are still timing out and struggling to use the webui and the dashboard/docker pages in particular. Diags attached highlander-diagnostics-20181028-0811.zip @limetechI'm going to try not passing through the USB controller to a VM as suggested today, but I think that's not the problem. The timestamps for the DMAR errors were where I failed several times to delete/make changes to a VM using that controller i.e. I think that's probably a different problem Quote Link to comment
DZMM Posted October 28, 2018 Author Share Posted October 28, 2018 hmm this is weird. I removed the USB controller from my syslinux, but it's still available to passthrough to a VM? I'm pretty sure this isn't possible, or it never used to be? I'm now wondering if the stubbing was the source of my problems, because I was having problems hot plugging a USB keyboard working on that controller that now works without the stub. I'll run this way for a bit to see if things get better with dockers and the UI. The problem I had which I think was the cause of the DMAR faults above is still there though. I have two VM profiles that use the same image file (one has an extra keyboard passed through when I play a LEGO Star Wars with my kids that which requires keyboard sharing, so we use two keyboards to make it easier). Now that I can hotplug the 2nd keyboard as per above properly I tried to 'Remove VM' not 'Remove VM & Disks' but it's just spinning round and round. highlander-diagnostics-20181028-0905.zip Quote Link to comment
DZMM Posted October 28, 2018 Author Share Posted October 28, 2018 (edited) 6 hour update: Looking very good so far with no issues - starting and stopping dockers has been fine, and dockers/unRAID have not been dropping out and have been much snappier e.g. navigating plex is at least x3 times faster, producing diags was instant rather than waiting a min or two. It's given my machine a new lease of life. When did it become possible to passthrough USB controllers without stubbing? Only probably non-related problem is I can't delete the unwanted VM profile - is there a safe way to do manually outside the GUI? Will keep going for another day, but this is looking promising highlander-syslog-20181028-1516.zip Edited October 28, 2018 by DZMM Quote Link to comment
DZMM Posted October 29, 2018 Author Share Posted October 29, 2018 Ok, not passing through the USB controller didn't fix the problem - I've just had had a big period where dockers and the GUI weren't available. highlander-diagnostics-20181029-1555.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.