SigmaInigma Posted June 4, 2021 Share Posted June 4, 2021 I've been using UNAID for over 4 years now and it has always been amazing, but recently I have been experiencing some system instability and I could use some help figuring out what's going on. I'm not sure when exactly this started but it's been roughly two months. Basically about once per week my Unraid system becomes unresponsive and I am unable to access any of my docker containers or even the Unraid Web UI. In order to fix this I typically just hold the power button on the system and reboot it to get it back up and running but this is happening often enough that it's become very annoying. Since the syslog get's wiped on reboot I have had trouble diagnosing the issue but I recently enabled writing the syslog to the flash drive in order to persist the logs. So far this has happened twice since I started writing the syslog to the flash drive. Here are the last few lines before the system becomes irresponsive: First Occurrence May 22 03:00:12 Tower Docker Auto Update: Community Applications Docker Autoupdate finished May 22 03:40:01 Tower crond[1996]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Second Occurence: Jun 3 00:00:02 Tower Plugin Auto Update: Community Applications Plugin Auto Update finished Jun 3 02:00:55 Tower kernel: XFS (nvme0n1p1): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 [xfs], inode 0x3042f953 dinode Jun 3 02:00:55 Tower kernel: XFS (nvme0n1p1): Unmount and run xfs_repair Jun 3 02:00:55 Tower kernel: XFS (nvme0n1p1): First 128 bytes of corrupted metadata buffer: Jun 3 02:00:55 Tower kernel: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Jun 3 02:00:55 Tower kernel: 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jun 3 02:00:55 Tower kernel: 00000020: 60 9c 7e 83 15 70 32 44 60 9c 7e 83 15 70 32 44 `.~..p2D`.~..p2D Jun 3 02:00:55 Tower kernel: 00000030: 60 9c 7e 83 15 70 32 44 00 00 00 00 00 02 83 70 `.~..p2D.......p Jun 3 02:00:55 Tower kernel: 00000040: 00 00 00 00 00 00 00 29 00 00 00 00 00 00 00 01 .......)........ Jun 3 02:00:55 Tower kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 b4 55 9d fd .............U.. Jun 3 02:00:55 Tower kernel: 00000060: ff ff ff ff 1e 67 d2 f9 00 00 00 00 00 00 00 07 .....g.......... Jun 3 02:00:55 Tower kernel: 00000070: 00 00 00 34 00 00 a3 b4 00 00 00 00 00 00 00 00 ...4............ Jun 3 03:00:01 Tower Docker Auto Update: Community Applications Docker Autoupdate running Jun 3 03:00:01 Tower Docker Auto Update: Checking for available updates Jun 3 03:00:04 Tower Docker Auto Update: No updates will be installed Jun 3 03:40:01 Tower crond[1924]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Jun 3 05:00:34 Tower root: /etc/libvirt: 125.7 MiB (131846144 bytes) trimmed on /dev/loop3 Jun 3 05:00:34 Tower root: /var/lib/docker: 2.8 GiB (2966016000 bytes) trimmed on /dev/loop2 Jun 3 05:00:34 Tower root: /mnt/cache: 448.4 GiB (481491525632 bytes) trimmed on /dev/nvme0n1p1 This line seems to be the common link between the two occurrences but I'm bot sure if the curruped metadata buffer from the second occurrence is a concern as well or not: Jun 3 03:40:01 Tower crond[1924]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Here is my system info Unraid Version: 6.9.2 Model: Custom M/B: Micro-Star International Co., Ltd. Z370 GAMING M5 (MS-7B58) Version 1.0 - s/n: I316865908 BIOS: American Megatrends Inc. Version 1.A0. Dated: 06/08/2020 CPU: Intel® Core™ i7-8700K CPU @ 3.70GHz HVM: Enabled IOMMU: Disabled Cache: 384 KiB, 1536 KiB, 12 MB Memory: 16 GiB DDR4 (max. installable capacity 64 GiB) Network: eth0: 10000 Mbps, full duplex, mtu 1500 eth1: interface down Kernel: Linux 5.10.28-Unraid x86_64 OpenSSL: 1.1.1j Any idea what could be going on? I've uploaded the syslog from my flash drive with the persisted information as well as the diagnostics zip file but the zip file is from after the reboot so the syslog in that does not contain what happened prior to reboot. One other thing to note is that I recently switched to a different server to run Unraid and while the issues I mentioned did not immediately start happening I'm wondering if that might have anything to do with it. Let me know if there is any more information I can provide. Thanks! tower-diagnostics-20210604-1140.zip syslog (3) Quote Link to comment
JorgeB Posted June 4, 2021 Share Posted June 4, 2021 Macvlan call traces are usually the result of having dockers with a custom IP address, more info below. https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ 5 minutes ago, SigmaInigma said: XFS (nvme0n1p1): Metadata corruption detected at xfs_dinode_verify+0xa3/0x581 Also this means you need to run a filesystem check on this device, this is likely the result of the unclean shutdowns. Quote Link to comment
SigmaInigma Posted June 4, 2021 Author Share Posted June 4, 2021 49 minutes ago, JorgeB said: Macvlan call traces are usually the result of having dockers with a custom IP address, more info below. https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ Also this means you need to run a filesystem check on this device, this is likely the result of the unclean shutdowns. Did you see any Macvlan call traces in my log? I might be missing it but I didn't see anything like that. I do have one of my dockers on the br0 network with a custom IP though so maybe that is causing the issue? ApacheGuacamole br0 192.168.117.118080 CUPS bridge 192.168.117.10631 Dolphin bridge 192.168.117.108080 EAPcontroller host 192.168.117.10??? heimdall proxynet 192.168.117.108143, 8180 letsencrypt proxynet 192.168.117.10180, 1443 ombi proxynet 192.168.117.103579 plex host 192.168.117.101900, 3005, 5353, 8324, 32400, 32410, 32412, 32413, 32414, 32469 radarr proxynet 192.168.117.10787 radarr-uhd proxynet 192.168.117.107879 sabnzbd proxynet 192.168.117.107070, 9090 sonarr proxynet 192.168.117.108989 unifi-controller bridge 192.168.117.103478, 8080, 8443, 8843, 8880, 10001 Quote Link to comment
JorgeB Posted June 4, 2021 Share Posted June 4, 2021 8 minutes ago, SigmaInigma said: Did you see any Macvlan call traces in my log? It's on top of the log, line 10 or so. 1 Quote Link to comment
SigmaInigma Posted June 4, 2021 Author Share Posted June 4, 2021 Got it, I'll try moving this docker to the bridge network instead and see if that fixes it. Thanks! 1 Quote Link to comment
John_M Posted June 4, 2021 Share Posted June 4, 2021 4 hours ago, SigmaInigma said: Jun 3 03:40:01 Tower crond[1924]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null This is nothing to worry about. It's the standard log entry when the Mover finds nothing to move. Having a non-zero exit status makes it look like an error, but it isn't. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.