Horrible Performance after updating to 6.12 RC6

Stubbs · May 21, 2023

I recently updated from 6.11.5 stable to 6.12 RC6, and since I did, Unraid performance has shocking. The WebUI hangs, SSH connections hang, docker containers randomly flicker between working and not responding, the "Main" page often takes forever to load all my array information. It gets especially bad when I stop the array; every single action seems to take forever while the array is stopped.

To recount my actions before and during the update:

I unstubbed my HBA Card. I was originally passing it through to a TrueNAS VM for a test zpool, but decided to let Unraid use it because of 6.12's zfs support.
I changed the HBA card's PCIe slot from the second to the third (bottom) x16 slot. I did this because since Unraid will be using this card, I don't have to worry about IOMMU groups anymore. The bottom slot was always hard to separate from other interfaces.
I installed a new Intel Optane P1600X M.2 SD in my motherboard's M.2 slot.

In the system log I see a lot of this. I don't know if it's relevant, but there's a lot of it:


May 21 21:49:11 Tower kernel: device veth7ada777 left promiscuous mode
May 21 21:49:11 Tower kernel: docker0: port 10(veth7ada777) entered disabled state
May 21 21:49:12 Tower kernel: docker0: port 10(vethb79be41) entered blocking state
May 21 21:49:12 Tower kernel: docker0: port 10(vethb79be41) entered disabled state
May 21 21:49:12 Tower kernel: device vethb79be41 entered promiscuous mode
May 21 21:49:12 Tower kernel: docker0: port 10(vethb79be41) entered blocking state
May 21 21:49:12 Tower kernel: docker0: port 10(vethb79be41) entered forwarding state
May 21 21:49:12 Tower kernel: docker0: port 10(vethb79be41) entered disabled state
May 21 21:49:12 Tower kernel: eth0: renamed from vethecd0195
May 21 21:49:12 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb79be41: link becomes ready
May 21 21:49:12 Tower kernel: docker0: port 10(vethb79be41) entered blocking state
May 21 21:49:12 Tower kernel: docker0: port 10(vethb79be41) entered forwarding state
May 21 21:50:35 Tower kernel: docker0: port 1(vethbcd1caf) entered disabled state
May 21 21:50:35 Tower kernel: vetha908dc1: renamed from eth0
May 21 21:50:35 Tower kernel: docker0: port 1(vethbcd1caf) entered disabled state
May 21 21:50:35 Tower kernel: device vethbcd1caf left promiscuous mode
May 21 21:50:35 Tower kernel: docker0: port 1(vethbcd1caf) entered disabled state
May 21 21:50:45 Tower kernel: docker0: port 10(vethb79be41) entered disabled state
May 21 21:50:45 Tower kernel: vethecd0195: renamed from eth0
May 21 21:50:45 Tower kernel: docker0: port 10(vethb79be41) entered disabled state
May 21 21:50:45 Tower kernel: device vethb79be41 left promiscuous mode
May 21 21:50:45 Tower kernel: docker0: port 10(vethb79be41) entered disabled state
May 21 21:51:44 Tower kernel: docker0: port 1(vethe1280f3) entered blocking state
May 21 21:51:44 Tower kernel: docker0: port 1(vethe1280f3) entered disabled state

I attached two diagnostics. The one marked "initial" was right after the update when I booted the server back up. The one marked "21-05-2023" is one I initiated just now, with the whole system running horribly.

(21-05-2023)tower-diagnostics-20230521-1505.zip initial-diagnostics-tower-diagnostics-20230520-0359.zip

JorgeB · May 21, 2023

Try booting in safe mode first to rule out any plugin issues, also whatever is causing these might be a problem:

May 20 14:38:53 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -an 'JOHN-PC.local' 2>&1) took longer than 10s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:38:54 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'TRUENAS' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
May 20 14:38:54 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -a 'TRUENAS' 2>&1) took longer than 15s!
May 20 14:38:57 Tower inotifywait[7344]: Watches established.
May 20 14:39:02 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'JOHN-PC' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:04 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -an 'TRUENAS.local' 2>&1) took longer than 10s!
May 20 14:39:11 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'TRUENAS' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
May 20 14:39:17 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -a 'TRUENAS' 2>&1) took longer than 15s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:18 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'JOHN-PC' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:27 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -an 'TRUENAS.local' 2>&1) took longer than 10s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:29 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'TRUENAS' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
May 20 14:39:30 Tower unassigned.devices: Remote Share '//JOHN-PC/Camera Roll' is not set to auto mount.
May 20 14:39:30 Tower unassigned.devices: Remote Share '//TRUENAS/Photos' is not set to auto mount.
May 20 14:39:33 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -a 'TRUENAS' 2>&1) took longer than 15s!

dlandon · May 21, 2023

You have an issue with the remote server or the network.

Fiservedpi · May 22, 2023

Im seeing the same thing out of nowhere after RC6 overall lag cant really pinpoint it to one thing. Diag attached

tower-diagnostics-20230522-1806.zip

Stubbs · May 22, 2023

On 5/21/2023 at 7:20 PM, JorgeB said:

Try booting in safe mode first to rule out any plugin issues, also whatever is causing these might be a problem:

May 20 14:38:53 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -an 'JOHN-PC.local' 2>&1) took longer than 10s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:38:54 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'TRUENAS' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
May 20 14:38:54 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -a 'TRUENAS' 2>&1) took longer than 15s!
May 20 14:38:57 Tower inotifywait[7344]: Watches established.
May 20 14:39:02 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'JOHN-PC' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:04 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -an 'TRUENAS.local' 2>&1) took longer than 10s!
May 20 14:39:11 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'TRUENAS' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
May 20 14:39:17 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -a 'TRUENAS' 2>&1) took longer than 15s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:18 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'JOHN-PC' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:27 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -an 'TRUENAS.local' 2>&1) took longer than 10s!
### [PREVIOUS LINE REPEATED 2 TIMES] ###
May 20 14:39:29 Tower unassigned.devices: Warning: shell_exec(/usr/bin/nmblookup 'TRUENAS' | /bin/head -n1 | /bin/awk '{print $1}' 2>/dev/null) took longer than 5s!
May 20 14:39:30 Tower unassigned.devices: Remote Share '//JOHN-PC/Camera Roll' is not set to auto mount.
May 20 14:39:30 Tower unassigned.devices: Remote Share '//TRUENAS/Photos' is not set to auto mount.
May 20 14:39:33 Tower unassigned.devices: Warning: shell_exec(/sbin/arp -a 'TRUENAS' 2>&1) took longer than 15s!

In safe-mode, it took a long time to boot up and become accessible, but by the time it was, it seemed fine. It's hard to tell though, because even without safe mode it can perform fine, but then randomly things will start going wrong. Attached two diagnostics, both during and after safe mode.

On 5/21/2023 at 9:02 PM, dlandon said:

You have an issue with the remote server or the network.

I don't understand. What remote server? Network? I didn't have any of these problems while I was on 6.11.5 stable.

I virtualize my router through a pfSense VM with a quad NIC passed through. Again, worked fine on the last update.

(22-05-2023) (reboot post-safemode) tower-diagnostics-20230522-1604.zip (22-05-2023) (safe mode) tower-diagnostics-20230522-1537(non-anon).zip

dlandon · May 22, 2023

52 minutes ago, Stubbs said:

I don't understand. What remote server? Network? I didn't have any of these problems while I was on 6.11.5 stable.

Unassigned Devices is timing out when trying to perform operations on your remote server (TRUENAS).

Stubbs · May 23, 2023

2 hours ago, dlandon said:

Unassigned Devices is timing out when trying to perform operations on your remote server (TRUENAS).

This was a test TrueNAS VM I made and created a mount point for on Unraid. The VM has been off for weeks now, and I deleted that mount point as soon as I started having this issue. I currently have zero unassigned devices, but I'm still having issues with the UI hanging randomly.

Even if Unassigned Devices was trying to perform operations on an offline remote, I don't see why that would cause the whole server to have issues. Nothing I run depended on those unassigned remotes.

dlandon · May 23, 2023

5 minutes ago, Stubbs said:

Even if Unassigned Devices was trying to perform operations on an offline remote, I don't see why that would cause the whole server to have issues. Nothing I run depended on those unassigned remotes.

UD is trying to connect to the TRUENAS server on your network. There seems to be an issue with JOHN-PC also.

Even if UD has nothing mounted, it is still trying to poll your remote servers to get an online status. Remove the remote shares from UD that are no longer being used. That will stop the logging of the UD messages, but probably not solve the server issues. Then reboot and post new dagnostics.

dlandon · May 23, 2023

4 hours ago, Fiservedpi said:

Im seeing the same thing out of nowhere after RC6 overall lag cant really pinpoint it to one thing. Diag attached

tower-diagnostics-20230522-1806.zip 160.63 kB · 0 downloads

Boot in safe mode and see if the issue persists.

Stubbs · May 23, 2023

5 minutes ago, dlandon said:

UD is trying to connect to the TRUENAS server on your network. There seems to be an issue with JOHN-PC also.

Even if UD has nothing mounted, it is still trying to poll your remote servers to get an online status. Remove the remote shares from UD that are no longer being used. That will stop the logging of the UD messages, but probably not solve the server issues. Then reboot and post new dagnostics.

As I said, I deleted both those TrueNAS and JOHN-PC unassigned disks (mount points) as soon as I started having issues. Am I missing something with the deletion process? I don't see anything under /mnt/disks or /mnt/remotes either. I am not seeing any mentions of TRUENAS or JOHN-PC in my system log, which should be reflected in the last two diagnostics files I attached here:

3 hours ago, Stubbs said:

In safe-mode, it took a long time to boot up and become accessible, but by the time it was, it seemed fine. It's hard to tell though, because even without safe mode it can perform fine, but then randomly things will start going wrong. Attached two diagnostics, both during and after safe mode.

I don't understand. What remote server? Network? I didn't have any of these problems while I was on 6.11.5 stable.

I virtualize my router through a pfSense VM with a quad NIC passed through. Again, worked fine on the last update.

(22-05-2023) (reboot post-safemode) tower-diagnostics-20230522-1604.zip 191.87 kB · 1 download (22-05-2023) (safe mode) tower-diagnostics-20230522-1537(non-anon).zip 143.83 kB · 1 download

dlandon · May 23, 2023

Your screen shot is for UD didks. It looks like you have some remote shares assigned to TRUENAS and JOHNS-PC. You need to remove those if they are no longer being used.

Show a full screen shot of UD.

dlandon · May 23, 2023

Ok. Looking at your logs again, I see you took care of the TRUENAS and JOHN-PC issues. I see this in your logs that shows a network issue of some sort:

May 23 02:40:01 Tower root: Fix Common Problems: Warning: Share wikijs database set to cache-only, but files / folders exist on the array
May 23 02:40:01 Tower root: Fix Common Problems: Error: Unable to communicate with GitHub.com ** Ignored
May 23 02:40:02 Tower root: Fix Common Problems: Warning: unRaids built in FTP server is currently disabled, but users are defined
May 23 02:40:02 Tower root: Fix Common Problems: Other Warning: Could not check for blacklisted plugins
May 23 02:40:05 Tower root: Fix Common Problems: Other Warning: Background notifications not enabled
May 23 02:40:09 Tower kernel: igb 0000:09:00.0 eth0: igb: eth0 NIC Link is Down
May 23 02:40:09 Tower kernel: bond0: (slave eth0): link status definitely down, disabling slave
May 23 02:40:09 Tower kernel: device eth0 left promiscuous mode
May 23 02:40:09 Tower kernel: bond0: now running without any active interface!
May 23 02:40:09 Tower kernel: br0: port 1(bond0) entered disabled state
May 23 02:40:11 Tower ntpd[1656]: Deleting interface #4 br0, 10.10.20.8#123, interface stats: received=0, sent=0, dropped=0, active_time=804 secs
May 23 02:40:21 Tower kernel: igb 0000:09:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
May 23 02:40:21 Tower kernel: bond0: (slave eth0): link status definitely up, 1000 Mbps full duplex
May 23 02:40:21 Tower kernel: bond0: (slave eth0): making interface the new active one
May 23 02:40:21 Tower kernel: device eth0 entered promiscuous mode
May 23 02:40:21 Tower kernel: bond0: active interface up!
May 23 02:40:21 Tower kernel: br0: port 1(bond0) entered blocking state
May 23 02:40:21 Tower kernel: br0: port 1(bond0) entered forwarding state
May 23 02:40:22 Tower ntpd[1656]: Listen normally on 5 br0 10.10.20.8:123
May 23 02:40:22 Tower ntpd[1656]: new interface(s) found: waking up resolver
May 23 02:40:23 Tower root: Fix Common Problems: Other Warning: Could not perform unknown plugins installed checks
May 23 02:40:23 Tower root: Fix Common Problems: Warning: Share system set to use pool optane, but files / folders exist on the cache pool

At one point you had no active interface for at least twelve seconds. FCP is also struggling to connect to the internet.

Stubbs · May 23, 2023

1 hour ago, dlandon said:

Your screen shot is for UD didks. It looks like you have some remote shares assigned to TRUENAS and JOHNS-PC. You need to remove those if they are no longer being used.

Show a full screen shot of UD.

My mistake, I forgot the disk and remote share boxes were separate. Nevertheless, as you found out the remote shares was empty too.

38 minutes ago, dlandon said:

Ok. Looking at your logs again, I see you took care of the TRUENAS and JOHN-PC issues. I see this in your logs that shows a network issue of some sort:

May 23 02:40:01 Tower root: Fix Common Problems: Warning: Share wikijs database set to cache-only, but files / folders exist on the array
May 23 02:40:01 Tower root: Fix Common Problems: Error: Unable to communicate with GitHub.com ** Ignored
May 23 02:40:02 Tower root: Fix Common Problems: Warning: unRaids built in FTP server is currently disabled, but users are defined
May 23 02:40:02 Tower root: Fix Common Problems: Other Warning: Could not check for blacklisted plugins
May 23 02:40:05 Tower root: Fix Common Problems: Other Warning: Background notifications not enabled
May 23 02:40:09 Tower kernel: igb 0000:09:00.0 eth0: igb: eth0 NIC Link is Down
May 23 02:40:09 Tower kernel: bond0: (slave eth0): link status definitely down, disabling slave
May 23 02:40:09 Tower kernel: device eth0 left promiscuous mode
May 23 02:40:09 Tower kernel: bond0: now running without any active interface!
May 23 02:40:09 Tower kernel: br0: port 1(bond0) entered disabled state
May 23 02:40:11 Tower ntpd[1656]: Deleting interface #4 br0, 10.10.20.8#123, interface stats: received=0, sent=0, dropped=0, active_time=804 secs
May 23 02:40:21 Tower kernel: igb 0000:09:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
May 23 02:40:21 Tower kernel: bond0: (slave eth0): link status definitely up, 1000 Mbps full duplex
May 23 02:40:21 Tower kernel: bond0: (slave eth0): making interface the new active one
May 23 02:40:21 Tower kernel: device eth0 entered promiscuous mode
May 23 02:40:21 Tower kernel: bond0: active interface up!
May 23 02:40:21 Tower kernel: br0: port 1(bond0) entered blocking state
May 23 02:40:21 Tower kernel: br0: port 1(bond0) entered forwarding state
May 23 02:40:22 Tower ntpd[1656]: Listen normally on 5 br0 10.10.20.8:123
May 23 02:40:22 Tower ntpd[1656]: new interface(s) found: waking up resolver
May 23 02:40:23 Tower root: Fix Common Problems: Other Warning: Could not perform unknown plugins installed checks
May 23 02:40:23 Tower root: Fix Common Problems: Warning: Share system set to use pool optane, but files / folders exist on the cache pool

At one point you had no active interface for at least twelve seconds. FCP is also struggling to connect to the internet.

Hopefully I'll figure it out one day.

dlandon · May 23, 2023

On 5/22/2023 at 10:44 PM, Stubbs said:

My mistake, I forgot the disk and remote share boxes were separate. Nevertheless, as you found out the remote shares was empty too.

Hopefully I'll figure it out one day.

See if your router or switch has any diagnostic tools like cable tests. Try rebooting all your network equipment.

Horrible Performance after updating to 6.12 RC6

Recommended Posts

Stubbs

Link to comment

JorgeB

Link to comment

dlandon

Link to comment

Fiservedpi

Link to comment

Stubbs

Link to comment

dlandon

Link to comment

Stubbs

Link to comment

dlandon

Link to comment

dlandon

Link to comment

Stubbs

Link to comment

dlandon

Link to comment

dlandon

Link to comment

Stubbs

Link to comment

dlandon

Link to comment

Join the conversation