GUI unresponsive after updating from 6.11.5 to 6.12.6

bwnautilus · December 3, 2023

This is especially noticeable when selecting the Dashboard, Plugins or Docker tabs. Sometimes it takes up to 45sec to render the page. On the Dashboard tab there are always at least 3 CPUs that are pegged at 100%. The page will finish rendering when the CPUs go back to normal load. I also notice this process in htop that pops to the top when the page is rendering:

/usr/local/bin/unraid-api/unraid-api /snapshot/api/dist/unraid-api.cjs start

I will be rolling back to 6.11.5. Diags attached. Thanks in advance.

mediatower-diagnostics-20231203-1452.zip

ljm42 · December 3, 2023

It looks like you've been with us for a while so I'd recommend navigating to Settings > Docker, switching to advanced view, and changing the "Docker custom network type" from macvlan to ipvlan. This is the default setting that Unraid 6.11 ships with. This will prevent crashes related to macvlan call traces, I'm not sure if the slowdowns you are seeing are possibly related to macvlan issues.

If that doesn't help, since htop is pointing at the unraid-api I'd suggest uninstalling the Connect plugin to see if that helps. It may be a symptom of the problem and not the cause, but might as well rule it out.

bwnautilus · December 4, 2023

@ljm42 Thanks for your suggestions. After rolling back to 6.11.5 (GUI was back to normal) I changed the Docker settings to ipvlan and removed Connect. Downloaded 6.12.6 and rebooted. With the array not started, the GUI is still unresponsive on Dashboard and Plugins tabs. I will roll back to 6.11.5 again and wait for an updated Unraid release.

ljm42 · December 5, 2023

This feels like something specific to your environment that may not be automatically solved by a new release. It is up to you, but if you'd like to keep going here's what I'd recommend...

Setup a new flash drive with 6.12.6 and boot into a default config. Navigate around the webgui and see how it responds. If this all works fine, then that points to a configuration issue with your server that we can work to isolate.

If it is still slow, I would start by focusing on your client. Try accessing the server from private/incognito mode, or a different browser or even a different computer. The webgui did change, so it is possible that browser extensions or security software on the client could be causing issues with the updated webgui.

Either way, be sure to grab diagnostics while in this state.

aje14700 · January 8

@bwnautilus It seems like I had something similar. My dashboard would take 1-2 minutes to load, and 1 CPU thread would be locked to 100%, the process would die, and then pop back up on a logical CPU.

It _seems_ like it was related to CPU temperature monitoring. The process that was locking up the CPU was `sensors`. It would switch between `sensors -A` to `sensors -u -A`, and `sensors -u -c /tmp/sensors.conf`.

In trying to turn off CPU temp detection, it would settle down for about 20 seconds, before it started back up. My cpu is an AMD Ryzen 5 5600G, and I'm using the `k10temp` driver. For me, `6.12.4` is fine, but `6.12.6` has this issue. You could try updating to `6.12.4` and see if the issue still occurs, or try disabling CPU temperature sensing (if turned on) before updating.

aje14700 · January 16

After switching back to 6.12.4, I had 0 issues until about 8 days later. UI would hang and take forever, and 1 logical core was always pegged 100%. HTOP still showed it as "sensors" sucking up CPU.

I believe I have figured out the issue. At somepoint, the sensors can no longer read particular attributes from my CPU (`amdgpu-pci-0900`).

When running `sensors -A`, this was my output:

~# sensors -A
amdgpu-pci-0900
vddgfx:           N/A  
vddnb:            N/A  
edge:             N/A  
PPT:              N/A  

k10temp-pci-00c3
MB Temp:      +33.8°C  

nvme-pci-0800
Composite:    +34.9°C  (low  = -60.1°C, high = +89.8°C)
                       (crit = +94.8°C)

And it would take forever on the amdgpu chip. Running with JSON output gave some extra clues:

# sensors -j
{
   "amdgpu-pci-0900":{
      "Adapter": "PCI adapter",
      "vddgfx":{
ERROR: Can't get value of subfeature in0_input: Can't read

      },
      "vddnb":{
ERROR: Can't get value of subfeature in1_input: Can't read

      },
      "edge":{
ERROR: Can't get value of subfeature temp1_input: Can't read

      },
      "PPT":{
ERROR: Can't get value of subfeature power1_average: Can't read

      }
   },
   "k10temp-pci-00c3":{
      "Adapter": "PCI adapter",
      "MB Temp":{
         "temp1_input": 33.750
      }
   },
   "nvme-pci-0800":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 32.850,
         "temp1_max": 89.850,
         "temp1_min": -60.150,
         "temp1_crit": 94.850,
         "temp1_alarm": 0.000
      }
   }
}

I ended up modifying `/boot/config/plugins/dynamix.system.temp/sensors.conf` to include the following:

chip "amdgpu-pci-0900"
ignore "in0"
ignore "in1"
ignore "temp1"
ignore "power1"

And then for it to take effect without a reboot, `cp /boot/config/plugins/dynamix.system.temp/sensors.conf /etc/sensors.d/sensor.conf`

And once the file copied, the issue immediately went away. So the issue seems to be related to the chip timing out for sensor readings.

I'm sure there's a better approach, but hopefully this helps @bwnautilus and anyone else running into this issue.

RoTalk · January 16

I am watching this because I might be experiencing the same thing.

I kept on seeing 1 logical core in same env. and noticed the that suddenly I can't connect or get into the server, I'd try it from another laptop only to freeze.

Went as far as the docker/vlan settings, replaced my thumb drive and will attempt your fix and report back.

aje14700 · January 16

Note for my solution above, if you need to change any settings in System Temp, then the UI will take a while to load as it runs `sensors` without the configuration. Additionally, it'll also overwrite any changes made to that file on save (since it's not aware of our changes).

I setup a user script that appends my ignore lines onto the conf and copies to the running configuration in /etc/senors.d so I can manually trigger it if needed.

bwnautilus · January 24

On 1/16/2024 at 11:35 AM, aje14700 said:

And once the file copied, the issue immediately went away. So the issue seems to be related to the chip timing out for sensor readings.

I'm sure there's a better approach, but hopefully this helps @bwnautilus and anyone else running into this issue.

Thanks for looking into this. My Unraid system that's experiencing this problem is Xeon-based and I do not see any CPU spikes when running 'sensors -A'. But as I mentioned previously, I'm back on 6.11.5 - don't want to do the upgrade/downgrade thing again.

Glad the solution worked for you.

GUI unresponsive after updating from 6.11.5 to 6.12.6

Recommended Posts

bwnautilus

Link to comment

ljm42

Link to comment

bwnautilus

Link to comment

ljm42

Link to comment

aje14700

Link to comment

aje14700

Link to comment

RoTalk

Link to comment

aje14700

Link to comment

bwnautilus

Link to comment

Join the conversation