CPU 100% Bug - Dockers and VMs lag


Go to solution Solved by HeNotSatisfied,

Recommended Posts

hey guys, 
since I'm getting desperate I decided to write a post in the hope that some of you have a few more ideas. 
Here's the problem:
About every 25 seconds a CPU core goes to 100% for about 5 seconds.
"top" gives me the information that the "kworker" process is very active during this time. I have already tried all the possible fixes I found in the forum and reddit, rebuilt the whole system several times, tried different hard disks, motherboards and SAS HBAs. There are no plugins, VMs or Docker installed. 
Once again, I really hope you guys can help me.
The following components are installed. 

 

CPU: Intel i5 10500
Motherboard: ASRock H510M-HVS R2.0
RAM: 32 GB DDR4 (2x 16 GB Corsair CMK32GX4M2A2666C16)
PSU: Corsair RM550x (2021)
SAS HBA: Dell PERC H310
 

 

hsrv01-diagnostics-20240120-1923.zip

Link to comment

I have this same problem on my board, an ASRockRack E3C246D4U. Took me forever to track down. It appeared in the 6.11 series. Blacklisting the GPU i915 module fixed it. Now I can't use hardware transcoding, but the CPU not spiking every 30 seconds was worth the tradeoff for me.

Link to comment

By the way, here's the call trace on on my system on the active core during a spike. You can see it trying to do some display-related work:

 

Call Trace:

[  967.656630]  <NMI>

[  967.656631]  ? nmi_cpu_backtrace+0xd3/0x104

[  967.656633]  ? nmi_cpu_backtrace_handler+0xd/0x15

[  967.656635]  ? nmi_handle+0x54/0x131

[  967.656637]  ? do_raw_spin_lock+0xb/0x1a

[  967.656638]  ? default_do_nmi+0x66/0x15b

[  967.656640]  ? exc_nmi+0xbf/0x130

[  967.656642]  ? end_repeat_nmi+0x16/0x67

[  967.656644]  ? do_raw_spin_lock+0xb/0x1a

[  967.656645]  ? do_raw_spin_lock+0xb/0x1a

[  967.656647]  ? do_raw_spin_lock+0xb/0x1a

[  967.656647]  </NMI>

[  967.656648]  <TASK>

[  967.656648]  _raw_spin_lock_irqsave+0x2c/0x37

[  967.656650]  fwtable_read32+0x2c/0xb8 [i915]

[  967.656723]  get_data+0x54/0x63 [i915]

[  967.656796]  bit_xfer+0x252/0x3e1 [i2c_algo_bit]

[  967.656800]  gmbus_xfer+0x44/0x92 [i915]

[  967.656909]  __i2c_transfer+0x2af/0x39b [i2c_core]

[  967.656917]  i2c_transfer+0xa2/0xc6 [i2c_core]

[  967.656923]  drm_do_probe_ddc_edid+0xc6/0x130 [drm]

[  967.656950]  ? drm_get_override_edid+0x53/0x53 [drm]

[  967.656973]  edid_block_read+0x3a/0xc1 [drm]

[  967.656996]  _drm_do_get_edid+0x83/0x2ec [drm]

[  967.657018]  ? drm_get_override_edid+0x53/0x53 [drm]

[  967.657041]  drm_get_edid+0x34/0x5c [drm]

[  967.657063]  intel_hdmi_set_edid+0x9d/0x271 [i915]

[  967.657136]  intel_hdmi_detect+0xc7/0x101 [i915]

[  967.657207]  drm_helper_probe_detect_ctx+0x81/0xf4 [drm_kms_helper]

[  967.657219]  output_poll_execute+0x10e/0x1fb [drm_kms_helper]

[  967.657231]  process_one_work+0x1a8/0x295

[  967.657233]  worker_thread+0x18b/0x244

[  967.657235]  ? rescuer_thread+0x281/0x281

[  967.657237]  kthread+0xe4/0xef

[  967.657239]  ? kthread_complete_and_exit+0x1b/0x1b

[  967.657241]  ret_from_fork+0x1f/0x30

[  967.657243]  </TASK>

Link to comment
  • Solution

The problem I found was that as soon as the screen is not connected, dimmed or turned off, this behavior occurs. To fix this, I connected an HDMI dummy to the server (LINK). Then the screen power saving settings need to be disabled. This results in a higher power consumption of approx. 0.5~1.5 W for me.
The only way I have found to disable it, which also preserves all hardware functions, is to customize the "GO" file under:

/boot/config/go

[usb-drive/config/go].


here the first command under #!/bin/bash must be the command /bin/setterm -blank 0 -powersave off -powerdown 0. The file then looks like this for me:

 

#!/bin/bash
#fix kworker bug - keep display active
/bin/setterm -blank 0 -powersave off -powerdown 0
# Start the Management Utility
/usr/local/sbin/emhttp &

the system must then be restarted.


adjusting the go file alone does not solve the problem, it requires an HDMI dummy or monitor.
normally the file "/etc/rc.d/rc.setterm" must be adjusted in Linux Slackware. The lines are already preconfigured and only need to be commented in. See Slackware docs [LINK].

Since unraid creates the /etc/ directory at every start and loads it into RAM, these settings are lost after every restart. 

you can have the file replaced at every start. but i didn't want to do that and i think it's easier to just write the command in the go-file.

go

  • Like 1
Link to comment
  • 4 weeks later...

Think i'm having the same problem as you. I have the asrock rack E3C246D2I board and I get the kworker spikes only the integrated graphics is enabled. I tried doing what you said with the go file and a dummy vga plug but i'm still getting the spikes.

 

Any ideas what else i could try? Did you change anything else in the bios?

Link to comment

Ok, i fixed it so i'll share for anyone else having the problem.

 

With my motherboard i've found out that the igpu is not supported for vga output so that explains why the previous solution didn't work for me. 

 

I ended up adding i915.disable_display=1 after each "append" line in my syslinux config file in the flash drive. Now my file looks like this.

 

default menu.c32
menu title Lime Technology, Inc.
prompt 0
timeout 50
label Unraid OS
  menu default
  kernel /bzimage
  append initrd=/bzroot i915.disable_display=1
label Unraid OS GUI Mode
  kernel /bzimage
  append initrd=/bzroot,/bzroot-gui i915.disable_display=1
label Unraid OS Safe Mode (no plugins, no GUI)
  kernel /bzimage
  append initrd=/bzroot unraidsafemode i915.disable_display=1
label Unraid OS GUI Safe Mode (no plugins)
  kernel /bzimage
  append initrd=/bzroot,/bzroot-gui unraidsafemode i915.disable_display=1
label Memtest86+
  kernel /memtest

 

After a reboot i no longer have the cpu spikes or the kworker process running every 30 seconds.  Plus the igpu still works for hardware transcoding.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.