Load Average Rocketed


Recommended Posts

Hi,

 

Since some days I suffer from high load average.

 

Some minutes or hours after booting the load starts to increase until the system is overload and it is unreachable (nor shh, gui...) and forced me to do an unsafe shootdown

 

Today I can take a photo when it starts to increase (another day I can see kworker rocketed too).

 

Any idea how to deal?

 

Thank you

unraid.jpg

nasdiego-syslog-20221110-2223.zip

Edited by dellorianes
Link to comment

Ok, I will try, but it is difficult to catch it. I have a window of some minutes since it starts increasing load until I can not access to the gui nor command line (and therefore I can not download the diagnostics file) and it random starts to increase.

 

Since yesterday I was running on safe mode (running parity) and everything is running ok. May It be a docker issue?

Edited by dellorianes
Link to comment
  • dellorianes changed the title to Load Average Rocketed

I return to the topic because this morning the server return to high cpu load values.

 

Core 4 is continously at 100% load. I have 8 cores i7-9700 3000 MHz and 4x16 DDR4 3200MHz

I tryed to get diagnostics but file doesn´t download ,as well as logs.

I give attached the error and warning messages from logs (copy paste from the gui).

 

 after command: ~# btrfs dev stats /mnt/cache
[/dev/nvme1n1p1].write_io_errs    0
[/dev/nvme1n1p1].read_io_errs     0
[/dev/nvme1n1p1].flush_io_errs    0
[/dev/nvme1n1p1].corruption_errs  48
[/dev/nvme1n1p1].generation_errs  0
[/dev/nvme0n1p1].write_io_errs    0
[/dev/nvme0n1p1].read_io_errs     0
[/dev/nvme0n1p1].flush_io_errs    0
[/dev/nvme0n1p1].corruption_errs  32
[/dev/nvme0n1p1].generation_errs  0

warning+errors_18-11-22.txt

Edited by dellorianes
Link to comment

After performing memtest86 appeared thousands of errors.

Repeating the test with 2 from the 4 RAM cards, no errors during a compleate test.

Repeting it with the other 2 cards, same result.

Again repeating the test with the 4 cards, errors appears in less than an hour.

 

My motherboard is:

 

4 x Memoria DIMM, Max. 64GB, DDR4 4266(O.C.)/4133(O.C.)/4000(O.C.)/3866(O.C.)/3733(O.C.)/3600(O.C.)/3466(O.C.)/3400(O.C.)/3333(O.C.)/3300(O.C.)/3200(O.C.)/3000(O.C.)/2800(O.C.)/2666/2400/2133 MHz Non-ECC, Un-buffered

 

I was running the RAM at 3200, so, after changing them to 2666 MHz, I could complete 1 test without errors.

 

This is the diagnostics file after changing. 

 

What should I do to stabilize the system??

 

Thank you again

nasdiego-diagnostics-20221120-1017.zip

Edited by dellorianes
Link to comment

Once finished with parity, that was running, I have allready set up the RAM to 2133 Hz.

 

After that, logs shows the following warnings:

 

Nov 21 07:17:15 NASDiego kernel: Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
Nov 21 07:17:15 NASDiego kernel: ACPI: Early table checksum verification disabled
Nov 21 07:17:15 NASDiego kernel: floppy0: no floppy controllers found
Nov 21 07:17:15 NASDiego kernel: i915 0000:00:02.0: [drm] failed to retrieve link info, disabling eDP
Nov 21 07:17:21 NASDiego  mcelog: failed to prefill DIMM database from DMI data
Nov 21 07:17:32 NASDiego kernel: ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20220331/utaddress-204)
Nov 21 07:17:54 NASDiego  rpc.statd[8814]: Failed to read /var/lib/nfs/state: Success
Nov 21 07:28:01 NASDiego kernel: EDID block 0 (tag 0x00) checksum is invalid, remainder is 182

 

Diagnostics included.

 

Do I need to do something with tohose warnings?

 

Thank you

nasdiego-diagnostics-20221121-1018.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.