Jump to content

Neverending CPU Taxing Processes


Go to solution Solved by JorgeB,

Recommended Posts

I currently have two processes that seem to be eating up a constant 10% of my CPU processing power. This is annoying as it's wasting a lot of power and heating up my CPU quite a bit. They run even when the array is stopped and restarts do not get rid of them:

 

They are:

rsyslogd

irq/123-aerdrv

 

What are these and how do I kill them forever?

 

-Jim

Link to comment

It looks like one of your pcie devices is producing lots of errors (which is what irq/xxx is processing) which are then being written to syslog. You can disable syslog server to reduce first one, but instead you should be looking at the root cause. 

 

Look into your syslog and see if there are lots of AER (Advanced Error Reporting) events logged. 

 

Or post diagnostics so someone here can check

Edited by apandey
Adding clarity about device being pcie
Link to comment

Ohh yeah, look at that:

 

Feb  7 05:06:08 Tower kernel: nvme 0000:14:00.0:    [ 0] RxErr                 
Feb  7 05:06:08 Tower kernel: pcieport 0000:00:06.0: AER: Corrected error received: 0000:14:00.0
Feb  7 05:06:08 Tower kernel: nvme 0000:14:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Feb  7 05:06:08 Tower kernel: nvme 0000:14:00.0:   device [1987:5018] error status/mask=00000001/00006000

 

Repeated over and over again. I turned off the server completely and when I powered it back on, I noticed that the one core running this process was no longer permanently locked at 100%. It slowly went up from 20% to 100% over the course of about 30 minutes. A little background: This is a machine where the pcie devices consist of 9 nvme drives. These drives have been dying at a pretty staggering rate. 3/9 have died infant mortality deaths. The latest drive that died was the parity drive. I got a replacement drive from the manufacturer, installed it, rebuilt parity successfully, and then noticed this CPU drain. I guess even if there is something wrong, I'd prefer Unraid not melt my system down trying to warn me about it.

Link to comment
30 minutes ago, Jlarimore said:

I'd prefer Unraid not melt my system down trying to warn me about it

Try running with kernel boot argument 

Quote

pci=nommconf

 

This isn't unraid's doing, you will probably see the same on other Linux distros too. You can Google those errors and find other solutions too. It's best to solve the underlying issue then to mute the logs, who knows if the drives are dying for same underlying cause

Link to comment

OK. As I understand it, I am trying to disable some power management feature of my NVME drives by editing some kind of sys config file. Let's pretend for a second I don't really know much about Unix. Where is this file? How do I get to it?

 

Interesting that this problem only presented itself after I replaced the broken parity disk. I wonder if the newer drive has different firmware or hardware components. Appears to be an identical Sabrent 8TB Rocket 4 Plus just like the others.

Edited by Jlarimore
Lingering Thought
Link to comment

Found it!

 

Did something like this:

kernel /bzimage
append initrd=/bzroot

append pcie_aspm=off

 

Now the system is not booting. I assume something is wrong with my formatting. What do I do next? Pull the flash drive and edit the config file directly on my Windows PC? How should it read?

Link to comment
19 minutes ago, Jlarimore said:

Found it!

 

Did something like this:

kernel /bzimage
append initrd=/bzroot

append pcie_aspm=off

 

Now the system is not booting. I assume something is wrong with my formatting. What do I do next? Pull the flash drive and edit the config file directly on my Windows PC? How should it read?

There should only be 1 append line with all the options on the same line space separated.

Link to comment

Alright found the sysconfig file on my PC cleaned out the bad formatting. Replaced with

 

kernel /bzimage
append initrd=/bzroot pcie_aspm=off

 

And we're golden. No more log spam and subsequent system drain! Yay! Thank you guys for all the help. 50w less power usage at idle and CPU temp down 30 degrees. I'm pretty impressed with how much the logging program can tax a 12900k. Glad it's not multithreaded!

Edited by Jlarimore
Comic relief
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...