February 7, 20233 yr I currently have two processes that seem to be eating up a constant 10% of my CPU processing power. This is annoying as it's wasting a lot of power and heating up my CPU quite a bit. They run even when the array is stopped and restarts do not get rid of them: They are: rsyslogd irq/123-aerdrv What are these and how do I kill them forever? -Jim
February 7, 20233 yr It looks like one of your pcie devices is producing lots of errors (which is what irq/xxx is processing) which are then being written to syslog. You can disable syslog server to reduce first one, but instead you should be looking at the root cause. Look into your syslog and see if there are lots of AER (Advanced Error Reporting) events logged. Or post diagnostics so someone here can check Edited February 7, 20233 yr by apandey Adding clarity about device being pcie
February 7, 20233 yr Author Ohh yeah, look at that: Feb 7 05:06:08 Tower kernel: nvme 0000:14:00.0: [ 0] RxErr Feb 7 05:06:08 Tower kernel: pcieport 0000:00:06.0: AER: Corrected error received: 0000:14:00.0 Feb 7 05:06:08 Tower kernel: nvme 0000:14:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Feb 7 05:06:08 Tower kernel: nvme 0000:14:00.0: device [1987:5018] error status/mask=00000001/00006000 Repeated over and over again. I turned off the server completely and when I powered it back on, I noticed that the one core running this process was no longer permanently locked at 100%. It slowly went up from 20% to 100% over the course of about 30 minutes. A little background: This is a machine where the pcie devices consist of 9 nvme drives. These drives have been dying at a pretty staggering rate. 3/9 have died infant mortality deaths. The latest drive that died was the parity drive. I got a replacement drive from the manufacturer, installed it, rebuilt parity successfully, and then noticed this CPU drain. I guess even if there is something wrong, I'd prefer Unraid not melt my system down trying to warn me about it.
February 7, 20233 yr 30 minutes ago, Jlarimore said: I'd prefer Unraid not melt my system down trying to warn me about it Try running with kernel boot argument Quote pci=nommconf This isn't unraid's doing, you will probably see the same on other Linux distros too. You can Google those errors and find other solutions too. It's best to solve the underlying issue then to mute the logs, who knows if the drives are dying for same underlying cause
February 7, 20233 yr Community Expert Solution Try this first: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009
February 8, 20233 yr Author OK. As I understand it, I am trying to disable some power management feature of my NVME drives by editing some kind of sys config file. Let's pretend for a second I don't really know much about Unix. Where is this file? How do I get to it? Interesting that this problem only presented itself after I replaced the broken parity disk. I wonder if the newer drive has different firmware or hardware components. Appears to be an identical Sabrent 8TB Rocket 4 Plus just like the others. Edited February 8, 20233 yr by Jlarimore Lingering Thought
February 8, 20233 yr Community Expert Click on the Flash drive on main then scroll down to "Syslinux Configuration"
February 8, 20233 yr Author Found it! Did something like this: kernel /bzimage append initrd=/bzroot append pcie_aspm=off Now the system is not booting. I assume something is wrong with my formatting. What do I do next? Pull the flash drive and edit the config file directly on my Windows PC? How should it read?
February 8, 20233 yr Community Expert 19 minutes ago, Jlarimore said: Found it! Did something like this: kernel /bzimage append initrd=/bzroot append pcie_aspm=off Now the system is not booting. I assume something is wrong with my formatting. What do I do next? Pull the flash drive and edit the config file directly on my Windows PC? How should it read? There should only be 1 append line with all the options on the same line space separated.
February 8, 20233 yr Author Alright found the sysconfig file on my PC cleaned out the bad formatting. Replaced with kernel /bzimage append initrd=/bzroot pcie_aspm=off And we're golden. No more log spam and subsequent system drain! Yay! Thank you guys for all the help. 50w less power usage at idle and CPU temp down 30 degrees. I'm pretty impressed with how much the logging program can tax a 12900k. Glad it's not multithreaded! Edited February 8, 20233 yr by Jlarimore Comic relief
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.