Jlarimore Posted February 7, 2023 Share Posted February 7, 2023 I currently have two processes that seem to be eating up a constant 10% of my CPU processing power. This is annoying as it's wasting a lot of power and heating up my CPU quite a bit. They run even when the array is stopped and restarts do not get rid of them: They are: rsyslogd irq/123-aerdrv What are these and how do I kill them forever? -Jim Quote Link to comment
apandey Posted February 7, 2023 Share Posted February 7, 2023 (edited) It looks like one of your pcie devices is producing lots of errors (which is what irq/xxx is processing) which are then being written to syslog. You can disable syslog server to reduce first one, but instead you should be looking at the root cause. Look into your syslog and see if there are lots of AER (Advanced Error Reporting) events logged. Or post diagnostics so someone here can check Edited February 7, 2023 by apandey Adding clarity about device being pcie Quote Link to comment
Jlarimore Posted February 7, 2023 Author Share Posted February 7, 2023 Ohh yeah, look at that: Feb 7 05:06:08 Tower kernel: nvme 0000:14:00.0: [ 0] RxErr Feb 7 05:06:08 Tower kernel: pcieport 0000:00:06.0: AER: Corrected error received: 0000:14:00.0 Feb 7 05:06:08 Tower kernel: nvme 0000:14:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Feb 7 05:06:08 Tower kernel: nvme 0000:14:00.0: device [1987:5018] error status/mask=00000001/00006000 Repeated over and over again. I turned off the server completely and when I powered it back on, I noticed that the one core running this process was no longer permanently locked at 100%. It slowly went up from 20% to 100% over the course of about 30 minutes. A little background: This is a machine where the pcie devices consist of 9 nvme drives. These drives have been dying at a pretty staggering rate. 3/9 have died infant mortality deaths. The latest drive that died was the parity drive. I got a replacement drive from the manufacturer, installed it, rebuilt parity successfully, and then noticed this CPU drain. I guess even if there is something wrong, I'd prefer Unraid not melt my system down trying to warn me about it. Quote Link to comment
apandey Posted February 7, 2023 Share Posted February 7, 2023 30 minutes ago, Jlarimore said: I'd prefer Unraid not melt my system down trying to warn me about it Try running with kernel boot argument Quote pci=nommconf This isn't unraid's doing, you will probably see the same on other Linux distros too. You can Google those errors and find other solutions too. It's best to solve the underlying issue then to mute the logs, who knows if the drives are dying for same underlying cause Quote Link to comment
Solution JorgeB Posted February 7, 2023 Solution Share Posted February 7, 2023 Try this first: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009 Quote Link to comment
Jlarimore Posted February 8, 2023 Author Share Posted February 8, 2023 (edited) OK. As I understand it, I am trying to disable some power management feature of my NVME drives by editing some kind of sys config file. Let's pretend for a second I don't really know much about Unix. Where is this file? How do I get to it? Interesting that this problem only presented itself after I replaced the broken parity disk. I wonder if the newer drive has different firmware or hardware components. Appears to be an identical Sabrent 8TB Rocket 4 Plus just like the others. Edited February 8, 2023 by Jlarimore Lingering Thought Quote Link to comment
JorgeB Posted February 8, 2023 Share Posted February 8, 2023 Click on the Flash drive on main then scroll down to "Syslinux Configuration" Quote Link to comment
Jlarimore Posted February 8, 2023 Author Share Posted February 8, 2023 Found it! Did something like this: kernel /bzimage append initrd=/bzroot append pcie_aspm=off Now the system is not booting. I assume something is wrong with my formatting. What do I do next? Pull the flash drive and edit the config file directly on my Windows PC? How should it read? Quote Link to comment
itimpi Posted February 8, 2023 Share Posted February 8, 2023 19 minutes ago, Jlarimore said: Found it! Did something like this: kernel /bzimage append initrd=/bzroot append pcie_aspm=off Now the system is not booting. I assume something is wrong with my formatting. What do I do next? Pull the flash drive and edit the config file directly on my Windows PC? How should it read? There should only be 1 append line with all the options on the same line space separated. Quote Link to comment
Jlarimore Posted February 8, 2023 Author Share Posted February 8, 2023 (edited) Alright found the sysconfig file on my PC cleaned out the bad formatting. Replaced with kernel /bzimage append initrd=/bzroot pcie_aspm=off And we're golden. No more log spam and subsequent system drain! Yay! Thank you guys for all the help. 50w less power usage at idle and CPU temp down 30 degrees. I'm pretty impressed with how much the logging program can tax a 12900k. Glad it's not multithreaded! Edited February 8, 2023 by Jlarimore Comic relief 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.