kernel: Disabling IRQ #16


xioustic

Recommended Posts

Hi,

 

I'm facing the same problem for over a year now and can't solve it.

For me it only happens when playing media over my LibreELEC vm. But its completely random.

When the IRQ16 occurs the currently displayed picture just freezes but sound continues.

 

I first got this issue when upgrading to unraid v6.3.1.

Today my vm crashed three times in a row with this error (never had this before) so I started to use google again...

 

I came across this post right here (unfortunately it's in german):

https://www.thomas-krenn.com/de/wiki/IRQ_16:_nobody_cared_Problem_beim_Einsatz_von_mehreren_NVIDIA_Grafikkarten_beheben

 

He basically describes the same behaviour and solved the problem by adding the kernel parameter 'acpi=debug'

 

As I have no idea what this means and how to do this I just wanted to ask if this can be established somehow by just changing my syslinux.cfg?

Can I try this out safely and do I just need to add a new line to the file?

Can someone jump in here please? I really need to get rid of this error...

Edited by Marv
Link to comment
  • 5 months later...

Marv,

 

Yes, I think this can be just done on the syslinux.cfg file in the USB drive at /syslinux/syslinux.cfg. Find the line that says "append" for your boot menu option, and add ",acpi=debug" to the end of it.

 

Thanks for the reply, I'll be trying this solution later. Had abandoned unRAID up to this point.

Link to comment
48 minutes ago, xioustic said:

Thanks for the reply, I'll be trying this solution later. Had abandoned unRAID up to this point.

In your tests and attempts to resolve this, did you come to the conclusion that unRAID itself was the cause of this issue, or, was it more a matter of you could not get the combination of unRAID, VMs, your hardware, etc. to work properly and, therefore, you migrated to a solution other than unRAID?

 

I had this same issue with my first unRAID server back in 2011.  In my case, it turned out to be a hardware issue that was triggered by switching monitor inputs on a monitor shared between my unRAID server and desktop PC.  This was never going to be a long-term solution and I just had it set up that way for unRAID installation, configuration and testing purposes.  It took me a while to figure out this was the cause of my IRQ 16 disabled issue.

 

Everything was fine until I switched monitor inputs at which time throughput on the drives attached to a PCIe SATA disk controller went in the tank.  It turns out the driver for the SATA card chipset was assigned to IRQ 16 as was USB1.  Somehow, switching inputs on my monitor messed with anything on IRQ 16.  The problem completely went away by simply attaching a dedicated monitor to my server.  I have since upgraded motherboard, CPU, several other hardware components and the unRAID OS many times without the problem returning. 

 

Did you type in "cat /proc/interrupts" at a command prompt to see what is assigned to IRQ 16 on your system?  Something is messing with one or more processes assigned to that interrupt and knowing what they are is helpful in troubleshooting.

 

Below is the ancient thread on the subject which contained several good suggestions, none of which worked in my case because I had not identified the source of the problem.  I thought it was an unRAID or preclear script problem since that is where I saw the issue occur.  I certainly am not saying this is your problem, but, several posters in these forums have had this IRQ XX disabled problem for a variety of reasons and I don't think any of them have anything to do specifically with unRAID by itself.

 

 

Link to comment
  • 2 years later...

This same problem was happening with me. I had my Windows gaming VM with my NVIDIA GPU passed through but the performance was crap.

 

I would see "kernel:Disabling IRQ #16" errors on the web terminal but had no idea what it meant.

 

After booting up Windows the performance for basic operations like right-clicking on the desktop were super laggy (like 1 second) but I figured it was a driver issue. So, I reinstalled the NVIDIA drivers and noticed that for the few seconds between the old NVIDIA drivers being unloaded and the new NVIDIA drivers starting up the performance of Windows jumped up to what I would normally expect.

 

I assumed I was doing something wrong since there is a lot of chatter on the internet about screwing up GPU pass through. I spent 3 full nights fiddling with VM settings, restarting the VM, changing VFIO-PCI settings, rebooting unRAID, trying the VM again, etc. etc. and nothing helped. During this whole time I had one display hooked up to my motherboard output and another display hooked up to the NVIDIA GPU.

 

After I finally gave up with the VM I unplugged the display hooked up to the motherboard and started fiddling with other unRAID stuff. Several reboots and a week or two later I decided to boot up the VM again to fiddle a bit more and this time it just worked, every time. Even VMs that were "slow" before now ran like normal. This made me super nervous but I was just happy that it was working.

 

About a month later, due to flash drive issues, I needed to hook up the on board display again so I could see the unRAID output. I got the flash drive issues fixed and started up the VM and saw that it was slow again. I shut down the VM and saw this in my log (just in case it helps LT):

 

Dec 18 01:05:48 Tower kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Dec 18 01:05:48 Tower kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 4.19.107-Unraid #1
Dec 18 01:05:48 Tower kernel: Hardware name: Gigabyte Technology Co., Ltd. Z170X-Gaming 7/Z170X-Gaming 7, BIOS F20 11/04/2016
Dec 18 01:05:48 Tower kernel: Call Trace:
Dec 18 01:05:48 Tower kernel: <IRQ>
Dec 18 01:05:48 Tower kernel: dump_stack+0x67/0x83
Dec 18 01:05:48 Tower kernel: __report_bad_irq+0x30/0xa5
Dec 18 01:05:48 Tower kernel: note_interrupt+0x1d8/0x229
Dec 18 01:05:48 Tower kernel: handle_irq_event_percpu+0x4f/0x6f
Dec 18 01:05:48 Tower kernel: handle_irq_event+0x34/0x51
Dec 18 01:05:48 Tower kernel: handle_fasteoi_irq+0x92/0xfc
Dec 18 01:05:48 Tower kernel: handle_irq+0x1c/0x1f
Dec 18 01:05:48 Tower kernel: do_IRQ+0x46/0xd0
Dec 18 01:05:48 Tower kernel: common_interrupt+0xf/0xf
Dec 18 01:05:48 Tower kernel: </IRQ>
Dec 18 01:05:48 Tower kernel: RIP: 0010:cpuidle_enter_state+0xe8/0x141
Dec 18 01:05:48 Tower kernel: Code: ff 45 84 f6 74 1d 9c 58 0f 1f 44 00 00 0f ba e0 09 73 09 0f 0b fa 66 0f 1f 44 00 00 31 ff e8 7a 8d bb ff fb 66 0f 1f 44 00 00 <48> 2b 2c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00 48 39 cd
Dec 18 01:05:48 Tower kernel: RSP: 0018:ffffc900031dbe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
Dec 18 01:05:48 Tower kernel: RAX: ffff88884fa9fac0 RBX: ffff88884faaa100 RCX: 000000000000001f
Dec 18 01:05:48 Tower kernel: RDX: 0000000000000000 RSI: 000000001fefa611 RDI: 0000000000000000
Dec 18 01:05:48 Tower kernel: RBP: 0000018f4e3253cb R08: 0000018f4e3253cb R09: 0000000000000001
Dec 18 01:05:48 Tower kernel: R10: 0000000000000000 R11: 071c71c71c71c71c R12: 0000000000000001
Dec 18 01:05:48 Tower kernel: R13: ffffffff81e5b120 R14: 0000000000000000 R15: ffffffff81e5b198
Dec 18 01:05:48 Tower kernel: ? cpuidle_enter_state+0xbf/0x141
Dec 18 01:05:48 Tower kernel: do_idle+0x17e/0x1fc
Dec 18 01:05:48 Tower kernel: cpu_startup_entry+0x6a/0x6c
Dec 18 01:05:48 Tower kernel: start_secondary+0x197/0x1b2
Dec 18 01:05:48 Tower kernel: secondary_startup_64+0xa4/0xb0
Dec 18 01:05:48 Tower kernel: handlers:
Dec 18 01:05:48 Tower kernel: [<0000000008c48ea5>] i801_isr [i2c_i801]
Dec 18 01:05:48 Tower kernel: [<0000000067bc464a>] vfio_intx_handler
Dec 18 01:05:48 Tower kernel: [<0000000067bc464a>] vfio_intx_handler
Dec 18 01:05:48 Tower kernel: Disabling IRQ #16

Then I started searching and found this post.

 

Based on the info here, I found that I can 100% replicate the "slow Windows VM" by just plugging in a display into my motherboard. When I do, the GPU utilization goes to near 100% in the Windows VM and the only fix I've found is to reboot unRAID.

Link to comment

So, I found this post from 2011: https://www.linuxquestions.org/questions/slackware-14/disabling-irq-16-a-879964/

 

Quote

I am using Slackware64 13.37 and every once and awhile I will get a notification that the kernel is disabling IRQ #16. When this happens it feels like I lose 3D acceleration. KDE becomes very sluggish and lags like crazy. Even when I type into text boxes they lag. The only way to fix it temporarily is to reboot, but then it will happen again in a few days.

 

Isn't unRAID based on Slackware? Perhaps this is an issue with the distro and not something specific to unRAID.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.