May 30, 201610 yr Hello. I've searched through the forums and saw some seemingly related issues, but nothing seemed to solve my issue. Hoping you brilliant folks have some ideas. Issue: The server freezes. VM unresponsive. Plugins unresponsive. Shares unavailable. Fixed only by hard reset. Seems to happen when VM/server is under load or having a lot of activity. Happens once a week or so, but haven't nailed down an exact event that triggers it. System: unRaid 6.1.7; dockers: PMS, CouchPotato, Sab, Sonarr, Transmission; Windows 7 VM Super Micro X8SIE-F running latest BIOS, Xeon X3470, GTX 950, 24GB ECC RAM (IBM 44T1586), PCIE USB card USB card and GPU passed through to VM Syslog from right before freeze: Message from syslogd@Tower at May 30 17:27:08 ... kernel:Disabling IRQ #44 May 30 17:27:08 Tower kernel: irq 44: nobody cared (try booting with the "irqpoll" option) May 30 17:27:08 Tower kernel: CPU: 0 PID: 8337 Comm: Timer-Scheduler Not tainted 4.1.15-unRAID #1 May 30 17:27:08 Tower kernel: Hardware name: Supermicro X8SIE/X8SIE, BIOS 1.2a 06/27/2012 May 30 17:27:08 Tower kernel: 0000000000000000 ffff88063fc03e28 ffffffff815f1ad0 ffff88063fc10a01 May 30 17:27:08 Tower kernel: ffff880616d22400 ffff88063fc03e58 ffffffff8107b05f 00000001003efbd1 May 30 17:27:08 Tower kernel: ffff880616d22400 0000000000000000 000000000000002c ffff88063fc03e98 May 30 17:27:08 Tower kernel: Call Trace: May 30 17:27:08 Tower kernel: <IRQ> [<ffffffff815f1ad0>] dump_stack+0x4c/0x6e May 30 17:27:08 Tower kernel: [<ffffffff8107b05f>] __report_bad_irq+0x2b/0xbe May 30 17:27:08 Tower kernel: [<ffffffff8107b46a>] note_interrupt+0x19d/0x227 May 30 17:27:08 Tower kernel: [<ffffffff81079460>] handle_irq_event_percpu+0xe0/0xf2 May 30 17:27:08 Tower kernel: [<ffffffff810794ae>] handle_irq_event+0x3c/0x5e May 30 17:27:08 Tower kernel: [<ffffffff8107bdeb>] handle_edge_irq+0xc3/0xdc May 30 17:27:08 Tower kernel: [<ffffffff8100cf66>] handle_irq+0x1a/0x24 May 30 17:27:08 Tower kernel: [<ffffffff8100c9f8>] do_IRQ+0x49/0xcd May 30 17:27:08 Tower kernel: [<ffffffff815f7cee>] common_interrupt+0x6e/0x6e May 30 17:27:08 Tower kernel: <EOI> May 30 17:27:08 Tower kernel: handlers: May 30 17:27:08 Tower kernel: [<ffffffffa0077dea>] Output from cat /proc/interrupts: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 502701 0 0 0 0 0 0 0 IO-APIC-edge timer 1: 1 0 0 2 0 0 0 0 IO-APIC-edge i8042 8: 0 0 0 31 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 0 3 0 0 0 0 IO-APIC-edge i8042 16: 0 0 0 0 0 0 415950 0 IO-APIC 16-fasteoi vfio-intx(0000:01:00.0) 17: 0 0 0 139 0 0 0 0 IO-APIC 17-fasteoi vfio-intx(0000:01:00.1) 18: 0 0 0 0 0 0 0 0 IO-APIC 18-fasteoi i801_smbus 19: 0 258079 0 0 0 0 0 0 IO-APIC 19-fasteoi ata_piix, ata_piix 21: 0 0 70 0 0 0 0 0 IO-APIC 21-fasteoi ehci_hcd:usb3 23: 0 0 0 0 22683 0 0 0 IO-APIC 23-fasteoi ehci_hcd:usb4 24: 2745742 0 0 0 0 0 0 0 HPET_MSI-edge hpet2 25: 0 2635305 0 0 0 0 0 0 HPET_MSI-edge hpet3 26: 0 0 1499462 0 0 0 0 0 HPET_MSI-edge hpet4 27: 0 0 0 1628165 0 0 0 0 HPET_MSI-edge hpet5 28: 0 0 0 0 1067375 0 0 0 HPET_MSI-edge hpet6 29: 0 0 0 0 0 0 0 0 DMAR_MSI-edge dmar0 30: 0 0 0 0 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME 31: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 32: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 33: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME 34: 0 0 0 0 0 38436 0 0 PCI-MSI-edge vfio-msix[0](0000:02:00.0) 35: 0 0 0 0 0 0 0 0 PCI-MSI-edge vfio-msix[1](0000:02:00.0) 36: 0 0 0 0 0 0 0 0 PCI-MSI-edge vfio-msix[2](0000:02:00.0) 37: 0 0 0 0 0 0 0 0 PCI-MSI-edge vfio-msix[3](0000:02:00.0) 38: 0 0 0 0 0 0 0 0 PCI-MSI-edge vfio-msix[4](0000:02:00.0) 39: 0 0 0 0 0 0 0 0 PCI-MSI-edge vfio-msix[5](0000:02:00.0) 40: 0 0 0 0 0 0 0 0 PCI-MSI-edge vfio-msix[6](0000:02:00.0) 41: 0 0 0 0 0 0 0 0 PCI-MSI-edge vfio-msix[7](0000:02:00.0) 42: 0 0 0 0 0 609579 0 0 PCI-MSI-edge eth0-rx-0 43: 0 0 0 0 0 0 601308 0 PCI-MSI-edge eth0-tx-0 44: 0 0 0 0 0 0 0 2 PCI-MSI-edge eth0 45: 0 0 1264 0 0 0 0 0 PCI-MSI-edge eth1-rx-0 46: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1-tx-0 47: 0 0 0 0 1 0 0 0 PCI-MSI-edge eth1 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 302 290 275 260 245 1267087 1027816 1006148 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts IWI: 0 0 2 0 0 1 0 3 IRQ work interrupts RTR: 7 0 0 0 0 0 0 0 APIC ICR read retries RES: 200537 200475 769912 889199 674601 1128501 846023 854416 Rescheduling interrupts CAL: 272 229 324 306 324 253 326 310 Function call interrupts TLB: 6105 5713 4787 4839 3520 4152 3505 3578 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 10 10 10 10 10 10 10 10 Machine check polls HYP: 0 0 0 0 0 0 0 0 Hypervisor callback interrupts ERR: 0 MIS: 0
May 31, 201610 yr A missed or unhandled IRQ is a program bug in my opinion, but a deep one, kernel, BIOS, drivers, or chipset firmware. To hopefully fix it, you would want the latest of each, in hopes they found and fixed it. You're using 6.1.7, so that's the first step, upgrade to 6.1.9 or possibly the 6.2-beta21, in order to get more recent kernels and drivers.
June 23, 201610 yr Author Quick update for anyone that stumbles upon this post in the future: I updated to 6.19 and added the irqpoll option to syslinux and the issue has not reoccurred.
Archived
This topic is now archived and is closed to further replies.