jeffreywhunter Posted April 22, 2015 Share Posted April 22, 2015 Just installed Docker and then created a container for Plex (which is running - default, not completely configured yet). Seeing the following in my syslog and on the console. Should I try to boot with the irqpoll option? Duh, how do you do that? I'm sure this means that there is some piece of hardware requesting an interrupt that something else has claim. Seems like IRQ 16 is the problem - which is where the AOC-SAS2LP-MV8 HBA controller resides. Tail of syslog (full syslog attached) Apr 22 15:54:17 HunterNAS-6 kernel: irq 16: nobody cared (try booting with the "irqpoll" option) (Errors) Apr 22 15:54:17 HunterNAS-6 kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.4-unRAID #1 (Errors) Apr 22 15:54:17 HunterNAS-6 kernel: Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 3603 11/09/2012 Apr 22 15:54:17 HunterNAS-6 kernel: 0000000000000000 ffff88041f203e18 ffffffff815f7e84 0000000000040001 Apr 22 15:54:17 HunterNAS-6 kernel: ffff88040ca0f600 ffff88041f203e48 ffffffff81075853 000000010029112f Apr 22 15:54:17 HunterNAS-6 kernel: ffff88040ca0f600 0000000000000000 0000000000000010 ffff88041f203e88 Apr 22 15:54:17 HunterNAS-6 kernel: Call Trace: (Errors) Apr 22 15:54:17 HunterNAS-6 kernel: <IRQ> [<ffffffff815f7e84>] dump_stack+0x4c/0x6e Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81075853>] __report_bad_irq+0x2b/0xbe Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81075c5e>] note_interrupt+0x19d/0x227 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81073dbc>] handle_irq_event_percpu+0xe0/0xf2 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff81073e0a>] handle_irq_event+0x3c/0x5e Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff810764bb>] handle_fasteoi_irq+0x7a/0xdb Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8100d45a>] handle_irq+0x1a/0x24 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff810895ee>] ? __tick_nohz_idle_enter+0x27e/0x308 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8100cefc>] do_IRQ+0x49/0xcd Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff815fdf2d>] common_interrupt+0x6d/0x6d Apr 22 15:54:17 HunterNAS-6 kernel: <EOI> [<ffffffff814eac9c>] ? cpuidle_enter_state+0x49/0x9f Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff814eac95>] ? cpuidle_enter_state+0x42/0x9f Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff814ead91>] cpuidle_enter+0x12/0x14 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8106d148>] cpu_startup_entry+0x19a/0x272 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff815eb460>] rest_init+0x80/0x84 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818aded9>] start_kernel+0x412/0x41f Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad8bd>] ? set_init_arg+0x56/0x56 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad120>] ? early_idt_handlers+0x120/0x120 Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad4c6>] x86_64_start_reservations+0x2a/0x2c Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff818ad5b6>] x86_64_start_kernel+0xee/0xfd Apr 22 15:54:17 HunterNAS-6 kernel: handlers: Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffff8149fec0>] usb_hcd_irq (Drive related) Apr 22 15:54:17 HunterNAS-6 kernel: [<ffffffffa00a25e6>] mvs_interrupt [mvsas] (Drive related) Apr 22 15:54:17 HunterNAS-6 kernel: Disabling IRQ #16 I did an lsdev, and there is mysas on IRQ 16, but I can't tell any conflicts from this. Device DMA IRQ I/O Ports ------------------------------------------------ 0000:00:02.0 f000-f03f 0000:00:19.0 f080-f09f 0000:00:1f.2 26 f060-f07f f0a0-f0a3 f0b0-f0b7 f0c0-f0c3 f0d0-f0d7 0000:00:1f.3 f040-f05f 0000:01:00.0 27 e000-e01f e020-e023 e030-e037 e040-e043 e050-e057 0000:05:00.0 28 d000-d00f d010-d013 d020-d027 d030-d033 d040-d047 ACPI 0400-0403 0404-0405 0408-040b 0410-0415 0420-042f 0450-0450 acpi 9 ahci d000-d00f d010-d013 d020-d027 d030-d033 d040-d047 e000-e01f e020-e023 e030-e037 e040-e043 e050-e057 f060-f07f f0a0-f0a3 f0b0-f0b7 f0c0-f0c3 f0d0-f0d7 cascade 4 dma 0080-008f dma1 0000-001f dma2 00c0-00df EC 0062-0062 0066-0066 ehci_hcd:usb2 23 eth0 25 fpu 00f0-00ff i8042 1 12 keyboard 0060-0060 0064-0064 mvsas 16 PCI 0000-0cf7 0cf8-0cff 0d00-ffff d000-dfff e000-efff pic1 0020-0021 pic2 00a0-00a1 pnp 0200-020f 0290-029f 0454-0457 0458-047f 04d0-04d1 0500-057f 0680-069f 164e-164f ffff-ffff ffff-ffff PNP0C04:00 00f0-00ff PNP0C09:00 0062-0062 0066-0066 rtc0 8 0070-0077 timer 0 timer0 0040-0043 timer1 0050-0053 vga+ 03c0-03df cat /proc/interrupts does show the SAS controller and the USB on the same IRQ? IO-APIC 16-fasteoi ehci_hcd:usb1, mvsas? Since Linux can't change the IRQ, not sure how to resolve this. My BIOS does not show any way to edit irq's... And I've disabled anything that's not being used in BIOS (including the USB 3.0 controller)... Thoughts? CPU0 CPU1 CPU2 CPU3 0: 13 0 0 0 IO-APIC-edge timer 1: 3 0 0 0 IO-APIC-edge i8042 8: 33 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 3 0 0 0 IO-APIC-edge i8042 16: 2513722 0 0 0 IO-APIC 16-fasteoi ehci_hcd:usb1, mvsas 23: 9469 0 0 0 IO-APIC 23-fasteoi ehci_hcd:usb2 25: 559950 0 0 0 PCI-MSI-edge 0000:00:1f.2 26: 13377 0 0 0 PCI-MSI-edge eth0 27: 260473 0 0 0 PCI-MSI-edge 0000:01:00.0 28: 565037 0 0 0 PCI-MSI-edge 0000:05:00.0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 141721 127078 121843 119970 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 IRQ work interrupts RTR: 3 0 0 0 APIC ICR read retries RES: 87963 5858 6259 4213 Rescheduling interrupts CAL: 72 117 73 126 Function call interrupts TLB: 1976 887 835 851 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 9 9 9 9 Machine check polls HYP: 0 0 0 0 Hypervisor callback interrupts ERR: 0 MIS: 0 syslog-2015-04-22.txt Link to comment
RobJ Posted April 22, 2015 Share Posted April 22, 2015 kernel: irq 16: nobody cared (try booting with the "irqpoll" option) This has been rather rare, and generally tough to solve. Something set up an interrupt call on IRQ 16, and it happened, but nobody answered the bell. That's a bug! Somewhere. The problem is trying to find who's to blame. It's low-level, almost always hardware related. In this case, you did the homework, which shows that there are 2 handlers that are *supposed* to handle any IRQ 16's, a USB driver and the mvsas driver. But low level code involved could be in the BIOS USB support, or one of the USB drivers, or in the SAS card BIOS/firmware, or in the mvsas driver module. Online, I found a number of 'IRQ #, nobody cared', involving the USB driver, so that puts some suspicion on it. Things to try - * update the motherboard BIOS (unlikely to help, but you never know) * update the SAS card firmware (might help, probably low chance though) * try the "irqpoll" option, just add the word to the append line in your syslinux.cfg (never heard of this helping anyone yet! plus it will likely affect system performance) * move USB connected devices to very different ports, because there are different USB drivers assigned to different pairs of ports (just might help, and easy to try) * replace motherboard or SAS card (sorry, sometimes that's the only choice left!) * wait for help from someone else who's been there, and fixed it or worked around it Link to comment
jeffreywhunter Posted April 23, 2015 Author Share Posted April 23, 2015 I thought this might be the case. One interesting observation, everythings working fine. I can access the USB w/o issue. I'm currently loading 1.8TB of movies on the system at 109MB/s. Shouldn't there be something not working? Link to comment
RobJ Posted April 23, 2015 Share Posted April 23, 2015 I thought this might be the case. One interesting observation, everythings working fine. I can access the USB w/o issue. I'm currently loading 1.8TB of movies on the system at 109MB/s. Shouldn't there be something not working? That's a good question. I took a look at your syslog, found that your motherboard architecture is setup as 2 USB buses, with bus 1 assigned IRQ 16 and bus 2 assigned IRQ 23. Your USB mouse, keyboard, and flash drive are all connected to bus 2, so you lucked out there. Syslog says bus 1 has 6 ports and bus 2 has 8 ports, but some are just pinouts on the motherboard, some may not even have pins, and some may be the ports on the front of the case. I think it's likely that some of the ports aren't working, once IRQ 16 was disabled. However mvsas is handling 5 of your drives, Disks 2, 3, 4, 6, and 7. I would normally think that the 5 drives would now be unresponsive. Nowhere can I find an actual IRQ assigned to mvsas, yet it shows an mvsas interrupt handler on IRQ 16, so it must be. Perhaps it is controlling the I/O differently, I don't know. Since the syslog you attached stops immediately after the disabling of IRQ 16, it's not clear that you actually accessed any of those drives. On boot, a parity sync began, which obviously read from them all, but the disabling happened just after 7.5 hours had passed, and I think by then the parity calc process had passed the 2GB mark, and was through with all 5 drives. Your fast writing to the drives is probably going to User Shares, so it is all being written directly to the SSD Cache drive, not the 5 drives. At 3:40am, the Mover will kick in and try to move the data from the Cache drive to the data drives, and then you will quickly find out if they are responding. Or you can try reading directly from one of those disks now (not from the Shares, from the disk itself). If that works, then mvsas is not using interrupt-driven I/O. Link to comment
jeffreywhunter Posted April 23, 2015 Author Share Posted April 23, 2015 Since the syslog you attached stops immediately after the disabling of IRQ 16, it's not clear that you actually accessed any of those drives. I moved nearly 2Tb of movie files after that syslog was generated. The share (Movies) was across two disks (Disk2 and Disk3). With high-water in place, it filled half of Disk2, then finished up on Disk3. So unRAID was using both disks to its fullest capacity...as far as I can tell. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.