March 12, 201610 yr Hi all, I'm having an issue troubleshooting a random crashing issue with unraid 6.1.8. Basically the system will be running fine for hours and then lock up completely so I will have to do a hard reboot. This has happened consistently once a day this week, and today it happened twice (the system is only a week old so it's been happening since day one). I have a monitor plugged into the GT730 that Unraid outputs it's shell prompt to and every time it has locked up there is nothing visible on that shell prompt - so for the last few days I have kept a SSH window open tailing the syslog file. Initially I found that one of my WD Reds was failed and throwing a ton of write errors, so I pulled that from the array and shrunk the array a few days ago hoping that would resolve the lock ups but it has not. In my logs I have seen notably the following error around the time of a few of the crashes (except in the event that nothing gets logged before the lockup): INFO: rcu_sched self-detected stall on CPU From all of the search results Ive found this error seems to be tied to a lot of problems with RieserFS - however all of my drives have been formatted with BTRFS since day 1. I've also gotten this error once: mce: [Hardware Error]: Machine check events logged But I have zero clue where this is logged to since it says it should be in /var/log/mcelog but that file does not exist (or gets cleaned up on reboot and doesn't appear until an error occurs) I would love to provide you guys with the logs I took over the last few days but I made the unfortunate mistake of leaving them in a PuTTY window that windows decided was unimportant during the night when it decided to install windows updates >.< I have attached the zip file from running diagnostics. This is the system: Intel Xeon 2960 12c/24t 2.6ghz processor AsRock Extreme 6 (2011v3 socket, x99 chipset) motherboard 64gb DDR4 2133 (8 x 8GB kit) memory EVGA GeForce GTX970 graphics card (for host passthrough only) PNY GeForce GT730 graphics card (for unraid graphics only as motherboard does not have onboard graphics) 4 x 2TB WD Red hard drives (for array layer) 2 x 256gb Samsung 850 EVO SSDs (for cache layer) 1 x 512gb Samsung 950 PRO M.2 NVME hard drive (for the future, currently not in use but installed in the machine) Windows 10 VM: <domain type='kvm' id='1' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> <name>Endeavour</name> <uuid>e0bbbec9-44bc-3cc1-f71a-448bb65e0194</uuid> <description>Workstation</description> <metadata> <vmtemplate name="Custom" icon="windows.png" os="windows"/> </metadata> <memory unit='KiB'>25165824</memory> <currentMemory unit='KiB'>25165824</currentMemory> <memoryBacking> <nosharepages/> <locked/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='1'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='3'/> <vcpupin vcpu='4' cpuset='4'/> <vcpupin vcpu='5' cpuset='5'/> <vcpupin vcpu='6' cpuset='6'/> <vcpupin vcpu='7' cpuset='7'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough'> <topology sockets='1' cores='8' threads='1'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/nvme/Endeavour/vdisk1.img'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/ArrayVDisks/Endeavour/vdisk2.img'/> <backingStore/> <target dev='hdd' bus='virtio'/> <alias name='virtio-disk3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <controller type='usb' index='0'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:48:dd:b8'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/0'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/Endeavour.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x05e3'/> <product id='0x0732'/> <address bus='2' device='4'/> </source> <alias name='hostdev0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x046d'/> <product id='0xc52f'/> <address bus='1' device='10'/> </source> <alias name='hostdev1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x1b1c'/> <product id='0x1c07'/> <address bus='1' device='5'/> </source> <alias name='hostdev2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x1532'/> <product id='0x0203'/> <address bus='1' device='4'/> </source> <alias name='hostdev3'/> </hostdev> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> <qemu:commandline> <qemu:arg value='-device'/> <qemu:arg value='ioh3420,bus=pci.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1'/> <qemu:arg value='-device'/> <qemu:arg value='vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on'/> <qemu:arg value='-device'/> <qemu:arg value='vfio-pci,host=00:1b.0,bus=root.1,addr=01.0'/> </qemu:commandline> </domain> Server 2012 VM: <domain type='kvm' id='2'> <name>Nyx</name> <uuid>1583133c-98ee-3342-24da-45b22af1fbe4</uuid> <description>SQL Server</description> <metadata> <vmtemplate name="Custom" icon="windows.png" os="windows"/> </metadata> <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <memoryBacking> <nosharepages/> <locked/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='16'/> <vcpupin vcpu='1' cpuset='17'/> <vcpupin vcpu='2' cpuset='18'/> <vcpupin vcpu='3' cpuset='19'/> <vcpupin vcpu='4' cpuset='20'/> <vcpupin vcpu='5' cpuset='21'/> <vcpupin vcpu='6' cpuset='22'/> <vcpupin vcpu='7' cpuset='23'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> </hyperv> </features> <cpu mode='host-passthrough'> <topology sockets='1' cores='8' threads='1'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/ArrayVDisks/Nyx/vdisk1.img'/> <backingStore/> <target dev='hdb' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/ISOs/virtio-win-0.1.113.iso'/> <backingStore/> <target dev='hda' bus='ide'/> <readonly/> <boot order='2'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <controller type='usb' index='0'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='ide' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:18:e4:9d'/> <source bridge='br0'/> <target dev='vnet1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/1'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/1'> <source path='/dev/pts/1'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/Nyx.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='5900' autoport='yes' websocket='5700' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='vmvga' vram='16384' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain> hydra-diagnostics-20160311-1905.zip
March 12, 201610 yr Author Got a crash again in the middle of the night - here's the log: Mar 12 03:38:24 Hydra kernel: INFO: rcu_preempt detected stalls on CPUs/tasks: { 10} (detected by 22, t=60006 jiffies, g=985233, c=985232, q=2959) Mar 12 03:38:24 Hydra kernel: Task dump for CPU 10: Mar 12 03:38:24 Hydra kernel: qemu-system-x86 R running task 0 6701 1 0x00000008 Mar 12 03:38:24 Hydra kernel: 0000000000000000 ffff88103d2ebbc8 ffffffff8141c799 ffff881057957180 Mar 12 03:38:24 Hydra kernel: ffff88103d2ebcb8 0000001700000014 0000000000000000 ffff881057957180 Mar 12 03:38:24 Hydra kernel: ffff88103d2ebcb8 000000000000001a ffff881053016440 ffff88103d2ebc28 Mar 12 03:38:24 Hydra kernel: Call Trace: Mar 12 03:38:24 Hydra kernel: [<ffffffff8141c799>] ? modify_irte+0x95/0xbc Mar 12 03:38:24 Hydra kernel: [<ffffffff8141caf9>] ? intel_ioapic_set_affinity+0x137/0x177 Mar 12 03:38:24 Hydra kernel: [<ffffffff8141cfbf>] ? set_remapped_irq_affinity+0x19/0x1e Mar 12 03:38:24 Hydra kernel: [<ffffffff8107a0cd>] ? irq_do_set_affinity+0x17/0x45 Mar 12 03:38:24 Hydra kernel: [<ffffffff8107a1b3>] ? setup_affinity+0xb8/0xc3 Mar 12 03:38:24 Hydra kernel: [<ffffffff8107a973>] ? __setup_irq+0x2e7/0x42f Mar 12 03:38:24 Hydra kernel: [<ffffffff814864d4>] ? vfio_pci_set_intx_trigger+0x152/0x152 Mar 12 03:38:24 Hydra kernel: [<ffffffff8107ac33>] ? request_threaded_irq+0xff/0x13d Mar 12 03:38:24 Hydra kernel: [<ffffffff81485d00>] ? vfio_intx_set_signal+0x111/0x198 Mar 12 03:38:24 Hydra kernel: [<ffffffff8148647e>] ? vfio_pci_set_intx_trigger+0xfc/0x152 Mar 12 03:38:24 Hydra kernel: [<ffffffff8148676a>] ? vfio_pci_set_irqs_ioctl+0x92/0x9c Mar 12 03:38:24 Hydra kernel: [<ffffffff814852c1>] ? vfio_pci_ioctl+0x397/0x7be Mar 12 03:38:24 Hydra kernel: [<ffffffff8112c096>] ? fsnotify+0x267/0x27d Mar 12 03:38:24 Hydra kernel: [<ffffffff814815bc>] ? vfio_device_fops_unl_ioctl+0x1e/0x28 Mar 12 03:38:24 Hydra kernel: [<ffffffff8110c336>] ? do_vfs_ioctl+0x367/0x421 Mar 12 03:38:24 Hydra kernel: [<ffffffff81114053>] ? __fget+0x6c/0x78 Mar 12 03:38:24 Hydra kernel: [<ffffffff8110c429>] ? SyS_ioctl+0x39/0x64 Mar 12 03:38:24 Hydra kernel: [<ffffffff815f74ee>] ? system_call_fastpath+0x12/0x71
March 12, 201610 yr MCE events are always hard to figure out, as they could be almost anything hardware related. I would start with a very long Memtest though (from the unRAID boot menu). I don't think that is related to your stalls though.
March 12, 201610 yr Author yeah that's what I figured - honestly I only ever saw the MCE event once though so I'm not as worried about that error I just ran a memory test for several hours and everything seems to check out wtih that. I'm now running a CPU stress tester off of a bootable USB key and I'm gonna let it run until around this time tomorrow. Since it was locking up at least once a day, even at idle, then if this test works fine then it's gotta be something in unraid (which I would prefer as at least that gives me options I can change) I'll post up when the cpu stress test comes back
March 13, 201610 yr Author Yea I ran the cpu stress test in excess of 20 hours and there weren't any issues...I'll keep logging the crashes to determine what might be causing the issue...
March 21, 201610 yr Author So an update on this - the crashing is starting to seem less frequent over time but I think I may have pinned down what is causing it: Windows locking and automatically going into standby. From the points in time where the system crashes, it appears to be coinciding with when windows would start to kick in the standby state. I'm going to just switch to high performance mode for this VM because I honestly would rather have it in that mode anyways (not to mention windows isn't directly controlling the hard drives anyways), but I wanted to know if anyone else here had experienced these kinds of symptoms where that was the cause? It may not even be windows standby that's the issue, it might be related to the graphics card passthrough - but it definitely appears to be related to when the system starts to try to kick things on and off.
Archived
This topic is now archived and is closed to further replies.