rpetz

rpetz started following What is the current status of NVMe support? , Cant start VMs, and cant stop them - Unraid 6.2 beta 19 and Randomly crashing February 20, 2017

xvga?

rpetz posted a topic in VM Engine (KVM)

Hey guys So I'm building a second server at my house for the very specific purpose of handling VMs outside of my unraid storage/docker machine. I'm not using unraid for this second server, but I am using it as a bit of a jumping off point to figure out how to get nvidia GPU passthrough working on my server. I'm doing this because I have successfully gotten my two nvidia gtx 970s to pass through to my windows vm on unraid 6.1.9 without hypervisor, and 6.2 with hypervisor. However, I'm having issues with Code 43 errors in device manager when I try this from my own server. I've gone through and updated QEMU to the latest (2.5) and libvirt to the latest (1.3.1) and I've matched up the XML for my domain as closely to the XML generated from unraid. I'm using VFIO and not pci-stub, and I've verified through lspci -nnk that the correct drivers are being loaded for the graphics cards. I have a secondary nvidia gtx 730 that I'm using as the graphics card for the server as well. However, my machine still throws code 43 on me when I pass through my 970s. I have one discrepancy between my XML and the unraid XML and that's on the hostdevice tag - unraid has an extra attribute of xvga='yes' that my machine will not accept. When I try to save the XML through virsh edit it complains that that attribute is not supported. This leads me to believe that I'm missing or running an out of date package that unraid is running. I know that the schema for my XML is defined in /usr/share/libvirt/schemas/domaincommon.rng, but updating the schema file to support xvga doesn't actually provide the underlying library support for that package (at least, that's what I would assume). If anyone has any insight into this I'd greatly appreciate it - I've been banging my head against a wall for a long time with this. I'm running Ubuntu Server 16 (Xenial)...to eliminate variables I'm first documenting how to get it running on the same machine I run Unraid on, meaning the hardware is no different between unraid and my server (for now) ASRock Extreme6 mobo Intel Xeon 12 core processor 2x NVidia GeForce GTX 970s NVidia GeForce GTX 730 64gb DDR4 RAM 5x 2TB WD Red HDDs 2x 250GB Samsung 850 EVO SSDs 1x 512GB Samsung 950 EVO NVMe SSD

April 1, 2016

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz replied to rpetz's topic in General Support

wow that release came a lot quicker than I expected - unfortunately I've already started to explore other options outside of unraid for my setup, but almost all of them are falling on their faces with my particular hardware setup so I may very well wind up back at unraid this weekend in which case I'll give this a go thanks for following up though, it's good to hear that I may have this as a fall-back option!

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz replied to rpetz's topic in General Support

haven't even been able to get that far to see that issue - sounds like I might just move back to 6.1.9 until this is all sorted thanks for the info on your setup!

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz replied to rpetz's topic in General Support

I have the excact same issue. I moved my vms to the array during upgrade, to add the cache drive. When I startet a vm it showed the symptoms you described, even before adding the nvme cache drive. Some instances of other programs (MC, htop) also froze and hat to be restartet in another ssh session. It may be related to NVMe, because you also have a NVMe disk, but at this point, I dont think it does. Once I added the new cache (NVMe), moved the "system"-share with libvirt.img an all vms to the cache, they were working again. To sum it up, there are 2 issues: 1) A VM (at least windows) running a vDisk on the array, infinitly boots (without freezing). 2) Destroying a VM in the state mentioned above, will fail (resource busy) an lock up the WebGui and other stuff I believe running VMs directly from the array was never recommended, but it definitly worked with 6.1.9. Some of my vms had a second disk on the array for backup reasons, not anymore. Btw. some VMs (win10) bootet do the desktop, but I could not do anything apart from moving the mouse. My Server2012r2 VM never made it so far. Maybe some guest-agent issues, that begin once it gets loaded, which may be at diffrent times for diffrent VMs/OSs Yessss - exact same issues I can attest to running vdisks off of the array working flawlessly for me in 6.1.8 and 6.1.9 as well. While I would like to blame the move to the NVMe cache drive though, I'm technically not even involving the cache drive on my worst VM as it's vdisk is on a share on the array (called ArrayVDisks) and that share is set to not use the cache drive. Here's the state of my three total VMs: Atlas - Array only vdisk, infinitely boots, locks up server when force stopping Nyx - Array only vdisk, boots 75% of the time, unable to do anything but keyboard/mouse input once on desktop, locks up server when force stopping Endeavour - NVME Cache vdisk for boot/os, secondary array vdisk for permanent storage, boots 75% of the time, same responsiveness on desktop and force stopping as Nyx Neither Nyx nor Endeavour can be shutdown gracefully after a successful boot as it locks up the server Endeavour is the most telling of the issue as I reproduced this several times: 1. Launch VM, boot into windows, login 2. Launch UPlay (Ubisoft's version of Steam essentially) and noticed a patch was available for a game that's loaded onto my secondary drive (array vdisk) 3. Patch started to download, went about halfway in, then just stopped and never finished 4. Waited an hour, shut down the VM which tried to shutdown for twenty minutes without success and required a force shutdown of the VM - locking up the server and requiring a hard shutdown 5. Restarted the server and repeated steps 1-4 with the same result Hope this repro steps helps narrow down this issue

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz replied to rpetz's topic in General Support

Here's something interesting from my libvirt log: 2016-03-23 02:28:05.379+0000: 5808: info : libvirt version: 1.3.1 2016-03-23 02:28:05.379+0000: 5808: info : hostname: Hydra 2016-03-23 02:28:05.379+0000: 5808: warning : qemuDomainObjTaint:2223 : Domain id=1 name='Atlas' uuid=10486b00-4ece-cd7e-33b9-29bb551f8c89 is tainted: high-privileges 2016-03-23 02:28:05.379+0000: 5808: warning : qemuDomainObjTaint:2223 : Domain id=1 name='Atlas' uuid=10486b00-4ece-cd7e-33b9-29bb551f8c89 is tainted: host-cpu Could this be the reason this VM wont start and it locks up the system on shutdown?

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz replied to rpetz's topic in General Support

As an update - I ran a BTRFS scrub against all of my drives and none of them came back with any errors, so I'm really confused as to why I'm seeing csum errors in syslog EDIT: from what I'm reading (I'm a software engineer, but only semi-new to linux) the loop0 device that's throwing the csum warnings sounds like it actually represents the 'user' mount...so I suppose it makes sense that the drives are fine but there is something wrong with the loop0 mount? EDIT2: looks like the loop0 errors were from bringing forward my old docker.img file, I rebuilt that and the loop0 errors appear to have disappeared Okay so now the issue stil remains that the underlying disks are getting locked up while the VMs are trying to access them....if I try to shut down a VM it simply locks up the system, and if I try to force stop it instead from the web UI it throws an error stating that the device is busy and then locks up the system

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz replied to rpetz's topic in General Support

ehhh I spoke too soon - my VMs start but very quickly degrade The issue very much seems to be that the underlying disks are getting locked up, the guest OSs are running but once they start to try and access any of the shares they immediately stop responding

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz replied to rpetz's topic in General Support

okay I seem to have alleviated my issues - I was planning to replace my unraid usb key with a new one since the original key I was using was quite old and not one I would trust for a long time. I got a new key setup, transferred my unraid license, and rebuilt the unraid configuration from the ground up (fresh unraid install, not from a backup) without touching the data on the array. I now am able to start the VMs that I care about without issue. There is still one VM that is causing the symptoms described in the post above but I literally had just built that VM and I really don't mind destroying it and rebuilding it again. That being said this issue still isn't resolved because of the BTRFS errors I described above. I would like to resolve those as they appear to be pretty significant. Any ideas?

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

rpetz posted a topic in General Support

So I decided to give the unraid 6.2 beta a try because of the NVME and Nvidia/hyper-v support and I'm getting a really weird issue now. None of my VMs will load into windows, one of them did once (after probably ten unsuccessful tries) but hasn't again since. Basically I start the server, start the array, start a VM, and it just sits at the spinning Windows logo during boot indefinitely. I've let it sit for an hour with no change. I have tried stopping the VM, nothing - I have tried force stopping the VM and it always comes back with the following error: (obviously with a different process ID each time) Failed to terminate process 13447 with SIGKILL: Device or resource busy After that error appears the webUI locks up (I still get SSH access) and I cannot umount any of the disks so safely shut the machine down so I have to force restart it The vdisk for the VM is stored on my array, though it does use the cache layer. Obviously, the VM operated fine in 6.1.9 - however there is one major difference now and that is that my cache layer is no longer a pair of 256gb SSDs it is now a single 512gb NVME SSD. That being said I had a different VM running off of the NVME drive which was mounted into Unraid in my Go file in 6.1.9 that worked flawlessley before - I no longer have that VM. The vm logs from the webUI will never load, so I don't have that info to post up since I'm not aware of where those logs are stored (I'll be happy to grab those logs if someone could point me to where they are located). Here's one of the VM's XML config: <domain type='kvm' id='1'> <name>Nyx</name> <uuid>1583133c-98ee-3342-24da-45b22af1fbe4</uuid> <description>SQL Server</description> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='8'/> <vcpupin vcpu='1' cpuset='9'/> <vcpupin vcpu='2' cpuset='10'/> <vcpupin vcpu='3' cpuset='11'/> <vcpupin vcpu='4' cpuset='12'/> <vcpupin vcpu='5' cpuset='13'/> <vcpupin vcpu='6' cpuset='14'/> <vcpupin vcpu='7' cpuset='15'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor id='none'/> </hyperv> </features> <cpu mode='host-passthrough'> <topology sockets='1' cores='4' threads='2'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/ArrayVDisks/Nyx/vdisk1.img'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <controller type='usb' index='0' model='nec-xhci'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:18:e4:9d'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/8'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/8'> <source path='/dev/pts/8'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-Nyx/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='5900' autoport='yes' websocket='5700' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain> Of note in the syslog I do see I'm getting a lot of this error: Hydra kernel: BTRFS warning (device loop0): csum failed ino 3069 off 1785856 csum 2365913268 expected csum 1094680760 And regardless of VM execution I am seeing this error frequently now as well: (usually this is repeated over and over like it's trying to complete a task that isn't ever finishing correctly) Mar 22 10:11:03 Hydra kernel: ------------[ cut here ]------------ Mar 22 10:11:03 Hydra kernel: WARNING: CPU: 11 PID: 8076 at fs/btrfs/extent-tree.c:4180 btrfs_free_reserved_data_space_noquota+0x5b/0x7b() Mar 22 10:11:03 Hydra kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net vhost macvtap macvlan xt_nat veth iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 nf_nat ip_tables md_mod tun mxm_wmi x86_pkg_temp_thermal coretemp kvm_intel kvm i2c_i801 e1000e alx ptp mdio ahci pps_core nvme libahci wmi [last unloaded: md_mod] Mar 22 10:11:03 Hydra kernel: CPU: 11 PID: 8076 Comm: kworker/u48:3 Tainted: G W 4.4.5-unRAID #1 Mar 22 10:11:03 Hydra kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme6, BIOS P2.10 12/15/2015 Mar 22 10:11:03 Hydra kernel: Workqueue: writeback wb_workfn (flush-btrfs-6) Mar 22 10:11:03 Hydra kernel: 0000000000000000 ffff88102d5f3600 ffffffff8136891e 0000000000000000 Mar 22 10:11:03 Hydra kernel: 0000000000001054 ffff88102d5f3638 ffffffff8104a28a ffffffff812ab3d1 Mar 22 10:11:03 Hydra kernel: 0000000000004000 ffff88105359ec00 ffff880fa9dc3be0 ffff88102d5f3734 Mar 22 10:11:03 Hydra kernel: Call Trace: Mar 22 10:11:03 Hydra kernel: [<ffffffff8136891e>] dump_stack+0x61/0x7e Mar 22 10:11:03 Hydra kernel: [<ffffffff8104a28a>] warn_slowpath_common+0x8f/0xa8 Mar 22 10:11:03 Hydra kernel: [<ffffffff812ab3d1>] ? btrfs_free_reserved_data_space_noquota+0x5b/0x7b Mar 22 10:11:03 Hydra kernel: [<ffffffff8104a347>] warn_slowpath_null+0x15/0x17 Mar 22 10:11:03 Hydra kernel: [<ffffffff812ab3d1>] btrfs_free_reserved_data_space_noquota+0x5b/0x7b Mar 22 10:11:03 Hydra kernel: [<ffffffff812c27d0>] btrfs_clear_bit_hook+0x143/0x272 Mar 22 10:11:03 Hydra kernel: [<ffffffff812d8f25>] clear_state_bit+0x8b/0x155 Mar 22 10:11:03 Hydra kernel: [<ffffffff812d9227>] __clear_extent_bit+0x238/0x2c3 Mar 22 10:11:03 Hydra kernel: [<ffffffff812d96e3>] clear_extent_bit+0x12/0x14 Mar 22 10:11:03 Hydra kernel: [<ffffffff812d9c76>] extent_clear_unlock_delalloc+0x46/0x18f Mar 22 10:11:03 Hydra kernel: [<ffffffff8111df29>] ? igrab+0x32/0x46 Mar 22 10:11:03 Hydra kernel: [<ffffffff812d662d>] ? __btrfs_add_ordered_extent+0x288/0x2cf Mar 22 10:11:03 Hydra kernel: [<ffffffff812c65cd>] cow_file_range+0x300/0x3bd Mar 22 10:11:03 Hydra kernel: [<ffffffff812c7249>] run_delalloc_range+0x321/0x331 Mar 22 10:11:03 Hydra kernel: [<ffffffff812da2af>] writepage_delalloc.isra.14+0xaa/0x126 Mar 22 10:11:03 Hydra kernel: [<ffffffff812dc3d4>] __extent_writepage+0x150/0x1f7 Mar 22 10:11:03 Hydra kernel: [<ffffffff812dc6d1>] extent_write_cache_pages.isra.10.constprop.24+0x256/0x30c Mar 22 10:11:03 Hydra kernel: [<ffffffff812dcbcf>] extent_writepages+0x46/0x57 Mar 22 10:11:03 Hydra kernel: [<ffffffff812c4384>] ? btrfs_direct_IO+0x28e/0x28e Mar 22 10:11:03 Hydra kernel: [<ffffffff812c2f19>] btrfs_writepages+0x23/0x25 Mar 22 10:11:03 Hydra kernel: [<ffffffff810c2bbf>] do_writepages+0x1b/0x24 Mar 22 10:11:03 Hydra kernel: [<ffffffff8112945b>] __writeback_single_inode+0x3d/0x151 Mar 22 10:11:03 Hydra kernel: [<ffffffff81129a15>] writeback_sb_inodes+0x212/0x38e Mar 22 10:11:03 Hydra kernel: [<ffffffff81129c02>] __writeback_inodes_wb+0x71/0xa9 Mar 22 10:11:03 Hydra kernel: [<ffffffff81129de8>] wb_writeback+0x10b/0x195 Mar 22 10:11:03 Hydra kernel: [<ffffffff8112a37f>] wb_workfn+0x157/0x22b Mar 22 10:11:03 Hydra kernel: [<ffffffff8112a37f>] ? wb_workfn+0x157/0x22b Mar 22 10:11:03 Hydra kernel: [<ffffffff8105ac40>] process_one_work+0x194/0x2a0 Mar 22 10:11:03 Hydra kernel: [<ffffffff8105b5f6>] worker_thread+0x26b/0x353 Mar 22 10:11:03 Hydra kernel: [<ffffffff8105b38b>] ? rescuer_thread+0x285/0x285 Mar 22 10:11:03 Hydra kernel: [<ffffffff8105f870>] kthread+0xcd/0xd5 Mar 22 10:11:03 Hydra kernel: [<ffffffff8105f7a3>] ? kthread_worker_fn+0x137/0x137 Mar 22 10:11:03 Hydra kernel: [<ffffffff8161a43f>] ret_from_fork+0x3f/0x70 Mar 22 10:11:03 Hydra kernel: [<ffffffff8105f7a3>] ? kthread_worker_fn+0x137/0x137 Mar 22 10:11:03 Hydra kernel: ---[ end trace a5e83c137feb7195 ]--- I have attached my diagnostics file with this post. Here's my rig: ASRock Extreme6 MoBo - X99 chipset Intel Xeon 12 core processor 64GB DDR4 memory 10TB WD Red SATA array (5x2TB hard drives - one drive parity, four drives storage, usable 8TB) 512GB Samsung 950 NVME SSD NVidia GeForce GTX 970 GFX card (x2 - not in SLI - used for VM passthrough) NVidia GeForce GTX 730 GFX card (used for unraid video out as X99 chipsets do not have onboard graphics) All storage disks are formatted using BTRFS This server isn't doing anything critical yet so I'm not too worried about rolling it back just yet, but obviously I would like to get it running again if I could haha BTW I know that this is beta software, I don't expect it to be perfect. However, I just read through the entire 6.2b18 and 6.2b19 release threads and didn't find anyone complaining about this issue so I imagine it is something tied to me specifically that might be resolvable since the only crazy thing in my setup is the NVME drive that others have reported as working fine. hydra-diagnostics-20160322-1034.zip

Randomly crashing

rpetz replied to rpetz's topic in General Support

So an update on this - the crashing is starting to seem less frequent over time but I think I may have pinned down what is causing it: Windows locking and automatically going into standby. From the points in time where the system crashes, it appears to be coinciding with when windows would start to kick in the standby state. I'm going to just switch to high performance mode for this VM because I honestly would rather have it in that mode anyways (not to mention windows isn't directly controlling the hard drives anyways), but I wanted to know if anyone else here had experienced these kinds of symptoms where that was the cause? It may not even be windows standby that's the issue, it might be related to the graphics card passthrough - but it definitely appears to be related to when the system starts to try to kick things on and off.

Randomly crashing

rpetz replied to rpetz's topic in General Support

Yea I ran the cpu stress test in excess of 20 hours and there weren't any issues...I'll keep logging the crashes to determine what might be causing the issue...

Randomly crashing

rpetz replied to rpetz's topic in General Support

yeah that's what I figured - honestly I only ever saw the MCE event once though so I'm not as worried about that error I just ran a memory test for several hours and everything seems to check out wtih that. I'm now running a CPU stress tester off of a bootable USB key and I'm gonna let it run until around this time tomorrow. Since it was locking up at least once a day, even at idle, then if this test works fine then it's gotta be something in unraid (which I would prefer as at least that gives me options I can change) I'll post up when the cpu stress test comes back

Randomly crashing

rpetz replied to rpetz's topic in General Support

Got a crash again in the middle of the night - here's the log: Mar 12 03:38:24 Hydra kernel: INFO: rcu_preempt detected stalls on CPUs/tasks: { 10} (detected by 22, t=60006 jiffies, g=985233, c=985232, q=2959) Mar 12 03:38:24 Hydra kernel: Task dump for CPU 10: Mar 12 03:38:24 Hydra kernel: qemu-system-x86 R running task 0 6701 1 0x00000008 Mar 12 03:38:24 Hydra kernel: 0000000000000000 ffff88103d2ebbc8 ffffffff8141c799 ffff881057957180 Mar 12 03:38:24 Hydra kernel: ffff88103d2ebcb8 0000001700000014 0000000000000000 ffff881057957180 Mar 12 03:38:24 Hydra kernel: ffff88103d2ebcb8 000000000000001a ffff881053016440 ffff88103d2ebc28 Mar 12 03:38:24 Hydra kernel: Call Trace: Mar 12 03:38:24 Hydra kernel: [<ffffffff8141c799>] ? modify_irte+0x95/0xbc Mar 12 03:38:24 Hydra kernel: [<ffffffff8141caf9>] ? intel_ioapic_set_affinity+0x137/0x177 Mar 12 03:38:24 Hydra kernel: [<ffffffff8141cfbf>] ? set_remapped_irq_affinity+0x19/0x1e Mar 12 03:38:24 Hydra kernel: [<ffffffff8107a0cd>] ? irq_do_set_affinity+0x17/0x45 Mar 12 03:38:24 Hydra kernel: [<ffffffff8107a1b3>] ? setup_affinity+0xb8/0xc3 Mar 12 03:38:24 Hydra kernel: [<ffffffff8107a973>] ? __setup_irq+0x2e7/0x42f Mar 12 03:38:24 Hydra kernel: [<ffffffff814864d4>] ? vfio_pci_set_intx_trigger+0x152/0x152 Mar 12 03:38:24 Hydra kernel: [<ffffffff8107ac33>] ? request_threaded_irq+0xff/0x13d Mar 12 03:38:24 Hydra kernel: [<ffffffff81485d00>] ? vfio_intx_set_signal+0x111/0x198 Mar 12 03:38:24 Hydra kernel: [<ffffffff8148647e>] ? vfio_pci_set_intx_trigger+0xfc/0x152 Mar 12 03:38:24 Hydra kernel: [<ffffffff8148676a>] ? vfio_pci_set_irqs_ioctl+0x92/0x9c Mar 12 03:38:24 Hydra kernel: [<ffffffff814852c1>] ? vfio_pci_ioctl+0x397/0x7be Mar 12 03:38:24 Hydra kernel: [<ffffffff8112c096>] ? fsnotify+0x267/0x27d Mar 12 03:38:24 Hydra kernel: [<ffffffff814815bc>] ? vfio_device_fops_unl_ioctl+0x1e/0x28 Mar 12 03:38:24 Hydra kernel: [<ffffffff8110c336>] ? do_vfs_ioctl+0x367/0x421 Mar 12 03:38:24 Hydra kernel: [<ffffffff81114053>] ? __fget+0x6c/0x78 Mar 12 03:38:24 Hydra kernel: [<ffffffff8110c429>] ? SyS_ioctl+0x39/0x64 Mar 12 03:38:24 Hydra kernel: [<ffffffff815f74ee>] ? system_call_fastpath+0x12/0x71

Randomly crashing

rpetz posted a topic in General Support

Hi all, I'm having an issue troubleshooting a random crashing issue with unraid 6.1.8. Basically the system will be running fine for hours and then lock up completely so I will have to do a hard reboot. This has happened consistently once a day this week, and today it happened twice (the system is only a week old so it's been happening since day one). I have a monitor plugged into the GT730 that Unraid outputs it's shell prompt to and every time it has locked up there is nothing visible on that shell prompt - so for the last few days I have kept a SSH window open tailing the syslog file. Initially I found that one of my WD Reds was failed and throwing a ton of write errors, so I pulled that from the array and shrunk the array a few days ago hoping that would resolve the lock ups but it has not. In my logs I have seen notably the following error around the time of a few of the crashes (except in the event that nothing gets logged before the lockup): INFO: rcu_sched self-detected stall on CPU From all of the search results Ive found this error seems to be tied to a lot of problems with RieserFS - however all of my drives have been formatted with BTRFS since day 1. I've also gotten this error once: mce: [Hardware Error]: Machine check events logged But I have zero clue where this is logged to since it says it should be in /var/log/mcelog but that file does not exist (or gets cleaned up on reboot and doesn't appear until an error occurs) I would love to provide you guys with the logs I took over the last few days but I made the unfortunate mistake of leaving them in a PuTTY window that windows decided was unimportant during the night when it decided to install windows updates >.< I have attached the zip file from running diagnostics. This is the system: Intel Xeon 2960 12c/24t 2.6ghz processor AsRock Extreme 6 (2011v3 socket, x99 chipset) motherboard 64gb DDR4 2133 (8 x 8GB kit) memory EVGA GeForce GTX970 graphics card (for host passthrough only) PNY GeForce GT730 graphics card (for unraid graphics only as motherboard does not have onboard graphics) 4 x 2TB WD Red hard drives (for array layer) 2 x 256gb Samsung 850 EVO SSDs (for cache layer) 1 x 512gb Samsung 950 PRO M.2 NVME hard drive (for the future, currently not in use but installed in the machine) Windows 10 VM: <domain type='kvm' id='1' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> <name>Endeavour</name> <uuid>e0bbbec9-44bc-3cc1-f71a-448bb65e0194</uuid> <description>Workstation</description> <metadata> <vmtemplate name="Custom" icon="windows.png" os="windows"/> </metadata> <memory unit='KiB'>25165824</memory> <currentMemory unit='KiB'>25165824</currentMemory> <memoryBacking> <nosharepages/> <locked/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='1'/> <vcpupin vcpu='2' cpuset='2'/> <vcpupin vcpu='3' cpuset='3'/> <vcpupin vcpu='4' cpuset='4'/> <vcpupin vcpu='5' cpuset='5'/> <vcpupin vcpu='6' cpuset='6'/> <vcpupin vcpu='7' cpuset='7'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough'> <topology sockets='1' cores='8' threads='1'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/nvme/Endeavour/vdisk1.img'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/ArrayVDisks/Endeavour/vdisk2.img'/> <backingStore/> <target dev='hdd' bus='virtio'/> <alias name='virtio-disk3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <controller type='usb' index='0'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:48:dd:b8'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/0'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/Endeavour.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x05e3'/> <product id='0x0732'/> <address bus='2' device='4'/> </source> <alias name='hostdev0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x046d'/> <product id='0xc52f'/> <address bus='1' device='10'/> </source> <alias name='hostdev1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x1b1c'/> <product id='0x1c07'/> <address bus='1' device='5'/> </source> <alias name='hostdev2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='yes'> <source> <vendor id='0x1532'/> <product id='0x0203'/> <address bus='1' device='4'/> </source> <alias name='hostdev3'/> </hostdev> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> <qemu:commandline> <qemu:arg value='-device'/> <qemu:arg value='ioh3420,bus=pci.0,addr=1c.0,multifunction=on,port=2,chassis=1,id=root.1'/> <qemu:arg value='-device'/> <qemu:arg value='vfio-pci,host=02:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on'/> <qemu:arg value='-device'/> <qemu:arg value='vfio-pci,host=00:1b.0,bus=root.1,addr=01.0'/> </qemu:commandline> </domain> Server 2012 VM: <domain type='kvm' id='2'> <name>Nyx</name> <uuid>1583133c-98ee-3342-24da-45b22af1fbe4</uuid> <description>SQL Server</description> <metadata> <vmtemplate name="Custom" icon="windows.png" os="windows"/> </metadata> <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <memoryBacking> <nosharepages/> <locked/> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='16'/> <vcpupin vcpu='1' cpuset='17'/> <vcpupin vcpu='2' cpuset='18'/> <vcpupin vcpu='3' cpuset='19'/> <vcpupin vcpu='4' cpuset='20'/> <vcpupin vcpu='5' cpuset='21'/> <vcpupin vcpu='6' cpuset='22'/> <vcpupin vcpu='7' cpuset='23'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> </hyperv> </features> <cpu mode='host-passthrough'> <topology sockets='1' cores='8' threads='1'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/ArrayVDisks/Nyx/vdisk1.img'/> <backingStore/> <target dev='hdb' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/ISOs/virtio-win-0.1.113.iso'/> <backingStore/> <target dev='hda' bus='ide'/> <readonly/> <boot order='2'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <controller type='usb' index='0'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='ide' index='0'> <alias name='ide'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:18:e4:9d'/> <source bridge='br0'/> <target dev='vnet1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/1'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/1'> <source path='/dev/pts/1'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/Nyx.org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='5900' autoport='yes' websocket='5700' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='vmvga' vram='16384' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain> hydra-diagnostics-20160311-1905.zip

What is the current status of NVMe support?

rpetz replied to dAigo's topic in General Support

well to say it 'doesnt' support it is not necessarily true...you can mount the drive through the shell without issue (it's quite simple to do actually) what you can't do (as far as I understand it) is assign it as a cache drive or even see it appear in the web gui at all

Posts

Joined

Last visited

Converted

rpetz's Achievements

Noob (1/14)

Reputation

xvga?

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Cant start VMs, and cant stop them - Unraid 6.2 beta 19

Randomly crashing

Randomly crashing

Randomly crashing

Randomly crashing

Randomly crashing

What is the current status of NVMe support?