Jump to content

rottenpotatoes

Members
  • Content Count

    8
  • Joined

  • Last visited

Community Reputation

0 Neutral

About rottenpotatoes

  • Rank
    Newbie
  1. Additional troubleshooting that I've now done: I have tried rebuilding a new usb stick, by transferring my key to a new usb drive and importing all the drives and recreating my dockers. I didnt even get far past my first couple dockers before this behavior continued. I thought I would try a memtest to see if something wasnt right. Everytime (i did multiple tries), it would hang momentarily and reboot within seconds of choosing the memtest option. This made me think the ram needs reseating. Did that, and still cannot get a memtest to run.
  2. Example of how when it locks up I can still login via the console and issue commands like ‘reboot’ but they don’t actually work completely. The attached picture shows the output of my reboot command where it says “going down for reboot now” and all it’s subsequent output, however it’s been sitting like this for 10+ minutes now. Time unfortunately for another reset.
  3. Current issue: Server keeps locking up for some reason. Behavior: Running fine, then different things will start not working - Ill lose connection to a docker app, then all dockers, then the unraid web page. Even if the web page hasnt yet locked up, if i try to do a diagnostic download, it will just spin without generating anything. However Ill still be able to login via the console (not on the web page, but directly). But even then major stuff doesnt work like 'reboot'. The bios speaker beeps but nothing happens. No 'sigterm' messages, just sits. Requires a press of the reset button. Sometimes, (most but not every time), there are trace dumps put on the console where I can see them when I login directly. Other wierd behavior: Ive noticed that there are discrepancies between the 'top' output vs the dashboard on utilization. Sometimes, minutes before a lock up described above, when I notice things arent behaving, the Dashboard will show one cpu pegged at 100%, then another, then 4, 6, then more till lockup, however during this time, 'top' shows basically idle. Other times, when things are working properly, (in the screen shot attached, plex is working fine doing a transcode ) , 'top' will show way over 100% cpu usage, but the Dashboard will show less than 10% usage. Full story: Upgrading my server from: Intel i7 4970x, ASUS MAXIMUS VI EXTREME, 16GB DDR3 Gskill, 256GB Adata (nvme sata) cache drive To: Threadripper 2990wx, Asus Zenith Extreme Alpha, 128GB DDR4 3000 Gskill, dual 500GB WD black (nvme pcie x4) cache drives Same Array drives, and same pci cards. Not updating graphics cards as of yet. Moving Procedure and what went wrong: I changed all shares to not use the old cache drive, ran mover, and made sure it was clear. Then i brought the system down. I put all the old drives, which were in a separate sans digital enclosure, into the new case, swapped the flash drive over, and booted up. Things were working immediately on boot, so i re-enabled the cache usage, pointing now to the 2 new ssd drives, and started up dockers. Little did I know that things were about to go south. I started experiencing all sorts of issues. Pcie errors on the main console, File system errors, kernel panics. The system would lock up like above. Id reboot, and it would fail to even get to the login part on the console before a kernel panic or trace dump would freeze the system (requiring reset press). When I did get in, I disabled docker, and VMs, and set the array to not auto start. I tried removing all my extra PCIe devices leaving only the main graphics card. I was still getting pcie errors sometimes. I even tried copying the entirety of my flash drive to my laptop, rebuilding the flash drive with a fresh install using the Unraid install tool, and then grabbing my key and configs from the backup to the 'new' drive. I would still see all the same errors, yet sometimes boot in fine and see my drives and dockers and vms fine (not started of course). Between all this trial and error and multiple lock-ups and reboots, I was incredibly unlucky. Disk 5 (one of my 4TB drives) was corrupted. I know I did this next part wrong, its too late, Ive moved past it, Ive accepted I will have to reaquire 3TB of data. It started showing my disk as disabled, so I tried to remove the drive, so i could re-add it and rebuild. When I stopped the array, I removed it, restarted the array, stopped, re-added the drive, and started again. This allowed the rebuild option. Then shortly thereafter it locked up. Reset and now the drive was corrupted data but green status. I tried to get the array to rebuild the drive, but while it would fail to mount the drive, it would still show as green. I tried to do a parity check, that locked up the OS. I started the array in maintenance mode, then tried a filesystem check, and subsequent fix, but the system wouldnt do it saying I needed to do a parity rebuild, or something like that. When I stopped the array, I removed it, but then couldnt start the array, because it was greyed out, so I couldnt then re add it, to make unraid think I replaced the drive. Somewhere in all this, the format option came available, and I did that to the drive, thinking "ok, clean FS, then I can rebuild". I was wrong. It formatted, and wrote the clean drive to parity, and then the OS locked up again. I became pissed, and left alone for a day. I did some research into the crashing, and found in one of SpaceInvaderOne's older videos on Ryzen, to add the no callbacks option to aid in stability. I did this, and got the system to stay stable long enough to do a parity check. This only proved that the data was gone. So be it. Move along. Now that I have better stability, I was finally able to restart some dockers. Plex works, I got some other dockers up and running. Then still a lockup, reset, and a trace error or kernel panic stopping the OS from completely starting. Another reset, and then hours of stability, until another lockup. I have gone as long as 2 days, and as short as 30 minutes between lockups. I have yet to try any VMs, only dockers. I still only have the one graphics card attached, along with all my drives, but nothing else. Thats the story of how I got where Im at now. unraid-diagnostics-20190508-0943.zip
  4. I have put in a bug report as recommended. The person helping me there is requesting: Could you get a backtrace of qemu crashing - this might be easiest if your distro records core dumps somewhere. I dont know how to do this. Can someone help me figure out 1)what this is and 2) how to do it so i can get this info to the bug report? also fyi for anyone interested, my bug report and conversation: https://bugs.launchpad.net/qemu/+bug/1821054
  5. here is a copy of the syslog. The repeated lines delineate the lines written during the latest trial. Mar 20 12:17:19 unraid kernel: vfio-pci 0000:05:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none Mar 20 12:22:55 unraid kernel: vfio-pci 0000:05:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none Mar 20 12:22:55 unraid kernel: vfio-pci 0000:05:05.0: Refused to change power state, currently in D0 Mar 20 12:22:55 unraid kernel: vfio-pci 0000:05:05.1: Refused to change power state, currently in D0 Mar 20 12:22:55 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 20 12:22:55 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 20 12:22:55 unraid kernel: device vnet0 entered promiscuous mode Mar 20 12:22:55 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 20 12:22:55 unraid kernel: br0: port 2(vnet0) entered forwarding state Mar 20 12:22:57 unraid avahi-daemon[8902]: Joining mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fee9:40aa. Mar 20 12:22:57 unraid avahi-daemon[8902]: New relevant interface vnet0.IPv6 for mDNS. Mar 20 12:22:57 unraid avahi-daemon[8902]: Registering new address record for fe80::fc54:ff:fee9:40aa on vnet0.*. Mar 20 12:22:58 unraid kernel: qemu-system-x86[14591]: segfault at a8 ip 0000561f5568437a sp 00007fff097b28e0 error 4 in qemu-system-x86_64[561f5536d000+af2000] Mar 20 12:22:58 unraid kernel: Code: f9 ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 8b 6f 58 e8 3e df ff ff 48 89 df e8 e6 e9 ff ff <48> 8b 85 a8 00 00 00 48 85 c0 74 2e 8b 93 a0 00 00 00 39 90 a0 00 Mar 20 12:22:58 unraid avahi-daemon[8902]: Interface vnet0.IPv6 no longer relevant for mDNS. Mar 20 12:22:58 unraid avahi-daemon[8902]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fee9:40aa. Mar 20 12:22:58 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 20 12:22:58 unraid kernel: device vnet0 left promiscuous mode Mar 20 12:22:58 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 20 12:22:58 unraid avahi-daemon[8902]: Withdrawing address record for fe80::fc54:ff:fee9:40aa on vnet0. Mar 20 12:22:59 unraid kernel: vfio-pci 0000:05:05.1: Refused to change power state, currently in D0 Mar 20 12:22:59 unraid kernel: vfio-pci 0000:05:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
  6. In reply to your second message, this is the error i get when i try to attach the pci bridge you specify.
  7. in reply to your first message, the below is the only message i get.
  8. I am trying to setup a XP vm with hardware passthrough of an old radeon 9250 and creative labs sound blaster card. I have the cards hooked to my unraid server via a pcie to dual pci adapter (example can be found here: https://www.aliexpress.com/item/Desktop-PCI-Express-PCI-e-to-PCI-Adapter-Card-PCIe-to-Dual-Pci-Slot-Expansion-Card/32849431927.html ). My IOMMU groups show the adapter, sound card and what i believe to be the two outputs of the graphics card all in the same group. I think this is fine as all the devices are to be passed to my vm. IOMMU group 15:[12d8:e111] 01:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI Reversible Bridge (rev 02) [1274:5880] 02:04.0 Multimedia audio controller: Ensoniq 5880B / Creative Labs CT5880 (rev 02) [1002:5960] 02:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV280 [Radeon 9200 PRO] (rev 01) [1002:5940] 02:05.1 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV280 [Radeon 9200 PRO] (Secondary) (rev 01) however when i try to start the vm im greeted with a popup and the message in the title: "internal error: qemu unexpectedly closed the monitor" the syslog isnt much help either. I tried to start several times in succession so that the errors would be evident in the logs: Mar 7 20:57:52 unraid kernel: vfio-pci 0000:02:05.0: Refused to change power state, currently in D0 Mar 7 20:57:52 unraid kernel: vfio-pci 0000:02:05.1: Refused to change power state, currently in D0 Mar 7 20:57:52 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 7 20:57:52 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 20:57:52 unraid kernel: device vnet0 entered promiscuous mode Mar 7 20:57:52 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 7 20:57:52 unraid kernel: br0: port 2(vnet0) entered forwarding state Mar 7 20:57:52 unraid kernel: qemu-system-x86[11275]: segfault at a8 ip 000055f57ce3237a sp 00007ffea39fb520 error 4 in qemu-system-x86_64[55f57cb1b000+af2000] Mar 7 20:57:52 unraid kernel: Code: f9 ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 8b 6f 58 e8 3e df ff ff 48 89 df e8 e6 e9 ff ff <48> 8b 85 a8 00 00 00 48 85 c0 74 2e 8b 93 a0 00 00 00 39 90 a0 00 Mar 7 20:57:52 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 20:57:52 unraid kernel: device vnet0 left promiscuous mode Mar 7 20:57:52 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 20:57:53 unraid kernel: vfio-pci 0000:02:05.1: Refused to change power state, currently in D0 Mar 7 20:57:53 unraid kernel: vfio-pci 0000:02:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none Mar 7 21:08:20 unraid kernel: vfio-pci 0000:02:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none Mar 7 21:08:20 unraid kernel: vfio-pci 0000:02:05.0: Refused to change power state, currently in D0 Mar 7 21:08:20 unraid kernel: vfio-pci 0000:02:05.1: Refused to change power state, currently in D0 Mar 7 21:08:20 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 7 21:08:20 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 21:08:20 unraid kernel: device vnet0 entered promiscuous mode Mar 7 21:08:20 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 7 21:08:20 unraid kernel: br0: port 2(vnet0) entered forwarding state Mar 7 21:08:21 unraid kernel: qemu-system-x86[7453]: segfault at a8 ip 00005590982c637a sp 00007fff8132e7b0 error 4 in qemu-system-x86_64[559097faf000+af2000] Mar 7 21:08:21 unraid kernel: Code: f9 ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 8b 6f 58 e8 3e df ff ff 48 89 df e8 e6 e9 ff ff <48> 8b 85 a8 00 00 00 48 85 c0 74 2e 8b 93 a0 00 00 00 39 90 a0 00 Mar 7 21:08:21 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 21:08:21 unraid kernel: device vnet0 left promiscuous mode Mar 7 21:08:21 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 21:08:22 unraid kernel: vfio-pci 0000:02:05.1: Refused to change power state, currently in D0 Mar 7 21:08:22 unraid kernel: vfio-pci 0000:02:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none Mar 7 21:15:46 unraid kernel: vfio-pci 0000:02:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none Mar 7 21:15:46 unraid kernel: vfio-pci 0000:02:05.0: Refused to change power state, currently in D0 Mar 7 21:15:46 unraid kernel: vfio-pci 0000:02:05.1: Refused to change power state, currently in D0 Mar 7 21:15:46 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 7 21:15:46 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 21:15:46 unraid kernel: device vnet0 entered promiscuous mode Mar 7 21:15:46 unraid kernel: br0: port 2(vnet0) entered blocking state Mar 7 21:15:46 unraid kernel: br0: port 2(vnet0) entered forwarding state Mar 7 21:15:46 unraid kernel: qemu-system-x86[27299]: segfault at a8 ip 000055982823137a sp 00007ffd38193ae0 error 4 in qemu-system-x86_64[559827f1a000+af2000] Mar 7 21:15:46 unraid kernel: Code: f9 ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 53 48 89 fb 48 83 ec 08 48 8b 6f 58 e8 3e df ff ff 48 89 df e8 e6 e9 ff ff <48> 8b 85 a8 00 00 00 48 85 c0 74 2e 8b 93 a0 00 00 00 39 90 a0 00 Mar 7 21:15:46 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 21:15:46 unraid kernel: device vnet0 left promiscuous mode Mar 7 21:15:46 unraid kernel: br0: port 2(vnet0) entered disabled state Mar 7 21:15:47 unraid kernel: vfio-pci 0000:02:05.1: Refused to change power state, currently in D0 Mar 7 21:15:47 unraid kernel: vfio-pci 0000:02:05.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none vm xml below: <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm'> <name>Windows XP</name> <uuid>91230a79-b145-678a-dc42-52195269234d</uuid> <description>XP</description> <metadata> <vmtemplate xmlns="unraid" name="Windows XP" icon="windowsxp.png" os="windowsxp"/> </metadata> <memory unit='KiB'>3145728</memory> <currentMemory unit='KiB'>3145728</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>2</vcpu> <cputune> <vcpupin vcpu='0' cpuset='1'/> <vcpupin vcpu='1' cpuset='5'/> </cputune> <os> <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='1' threads='2'/> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/en_windows_xp_professional_with_service_pack_3_x86_cd_x14-80428.iso'/> <target dev='hda' bus='ide'/> <readonly/> <boot order='2'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Windows XP/vdisk1.img'/> <target dev='hdc' bus='ide'/> <boot order='1'/> <address type='drive' controller='0' bus='1' target='0' unit='0'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:e9:40:aa'/> <source bridge='br0'/> <model type='rtl8139'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x05' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x05' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x04' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x0461'/> <product id='0x4d17'/> </source> <address type='usb' bus='0' port='2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x062a'/> <product id='0x0201'/> </source> <address type='usb' bus='0' port='3'/> </hostdev> <memballoon model='none'/> </devices> </domain> Also in case its needed, here's my syslinux config: default menu.c32 menu title Lime Technology, Inc. prompt 0 timeout 50 label unRAID OS menu default kernel /bzimage append pcie_acs_override=downstream,multifunction initrd=/bzroot vfio_iommu_type1.allow_unsafe_interrupts=1 label unRAID OS GUI Mode kernel /bzimage append pcie_acs_override=downstream,multifunction vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot,/bzroot-gui label unRAID OS Safe Mode (no plugins, no GUI) kernel /bzimage append pcie_acs_override=downstream initrd=/bzroot unraidsafemode label unRAID OS GUI Safe Mode (no plugins) kernel /bzimage append pcie_acs_override=downstream initrd=/bzroot,/bzroot-gui unraidsafemode label Memtest86+ kernel /memtest Any ideas on how to get this to boot?