Jump to content

TheTechnoPilot

Members
  • Content Count

    30
  • Joined

  • Last visited

Everything posted by TheTechnoPilot

  1. So sadly, neither of those, nor direct override of passing the hardware IDs to the VFIO driver on boot has had any effect on the boot error... @SpaceInvaderOne is there any chance you might be able to chime in with any thoughts? The VM boot log: -chardev socket,id=charmonitor,fd=24,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=localtime \ -no-hpet \ -no-shutdown \ -boot strict=on \ -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x1.0x1 \ -device pcie-root-port,port=0xa,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x2 \ -device pcie-root-port,port=0xb,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x3 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x8,chassis=6,id=pci.6,bus=pcie.0,multifunction=on,addr=0x1 \ -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x7.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x7 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x7.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x7.0x2 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 \ -drive 'file=/mnt/user/domains/Windows 10/vdisk2.img,format=raw,if=none,id=drive-virtio-disk2,cache=writeback' \ -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=1,write-cache=on \ -drive file=/mnt/user/isos/Windows10_Install.iso,format=raw,if=none,id=drive-sata0-0-0,readonly=on \ -device ide-cd,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=2 \ -drive file=/mnt/user/isos/virtio-win-0.1.160-1.iso,format=raw,if=none,id=drive-sata0-0-1,readonly=on \ -device ide-cd,bus=ide.1,drive=drive-sata0-0-1,id=sata0-0-1 \ -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e4:5e:83,bus=pci.1,addr=0x0 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=29,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -device vfio-pci,host=09:00.0,id=hostdev0,bus=pci.4,addr=0x0 \ -device vfio-pci,host=09:00.1,id=hostdev1,bus=pci.5,addr=0x0 \ -device usb-host,hostbus=1,hostaddr=2,id=hostdev2,bus=usb.0,port=2 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on 2020-02-10 00:28:41.371+0000: Domain id=1 is tainted: high-privileges 2020-02-10 00:28:41.371+0000: Domain id=1 is tainted: host-cpu char device redirected to /dev/pts/0 (label charserial0) 2020-02-10T00:28:43.974057Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest 2020-02-10T00:28:43.974162Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest Also see the attached log from my last boot. coultonstudios-diagnostics-20200210-0034.zip
  2. So, interestingly, adding that line after my ACS overrides didn't disable video during loading on reboot. Am I using it right? Interestingly, when I tried to start the VM again at that point, it did yank the card from UnRAID's command-line but Win10 didn't seem to grab it and the boot stalled with the same error and single stuck pinned logical core out of the 10 pairs assigned.
  3. Didn't really think it would fix this issue, just a worthwhile upgrade for me while I'm going through all this to improve overall compatibility across the board for my setup was my thinking.
  4. You are 100% correct and have no need or desire to run UnRaid in GUI mode, it was just to confirm the new card was functioning without issue. Normally I run it without and my only loss I feel doing this is seeing the IP address on bootup when working off a new network (use it for work on the sets of feature films). I'll give this a try. Would you say though this behaviour for grabbing control would be graphics card dependent? As it had no issue previously when using the GTX970.
  5. I've not yet, but I'm having general VM issues I need to sort through first before diving deeper into working on getting it working in MacOS. In trying to find solutions to this weird issue I stumbled upon the reset fix issue and seeing your work also seems to deal with board audio pass-through issues along with Ryzen sensors and I haven't bothered upgrading from 6.7.2 yet, this seemed like a good thing to look at and reason to upgrade. Right now my MacOS VM is still running High Sierra and want to get that going first ideally on the new card, but need to fix this basic pass-through error first before even going there.
  6. Hey there bud, bit of a noobie here when it comes to this level. Running a 3900X on an ROG B450-I board with a Strix Vega 64 and still back on 6.7.2. Looks like upgrading to your build is my best bet now having just moved to the Vega 64 and discovering it is not a smooth sailing as I hoped (MacOS VMs so needed to keep to the Radeon camp). Sorry for asking such a basic question, but perhaps it could also help others, what is my best bet for upgrading to your specific build with these fixes as I've not done such an upgrade before.
  7. Hi there @johngc, seems like where you are hanging looking at your XML is: <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/988cac7a-49bb-4a9e-844a-f791ce1ffb0d_VARS-pure-efi.fd</nvram> which should based on what seems to be your file structure actually read: <loader readonly='yes' type='pflash'>/mnt/disks/VMs/MacinaboxCatalina/ovmf/OVMF_CODE.fd</loader> <nvram>/mnt/disks/VMs/MacinaboxCatalina/ovmf/OVMF_VARS.fd</nvram> this should hopefully get you un-hung on boot, though to completely correct everything you want to change <vmtemplate xmlns="unraid" name="Windows 10" icon="default.png" os="Catalina"/> back too: <vmtemplate xmlns="unraid" name="Windows 10" icon="/mnt/disks/VMs/MacinaboxCatalina/icon/catalina.png" os="Catalina"/> Hope that helps!
  8. Hey all, So I just picked up an ASUS ROG STRIX Radeon Vega 64 to replace my older Gigabyte GTX970 Mini in my UnRAID 6.7.2 build. I specifically went this route because I extensively use MacOS and wanted to move to a natively supported card for that VM so I didn't have to deal with the stupidities of trying to get an NVidia card supported in MacOS overall (after a few changes to the hardware which changed PCI express port mappings I was unable to get the GTX970 recognized in MacOS, but came up without any issue and fully functional in my Windows10 VM with the same XML settings for the card). I swapped the cards and upon reboot, everything seems to be working fine both in bios and across both command-line and GUI versions of UnRAID, but when I go to start one of the VMs (after changing the PCI ports in the XML to match the new change from 7 to 9 and remove the vBIOs injection I was using to make the GTX card work in a VM) the start of each of them hung after getting a green triangle on the VM without loosing UnRAID's interface from the monitor connected to the Vega 64 and pinning one of the assigned logical CPU cores. Looking in the VM logs I am getting: 2020-02-05T22:23:04.260722Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest 2020-02-05T22:23:04.260795Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest I need help people!!! I knew I would have to do a little work on MacOS to remove Clover's forcing of use of the NVidia driver, etc., but expected that like bare metal hardware, Windows 10 should boot right up as long as I updated the PCI assignment from bus 7 to 9! I've included below my Win10 VM XML and also attached the diagnostics from both extensive attempts of both VMs last night and my reboot today with just primarily trying to start my Win10 VM. PLEASE HELP! 😝 <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm'> <name>Backblaze</name> <uuid>44cda2aa-66af-a307-7f6a-232c3dc374fd</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="/mnt/user/domains/Backblaze/backblaze.png" os="windows10"/> </metadata> <memory unit='KiB'>58720256</memory> <currentMemory unit='KiB'>58720256</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>20</vcpu> <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='14'/> <vcpupin vcpu='2' cpuset='3'/> <vcpupin vcpu='3' cpuset='15'/> <vcpupin vcpu='4' cpuset='4'/> <vcpupin vcpu='5' cpuset='16'/> <vcpupin vcpu='6' cpuset='5'/> <vcpupin vcpu='7' cpuset='17'/> <vcpupin vcpu='8' cpuset='6'/> <vcpupin vcpu='9' cpuset='18'/> <vcpupin vcpu='10' cpuset='7'/> <vcpupin vcpu='11' cpuset='19'/> <vcpupin vcpu='12' cpuset='8'/> <vcpupin vcpu='13' cpuset='20'/> <vcpupin vcpu='14' cpuset='9'/> <vcpupin vcpu='15' cpuset='21'/> <vcpupin vcpu='16' cpuset='10'/> <vcpupin vcpu='17' cpuset='22'/> <vcpupin vcpu='18' cpuset='11'/> <vcpupin vcpu='19' cpuset='23'/> </cputune> <os> <type arch='x86_64' machine='pc-i440fx-3.1'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/44cda2aa-66af-a307-7f6a-232c3dc374fd_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> </features> <cpu mode='host-passthrough' check='none'> <topology sockets='1' cores='20' threads='1'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/user/domains/Backblaze/vdisk1.img'/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/Windows10_Install.iso'/> <target dev='hda' bus='ide'/> <readonly/> <boot order='2'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/mnt/user/isos/virtio-win-0.1.160-1.iso'/> <target dev='hdb' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:60:31:97'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <memballoon model='none'/> </devices> </domain> coultonstudios-diagnostics-20200205-2114.zip coultonstudios-diagnostics-20200205-2224.zip
  9. Thank you so much @johnnie.black and @trurl for all your help! I am back up and running with all my data fully parity synced and now just working on getting it all backed-up through a VM back into the cloud! While initially this all made me feel like my array was fragile, I now know that it is even more robust then I realized and have learned so much for going forward! Thanks again for stepping in and making that the case!
  10. By that you just mean using new configuration but not flagging it as parity valid, correct?
  11. OMG, @trurl & @johnnie.black I figured out what an idiot I am! I accidentally somehow managed to put the wrong 8TB Barracuda drive into my case and therefore told the system to add what actually was my old Disk2 to the array as Disk4! Okay so that I don't bork everything, please confirm my understanding that at this point is that I should be able to go back through and use New Configuration (parity valid) option to correct this, putting the real Disk4 back into the array and then once that is done, due a full parity check (with write corrections to disk) to get myself back up and running correct? I'm still amazed at my basic stupidity and how I somehow managed to put the wrong drive into the external enclosure...
  12. Honestly not at that folder size since my server should be honestly at just over 30TB (77%) space utilization and was at shut-down, now it is only showing about 28.5TB (74%). So while I wish that was the case, I don't suspect it being a possible explanation unfortunately... Oh and also even if somehow someone (no authorized users though besides myself on the current network) deleted the files on the share, I am also using recyclebin with manual emptying only, so it should still be taking up the space on the array.
  13. That's what I'm thinking, it seems very odd for the data to be missing and while admittedly I only looked in the share for it (so perhaps not being recognized for being part of that share), the total size of used space on the disk doesn't support unraid thinking it is there in the native disk file system either though... I'm wondering if potentially the filesystem on that drive got damaged and in essence lost the pointing to that folder and no longer considers it used space. That's the only explanation I can imagine for loosing essentially one whole folder from what I can tell. I'm tempted to run a filesystem check on the drive when I next continue my trouble shooting to see if it can repair perhaps such corruption to the directory structure (though I don't want it to go as aggressive as I saw on disk2 if I can help it because the files were so strune about, I totally recovered from backup only and deleted the recovered files). For @trurl who asked, unfortunately I've lost the ability to recover this directory from backup it seems when I checked this morning, because in my migration to unraid, my old build has been offline for too long to recover this directory from my Backblaze backup. 😞 I have taken the array off-line for now and will probably shut-down the server (unless any further diagnostics first would be of help), and then wait till a pack of new locking SATA cables I ordered an hour ago arrive on Friday. I think that's a good first step before I go any further working with the system to eliminate that one area of potential failure (these ones I just used admittedly are what came with the SATA controller I installed an are pretty generic, no-name, non-locking).
  14. No, however, considering the available free space listing of the disk outside the array (before I added it back), it now seems wrong when I think back as it had 2.9TB free when it really should only be at about 1.3TB and this is the same when mounted in the array. Honestly with all that I have done in the last couple days though, having yet done a full parity check, I worry the existing parity is probably already wrong at this point. This is also why I preferred to bet on the original driving being intact.
  15. Those 8 hrs I slept, I followed the new configuration, trusted parity instruction as the only change from having a functional full array to an unfunctional one last night was a reboot and accidental array start while seemingly missing a disk (not using ControlR to start the array again since I didn't pickup on the issue in its interface).
  16. On the actual disk when I put it back into the array via new configuration (trusted parity).
  17. So...ugh it seems I am missing a bunch of data from the drive, is there harm if I pull it and put in a new drive for rebuild from the existing parity? Am I just asking for issues doing that? I really don't understand how I could be missing whole sections of data from the drive, literally it is like it is missing it's portion an entire share. Approx 1.5TB worth. I am now suspecting the new SATA cables I put in might be flaky as this drive wasn't on the new SATA controller but is on a new cable. Either way, I am not going to do anything till I replace the cables to be sure.
  18. OMG, more god-damn issues!!! I rebooted my server and used the ControlR app to start the array, not realizing it seems my disk4 hadn't been detected I think and showed up as unmountable? Now stopping the array of course it is recognizing it as new and wanting to do a parity rebuild on it...but the disk itself is definitely still mountable from UD. johnnie.black, can you please walk me through the steps to re-add it back to the array without rebuilding it as it seems to be mounting fine outside of and definitely don't want to rebuild it to risk going through what I just did on disk2 (which I didn't end up using any of the rebuild data from and only from my backup and the original disk2). Please see the attached log in case there is something I am missing! coultonstudios-diagnostics-20200108-0458.zip
  19. AMAZING!!! Thank you for that final tip! That quickly fixed the problem and am happy to say so far all files seem to have survived on the old disk and beginning to copy them all over, project by project, checking file integrities. Also I have discovered what happened and why my VM trashed by rebuild, it seems that what had been my USB3.2 PCI-E device that I was passing through to MacOS on slot 1 became slot 2 when I installed the new SATA controller and that SATA controller which had been controlling the Parity Drive, Drive 1, and Drive 2 became slot 1 and got yanked from UnRAID when I started the VM and then no clue what happened when I unpaused it while it was still passed through. Lessons definitely learned, guess we can also flag this as solved thanks to you!!! Now my build challenge is I am just trying to get my VM back up and running as I decided I should start the XLM from scratch after that.
  20. Okay great! That got it mountable, though sadly all the files are strewn about in the lost+found folder...(none seem to have really survived outside that). The first log is the diagnostic file from that. So my solution was to copy over all my backups, and then the files I need to from the the original disk2, however I ran into a very weird issue. With the array started, the original disk2 wouldn't succeed in mounting, so I decided to restart the system and try mounting it first. When I did that the disk mounted fine, but now when I then started the array, the new disk2 showed up as unmountable. To check and see if they are related, I shutdown the array and unmounted the original disk2 (note that this one is always being mounted from UD), then restarted the array and all drives showed up fine! I have included a second diagnostic from this reboot and weird symptoms as well, can you please take a look and tell me what is going on there? Do I have any reason not to trust the array? Or is it just some type of collision with the old disk2 and is there anyway to mount it in UD without causing it so I can transfer files from the old onto the new disk? EDIT - Note that my own trouble shooting investigation seems to suggest it is because both drives share the same UUID? Is there anyway to change this on the old disk2 to allow it to be mounted after starting the array or blocking the new disk2 from mounting when pre-mounted? coultonstudios-diagnostics-20200105-2225.zip coultonstudios-diagnostics-20200105-2232.zip
  21. So, frustratingly, while the parity/sync has completed, drive2 is still showing up as unmountable/no file system... Could you take a peak johnnie.black and let me know what's the situation? What's my options here, is there anything that I can do? coultonstudios-diagnostics-20200104-1942.zip
  22. Thank you for you help johnnie.black! I decided to run out today and buy a new 10TB to replace that 8TB for this, especially since the 8TB is still mountable. Your procedure I am so happy to say has my array back started and rebuilding disk2 on this new drive at a fairly sustained 185MB/s and look forward to hopefully reporting back in success no later then Sunday morning! My only regret is that I obviously don't get to run my normal check of the new drive (I actually use another test method from a program within MacOS which doesn't just write out the drive but does two full-write and then read-outs plus a sustained random access test). Oh well, guess this will partially test it. Oh BTW, yes the system showed disk2 in this case as unmountable. For protection sake have all dockers shut-down, not starting any VMs and only letting the system focus on the rebuild this time! NO RISKS!
  23. Would there be any chance though what I am copying off has been corrupted during the attempted rebuild though? I would assume so...
  24. Thanks johnnie.black! I did have one minor smart report on that disk in the past admittedly which is probably what you see in the log, but was minor enough I felt to ignore for the time being. So before I go through with this procedure, first off yes, disk2 still shows as mountable (does succeed in mounting in UD) and does have the generally correct used space. However currently I don't have a spare disk to throw in the slot that is large enough that isn't already an almost complete backup for the data on disk2. I would think that in the circumstance, I couldn't trust the validity of the data on the drive if is started parity operation, or because it was all valid data there being replaced identically with itself it would remain valid...(though the sudden super fast rebuild makes me question) I also want to check your thoughts on why as I described in my initial post: Was booting a VM that mainly lives on the cache drive a mistake that UnRAID tried to protect me against by pausing the parity/sync? I kinda want to understand what happened before bringing things back online if that makes sense.
  25. It is the original disk yes, but unfortunately it has already had a rebuild started on it that I cancelled when it took off claiming it was rebuilding at 22GB/s which was truly impossible across even a PCI interface, let alone SATA.. As I understand, I need to bring the array back online with the other disks, as just a parity rebuild on ST8000DM004-2CX188_WCT07A05 Oh I will just note that it originally needed a rebuild as mentioned, only because the device ID changed when I switched the underlying SATA controller on that disk to one that properly reported its ID. Perhaps I should have come on here first to see if one of you wizards could have told unraid to ignore the ID and trust the disk was the same.