TheTechnoPilot

August 23, 2021

Hi all,

So my UnRaid server has finally been running exactly as I desire it too over the last couple months, with my final upgrade in the past week to the GPU it will have for the foreseeable future. Now I’m down to dealing with some house keeping items and one of those is my RGB setup (because we all know RGB increases array read and write speeds! 😝). I’m running an ASUS X570 Impact DTX board with on-board ASUS Aura controlling two RGB fans and an EK waterblock plus all the motherboard lighting (so I can’t just use another USB only controller). I’ve been trying to pass this controller through to my primary Windows 10 VM so that the Aura software can control my lighting, but despite getting the Aura controller to show up in my electable USB device list in Windows using device ID pass-through of the single device with the name from the PCI-E device list (note that even having forced breakout of PCI-E groups for other purposes, it still shares it’s group with some other devices so I can’t pass through the whole PCI-E group) the ASUS software is claiming to not see the controller when I attempt to start it.

I would of course love to control this just natively in UnRaid so I don’t need to start my VM to have my lighting the way I want, but I would love to even just have that work at this point. Does anyone have any ideas of how I can either handle this natively within UnRaid, or get it to properly pass through to control in Windows? Thanks in advance for any tips!

February 10, 2020

So sadly, neither of those, nor direct override of passing the hardware IDs to the VFIO driver on boot has had any effect on the boot error...

@SpaceInvaderOne is there any chance you might be able to chime in with any thoughts?

The VM boot log:

-chardev socket,id=charmonitor,fd=24,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=localtime \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x8,chassis=6,id=pci.6,bus=pcie.0,multifunction=on,addr=0x1 \
-device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x7.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x7 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x7.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x7.0x2 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 \
-drive 'file=/mnt/user/domains/Windows 10/vdisk2.img,format=raw,if=none,id=drive-virtio-disk2,cache=writeback' \
-device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=1,write-cache=on \
-drive file=/mnt/user/isos/Windows10_Install.iso,format=raw,if=none,id=drive-sata0-0-0,readonly=on \
-device ide-cd,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=2 \
-drive file=/mnt/user/isos/virtio-win-0.1.160-1.iso,format=raw,if=none,id=drive-sata0-0-1,readonly=on \
-device ide-cd,bus=ide.1,drive=drive-sata0-0-1,id=sata0-0-1 \
-netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e4:5e:83,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=29,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-device vfio-pci,host=09:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
-device vfio-pci,host=09:00.1,id=hostdev1,bus=pci.5,addr=0x0 \
-device usb-host,hostbus=1,hostaddr=2,id=hostdev2,bus=usb.0,port=2 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2020-02-10 00:28:41.371+0000: Domain id=1 is tainted: high-privileges
2020-02-10 00:28:41.371+0000: Domain id=1 is tainted: host-cpu
char device redirected to /dev/pts/0 (label charserial0)
2020-02-10T00:28:43.974057Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
2020-02-10T00:28:43.974162Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest

Also see the attached log from my last boot.

coultonstudios-diagnostics-20200210-0034.zip

February 6, 2020

1 hour ago, Skitals said:

I can't say for certain. I required "video=efifb:off" even with a single nvidia card on my setup. I use two GPUs now, and only pass my non-primary GPU so it's no longer needed for my setup.

So, interestingly, adding that line after my ACS overrides didn't disable video during loading on reboot. Am I using it right? Interestingly, when I tried to start the VM again at that point, it did yank the card from UnRAID's command-line but Win10 didn't seem to grab it and the boot stalled with the same error and single stuck pinned logical core out of the 10 pairs assigned.

February 6, 2020

1 hour ago, Skitals said:

I replied in that thread, this kernel won't fix your issue.

Didn't really think it would fix this issue, just a worthwhile upgrade for me while I'm going through all this to improve overall compatibility across the board for my setup was my thinking.

February 6, 2020

9 minutes ago, Skitals said:

If I follow correctly when you start the win10 vm the monitor output doesn't change from unraid. It sounds like vfio is having trouble grabbing control of the gpu from unraid.

Add "video=efifb:off" as a boot flag to your syslinux.cfg. This will disable the framebuffer in unraid making the gpu fully available. This will disable video output after the bootloader screen, so no gui, you will need to launch your vm from the webgui on another machine.

If you want to be able to use GUI mode I believe you will need 2 gpus. The gui mode is only recommended for fixing networking issues if you can't access the webgui, so setting efifb:off shouldn't be a loss.

You are 100% correct and have no need or desire to run UnRaid in GUI mode, it was just to confirm the new card was functioning without issue. Normally I run it without and my only loss I feel doing this is seeing the IP address on bootup when working off a new network (use it for work on the sets of feature films).

I'll give this a try. Would you say though this behaviour for grabbing control would be graphics card dependent? As it had no issue previously when using the GTX970.

February 6, 2020

1 minute ago, Skitals said:

Whats your issue, specifically? Does your MacOS VM work, just looking for a vega reset fix? I've had no luck getting my 5700 XT working (at all, not reset related) in Catalina despite it having native bare metal support. AMD Mac OS is it's own can of worms I haven't tackled (yet).

I've not yet, but I'm having general VM issues I need to sort through first before diving deeper into working on getting it working in MacOS.

In trying to find solutions to this weird issue I stumbled upon the reset fix issue and seeing your work also seems to deal with board audio pass-through issues along with Ryzen sensors and I haven't bothered upgrading from 6.7.2 yet, this seemed like a good thing to look at and reason to upgrade.

Right now my MacOS VM is still running High Sierra and want to get that going first ideally on the new card, but need to fix this basic pass-through error first before even going there.

February 6, 2020

On 12/10/2019 at 4:18 PM, Skitals said:

build 2020.01.21:

6.8.0-rc5 kernel with

navi reset v2

vega reset patch* (used in conjunction with this script)

k10temp Ryzen patches from 1/18/20

(new) Fix for onboard audio passthrough on x570

*modified to add 0x6863 in addition to 0x687f

6.8.0-rc5-Skitals-2020.01.21.zip 18.16 MB · 25 downloads

Hey there bud, bit of a noobie here when it comes to this level. Running a 3900X on an ROG B450-I board with a Strix Vega 64 and still back on 6.7.2. Looks like upgrading to your build is my best bet now having just moved to the Vega 64 and discovering it is not a smooth sailing as I hoped (MacOS VMs so needed to keep to the Radeon camp). Sorry for asking such a basic question, but perhaps it could also help others, what is my best bet for upgrading to your specific build with these fixes as I've not done such an upgrade before.

February 5, 2020

Hi there @johngc, seems like where you are hanging looking at your XML is:

    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/988cac7a-49bb-4a9e-844a-f791ce1ffb0d_VARS-pure-efi.fd</nvram>

which should based on what seems to be your file structure actually read:

    <loader readonly='yes' type='pflash'>/mnt/disks/VMs/MacinaboxCatalina/ovmf/OVMF_CODE.fd</loader>
    <nvram>/mnt/disks/VMs/MacinaboxCatalina/ovmf/OVMF_VARS.fd</nvram>

this should hopefully get you un-hung on boot, though to completely correct everything you want to change

<vmtemplate xmlns="unraid" name="Windows 10" icon="default.png" os="Catalina"/>

back too:

<vmtemplate xmlns="unraid" name="Windows 10" icon="/mnt/disks/VMs/MacinaboxCatalina/icon/catalina.png" os="Catalina"/>

Hope that helps!

February 5, 2020

Hey all,

So I just picked up an ASUS ROG STRIX Radeon Vega 64 to replace my older Gigabyte GTX970 Mini in my UnRAID 6.7.2 build. I specifically went this route because I extensively use MacOS and wanted to move to a natively supported card for that VM so I didn't have to deal with the stupidities of trying to get an NVidia card supported in MacOS overall (after a few changes to the hardware which changed PCI express port mappings I was unable to get the GTX970 recognized in MacOS, but came up without any issue and fully functional in my Windows10 VM with the same XML settings for the card). I swapped the cards and upon reboot, everything seems to be working fine both in bios and across both command-line and GUI versions of UnRAID, but when I go to start one of the VMs (after changing the PCI ports in the XML to match the new change from 7 to 9 and remove the vBIOs injection I was using to make the GTX card work in a VM) the start of each of them hung after getting a green triangle on the VM without loosing UnRAID's interface from the monitor connected to the Vega 64 and pinning one of the assigned logical CPU cores. Looking in the VM logs I am getting:

2020-02-05T22:23:04.260722Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.1) Unrecoverable error detected. Please collect any data possible and then kill the guest
2020-02-05T22:23:04.260795Z qemu-system-x86_64: vfio_err_notifier_handler(0000:09:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest

I need help people!!! I knew I would have to do a little work on MacOS to remove Clover's forcing of use of the NVidia driver, etc., but expected that like bare metal hardware, Windows 10 should boot right up as long as I updated the PCI assignment from bus 7 to 9! I've included below my Win10 VM XML and also attached the diagnostics from both extensive attempts of both VMs last night and my reboot today with just primarily trying to start my Win10 VM.

PLEASE HELP! 😝

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Backblaze</name>
  <uuid>44cda2aa-66af-a307-7f6a-232c3dc374fd</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="/mnt/user/domains/Backblaze/backblaze.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>58720256</memory>
  <currentMemory unit='KiB'>58720256</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>20</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='14'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='15'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='16'/>
    <vcpupin vcpu='6' cpuset='5'/>
    <vcpupin vcpu='7' cpuset='17'/>
    <vcpupin vcpu='8' cpuset='6'/>
    <vcpupin vcpu='9' cpuset='18'/>
    <vcpupin vcpu='10' cpuset='7'/>
    <vcpupin vcpu='11' cpuset='19'/>
    <vcpupin vcpu='12' cpuset='8'/>
    <vcpupin vcpu='13' cpuset='20'/>
    <vcpupin vcpu='14' cpuset='9'/>
    <vcpupin vcpu='15' cpuset='21'/>
    <vcpupin vcpu='16' cpuset='10'/>
    <vcpupin vcpu='17' cpuset='22'/>
    <vcpupin vcpu='18' cpuset='11'/>
    <vcpupin vcpu='19' cpuset='23'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/44cda2aa-66af-a307-7f6a-232c3dc374fd_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='20' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/domains/Backblaze/vdisk1.img'/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/Windows10_Install.iso'/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.160-1.iso'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:60:31:97'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

coultonstudios-diagnostics-20200205-2114.zip coultonstudios-diagnostics-20200205-2224.zip

January 16, 2020

Thank you so much @johnnie.black and @trurl for all your help! I am back up and running with all my data fully parity synced and now just working on getting it all backed-up through a VM back into the cloud! While initially this all made me feel like my array was fragile, I now know that it is even more robust then I realized and have learned so much for going forward! Thanks again for stepping in and making that the case!

January 13, 2020

2 hours ago, johnnie.black said:

You can do that but if there are many sync errors it could be faster to just do a full parity sync, it depends on how much data changed on the other disk.

By that you just mean using new configuration but not flagging it as parity valid, correct?

January 12, 2020

On 1/8/2020 at 12:18 PM, trurl said:

Are you sure the missing data isn't on another disk? Easy to accidentally move something to some place you didn't intend, such as under another folder for example.

OMG, @trurl & @johnnie.black I figured out what an idiot I am! I accidentally somehow managed to put the wrong 8TB Barracuda drive into my case and therefore told the system to add what actually was my old Disk2 to the array as Disk4! Okay so that I don't bork everything, please confirm my understanding that at this point is that I should be able to go back through and use New Configuration (parity valid) option to correct this, putting the real Disk4 back into the array and then once that is done, due a full parity check (with write corrections to disk) to get myself back up and running correct?

I'm still amazed at my basic stupidity and how I somehow managed to put the wrong drive into the external enclosure...

January 8, 2020

3 minutes ago, trurl said:

Are you sure the missing data isn't on another disk? Easy to accidentally move something to some place you didn't intend, such as under another folder for example.

Honestly not at that folder size since my server should be honestly at just over 30TB (77%) space utilization and was at shut-down, now it is only showing about 28.5TB (74%). So while I wish that was the case, I don't suspect it being a possible explanation unfortunately...

Oh and also even if somehow someone (no authorized users though besides myself on the current network) deleted the files on the share, I am also using recyclebin with manual emptying only, so it should still be taking up the space on the array.

January 8, 2020

22 minutes ago, johnnie.black said:

It would be very weird, if not extremely unlikely, that if you used the old disk there's data missing, assuming it mounted directly without running xfs_repair.

That's what I'm thinking, it seems very odd for the data to be missing and while admittedly I only looked in the share for it (so perhaps not being recognized for being part of that share), the total size of used space on the disk doesn't support unraid thinking it is there in the native disk file system either though...

I'm wondering if potentially the filesystem on that drive got damaged and in essence lost the pointing to that folder and no longer considers it used space. That's the only explanation I can imagine for loosing essentially one whole folder from what I can tell. I'm tempted to run a filesystem check on the drive when I next continue my trouble shooting to see if it can repair perhaps such corruption to the directory structure (though I don't want it to go as aggressive as I saw on disk2 if I can help it because the files were so strune about, I totally recovered from backup only and deleted the recovered files). For @trurl who asked, unfortunately I've lost the ability to recover this directory from backup it seems when I checked this morning, because in my migration to unraid, my old build has been offline for too long to recover this directory from my Backblaze backup. 😞

I have taken the array off-line for now and will probably shut-down the server (unless any further diagnostics first would be of help), and then wait till a pack of new locking SATA cables I ordered an hour ago arrive on Friday. I think that's a good first step before I go any further working with the system to eliminate that one area of potential failure (these ones I just used admittedly are what came with the SATA controller I installed an are pretty generic, no-name, non-locking).

January 8, 2020

1 minute ago, trurl said:

So neither the emulated disk nor the actual disk mounted outside the array?

No, however, considering the available free space listing of the disk outside the array (before I added it back), it now seems wrong when I think back as it had 2.9TB free when it really should only be at about 1.3TB and this is the same when mounted in the array. Honestly with all that I have done in the last couple days though, having yet done a full parity check, I worry the existing parity is probably already wrong at this point. This is also why I preferred to bet on the original driving being intact.

January 8, 2020

1 minute ago, trurl said:

There were 8 hours between the post you quoted and your post. The post you quoted suggested some possible approaches. You don't mention if you actually did anything during those eight hours. Did you just examine the drive and discover you might be missing something, or did you do more than that?

Those 8 hrs I slept, I followed the new configuration, trusted parity instruction as the only change from having a functional full array to an unfunctional one last night was a reboot and accidental array start while seemingly missing a disk (not using ControlR to start the array again since I didn't pickup on the issue in its interface).

January 8, 2020

1 minute ago, johnnie.black said:

Do you mean on the emulated disk or the actual disk mounted outside the array?

On the actual disk when I put it back into the array via new configuration (trusted parity).

January 8, 2020

7 hours ago, johnnie.black said:

Note that the disk is showing a recent UNC @ LBA error, so it might have issues, it would be safer to rebuild to a new disk and keep that one intact for now, but if you want to re-enable it do a new config and trust parity, note that since the disk was mounted outside de array read/write you'll need to run a correcting parity check because a few sync errors are expected.

So...ugh it seems I am missing a bunch of data from the drive, is there harm if I pull it and put in a new drive for rebuild from the existing parity? Am I just asking for issues doing that? I really don't understand how I could be missing whole sections of data from the drive, literally it is like it is missing it's portion an entire share. Approx 1.5TB worth. I am now suspecting the new SATA cables I put in might be flaky as this drive wasn't on the new SATA controller but is on a new cable. Either way, I am not going to do anything till I replace the cables to be sure.

January 8, 2020

On 1/6/2020 at 4:06 AM, johnnie.black said:
Yes, UUID will be duplicated, you can change the one from the UD device with:
xfs_admin -U generate /dev/sdX1

OMG, more god-damn issues!!! I rebooted my server and used the ControlR app to start the array, not realizing it seems my disk4 hadn't been detected I think and showed up as unmountable? Now stopping the array of course it is recognizing it as new and wanting to do a parity rebuild on it...but the disk itself is definitely still mountable from UD. johnnie.black, can you please walk me through the steps to re-add it back to the array without rebuilding it as it seems to be mounting fine outside of and definitely don't want to rebuild it to risk going through what I just did on disk2 (which I didn't end up using any of the rebuild data from and only from my backup and the original disk2). Please see the attached log in case there is something I am missing!

coultonstudios-diagnostics-20200108-0458.zip

January 7, 2020

On 1/6/2020 at 4:06 AM, johnnie.black said:
Yes, UUID will be duplicated, you can change the one from the UD device with:
xfs_admin -U generate /dev/sdX1

AMAZING!!! Thank you for that final tip! That quickly fixed the problem and am happy to say so far all files seem to have survived on the old disk and beginning to copy them all over, project by project, checking file integrities.

Also I have discovered what happened and why my VM trashed by rebuild, it seems that what had been my USB3.2 PCI-E device that I was passing through to MacOS on slot 1 became slot 2 when I installed the new SATA controller and that SATA controller which had been controlling the Parity Drive, Drive 1, and Drive 2 became slot 1 and got yanked from UnRAID when I started the VM and then no clue what happened when I unpaused it while it was still passed through. Lessons definitely learned, guess we can also flag this as solved thanks to you!!!

Now my build challenge is I am just trying to get my VM back up and running as I decided I should start the XLM from scratch after that.

January 5, 2020

13 hours ago, johnnie.black said:

That's expected, if it was unmountable in the begging of the rebuild it would be the same in the end, you need to check filesystem on disk2:

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Remove -n or nothing will de done and if it asks for -L use it.

Okay great! That got it mountable, though sadly all the files are strewn about in the lost+found folder...(none seem to have really survived outside that). The first log is the diagnostic file from that.

So my solution was to copy over all my backups, and then the files I need to from the the original disk2, however I ran into a very weird issue. With the array started, the original disk2 wouldn't succeed in mounting, so I decided to restart the system and try mounting it first. When I did that the disk mounted fine, but now when I then started the array, the new disk2 showed up as unmountable. To check and see if they are related, I shutdown the array and unmounted the original disk2 (note that this one is always being mounted from UD), then restarted the array and all drives showed up fine! I have included a second diagnostic from this reboot and weird symptoms as well, can you please take a look and tell me what is going on there? Do I have any reason not to trust the array? Or is it just some type of collision with the old disk2 and is there anyway to mount it in UD without causing it so I can transfer files from the old onto the new disk?

EDIT - Note that my own trouble shooting investigation seems to suggest it is because both drives share the same UUID? Is there anyway to change this on the old disk2 to allow it to be mounted after starting the array or blocking the new disk2 from mounting when pre-mounted?

coultonstudios-diagnostics-20200105-2225.zip coultonstudios-diagnostics-20200105-2232.zip

January 4, 2020

On 1/3/2020 at 1:43 PM, johnnie.black said:

Difficult to say for sure, the rebuild being cancelled might not a big deal, if the disk was being rebuilt with the same data as it was there before, i.e., if nothing was written to the emulated disk, canceling the rebuild by itself wouldn't be a problem, since anything before and after that sector would be as it was, maybe that file could be corrupted, now if something really happened during your rebuild, like errors on other disks, then yeah that data would be corrupt.

So, frustratingly, while the parity/sync has completed, drive2 is still showing up as unmountable/no file system... Could you take a peak johnnie.black and let me know what's the situation? What's my options here, is there anything that I can do?

coultonstudios-diagnostics-20200104-1942.zip

January 4, 2020

Thank you for you help johnnie.black! I decided to run out today and buy a new 10TB to replace that 8TB for this, especially since the 8TB is still mountable. Your procedure I am so happy to say has my array back started and rebuilding disk2 on this new drive at a fairly sustained 185MB/s and look forward to hopefully reporting back in success no later then Sunday morning! My only regret is that I obviously don't get to run my normal check of the new drive (I actually use another test method from a program within MacOS which doesn't just write out the drive but does two full-write and then read-outs plus a sustained random access test). Oh well, guess this will partially test it.

Oh BTW, yes the system showed disk2 in this case as unmountable. For protection sake have all dockers shut-down, not starting any VMs and only letting the system focus on the rebuild this time! NO RISKS!

January 3, 2020

21 minutes ago, johnnie.black said:

SO if I'm understanding correctly you already have a backup of disk2, or at least the most important data, if that's correct then yeah try rebuilding on top, if needed you can also try to backup any missing data using the old disk with UD.

Would there be any chance though what I am copying off has been corrupted during the attempted rebuild though? I would assume so...

January 3, 2020

Thanks johnnie.black! I did have one minor smart report on that disk in the past admittedly which is probably what you see in the log, but was minor enough I felt to ignore for the time being.

So before I go through with this procedure, first off yes, disk2 still shows as mountable (does succeed in mounting in UD) and does have the generally correct used space. However currently I don't have a spare disk to throw in the slot that is large enough that isn't already an almost complete backup for the data on disk2. I would think that in the circumstance, I couldn't trust the validity of the data on the drive if is started parity operation, or because it was all valid data there being replaced identically with itself it would remain valid...(though the sudden super fast rebuild makes me question)

I also want to check your thoughts on why as I described in my initial post:

21 hours ago, TheTechnoPilot said:

All looked well with it parity sync going at 150MB/s, so I then went to start up my MacOS VM that lives predominately on my cache drive (primary virtual disk location), after boot-up I noticed that for some odd reason my sync was now paused, so I unpaused it (was at most a 1-2% at that point). Suddenly the sync said it was running at 22GB/s and took off chugging through up to 10% rebuild which made no sense for having it running for only 5 minutes at this point, so I immediately paused the sync again and hard stopped the virtual machine. I then tried cancelling the rebuild and taking the array offline and restarting the system, hoping for the option to restart the parity sync

Was booting a VM that mainly lives on the cache drive a mistake that UnRAID tried to protect me against by pausing the parity/sync? I kinda want to understand what happened before bringing things back online if that makes sense.

TheTechnoPilot

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by TheTechnoPilot

Attempting to Pass-Through ASUS Aura Controller to Windows 10

Error when trying to pass-through Vega 64 🤨

Error when trying to pass-through Vega 64 🤨

[KERNEL] 6.8.0-RC5 w/ navi reset v2, vega reset, k10temp Ryzen patches, x570 onboard audio fix

Error when trying to pass-through Vega 64 🤨

[KERNEL] 6.8.0-RC5 w/ navi reset v2, vega reset, k10temp Ryzen patches, x570 onboard audio fix

[KERNEL] 6.8.0-RC5 w/ navi reset v2, vega reset, k10temp Ryzen patches, x570 onboard audio fix

Another GPU Passthrough struggles thread (i7 CPU and AMD GPU)

Error when trying to pass-through Vega 64 🤨

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!

Rebuild Issue!!!