• 6.8.0 RC1+RC4 corrupted QCOW2 vdisks on XFS! warning "unraid qcow2_free_clusters failed: Invalid argument" propably due compressed QCOW2 files


    bastl
    • Solved Minor

    Edit: retested with RC6

     

    Installing VMs on XFS array drives workes fine, same for BTRFS cache drive. No corruption found on the qcow2 vdisks so far with the same testings as before. Already existing qcow2 images with compression which got corrupted before in RC1-4 are shown no issues so far. Will have a look at it the next couple days. Compressing an uncompressed qcow2 also not producing corrupted vdisks. Looks like the patches on qemu 4.1.1 fixed my issues.

     

    ------------------------------------------------------------------------------

     

    EDIT: Edited the title for better understanding of the issue. Main issue is that qcow2 vdisks hosted on xfs formatted drives won't allow to install the guest os without issues. Installation will fail or lead to corrupted installs. Existing images can also be affected by this. Some reports about ext4 also effected and the warnings I've got using compressed qcow2 files on btrfs might be related to this. Affected qemu version is 4.1. Using RAW images should be fine. 

     

     

    First of all I did the the update from 6.7.1 to the 6.8.0RC1 on saturday. Everything went fine i thought. Except of some qemu arguments preventing 2 VMs with GPU passthrough to boot up (root-port-fix). Nothing else changed. No errors in the server logs. As on every weekday morning an extra Win7 VM started up automatically. Fine so far. On thuesday after an software update I had to restart the VM and it won't came back online. The VM showed some weired error I never saw before and after some searches on the web it was clear the file system corrupted somehow. I restored the vdisk from an backup and it booted back up. This time I didn't installed any update or used it like normal for office stuff. It idled for a couple minutes and I noticed the following errors in the VM logs.

     

     "unraid qcow2_free_clusters failed: Invalid argument"

    1088646363_invalidargument.thumb.png.9a69fe9aec53158beb2a63da69bc6ec2.png

     

    Restarting this time worked, even if it feels a bit slower as usual but the shown errors quickly counting up. Inside the VM I didn't noticed any performance degredations or errors so far. Looked like a false positive. Rebootet the VM again and it won't startup. I toke the vdisk and attached it to another VM and fired up chkdsk and it found hundreds of file system errors, trying to recover them to the point where either chkdsk finished with unrecoverable errors or it frooze completly.

     

    Time to check the other VMs I'am using with an qcow2 vdisk. And what a surprise a Linux Mint VM also showed this error after a couple minutes running. A played around a bit with the xml and removed a couple tweaks. Removed "discard='unmap'" "numatune memory mode='strict' nodeset='0'" and tried again. Same error. Everything else in the xml is on default and runs for almost 2 years now. I tried reverting back to different vdisks from back to september. All files after running a couple minutes showed some errors. The Win7 VM once reported a unreadable file and crashed, the next try on first boot everything fine. Some reboots where fine, some frooze, some reported filesystem corruptions. I tried it with different types of VMs, OVMF, seabios, Q35-3.0, i440fx-3.0 doesn't matter, always the same issue. The only thing that is the same on all VMs is that they use qcow2 as disk image format!?

     

    All the VMs are hosted on an single BTRFS NVME cache drive. I've even tried the vdisk for the Win7 VM sitting on the array. Same issue, after a couple minutes the errors popping up. I than tried various different backups back till march. Only VMs with directly passed through ssd/hdd are not affected by this.

     

    Is there anything I can try to prevent the vdisk corruption?

     

    Below are 2 xml files and the diagnostics from the server running since saturday.

     

    Win7 i440fx VM

    <?xml version='1.0' encoding='UTF-8'?>
    <domain type='kvm'>
      <name>Win7_Outlook</name>
      <uuid>0b67611b-12b3-d0fd-c02b-055394dd34dc</uuid>
      <metadata>
        <vmtemplate xmlns="unraid" name="Windows 7" icon="windows7.png" os="windows7"/>
      </metadata>
      <memory unit='KiB'>4194304</memory>
      <currentMemory unit='KiB'>4194304</currentMemory>
      <memoryBacking>
        <nosharepages/>
      </memoryBacking>
      <vcpu placement='static'>4</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='4'/>
        <vcpupin vcpu='1' cpuset='20'/>
        <vcpupin vcpu='2' cpuset='5'/>
        <vcpupin vcpu='3' cpuset='21'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
      </numatune>
      <os>
        <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
      </os>
      <features>
        <acpi/>
        <apic/>
      </features>
      <cpu mode='host-passthrough' check='none'>
        <topology sockets='1' cores='4' threads='1'/>
      </cpu>
      <clock offset='localtime'>
        <timer name='rtc' tickpolicy='catchup'/>
        <timer name='pit' tickpolicy='delay'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>restart</on_crash>
      <devices>
        <emulator>/usr/local/sbin/qemu</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2' cache='writeback' discard='unmap'/>
          <source file='/mnt/user/VMs/Win7_Outlook/WIN7_OUTLOOK.qcow2'/>
          <target dev='hdc' bus='scsi'/>
          <boot order='1'/>
          <address type='drive' controller='0' bus='0' target='0' unit='2'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw'/>
          <source file='/mnt/user/isos/Acronis/AcronisMedia.117iso.iso'/>
          <target dev='hda' bus='ide'/>
          <readonly/>
          <boot order='2'/>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
        <controller type='usb' index='0' model='ich9-ehci1'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci1'>
          <master startport='0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci2'>
          <master startport='2'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci3'>
          <master startport='4'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
        </controller>
        <controller type='scsi' index='0' model='virtio-scsi'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
        </controller>
        <controller type='pci' index='0' model='pci-root'/>
        <controller type='ide' index='0'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
        </controller>
        <controller type='virtio-serial' index='0'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
        </controller>
        <interface type='bridge'>
          <mac address='52:54:00:64:a8:e2'/>
          <source bridge='br0'/>
          <model type='virtio'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
        </interface>
        <serial type='pty'>
          <target type='isa-serial' port='0'>
            <model name='isa-serial'/>
          </target>
        </serial>
        <console type='pty'>
          <target type='serial' port='0'/>
        </console>
        <channel type='unix'>
          <target type='virtio' name='org.qemu.guest_agent.0'/>
          <address type='virtio-serial' controller='0' bus='0' port='1'/>
        </channel>
        <input type='tablet' bus='usb'>
          <address type='usb' bus='0' port='1'/>
        </input>
        <input type='mouse' bus='ps2'/>
        <input type='keyboard' bus='ps2'/>
        <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='de'>
          <listen type='address' address='0.0.0.0'/>
        </graphics>
        <video>
          <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
        </video>
        <memballoon model='virtio'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
        </memballoon>
      </devices>
    </domain>

    Mint Q35 VM

    <?xml version='1.0' encoding='UTF-8'?>
    <domain type='kvm' id='7'>
      <name>Mint</name>
      <uuid>065a6081-e954-0913-370d-b6001262fb61</uuid>
      <metadata>
        <vmtemplate xmlns="unraid" name="Debian" icon="linux-mint.png" os="debian"/>
      </metadata>
      <memory unit='KiB'>8388608</memory>
      <currentMemory unit='KiB'>8388608</currentMemory>
      <memoryBacking>
        <nosharepages/>
      </memoryBacking>
      <vcpu placement='static'>4</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='6'/>
        <vcpupin vcpu='1' cpuset='22'/>
        <vcpupin vcpu='2' cpuset='7'/>
        <vcpupin vcpu='3' cpuset='23'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
      </numatune>
      <resource>
        <partition>/machine</partition>
      </resource>
      <os>
        <type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
        <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
        <nvram>/etc/libvirt/qemu/nvram/065a6081-e954-0913-370d-b6001262fb61_VARS-pure-efi.fd</nvram>
      </os>
      <features>
        <acpi/>
        <apic/>
      </features>
      <cpu mode='host-passthrough' check='none'>
        <topology sockets='1' cores='4' threads='1'/>
      </cpu>
      <clock offset='utc'>
        <timer name='rtc' tickpolicy='catchup'/>
        <timer name='pit' tickpolicy='delay'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>restart</on_crash>
      <devices>
        <emulator>/usr/local/sbin/qemu</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2' cache='writeback' discard='unmap'/>
          <source file='/mnt/user/VMs/Mint/vdisk1.img'/>
          <backingStore/>
          <target dev='hdc' bus='scsi'/>
          <boot order='1'/>
          <alias name='scsi0-0-0-2'/>
          <address type='drive' controller='0' bus='0' target='0' unit='2'/>
        </disk>
        <controller type='usb' index='0' model='nec-xhci' ports='15'>
          <alias name='usb'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
        </controller>
        <controller type='scsi' index='0' model='virtio-scsi'>
          <alias name='scsi0'/>
          <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
        </controller>
        <controller type='pci' index='0' model='pcie-root'>
          <alias name='pcie.0'/>
        </controller>
        <controller type='pci' index='1' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='1' port='0x8'/>
          <alias name='pci.1'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
        </controller>
        <controller type='pci' index='2' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='2' port='0x9'/>
          <alias name='pci.2'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
        </controller>
        <controller type='pci' index='3' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='3' port='0xa'/>
          <alias name='pci.3'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
        </controller>
        <controller type='pci' index='4' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='4' port='0xb'/>
          <alias name='pci.4'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
        </controller>
        <controller type='pci' index='5' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='5' port='0xc'/>
          <alias name='pci.5'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
        </controller>
        <controller type='pci' index='6' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='6' port='0xd'/>
          <alias name='pci.6'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
        </controller>
        <controller type='pci' index='7' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='7' port='0xe'/>
          <alias name='pci.7'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
        </controller>
        <controller type='pci' index='8' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='8' port='0xf'/>
          <alias name='pci.8'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
        </controller>
        <controller type='pci' index='9' model='pcie-to-pci-bridge'>
          <model name='pcie-pci-bridge'/>
          <alias name='pci.9'/>
          <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
        </controller>
        <controller type='virtio-serial' index='0'>
          <alias name='virtio-serial0'/>
          <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
        </controller>
        <controller type='sata' index='0'>
          <alias name='ide'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
        </controller>
        <interface type='bridge'>
          <mac address='52:54:00:fd:86:8a'/>
          <source bridge='br0'/>
          <target dev='vnet1'/>
          <model type='virtio'/>
          <alias name='net0'/>
          <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
        </interface>
        <serial type='pty'>
          <source path='/dev/pts/1'/>
          <target type='isa-serial' port='0'>
            <model name='isa-serial'/>
          </target>
          <alias name='serial0'/>
        </serial>
        <console type='pty' tty='/dev/pts/1'>
          <source path='/dev/pts/1'/>
          <target type='serial' port='0'/>
          <alias name='serial0'/>
        </console>
        <channel type='unix'>
          <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-7-Mint/org.qemu.guest_agent.0'/>
          <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
          <alias name='channel0'/>
          <address type='virtio-serial' controller='0' bus='0' port='1'/>
        </channel>
        <input type='tablet' bus='usb'>
          <alias name='input0'/>
          <address type='usb' bus='0' port='3'/>
        </input>
        <input type='mouse' bus='ps2'>
          <alias name='input1'/>
        </input>
        <input type='keyboard' bus='ps2'>
          <alias name='input2'/>
        </input>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
          </source>
          <alias name='hostdev0'/>
          <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
        </hostdev>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
          </source>
          <alias name='hostdev1'/>
          <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
        </hostdev>
        <hostdev mode='subsystem' type='usb' managed='no'>
          <source>
            <vendor id='0x04fc'/>
            <product id='0x0003'/>
            <address bus='3' device='4'/>
          </source>
          <alias name='hostdev2'/>
          <address type='usb' bus='0' port='1'/>
        </hostdev>
        <hostdev mode='subsystem' type='usb' managed='no'>
          <source>
            <vendor id='0x0a12'/>
            <product id='0x0001'/>
            <address bus='3' device='3'/>
          </source>
          <alias name='hostdev3'/>
          <address type='usb' bus='0' port='2'/>
        </hostdev>
        <memballoon model='none'/>
      </devices>
      <seclabel type='dynamic' model='dac' relabel='yes'>
        <label>+0:+100</label>
        <imagelabel>+0:+100</imagelabel>
      </seclabel>
    </domain>

    unraid-diagnostics-20191016-1208.zip




    User Feedback

    Recommended Comments



    retested with RC5:

     

    No issues in all of my tests. Different guests(Mint, PopOS, Win7, Win10) are able to install without issues on xfs formatted array drives. Cache also works without any problems so far.

     

    The issue I had with compressed qcow2 vdisks is also gone. Old existing compressed files can be used without getting corrupted. Newly generated and compressed qcow2 files are also fine.

     

    Let's hope they fix it in future qemu versions.

    • Thanks 1
    Link to comment

    Is there any way of manually updating qemu to 4.1 in RC5 or upcoming builds? I won't be using qcow2, especially compressed. I just pass through the whole controller for my windows 10 vm. Since 4.0.1 doesnt allow me to use the previous patch:

     

    <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
    </qemu:commandline>


    Which means my pcie lanes are only running at 1x speed. That will be a problem.

     

     

    image.png

    Edited by darthcircuit
    Link to comment
    13 minutes ago, darthcircuit said:

    Is there any way of manually updating qemu to 4.1 in RC5 or upcoming builds? I won't be using qcow2, especially compressed. I just pass through the whole controller for my windows 10 vm. Since 4.0.1 doesnt allow me to use the previous patch:

     

    
    <qemu:commandline>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.speed=8'/>
    <qemu:arg value='-global'/>
    <qemu:arg value='pcie-root-port.width=16'/>
    </qemu:commandline>


    Which means my pcie lanes are only running at 1x speed. That will be a problem.

     

     

    image.png

    Are you using a AMD GPU? if not you could switch the machine type back to i440fx and that would give you the correct speeds. 

    Link to comment
    14 minutes ago, david279 said:

    Are you using a AMD GPU? if not you could switch the machine type back to i440fx and that would give you the correct speeds. 

    I'm running an Nvidia 1080ti. I'd prefer to stay on q35 if i can. I haven't done a lot of testing on i440fx, but everything i've been reading from people with my same setup (Threadripper) says i'll get better perf on q35 on my windows VM.

     

    Even if 4.0.1 was patched to take the <qemu:commandline> option again, that would be better than nothing.

    Edited by darthcircuit
    Link to comment
    33 minutes ago, darthcircuit said:

    Even if 4.0.1 was patched to take the <qemu:commandline> option again, that would be better than nothing.

    Starting from 6.8 RC1 I had to remove the extra qemu commandlines and haven't noticed any graphics performance decreases yet. Not in RC1, RC4 or RC5

    Link to comment

    @darthcircuit 3-5fps more or less? come on 🤨 1080ti for 4k as the bare minimum and 60fps in most games on highest settings not even reachable. If you have a high refresh rate monitor and want a good 4k experience, get a 2080ti 😂

    Link to comment
    32 minutes ago, bastl said:

    @darthcircuit 3-5fps more or less? come on 🤨 1080ti for 4k as the bare minimum and 60fps in most games on highest settings not even reachable. If you have a high refresh rate monitor and want a good 4k experience, get a 2080ti 😂

    I'll use your argument from before. Why buy new hardware when this works fine for me? I never said I was wanting to use a high refresh rate or even 60fps (although most of the games i play run at that). I just want to game from my couch on my tv smoothly. 3-5 fps isn't much, but it helps. Why should your edge case performance come at the expense of my edge case or vice versa? All I'm asking for is the option. I had the option before.

     

    If there's a problem with 4.1, just put in a disclaimer when it's selected that there's problems with qcow2. Or just patch 4.0.1 like it was in 3.x so i can fix it myself. Or let me manually update the binary myself. There's lots of solutions to this problem.

     

    Edited by darthcircuit
    Link to comment
    2 hours ago, darthcircuit said:

    Or let me manually update the binary myself.

    Or just be patient - this issue will get resolved very soon.

    • Thanks 1
    • Haha 1
    Link to comment

     

    25 minutes ago, limetech said:

    Or just be patient - this issue will get resolved very soon.

    I’m happy to wait :) just trying to figure out what the game plan is. 
     

    EDIT: just read through my last reply and I realized that I came across as rude. Wasn’t meaning to. Sorry about that. I’m just trying to get everything running as best as I can. I appreciate your hard work :) 

    Edited by darthcircuit
    Link to comment
    35 minutes ago, darthcircuit said:

     

    I’m happy to wait :) just trying to figure out what the game plan is. 
     

    EDIT: just read through my last reply and I realized that I came across as rude. Wasn’t meaning to. Sorry about that. I’m just trying to get everything running as best as I can. I appreciate your hard work :) 

    No worries.

    Link to comment
    20 hours ago, darthcircuit said:

    Which means my pcie lanes are only running at 1x speed. That will be a problem.

    FYI starting with Qemu 4.0 they only changed the naming. This custom arguments are still working.

     

     

     

    Edited by bastl
    • Thanks 2
    Link to comment

    Retested with RC6:

     

    Installing qcow2 VMs on XFS array drives workes fine, same for BTRFS cache drive. No corruption found on the qcow2 vdisks so far with the same testings as before. Already existing qcow2 images with compression which got corrupted before in RC1-4 are shown no issues so far. Will have a look at it the next couple days. Compressing an uncompressed qcow2 also not producing corrupted vdisks. Looks like the patches on qemu 4.1.1 fixed my issues.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.