• 6.8.0 RC1+RC4 corrupted QCOW2 vdisks on XFS! warning "unraid qcow2_free_clusters failed: Invalid argument" propably due compressed QCOW2 files


    bastl
    • Solved Minor

    Edit: retested with RC6

     

    Installing VMs on XFS array drives workes fine, same for BTRFS cache drive. No corruption found on the qcow2 vdisks so far with the same testings as before. Already existing qcow2 images with compression which got corrupted before in RC1-4 are shown no issues so far. Will have a look at it the next couple days. Compressing an uncompressed qcow2 also not producing corrupted vdisks. Looks like the patches on qemu 4.1.1 fixed my issues.

     

    ------------------------------------------------------------------------------

     

    EDIT: Edited the title for better understanding of the issue. Main issue is that qcow2 vdisks hosted on xfs formatted drives won't allow to install the guest os without issues. Installation will fail or lead to corrupted installs. Existing images can also be affected by this. Some reports about ext4 also effected and the warnings I've got using compressed qcow2 files on btrfs might be related to this. Affected qemu version is 4.1. Using RAW images should be fine. 

     

     

    First of all I did the the update from 6.7.1 to the 6.8.0RC1 on saturday. Everything went fine i thought. Except of some qemu arguments preventing 2 VMs with GPU passthrough to boot up (root-port-fix). Nothing else changed. No errors in the server logs. As on every weekday morning an extra Win7 VM started up automatically. Fine so far. On thuesday after an software update I had to restart the VM and it won't came back online. The VM showed some weired error I never saw before and after some searches on the web it was clear the file system corrupted somehow. I restored the vdisk from an backup and it booted back up. This time I didn't installed any update or used it like normal for office stuff. It idled for a couple minutes and I noticed the following errors in the VM logs.

     

     "unraid qcow2_free_clusters failed: Invalid argument"

    1088646363_invalidargument.thumb.png.9a69fe9aec53158beb2a63da69bc6ec2.png

     

    Restarting this time worked, even if it feels a bit slower as usual but the shown errors quickly counting up. Inside the VM I didn't noticed any performance degredations or errors so far. Looked like a false positive. Rebootet the VM again and it won't startup. I toke the vdisk and attached it to another VM and fired up chkdsk and it found hundreds of file system errors, trying to recover them to the point where either chkdsk finished with unrecoverable errors or it frooze completly.

     

    Time to check the other VMs I'am using with an qcow2 vdisk. And what a surprise a Linux Mint VM also showed this error after a couple minutes running. A played around a bit with the xml and removed a couple tweaks. Removed "discard='unmap'" "numatune memory mode='strict' nodeset='0'" and tried again. Same error. Everything else in the xml is on default and runs for almost 2 years now. I tried reverting back to different vdisks from back to september. All files after running a couple minutes showed some errors. The Win7 VM once reported a unreadable file and crashed, the next try on first boot everything fine. Some reboots where fine, some frooze, some reported filesystem corruptions. I tried it with different types of VMs, OVMF, seabios, Q35-3.0, i440fx-3.0 doesn't matter, always the same issue. The only thing that is the same on all VMs is that they use qcow2 as disk image format!?

     

    All the VMs are hosted on an single BTRFS NVME cache drive. I've even tried the vdisk for the Win7 VM sitting on the array. Same issue, after a couple minutes the errors popping up. I than tried various different backups back till march. Only VMs with directly passed through ssd/hdd are not affected by this.

     

    Is there anything I can try to prevent the vdisk corruption?

     

    Below are 2 xml files and the diagnostics from the server running since saturday.

     

    Win7 i440fx VM

    <?xml version='1.0' encoding='UTF-8'?>
    <domain type='kvm'>
      <name>Win7_Outlook</name>
      <uuid>0b67611b-12b3-d0fd-c02b-055394dd34dc</uuid>
      <metadata>
        <vmtemplate xmlns="unraid" name="Windows 7" icon="windows7.png" os="windows7"/>
      </metadata>
      <memory unit='KiB'>4194304</memory>
      <currentMemory unit='KiB'>4194304</currentMemory>
      <memoryBacking>
        <nosharepages/>
      </memoryBacking>
      <vcpu placement='static'>4</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='4'/>
        <vcpupin vcpu='1' cpuset='20'/>
        <vcpupin vcpu='2' cpuset='5'/>
        <vcpupin vcpu='3' cpuset='21'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
      </numatune>
      <os>
        <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
      </os>
      <features>
        <acpi/>
        <apic/>
      </features>
      <cpu mode='host-passthrough' check='none'>
        <topology sockets='1' cores='4' threads='1'/>
      </cpu>
      <clock offset='localtime'>
        <timer name='rtc' tickpolicy='catchup'/>
        <timer name='pit' tickpolicy='delay'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>restart</on_crash>
      <devices>
        <emulator>/usr/local/sbin/qemu</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2' cache='writeback' discard='unmap'/>
          <source file='/mnt/user/VMs/Win7_Outlook/WIN7_OUTLOOK.qcow2'/>
          <target dev='hdc' bus='scsi'/>
          <boot order='1'/>
          <address type='drive' controller='0' bus='0' target='0' unit='2'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw'/>
          <source file='/mnt/user/isos/Acronis/AcronisMedia.117iso.iso'/>
          <target dev='hda' bus='ide'/>
          <readonly/>
          <boot order='2'/>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
        <controller type='usb' index='0' model='ich9-ehci1'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci1'>
          <master startport='0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci2'>
          <master startport='2'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci3'>
          <master startport='4'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
        </controller>
        <controller type='scsi' index='0' model='virtio-scsi'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
        </controller>
        <controller type='pci' index='0' model='pci-root'/>
        <controller type='ide' index='0'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
        </controller>
        <controller type='virtio-serial' index='0'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
        </controller>
        <interface type='bridge'>
          <mac address='52:54:00:64:a8:e2'/>
          <source bridge='br0'/>
          <model type='virtio'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
        </interface>
        <serial type='pty'>
          <target type='isa-serial' port='0'>
            <model name='isa-serial'/>
          </target>
        </serial>
        <console type='pty'>
          <target type='serial' port='0'/>
        </console>
        <channel type='unix'>
          <target type='virtio' name='org.qemu.guest_agent.0'/>
          <address type='virtio-serial' controller='0' bus='0' port='1'/>
        </channel>
        <input type='tablet' bus='usb'>
          <address type='usb' bus='0' port='1'/>
        </input>
        <input type='mouse' bus='ps2'/>
        <input type='keyboard' bus='ps2'/>
        <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='de'>
          <listen type='address' address='0.0.0.0'/>
        </graphics>
        <video>
          <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
        </video>
        <memballoon model='virtio'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
        </memballoon>
      </devices>
    </domain>

    Mint Q35 VM

    <?xml version='1.0' encoding='UTF-8'?>
    <domain type='kvm' id='7'>
      <name>Mint</name>
      <uuid>065a6081-e954-0913-370d-b6001262fb61</uuid>
      <metadata>
        <vmtemplate xmlns="unraid" name="Debian" icon="linux-mint.png" os="debian"/>
      </metadata>
      <memory unit='KiB'>8388608</memory>
      <currentMemory unit='KiB'>8388608</currentMemory>
      <memoryBacking>
        <nosharepages/>
      </memoryBacking>
      <vcpu placement='static'>4</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='6'/>
        <vcpupin vcpu='1' cpuset='22'/>
        <vcpupin vcpu='2' cpuset='7'/>
        <vcpupin vcpu='3' cpuset='23'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
      </numatune>
      <resource>
        <partition>/machine</partition>
      </resource>
      <os>
        <type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
        <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
        <nvram>/etc/libvirt/qemu/nvram/065a6081-e954-0913-370d-b6001262fb61_VARS-pure-efi.fd</nvram>
      </os>
      <features>
        <acpi/>
        <apic/>
      </features>
      <cpu mode='host-passthrough' check='none'>
        <topology sockets='1' cores='4' threads='1'/>
      </cpu>
      <clock offset='utc'>
        <timer name='rtc' tickpolicy='catchup'/>
        <timer name='pit' tickpolicy='delay'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>restart</on_crash>
      <devices>
        <emulator>/usr/local/sbin/qemu</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2' cache='writeback' discard='unmap'/>
          <source file='/mnt/user/VMs/Mint/vdisk1.img'/>
          <backingStore/>
          <target dev='hdc' bus='scsi'/>
          <boot order='1'/>
          <alias name='scsi0-0-0-2'/>
          <address type='drive' controller='0' bus='0' target='0' unit='2'/>
        </disk>
        <controller type='usb' index='0' model='nec-xhci' ports='15'>
          <alias name='usb'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
        </controller>
        <controller type='scsi' index='0' model='virtio-scsi'>
          <alias name='scsi0'/>
          <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
        </controller>
        <controller type='pci' index='0' model='pcie-root'>
          <alias name='pcie.0'/>
        </controller>
        <controller type='pci' index='1' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='1' port='0x8'/>
          <alias name='pci.1'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
        </controller>
        <controller type='pci' index='2' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='2' port='0x9'/>
          <alias name='pci.2'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
        </controller>
        <controller type='pci' index='3' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='3' port='0xa'/>
          <alias name='pci.3'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
        </controller>
        <controller type='pci' index='4' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='4' port='0xb'/>
          <alias name='pci.4'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
        </controller>
        <controller type='pci' index='5' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='5' port='0xc'/>
          <alias name='pci.5'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
        </controller>
        <controller type='pci' index='6' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='6' port='0xd'/>
          <alias name='pci.6'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
        </controller>
        <controller type='pci' index='7' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='7' port='0xe'/>
          <alias name='pci.7'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
        </controller>
        <controller type='pci' index='8' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='8' port='0xf'/>
          <alias name='pci.8'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
        </controller>
        <controller type='pci' index='9' model='pcie-to-pci-bridge'>
          <model name='pcie-pci-bridge'/>
          <alias name='pci.9'/>
          <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
        </controller>
        <controller type='virtio-serial' index='0'>
          <alias name='virtio-serial0'/>
          <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
        </controller>
        <controller type='sata' index='0'>
          <alias name='ide'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
        </controller>
        <interface type='bridge'>
          <mac address='52:54:00:fd:86:8a'/>
          <source bridge='br0'/>
          <target dev='vnet1'/>
          <model type='virtio'/>
          <alias name='net0'/>
          <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
        </interface>
        <serial type='pty'>
          <source path='/dev/pts/1'/>
          <target type='isa-serial' port='0'>
            <model name='isa-serial'/>
          </target>
          <alias name='serial0'/>
        </serial>
        <console type='pty' tty='/dev/pts/1'>
          <source path='/dev/pts/1'/>
          <target type='serial' port='0'/>
          <alias name='serial0'/>
        </console>
        <channel type='unix'>
          <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-7-Mint/org.qemu.guest_agent.0'/>
          <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
          <alias name='channel0'/>
          <address type='virtio-serial' controller='0' bus='0' port='1'/>
        </channel>
        <input type='tablet' bus='usb'>
          <alias name='input0'/>
          <address type='usb' bus='0' port='3'/>
        </input>
        <input type='mouse' bus='ps2'>
          <alias name='input1'/>
        </input>
        <input type='keyboard' bus='ps2'>
          <alias name='input2'/>
        </input>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
          </source>
          <alias name='hostdev0'/>
          <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
        </hostdev>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
          </source>
          <alias name='hostdev1'/>
          <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
        </hostdev>
        <hostdev mode='subsystem' type='usb' managed='no'>
          <source>
            <vendor id='0x04fc'/>
            <product id='0x0003'/>
            <address bus='3' device='4'/>
          </source>
          <alias name='hostdev2'/>
          <address type='usb' bus='0' port='1'/>
        </hostdev>
        <hostdev mode='subsystem' type='usb' managed='no'>
          <source>
            <vendor id='0x0a12'/>
            <product id='0x0001'/>
            <address bus='3' device='3'/>
          </source>
          <alias name='hostdev3'/>
          <address type='usb' bus='0' port='2'/>
        </hostdev>
        <memballoon model='none'/>
      </devices>
      <seclabel type='dynamic' model='dac' relabel='yes'>
        <label>+0:+100</label>
        <imagelabel>+0:+100</imagelabel>
      </seclabel>
    </domain>

    unraid-diagnostics-20191016-1208.zip




    User Feedback

    Recommended Comments



    I tested a lot of stuff and I think I found something. All the affected vdisks are qcow2 compressed. If I convert them back to uncompressed qcow2 files it looks like this error won't appear and the vdisks don't get corrupted. Did something changed in the current qemu build how to handle compressed qcow2 files or is this maybe a known issue already? Is it maybe an Unraid only issue?

     

    I used the following command to create the compressed files in case someone wanna test it. This way I save nearly 20GB space for a 100GB vdisk file with 55GB allocation. The compressed file is 35GB.

    qemu-img convert -O qcow2 -c -p uncompressed_image.qcow2 compressed_image.qcow2

     

    • Thanks 1
    Link to comment
    9 hours ago, limetech said:

    Changed Priority to Minor

    Just for my understanding, a bug that causes data loss isn't "urgent"?

    Link to comment
    56 minutes ago, bastl said:

    Just for my understanding, a bug that causes data loss isn't "urgent"?

    You posted a workaround: don't used compressed qcow2, it's not an option we offer either.

    Link to comment
    7 minutes ago, limetech said:

    it's not an option we offer either.

    I get the point, but qemu with all it's packages is part of unraid and "qemu-img convert" is even mentioned in the wiki. There is lots of stuff people have to manual edit files or use the cli for because there isn't a UI option build in unraid yet. And all are options.

     

    To sum things up, for my usecase with a couple VMs on the cache drive, I have to watch for the size of the files. I only have 500GB cache and the "OPTION" to save 20-30% space is always welcome 😉

     

    Link to comment

    Yes I understand but we don't explicitly test for that since it's not a "standard" option (yet) in VM manager.  The issue is still marked as a bug which we will look into but it's not a "show-stopper" type bug that requires us to drop everything we're doing and take a look.

     

    Could be this corruption is related to 'sqlite db corruption' issue which is a true show stopper for a lot of people.

    Link to comment
    8 hours ago, bastl said:

    Quick report on RC4. Issue still exists.

    Where do the actual image files exist?  On 'cache' device or on an array device?

    Link to comment
    1 hour ago, limetech said:

    Where do the actual image files exist?  On 'cache' device or on an array device?

    On a single 500GB 960 Evo BTRFS formated NVME in a none default share called "VMs" set as default storage path in VM manager "/mnt/user/VMs/". Didn't tried it on the array or UD

    Link to comment

    From your syslog I see you are using a Highpoint RocketNVME controller to access your nvme device, using the 'rsnvme' driver we included per user request.  First I would say, we're pretty disappointed in Highpoint.  Their r750 driver won't compile on newest kernels, and though compiles on previous 5.x kernels, it doesn't work.  In corresponding with Highpoint they basically say all their Linux engineers (plural is questionable) are busy, so go kick rocks.  I would say get a different controller.

     

    Also in syslog this line pops up a few times:

    Oct 12 15:05:06 UNRAID kernel: nvme nvme0: failed to set APST feature (-19)

     

    Further evidence the issue lies with a buggy driver.

     

    Having said that, it's always possible your SSD is failing.

     

    Link to comment
    46 minutes ago, limetech said:

    Highpoint RocketNVME controller

    For almost 2 years now I'am using this board (AsRock Fatality x399) and the onboard slots/controller for a Samsungs 960 Evo 500GB for cache and a Samsung 960 Pro 500GB for passthrough. I never had any issues before. Don't know why I should get an new controller if basically everything is functional except of using compressed qcow2 files on 6.8 RC.

    46 minutes ago, limetech said:

    Oct 12 15:05:06 UNRAID kernel: nvme nvme0: failed to set APST feature (-19)

    Not exactly sure if I see this since the beginning I'am using unraid (Nov 2017) but this warning I have for a long time with no signs of side effects. How I understood it, is this error/warning is more important in circumstances where if the device lets say a notebook gets into an sleep state the drive doesn't change it's power state correctly or gets dropped. When Unraid is running in my setup there are always VMs and Dockers up and running, utilizing the drive. So it never will go to a powersaving state, right? Warning is only reported once at boot of Unraid and never again. No errors or warnings in the logs even after a week.

     

     

    The following is from the Arch wiki. This is a mass market mainstream device and not a niche product, so it sounds really hard to believe that there won't be a driver support for it. Maybe the Unraid Kernal missing the patch for the APST issue.

    https://wiki.archlinux.org/index.php/Solid_state_drive/NVMe#Power_Saving_APST

    Samsung drive errors on Linux 4.10
    
    On Linux 4.10, drive errors can occur and causing system instability. This seems to be the result of a power saving state that the drive cannot use. Adding the kernel parameter nvme_core.default_ps_max_latency_us=5500[3][4] disables the lowest power saving state, preventing write errors. 

    There is also mentioned a patch for it, that should be already merged into mainline 4.11 Kernel.

     

    46 minutes ago, limetech said:

    Having said that, it's always possible your SSD is failing.

    The drives warranty is rated for "3 Years or 200 TBW" and with 80TB I'am not even close to that. No errors reported from the smart report as long as I can see.

    grafik.png.6761830f6ca8bca5352506ba46396e4e.png

     

    Not exactly sure how to interpret the following

    grafik.png.ae93406bb4383c069141b4952f29b824.png

    Sure, drives can fail, but in my case I can definitly reproduce that vdisk corruption. Tried it a couple times with a couple different VMs/backups already and I have setup a couple fresh VMs from sketch and NO errors as long as I don't compresse the vdisk to save 10-20% space. EVERYTIME I compress the vdisk what works on 6.7.2 won't work on 6.8 RC.

     

    Btw. a larger new drive is already on it's way. I will report back if the issue will be the same with a Samsung 970 1TB Evo Plus.

     

    Edited by bastl
    Link to comment

    Maybe stick to the 2 year old software versions then instead of trying to upgrade to newer tech that Highpoint hasn't kept up with?

    Link to comment

    I had 2 existing linux VM's in Unraid.  I upgraded to rc4, both VM's were created in the Unraid interface with no "hacking" using the available qcow option.

     

    Both corrupted very soon after upgrading to rc4.  (Fortunately I didn't boot my main VM's).

     

    I reinstalled Ubuntu, selected qcow2 and by the time it had come to reboot the newly installed VM it was already corrupted.

     

    Downgraded back to 6.7.2 and reinstalled using exactly the same options and the VM is behaving normally again.

    Link to comment

    Short info, qcow2 compressed vdisk sitting on the XFS formatted array shows the same error. Installing Mint or PopOS with an RAW image on the array went fine. As soon as I choose qcow2 in the template without even compressing it the installation will fail during the process. Only things I did:

     

    1. add VM

    2. select Linux template

    3. 4 cores, 4GB RAM

    4. 30GB disk qcow2 virtio

    5. select boot iso (Mint 19.1 xfce, PopOS 19.10)

    6. VNC set to german

    7. untick "start vm after creation"

     

    On RAW it installs fine on qcow2 it will abort during the process. This time there are no warnings or errors in the VM logs. Removing VM+disks in one case where I had a RAW and a QCOW2 image attached only removed the RAW vdisk.

    Link to comment

    @limetech Another test, this time Linux template changed to i440fx-4.1, rest is the same, new qcow2 vdisk also on the array as before. Same problem, the Mint installer aborts during the install.

     

    After that I did a couple more tests and I narrowed it down a bit. All following tests are done with default Linux template settings, 4cores, 4GB RAM, 20GB qcow2 virtio vdisks on the array, Mint 19.1 xfce. If I manually set cache='none' or cache='directsync' instead of the default cache='writeback' in the xml, I am able to install the OS without any issues. writethrough and unsafe producing the same problem as the default writeback. Installer will abort.

     

    Something is wrong in my opinion how the IO to the underlying device/filesystem is handled if using the default settings. I can't tell if it's a qemu thing or if something else in unraid is the culprit.

     

    Next test with vdisk on the NVME cache with cache='writeback' fresh installation works, but still my already existing compressed qcow2 files producing the same errors, no matter what I define for cache. Compressed qcow2 produces more IO from what I'am understanding, what in general shouldn't be an issue. It might slow the vm down a bit, but shouldn't end up in a corruption. 

     

    @Fizzyade Can you post your diagnostics so Limetech can have a deeper look into your system specs maybe? He said it might be my "Highpoint RocketNVME controller" causing this issue. I'am on an x399 Asrock Threadripper board and the affected NVME is a Samsung 960 Evo 500GB setup as cache drive but what my testings already showed it isn't only an issue on that drive. Vdisks on the array are also affected. Are you using similar hardware?

     

    Latest diagnostics if needed. New drive arrived, lets have a look.

    unraid-diagnostics-20191029-1307.zip

    Link to comment
    2 hours ago, Fizzyade said:

    I just checked and luckily I saved the diagnostics, although saying that I can't see anything in there relating to this.

     

    Whatever the issue is, downgrading to 6.7 makes it go away. 

    tower-diagnostics-20191027-0004.zip 191.88 kB · 0 downloads

    Thanks for the diags. Had a quick look into it. Not the error I saw at first opening this thread for. No NVME/SSD from Samsung, other array disks as me. Second gen Ryzen 2700 on a x470 board compared to my firstgen TR4 1950x on a x399 board. No "APST feature" warning, ok, you have no nvme's at all only 2 Seagate SSDs. Not sure what onboard controller is used for your sata connections. Highpoint? I don't know.

     

    But you described the same I saw when installing a VM on default qcow2 images. Sooner or later they get corrupted or install fails, right?

     

    I think we need a couple more people testing this @limetech

     

     

    Link to comment

    I found a couple different reports back to 2015 talking about file system corruptions on compressed qcow2 files. Let's hope there will be a fix as fast as possible. I guess there are a couple more users using qcow2 as their vdisks and if 6.8 releases to stable without a fix.....i don't even wanna think about it. fingers crossed

    Link to comment
    16 minutes ago, bastl said:

    6.8 releases to stable without a fix

    Can't hold back 6.8 release because of this.  Compressed qcow2 has never been an option in our VM manager.

    Link to comment
    16 minutes ago, limetech said:

    Can't hold back 6.8 release because of this.  Compressed qcow2 has never been an option in our VM manager.

    It has been for awhile.

    image.png.88e0c6de55e328a8dea7a936c7111262.png (selected qcow2 though the screenshot is of the default)

     

    image.png.24653769afff7b3e9c5e817aa7df3b8c.png

    Edited by jbartlett
    Link to comment
    1 minute ago, limetech said:

    That's not compressed.

    Ah, then it's a misunderstanding between that and dynamically expanding. I'll update the subject of my report.

    Link to comment
    11 minutes ago, jbartlett said:

    Ah, then it's a misunderstanding between that and dynamically expanding. I'll update the subject of my report.

    We can downgrade qemu from 4.1.x back to 4.0.x - think that will solve it?

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.