Server almost appears to be degrading with more use


Recommended Posts

Hi all, I'm having a real nightmare with Unraid lately. I had everything working on the Unraid version that had Linux kernel 5 in, stable as anything, no reboot bug on my AMD GPU, VM working flawlessly and then since that update was pulled and the next one reverted to kernel 4 the issues started.

 

My install is *currently* set at the unraid 6.9 beta to try and sit at what the 6.8 version was.

AMD Ryzen 3700x

MSI B450 Carbon AC Motherboard

Vega 56 GPU

8x drives

2x SSD for cache

1x NVMe M.2 used for passthrough

Creative Z Soundcard

LSI 9208 (i think) RAID card

 

Symptoms I currently have without the VM

Plugins page extremely slow to load

Random Dynamix errors in the log, tried to remove plugins that I thought were causing this but still get them, 3 examples:

/bin/sh: line 1: 18906 Segmentation fault      /etc/rc.d/rc.diskinfo --daemon &> /dev/null

/etc/cron.hourly/user.script.start.hourly.sh: line 2: 17357 Segmentation fault      /usr/local/emhttp/plugins/user.scripts/startSchedule.php hourly

/bin/sh: line 1: 19893 Segmentation fault      /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null

 

Symptoms with the VM

When starting it, sometimes locks the server up

If I do the AMD reboot script that spaceinvaderone made to drop the GPU and then put the server to sleep and then start the VM, sometimes works, mostly doesn't, sometimes works for a few hours then the server locks up

 

I posted a thread here

I didn't really get anywhere with it, tried the custom Vega/Navi patches, no changes to note. You can see in that thread that I tried a lot of things like a fresh USB stick

 

If I can get the VM to work for more than a day it will work until the next server reboot absolutely fine, and an example of the XML for that VM would be as follows

<?xml version='1.0' encoding='UTF-8'?>
<domain type='kvm'>
  <name>Windows 10</name>
  <uuid>b6aeda16-43c9-b72d-b2b2-f77ebbe8c9b3</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='12'/>
    <vcpupin vcpu='2' cpuset='5'/>
    <vcpupin vcpu='3' cpuset='13'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='14'/>
    <vcpupin vcpu='6' cpuset='7'/>
    <vcpupin vcpu='7' cpuset='15'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/b6aeda16-43c9-b72d-b2b2-f77ebbe8c9b3_VARS-pure-efi.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/isos/virtio-win-0.1.173-2.iso'/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:89:0f:af'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2b' slot='0x00' function='0x0'/>
      </source>
      <rom file='/mnt/user/isos/Powercolor.RXVega56.8176.170730.rom'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x2b' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x26' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc262'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1e7d'/>
        <product id='0x2dcd'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x2516'/>
        <product id='0x0004'/>
      </source>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
    </hostdev>
    <memballoon model='none'/>
  </devices>
</domain>

I'm at my wits end, I just do not have anything solid to point the issue at. I can't work it out. I can get the VM working sometimes but it's always some random series of rebooting, running the SIO script, starting the VM, praying.. Nothing consistenly works.

 

I'll add a diagnostics shortly if anyone reads this before I've added it..

 

 

Help?!

 

atlas-diagnostics-20200422-0857.zip

Edited by J89eu
Link to comment
17 minutes ago, Squid said:

You're overclocking your memory using the XMP / AMP memory profile.  All overclocks introduce instability into the system.  You should reconfigure to use it's rated speed of 2666MHz and then see if the issues continue

Hi, I can try that though my RAM is 3200MHz Corsair Vengeance LPX CMK32GX4M2Z3200C16

Edited by J89eu
Link to comment
18 minutes ago, Squid said:

Yeah, I know (it's in the diagnostics).  The RAM is 2666MHz which is overclocked to 3200MHz  All manufacturers lie when rating their memory.  Hence why they list the SPD speed (which is what the RAM is actually rated for) on their specs.

Ok, I did ran that at 2666MHz with XMP off and as soon as I started the VM which span for a while before the server ground to a halt and locked up again

3 minutes ago, jonathanm said:

It's not the memory that's the issue, it's the CPU and motherboard.

Why? I don't understand

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.