unRAID Server Release 6.2.0-beta21 Available


Recommended Posts

Look at your VM mappings... some of those shares don't even exist.

 

you had it mapped to /mnt/user/system/libvirt it should have been /mnt/user/libvirt from what I can see you changed it to to get it working

 

 

/mnt/disk8/libvirt = /mnt/user/libvirt as far as the system cares.

Link to comment
  • Replies 545
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Please note that I awoke this morning to find that my daily backup of the array had failed overnight. On checking I found that all mapped shares to the array had been dropped on Windows Explorer, using a network PC.

 

I was able to access the GUI, and on trying to access the mapped drives manually, they have suddenly become accessable again

 

Please find find attached diagnostic file which I obtained this morning.

tower-diagnostics-20160411-0643.zip

Link to comment

Please note that I awoke this morning to find that my daily backup of the array had failed overnight. On checking I found that all mapped shares to the array had been dropped on Windows Explorer, using a network PC.

 

I was able to access the GUI, and on trying to access the mapped drives manually, they have suddenly become accessable again

 

Please find find attached diagnostic file which I obtained this morning.

 

Not sure what time your backup starts, but I noticed in your logs either the network was unplugged or the switch/router was reset that's connect to your unRAID box:

Apr 10 18:51:16 Tower kernel: e1000e: eth0 NIC Link is Down
Apr 10 18:51:17 Tower ntpd[1573]: Deleting interface #2 eth0, 192.168.1.10#123, interface stats: received=75, sent=75, dropped=0, active_time=7137 secs
Apr 10 18:51:17 Tower ntpd[1573]: 192.168.1.200 local addr 192.168.1.10 -> <null>
Apr 10 18:52:14 Tower kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

Link to comment

I made reference to this issue in another post located here, but thought better to focus on the issues related to OVMF and beta 21 that I'm having here.

http://lime-technology.com/forum/index.php?topic=48241.0

 

When I try to transfer files around the shares from the VM's vdisks or access them from the primary/secondary vdisk, my Win10 OVMF-440 VM crashes, looses ability to find files, crashes unraid, etc. For the first time tonight I was able to successfully transfer 22GB from a share to the primary vdisk without this sort of crash. However, shortly afterwards while trying to access and update the game located in those folders, I started getting multiple: file does not exist, cannot find specified files, etc. type of errors.

 

Attached is the diagnostics file from after when the VM started acting strangely.

 

For ease of use, I will also repost the xml and VM log as well.

 

The system is...

MB- MSI X99A SLI Plus

CPU- Intel Xeon E5 2670 V3

Mem- 2x8GB Kingston DDR4-2133

GPU- (2) MSI GTX960 GAMING 4G

SSD- (1) 250GB SK hynix (cache)

HDD-(2) 3TB Seagate (parity and storage)

 

I have Dockers activated, but nothing installed. Only the 2 default plugins, and have only added the vrom tag to the XML file.

 

XML

 

<domain type='kvm' id='1'>
  <name>Win10OVMF</name>
  <uuid>685fc4b3-40bb-64df-d571-cdf37b27f929</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>7340032</memory>
  <currentMemory unit='KiB'>7340032</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>10</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='14'/>
    <vcpupin vcpu='6' cpuset='15'/>
    <vcpupin vcpu='7' cpuset='16'/>
    <vcpupin vcpu='8' cpuset='17'/>
    <vcpupin vcpu='9' cpuset='18'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.5'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
    <nvram>/etc/libvirt/qemu/nvram/685fc4b3-40bb-64df-d571-cdf37b27f929_VARS-pure-efi.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor id='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='5' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/user/vdisks/Win10OVMF/vdisk1.img'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/ISOs/OS iso/Windows 10 Pro  64bit.iso'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/ISOs/virtio iso/virtio-win-0.1.113.iso'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='sata0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:1f:4d:ff'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/0'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/0'>
      <source path='/dev/pts/0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-Win10OVMF/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/boot/vbios.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x056e'/>
        <product id='0x0035'/>
        <address bus='5' device='3'/>
      </source>
      <alias name='hostdev2'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1c4f'/>
        <product id='0x0002'/>
        <address bus='5' device='8'/>
      </source>
      <alias name='hostdev3'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </memballoon>
  </devices>
</domain>

 

VM log

 

2016-04-11 10:46:00.476+0000: starting up libvirt version: 1.3.1, qemu version: 2.5.1, hostname: Beast
LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/sbin/qemu -name Win10OVMF -S -machine pc-i440fx-2.5,accel=kvm,usb=off,mem-merge=off -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=none -drive file=/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/etc/libvirt/qemu/nvram/685fc4b3-40bb-64df-d571-cdf37b27f929_VARS-pure-efi.fd,if=pflash,format=raw,unit=1 -m 7168 -realtime mlock=on -smp 10,sockets=1,cores=5,threads=2 -uuid 685fc4b3-40bb-64df-d571-cdf37b27f929 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-Win10OVMF/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-hpet -no-shutdown -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x7 -device ahci,id=sata0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/mnt/usio-pci,host=03:00.0,id=hostdev0,bus=pci.0,addr=0x6,romfile=/boot/vbios.rom -device vfio-pci,host=03:00.1,id=hostdev1,bus=pci.0,addr=0x8 -device usb-host,hostbus=5,hostaddr=3,id=hostdev2 -device usb-host,hostbus=5,hostaddr=8,id=hostdev3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -msg timestamp=on
Domain id=1 is tainted: high-privileges
Domain id=1 is tainted: host-cpu
char device redirected to /dev/pts/0 (label charserial0)
2016-04-11T12:02:53.561741Z qemu-system-x86_64: terminating on signal 15 from pid 6261
2016-04-11 12:02:53.756+0000: shutting down

beast-diagnostics-20160411-1950.zip

Link to comment

I have a Docker container that always says "update ready" in the Unraid Docker page.  I can update it 6 times in a row and it will still list that.  When I do run the update, the logs say that the container is up to date and does not download anything, just re-launches it.

 

The Docker is the official Postgres Docker https://hub.docker.com/_/postgres/

 

I am running 6.2.0 beta 21 but this has been happening since at least 6.2.0 beta 18 (when I upgraded to the beta).

 

Diagnostics attached

aeris-diagnostics-20160411-1055.zip

Link to comment

I am creating a 2nd server - testing it for now, then to put data on once i trust it.  I have  a few questions if someone can help.

1- Unraid irc channel says broken - should i just wait for beta 22?  At the moment all i need to do is load it up with data and samba share, no vms, at some point i would use docker containers.

2- I precleared 4 seagate 8tb harvested from usb enclosure drives 2 cycles each.  After that i made 1 a parity and 3 assigned as data, then i just tried to mount, it appears that there are some errors/warnings, are these errors normal in that the server sees drives aren't formatted and formats them?  I see a number of other errors in my syslog as well, is there something wrong?

3- It is doing a parity sync/data rebuild, at 2.2%, i know that a parity sync is normal when creating an array for first time, but is it normal to take so long when there is no actual data?

 

If someone has time to look at my log/answer questions i would appreciate it, before i move forward with transferring data over and enabling my 2nd unraid pro key on this trial.

 

pipe-diagnostics-20160411-1152.zip

Link to comment

2- I precleared 4 seagate 8tb harvested from usb enclosure drives 2 cycles each.  After that i made 1 a parity and 3 assigned as data, then i just tried to mount, it appears that there are some errors/warnings, are these errors normal in that the server sees drives aren't formatted and formats them?  I see a number of other errors in my syslog as well, is there something wrong?

It is normal that new drives are seen as unformattted.

3- It is doing a parity sync/data rebuild, at 2.2%, i know that a parity sync is normal when creating an array for first time, but is it normal to take so long when there is no actual data?

Parity has no idea what is on the disks and as such is unaware of the data that is on them at the file system level.    It just sees the disk as a bunch of sectors that need protecting against failure of anyy disk, so yes on any new system the whole of each disk will be read as part of creating parity.

Link to comment

2- I precleared 4 seagate 8tb harvested from usb enclosure drives 2 cycles each.  After that i made 1 a parity and 3 assigned as data, then i just tried to mount, it appears that there are some errors/warnings, are these errors normal in that the server sees drives aren't formatted and formats them?  I see a number of other errors in my syslog as well, is there something wrong?

It is normal that new drives are seen as unformattted.

3- It is doing a parity sync/data rebuild, at 2.2%, i know that a parity sync is normal when creating an array for first time, but is it normal to take so long when there is no actual data?

Parity has no idea what is on the disks and as such is unaware of the data that is on them at the file system level.    It just sees the disk as a bunch of sectors that need protecting against failure of anyy disk, so yes on any new system the whole of each disk will be read as part of creating parity.

 

would this email/popup be normal - ? Event: unRAID Parity disk error

Subject: Warning [PIPE] - Parity disk, parity-sync in progress

Description: ST8000AS0002-1NA17Z_Z8408NKD (sdb)

Importance: warning

 

Link to comment

This morning my GUI was unresponsive, and I had to power off/power on to get it working again.  I was able to telnet into the box and get a diag (attached).  I'm suspecting (without much data to base it on) that it had something to do with my plex docker.  I tried to stop the plex docker from the command line (docker stop plex) and it would just hang the command line until I hit ctrl-c.

 

Diags attached.  Would appreciate input as I've found this to be less reliable than I would like. I had to do the power toggle a couple of times yesterday too.

unraid-diagnostics-20160411-0912.zip

Link to comment

I have a Docker container that always says "update ready" in the Unraid Docker page.  I can update it 6 times in a row and it will still list that.  When I do run the update, the logs say that the container is up to date and does not download anything, just re-launches it.

 

The Docker is the official Postgres Docker https://hub.docker.com/_/postgres/

 

I am running 6.2.0 beta 21 but this has been happening since at least 6.2.0 beta 18 (when I upgraded to the beta).

 

Diagnostics attached

 

I have the same issue with the "mysql" app. Likely due to the single name/no user repository.

 

https://registry.hub.docker.com/_/mysql/

Link to comment

I have a Docker container that always says "update ready" in the Unraid Docker page.  I can update it 6 times in a row and it will still list that.  When I do run the update, the logs say that the container is up to date and does not download anything, just re-launches it.

 

The Docker is the official Postgres Docker https://hub.docker.com/_/postgres/

 

I am running 6.2.0 beta 21 but this has been happening since at least 6.2.0 beta 18 (when I upgraded to the beta).

 

Diagnostics attached

 

I have the same issue with the "mysql" app. Likely due to the single name/no user repository.

 

https://registry.hub.docker.com/_/mysql/

 

We plan on fixing the update issue for official Docker images for a future beta.

Link to comment

I awoke this morning and found that my arrray had been sucessfully backed up overnight @ 2am, also mapped drives were also shown on Windows Explorer.

 

I have copied a file into one of the shares, and deleted it Ok, however if I then try and access any further folders & files in any of the other shares, there is a time delay of several seconds before they appear?

 

Please find attached diagnostics file

tower-diagnostics-20160412-0625.zip

Link to comment

I awoke this morning and found that my arrray had been sucessfully backed up overnight @ 2am, also mapped drives were also shown on Windows Explorer.

 

I have copied a file into one of the shares, and deleted it Ok, however if I then try and access any further folders & files in any of the other shares, there is a time delay of several seconds before they appear?

 

Please find attached diagnostics file

The Dynamix Cache Dirs plugin will help to alleviate this
Link to comment

This morning my GUI was unresponsive, and I had to power off/power on to get it working again.  I was able to telnet into the box and get a diag (attached).  I'm suspecting (without much data to base it on) that it had something to do with my plex docker.  I tried to stop the plex docker from the command line (docker stop plex) and it would just hang the command line until I hit ctrl-c.

 

Diags attached.  Would appreciate input as I've found this to be less reliable than I would like. I had to do the power toggle a couple of times yesterday too.

 

I'm in the same situation.  I've tried disabling all plugins, vm's and dockers but it still doesn't improve the lock ups.  The system becomes unresponsive with a load (according to top) of >50.  Iotop doesn't show any IO activity and top says the CPU is not busy, yet load remains extremely high.  Dmesg doesn't have anything of interest, with the last messages being about spindowns.  The system will not shut down once the load gets that high, so I have to resort to powering off.

 

I should also mention that I can connect through telnet while this is happening, but depending on what command I issue the session will lock up.  For example a "btrfs fi sh" will never return.

 

I also noticed that any significant concurrent IO will bring on the problem quickly, which made me wonder if perhaps there is some kind of deadlock/race condition happening with the new dual parity code.  Totally unsubstantiated (sorry Tom, I'm not trying to point fingers!), just offering my uninformed guess at least.  My other thought was maybe BTRFS was dying under the concurrent IO, but then again BTRFS was solid when I transferred the 70+ TB from my ZFS disks onto BTRFS so I could move back to Unraid.  I did that using Ubuntu 16.04 Beta which used the 4.4 kernel as well, and had 3 disks copying at the same time (from 3 other disks, not thrashing).  Average throughput on the hardware saturated a SATA2 connection and I never had a lockup in the two weeks it took me to move the data.

 

All 27 data drives in the array are formatted with BTRFS and are spread across two Norco 24 bay enclosures using an Intel SAS expander.  This setup was reliable using Ubuntu 14.04 and ZoL so I know the hardware is solid.  Something just needs to be tweaked a little to make it reliable.

 

Also, FWIW, heavy IO that brings on the lock up was not using any of the drives on the expander.  It was done by moving data from one drive to another using mc in an ssh session and having NzbGet working on uncompressing a large 200GB download.

 

Diags are attached and you can PM me if you want me to test anything for you...

 

Thanks to the LimeTech staff and volunteers for all your efforts!

unmedia-diagnostics-20160412-0734.zip

Link to comment

I previously had Cache Dir installed, and the problem was happening. I went back to bare metal to try and remove plugins in my testing.

 

Sometimes instead of a delay, I get a hard lockup of Terracopy / Windows Explorer

 

You might try turning off SMB2&3 on your client PC (assuming it's Win10?) and see if that helps.  6.2 includes an updated Samba, but I had issues with it that seemed to go away when I turned off SMB2&3 on my Win10 machine.

 

Link to comment

I previously had Cache Dir installed, and the problem was happening. I went back to bare metal to try and remove plugins in my testing.

 

Sometimes instead of a delay, I get a hard lockup of Terracopy / Windows Explorer

Little tip:  it helps to quote responses in this thread because there's many conversations going on at once.  PM me if you're running docker apps.  I have a possible theory but need someone to try it.  (and have been trying to justify it to myself).

 

Sent from my LG-D852 using Tapatalk

 

Link to comment

Did a little more testing related to the high load/unresponsive server tonight.  I re-enabled all the disabled dockers and queued up some downloads.  I now have a par2 repair stuck in the download queue that puts enough IO strain on the server to have it lock up within 10 minutes of booting.  I've rebooted 3 times to ensure that it will lock up consistently.

 

I then did something daring (or maybe stupid  :o ) to eliminate the possibility of it being the dual parity.  I unassigned both my parity drives and rebooted the server to see if it would lock up.  The good news is that it locked up with 10 minutes of booting, so I'm now assuming it does not have anything to do with the new/changed dual parity code (sorry for doubting you Tom!).  The bad news is I'm more stumped than ever as to what it could be.

 

Below is what top and iotop are reporting at roughly the same time.  The server is sitting in an unresponsive state right now, although my previously connected ssh sessions continue to update the top and iotop screens.  You'll notice that the top command shows high load and the wa figure indicates its waiting on IO of some sort.  But the iotop doesn't show any significant disk use.  In fact there is no disk use and if I leave it long enough the drives spin down as per their settings (seen in syslog and dmesg).

 

I'm not enough of a Linux guru to figure out where to look next, so if anyone has suggestions on what next steps could be, please pass them along.  Thanks!

 

top:

top - 20:53:21 up 42 min,  3 users,  load average: 38.53, 38.31, 32.92
Tasks: 1031 total,   2 running, 1029 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.7 us,  9.2 sy,  0.0 ni,  0.0 id, 80.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32989816 total, 17111804 free,  1649940 used, 14228072 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 30570024 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21618 nobody    20   0  513616  11372   3148 R  99.7  0.0  29:55.60 /usr/bin/par2 r /incomplete-d+
18167 nobody    20   0  432220 162120  37060 S  60.2  0.5  17:19.48 ./Plex Media Server
7785 root      20   0   83092  21380   7160 S   6.9  0.1   2:42.74 /usr/bin/python /usr/sbin/iot+
8696 root      20   0   25892   4212   2468 R   1.0  0.0   0:22.84 top
  292 root      39  19       0      0      0 S   0.3  0.0   0:00.22 [khugepaged]
11685 root      20   0   25772   3880   2368 S   0.3  0.0   0:21.00 top
    1 root      20   0    4372   1640   1532 S   0.0  0.0   0:07.00 init
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.03 [kthreadd]
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.21 [ksoftirqd/0]
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/0:0H]
    7 root      20   0       0      0      0 S   0.0  0.0   0:01.18 [rcu_preempt]
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [rcu_sched]
    9 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [rcu_bh]
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 [migration/0]
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 [migration/1]
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.08 [ksoftirqd/1]
   14 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/1:0H]
   15 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 [migration/2]
   16 root      20   0       0      0      0 S   0.0  0.0   0:00.02 [ksoftirqd/2]
   18 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/2:0H]
   19 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 [migration/3]
   20 root      20   0       0      0      0 S   0.0  0.0   0:00.07 [ksoftirqd/3]
   21 root      20   0       0      0      0 S   0.0  0.0   0:00.09 [kworker/3:0]
   22 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/3:0H]
   23 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 [migration/4]
   24 root      20   0       0      0      0 S   0.0  0.0   0:00.06 [ksoftirqd/4]
   26 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/4:0H]
   27 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 [migration/5]
   28 root      20   0       0      0      0 S   0.0  0.0   0:00.02 [ksoftirqd/5]
   30 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/5:0H]

 

iotop:

Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
4945 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.03 % [kworker/u16:12]
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
    7 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_preempt]
    8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
   10 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
   11 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
   12 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
   14 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0H]
   15 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
   16 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
   18 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:0H]
   19 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
   20 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
   21 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0]
   22 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0H]
   23 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/4]
   24 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/4]
   26 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/4:0H]
   27 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/5]
   28 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/5]
   30 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/5:0H]
   31 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/6]
   32 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/6]
   33 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/6:0]
   34 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/6:0H]
   35 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/7]

 

dmesg | tail -n 50:

[  646.361780] eth0: renamed from veth85dfe66
[  646.370907] docker0: port 4(veth71faf9b) entered forwarding state
[  646.370923] docker0: port 4(veth71faf9b) entered forwarding state
[  646.466434] docker0: port 2(veth415f089) entered forwarding state
[  648.772820] device veth6254b33 entered promiscuous mode
[  648.772972] docker0: port 5(veth6254b33) entered forwarding state
[  648.772988] docker0: port 5(veth6254b33) entered forwarding state
[  648.773860] docker0: port 5(veth6254b33) entered disabled state
[  651.842532] docker0: port 3(vethcda3321) entered forwarding state
[  653.168843] eth0: renamed from veth3c62677
[  653.173696] docker0: port 5(veth6254b33) entered forwarding state
[  653.173712] docker0: port 5(veth6254b33) entered forwarding state
[  661.378664] docker0: port 4(veth71faf9b) entered forwarding state
[  667.530740] BTRFS info (device loop1): disk space caching is enabled
[  667.530744] BTRFS: has skinny extents
[  668.226770] docker0: port 5(veth6254b33) entered forwarding state
[  668.917968] BTRFS info (device loop1): new size for /dev/loop1 is 1073741824
[  668.925642] tun: Universal TUN/TAP device driver, 1.6
[  668.925643] tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
[  670.188941] device virbr0-nic entered promiscuous mode
[  670.303816] virbr0: port 1(virbr0-nic) entered listening state
[  670.303830] virbr0: port 1(virbr0-nic) entered listening state
[  670.326386] virbr0: port 1(virbr0-nic) entered disabled state
[ 2500.891496] mdcmd (63): spindown 19
[ 2501.318623] mdcmd (64): spindown 21
[ 2503.464412] mdcmd (65): spindown 9
[ 2503.868055] mdcmd (66): spindown 10
[ 2504.154809] mdcmd (67): spindown 11
[ 2505.158121] mdcmd (68): spindown 14
[ 2505.585249] mdcmd (69): spindown 17
[ 2507.589672] mdcmd (70): spindown 1
[ 2508.026666] mdcmd (71): spindown 3
[ 2509.029710] mdcmd (72): spindown 4
[ 2509.456308] mdcmd (73): spindown 8
[ 2510.460182] mdcmd (74): spindown 12
[ 2510.897515] mdcmd (75): spindown 13
[ 2511.184286] mdcmd (76): spindown 15
[ 2511.471072] mdcmd (77): spindown 16
[ 2511.757819] mdcmd (78): spindown 20
[ 2513.185802] mdcmd (79): spindown 5
[ 2514.473687] mdcmd (80): spindown 6
[ 2518.143213] mdcmd (81): spindown 26
[ 2520.572942] mdcmd (82): spindown 2
[ 2527.584083] mdcmd (83): spindown 22
[ 2533.878293] mdcmd (84): spindown 23
[ 2535.306545] mdcmd (85): spindown 7
[ 2535.593328] mdcmd (86): spindown 18
[ 2536.022367] mdcmd (87): spindown 24
[ 2536.449974] mdcmd (88): spindown 25
[ 2536.736169] mdcmd (89): spindown 27

 

tail -n 50 /var/log/syslog

Apr 12 20:22:19 unmedia root: Starting libvirtd...
Apr 12 20:22:19 unmedia kernel: tun: Universal TUN/TAP device driver, 1.6
Apr 12 20:22:19 unmedia kernel: tun: (C) 1999-2004 Max Krasnyansky <[email protected]>
Apr 12 20:22:19 unmedia emhttp: nothing to sync
Apr 12 20:22:19 unmedia rc.unRAID[18670][18674]: Processing /etc/rc.d/rc.unRAID.d/ start scripts.
Apr 12 20:22:20 unmedia kernel: device virbr0-nic entered promiscuous mode
Apr 12 20:22:21 unmedia avahi-daemon[12607]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
Apr 12 20:22:21 unmedia avahi-daemon[12607]: New relevant interface virbr0.IPv4 for mDNS.
Apr 12 20:22:21 unmedia avahi-daemon[12607]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered listening state
Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered listening state
Apr 12 20:22:21 unmedia dnsmasq[19079]: started, version 2.75 cachesize 150
Apr 12 20:22:21 unmedia dnsmasq[19079]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: DHCP, sockets bound exclusively to interface virbr0
Apr 12 20:22:21 unmedia dnsmasq[19079]: reading /etc/resolv.conf
Apr 12 20:22:21 unmedia dnsmasq[19079]: using nameserver 192.168.10.1#53
Apr 12 20:22:21 unmedia dnsmasq[19079]: read /etc/hosts - 2 addresses
Apr 12 20:22:21 unmedia dnsmasq[19079]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered disabled state
Apr 12 20:38:02 unmedia sshd[24198]: Accepted none for root from 192.168.10.248 port 50879 ssh2
Apr 12 20:52:51 unmedia kernel: mdcmd (63): spindown 19
Apr 12 20:52:52 unmedia kernel: mdcmd (64): spindown 21
Apr 12 20:52:54 unmedia kernel: mdcmd (65): spindown 9
Apr 12 20:52:54 unmedia kernel: mdcmd (66): spindown 10
Apr 12 20:52:54 unmedia kernel: mdcmd (67): spindown 11
Apr 12 20:52:55 unmedia kernel: mdcmd (68): spindown 14
Apr 12 20:52:56 unmedia kernel: mdcmd (69): spindown 17
Apr 12 20:52:58 unmedia kernel: mdcmd (70): spindown 1
Apr 12 20:52:58 unmedia kernel: mdcmd (71): spindown 3
Apr 12 20:52:59 unmedia kernel: mdcmd (72): spindown 4
Apr 12 20:53:00 unmedia kernel: mdcmd (73): spindown 8
Apr 12 20:53:01 unmedia kernel: mdcmd (74): spindown 12
Apr 12 20:53:01 unmedia kernel: mdcmd (75): spindown 13
Apr 12 20:53:01 unmedia kernel: mdcmd (76): spindown 15
Apr 12 20:53:02 unmedia kernel: mdcmd (77): spindown 16
Apr 12 20:53:02 unmedia kernel: mdcmd (78): spindown 20
Apr 12 20:53:03 unmedia kernel: mdcmd (79): spindown 5
Apr 12 20:53:05 unmedia kernel: mdcmd (80): spindown 6
Apr 12 20:53:08 unmedia kernel: mdcmd (81): spindown 26
Apr 12 20:53:11 unmedia kernel: mdcmd (82): spindown 2
Apr 12 20:53:18 unmedia kernel: mdcmd (83): spindown 22
Apr 12 20:53:24 unmedia kernel: mdcmd (84): spindown 23
Apr 12 20:53:26 unmedia kernel: mdcmd (85): spindown 7
Apr 12 20:53:26 unmedia kernel: mdcmd (86): spindown 18
Apr 12 20:53:26 unmedia kernel: mdcmd (87): spindown 24
Apr 12 20:53:27 unmedia kernel: mdcmd (88): spindown 25
Apr 12 20:53:27 unmedia kernel: mdcmd (89): spindown 27
Apr 12 21:01:01 unmedia sshd[27286]: Accepted none for root from 192.168.10.248 port 51560 ssh2

top.png.c889cf2848caba62e76c561f6369d36b.png

iotop.png.92cd749b88defd18020c8df835831191.png

Link to comment

Did a little more testing related to the high load/unresponsive server tonight.  I re-enabled all the disabled dockers and queued up some downloads.  I now have a par2 repair stuck in the download queue that puts enough IO strain on the server to have it lock up within 10 minutes of booting.  I've rebooted 3 times to ensure that it will lock up consistently.

...[snipped]...

tail -n 50 /var/log/syslog

Apr 12 20:22:19 unmedia rc.unRAID[18670][18674]: Processing /etc/rc.d/rc.unRAID.d/ start scripts.

Apr 12 20:22:20 unmedia kernel: device virbr0-nic entered promiscuous mode

Apr 12 20:22:21 unmedia avahi-daemon[12607]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.

Apr 12 20:22:21 unmedia avahi-daemon[12607]: New relevant interface virbr0.IPv4 for mDNS.

Apr 12 20:22:21 unmedia avahi-daemon[12607]: Registering new address record for 192.168.122.1 on virbr0.IPv4.

Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered listening state

Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered listening state

Apr 12 20:22:21 unmedia dnsmasq[19079]: started, version 2.75 cachesize 150

Apr 12 20:22:21 unmedia dnsmasq[19079]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify

Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h

Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: DHCP, sockets bound exclusively to interface virbr0

Apr 12 20:22:21 unmedia dnsmasq[19079]: reading /etc/resolv.conf

Apr 12 20:22:21 unmedia dnsmasq[19079]: using nameserver 192.168.10.1#53

Apr 12 20:22:21 unmedia dnsmasq[19079]: read /etc/hosts - 2 addresses

Apr 12 20:22:21 unmedia dnsmasq[19079]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses

Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: read /var/lib/libvirt/dnsmasq/default.hostsfile

Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered disabled state

Apr 12 20:38:02 unmedia sshd[24198]: Accepted none for root from 192.168.10.248 port 50879 ssh2

I'm in no way an expert here, but what's curious is that the internal bridge is set up, then disabled.  That would leave anything using it hanging.  That may not be important though, as nothing has had time to begin using it.

 

Some possible steps to alter what's happening, try removing dnsmasq from the equation (don't know if you can).  And try disabling avahi, just to see if anything changes.  And perhaps try it without the internal bridging.

Link to comment
Guest
This topic is now closed to further replies.