unRAID Server Release 6.2.0-beta20 Available

jude · March 30, 2016

Upgraded to 6.2.0-beta20 and array would not start due to four disks missing. I have a diagnostics file from before and after the upgrade.

My understanding from reading through the previous release posts is that this could be related to the marvel SATA controller that is integrated into this motherboard. Four drives are connected to the Marvel controllers. Prior to the upgrade all disks have functioned properly and this has never been a problem in the past. I checked my Bios firmware and I believe that I am on the most recent version for the GA-990FXA-UD5.

I have an LSI SATA SAS2008 RAID controller card plugged into the motherboard that has been flashed into IT mode that currently does not have any drives connected to it.

Should I power down and connect the drives currently attached to the Marvel SATA controllers to the SAS2008? (I have not tried this card - just flashed it recently and then installed it)

Or should I downgrade the software and then look at switching those drives over to the SAS2008 card?

Is there some way of upgrading the driver for the Marvel SATA controller? I would still like to be able to use those ports and they have always worked well in the past.

Jude,

If you are willing to test this again, I'd like you try booting up beta18 and see if the issue happens. I know you've tested 6.1.x and 6.2-beta20, but we added support for AMD IOMMUv2 in beta 19 that wasn't there in beta 18. Would like to know if this has anything to do with it given that the key event in your logs on the beta20 test was this:

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a80440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a80450 flags=0x0070]

...

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9ae0440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9ae0450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9b00440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9b00450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60450 flags=0x0070]

What's interesting is that 09:00.1 nor 03:00.1 exist in your lspci (the parent device does, but not the function). I'm no expert, but I think the marvell controllers use a virtual device of their own that conflicts with IOMMU/DMA. Perhaps this issue doesn't present itself with IOMMUv1, but IOMMUv2 it does. Please report back after testing to let us know.

I will test that and get back to you.

Squid · March 30, 2016

Minor little display aberration with my daily status reports. Notification system keeps telling me that my array of 13 drives is nice and healthy. Trouble is that I only have 12 drives. Presumably, its including my non-existent parity 2 drive in the calculations

server_a-diagnostics-20160330-0034.zip

Furby8704 · March 30, 2016

Minor little display aberration with my daily status reports. Notification system keeps telling me that my array of 13 drives is nice and healthy. Trouble is that I only have 12 drives. Presumably, its including my non-existent parity 2 drive in the calculations

+1

i have 13 drives (1 Parity + 10 Data + 2 Cache Pool) and notification shows 14 drives are nice and healthy

media-diagnostics-20160329-2347.zip

Dimtar · March 30, 2016

I apologize if this has already been covered but do Parity drives no longer spin down in 6.2beta, is this correct?

JorgeB · March 30, 2016

I apologize if this has already been covered but do Parity drives no longer spin down in 6.2beta, is this correct?

They spin down as in v6.1, if using dual parity both will spin up for writes.

Ja9 · March 30, 2016

I think there is a problem when you run more than 1 vdisk for a VM, and you have the vdisk's on different disks..

I sat up a Windows 2012 R2 VM with OS vdisk on cache, and data vdisk on array.. Worked fine, untill I copied something to the data vdisk.. had to use the reset button to get it up and running again.. Since I could not stop array, and reboot command did not work.. said it was going down for a reboot, but nothing happened.

Tried it multiple times.. and every time it copied about 1-2GB to the data vdisk, before it stopped.. and did not want to shut down..

I did not try this setup in 6.1.9.. only on 6.2 beta 18 and 19..

After that I deleted the VM.. and set it up again.. this time with only one vdisk (just made one extra partition when I installed Windows 2012 R2) and I also let unRaid deside where to put the vdisk image.

Working fine for me now

It has crashed suddenly.. but atleast I did not have to use the reset button again.

As an FYI, I run one of my main VMs here with two virtual disks attached to it (one in a btrfs cache pool and one in the array). I haven't noticed any issues and even just tried copying data from one to the other and back and haven't seen any problems.

Now it could be that one of your storage devices on the array is having problems. Have you tried running a filesystem check?

I did a readcheck, and it did not find any problems.

And I don't think that any of my disks have problems.. before i started using unraid, I ran Windows Server 2012 R2, with both stablebit drivepool and scanner.. The disks was scanned every 30 days or so And I think stablebit scanner scanned the whole surface every time

But as I mentioned.. It works fine now, that I only use one vdisk image.

bonienl · March 30, 2016

Minor little display aberration with my daily status reports. Notification system keeps telling me that my array of 13 drives is nice and healthy. Trouble is that I only have 12 drives. Presumably, its including my non-existent parity 2 drive in the calculations

+1

i have 13 drives (1 Parity + 10 Data + 2 Cache Pool) and notification shows 14 drives are nice and healthy

Correction made, will be available in next release.

jude · March 30, 2016

Docker updates worked perfectly but now I am having some issues getting my Win10 VM updated.

I went into the VM tab and attempted to "edit" the VM (Win10ProIsolCPUs) so that it would update to the new settings. I may have done something wrong during that process because now the VM wont start and their are errors displayed on the VM tab. See screen capture.

I have copies of my old XML file and copies of the Win10 disk image. Should I attempt to fix this current "edit" or would it be better to use a template and import the existing Win10 disk image and then make changes to the generated XML if needed?

edit

I went into terminal and Virsh to see if I could start the VM from there. It reported that the VM started and when I turned on my TV the VM was being passed through Audio and Video as well as USB controller and attached devices. See screen shots.

The VM tab is still showing errors and the dashboard is no longer showing my working Dockers or VM's

Diagnostics attached

Couldn't add the diagnostics file to the previous post so here it is

Jude,

Please try booting into safe mode and report back if the errors persist.

Restarted in Safe mode same errors showing on VM tab page (see screen cap)

Clicking on the Windows icon brings up the menu but it will not start the VM.

Using the Virsh utility I was not able to start the VM

root@Tower:~# virsh
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # start Win10ProIsolCPUs
error: Failed to start domain Win10ProIsolCPUs
error: internal error: process exited while connecting to monitor: 2016-03-29T16:49:35.257351Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: vfio: error, group 13 is not viable, please ensure all devices within the iommu_group are bound to their vfio bus driver.
2016-03-29T16:49:35.257415Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: vfio: failed to get group 13
2016-03-29T16:49:35.257439Z qemu-system-x86_64: -device vfio-pci,host=08:00.0,id=hostdev3,bus=pci.2,addr=0x7: Device initialization failed

virsh #

XML

<domain type='kvm' id='1'>
  <name>Win10ProIsolCPUs</name>
  <uuid>0c2749f8-96cd-1238-0965-6f9d33c9758f</uuid>
  <metadata>
    <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>5</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='6'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-2.3'>hvm</type>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor id='none'/>
    </hyperv>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='5' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='hypervclock' present='yes'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/local/sbin/qemu</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <source file='/mnt/disk/vmdisk/Win10Pro/vdisk1.img'/>
      <backingStore/>
      <target dev='hdc' bus='virtio'/>
      <boot order='1'/>
      <alias name='virtio-disk2'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/ISO Library Share/Windows.iso'/>
      <backingStore/>
      <target dev='hda' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/user/VirtIO Drivers/virtio-win-0.1.109.iso'/>
      <backingStore/>
      <target dev='hdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='nec-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:69:22:69'/>
      <source bridge='br0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-Win10ProIsolCPUs/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Diagnostics attached

Edit

Booted up again in normal mode. Errors still on VM tab but able to use the Windows icon menu to start the VM

Ok, you need to boot up in safe mode again, but please comment out these lines from your go file (or make a backup and revert to the stock go file):

cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c 

sleep 30; blockdev --setra 2048 /dev/md*
unraid_notify start

#
# Set up daily SSD cache trim for unRaid cron
#
fromdos < /boot/custom/DailyTrim > /etc/cron.daily/DailyTrim
chmod +x /etc/cron.daily/DailyTrim

#### Snap outside array disk mount and share
/boot/config/plugins/snap/snap.sh -b

### Mount and share snap disk
/boot/config/plugins/snap/snap.sh -m vmdisk
/boot/config/plugins/snap/snap.sh -ms vmdisk

I probably need to add this to the OP of this thread, but if anyone has an issue with the beta that has customized their installation with plugins, scripts, or other modifications, you need to strip down to a stock setup before posting an issue. Safe mode is ok to use if all you have are plugins, but in your case, you have heavily customized your setup and almost any of those things could be causing this breakage. If the issue persists, the next step I will resort to is having you recreate your libvirt.img file, manually copying the XML for your VMs to safe place before you do so.

OK so I made the changes to the Go file so that it is running as stock.

I rebooted into Safe mode. Their are a number of errors on the VM tab. Most of them relate to the missing SSD drive that the VM images are stored on (I am now using the Unassigned plugin to mount that drive). The error that shows whether booting in Safe mode or not is there at the end of the list see screen cap.

I will revert to Beta18 to check on the Marvell controller issue and see if this has any effect on the VM errors. Otherwise I would be happy to try what ever else you suggest. Thanks

jude · March 30, 2016

Upgraded to 6.2.0-beta20 and array would not start due to four disks missing. I have a diagnostics file from before and after the upgrade.

My understanding from reading through the previous release posts is that this could be related to the marvel SATA controller that is integrated into this motherboard. Four drives are connected to the Marvel controllers. Prior to the upgrade all disks have functioned properly and this has never been a problem in the past. I checked my Bios firmware and I believe that I am on the most recent version for the GA-990FXA-UD5.

I have an LSI SATA SAS2008 RAID controller card plugged into the motherboard that has been flashed into IT mode that currently does not have any drives connected to it.

Should I power down and connect the drives currently attached to the Marvel SATA controllers to the SAS2008? (I have not tried this card - just flashed it recently and then installed it)

Or should I downgrade the software and then look at switching those drives over to the SAS2008 card?

Is there some way of upgrading the driver for the Marvel SATA controller? I would still like to be able to use those ports and they have always worked well in the past.

Jude,

If you are willing to test this again, I'd like you try booting up beta18 and see if the issue happens. I know you've tested 6.1.x and 6.2-beta20, but we added support for AMD IOMMUv2 in beta 19 that wasn't there in beta 18. Would like to know if this has anything to do with it given that the key event in your logs on the beta20 test was this:

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a80440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a80450 flags=0x0070]

...

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9ae0440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9ae0450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9b00440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=09:00.1 domain=0x0000 address=0x00000000a9b00450 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60440 flags=0x0070]

Mar 28 08:41:02 Tower kernel: AMD-Vi: Event logged [iO_PAGE_FAULT device=03:00.1 domain=0x0000 address=0x00000000a9a60450 flags=0x0070]

What's interesting is that 09:00.1 nor 03:00.1 exist in your lspci (the parent device does, but not the function). I'm no expert, but I think the marvell controllers use a virtual device of their own that conflicts with IOMMU/DMA. Perhaps this issue doesn't present itself with IOMMUv1, but IOMMUv2 it does. Please report back after testing to let us know.

I can't get Beta18 from the link on the Beta18 announcement page and the Lime-Tech download page only lists Beta20 could you send me a github link. Thanks

jphipps · March 30, 2016

I have been having an issue with the past 2 beta's with loosing connectivity to the server and the console becoming mostly unresponsive. I did happen to get the syslog copied onto the flash drive before rebooting, but couldn't get the diagnostic run because of the UI not responding.

syslog.zip

RobJ · March 30, 2016

I have been having an issue with the past 2 beta's with loosing connectivity to the server and the console becoming mostly unresponsive. I did happen to get the syslog copied onto the flash drive before rebooting, but couldn't get the diagnostic run because of the UI not responding.

System boots fine and starts the array without issue, then is quiet for about 90 minutes, then suddenly at Mar 29 21:48:02, something goes wrong with the Realtek NIC, and a Call Trace is reported.

Mar 29 21:48:02 Dumpster kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x18c/0x1f2()

Mar 29 21:48:02 Dumpster kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

Mar 29 21:48:02 Dumpster kernel: Modules linked in: md_mod mxm_wmi powernow_k8 kvm_amd kvm k8temp r8169 mii mvsas libsas ahci scsi_transport_sas libahci pata_amd wmi acpi_cpufreq

Mar 29 21:48:02 Dumpster kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.6-unRAID #1

Mar 29 21:48:02 Dumpster kernel: Hardware name: 113 1/113-M2-E113, BIOS 6.00 PG 09/30/2008

Mar 29 21:48:02 Dumpster kernel: 0000000000000000 ffff88013fc83dd0 ffffffff813688da ffff88013fc83e18

Mar 29 21:48:02 Dumpster kernel: 000000000000012f ffff88013fc83e08 ffffffff8104a28a ffffffff81552498

Mar 29 21:48:02 Dumpster kernel: ffff88013a430000 ffff880095a52400 ffff88013a4303a0 0000000000000001

Mar 29 21:48:02 Dumpster kernel: Call Trace:

Mar 29 21:48:02 Dumpster kernel: <IRQ> [<ffffffff813688da>] dump_stack+0x61/0x7e

...[snipped]...

Mar 29 21:48:02 Dumpster kernel: ---[ end trace 3bb4eae1ef92424f ]---

Mar 29 21:48:02 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

Mar 29 21:48:14 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

Mar 29 21:48:23 Dumpster kernel: rpc-srv/tcp: nfsd: sent only 125556 when sending 262276 bytes - shutting down socket

Mar 29 21:48:44 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

Mar 29 21:49:59 Dumpster kernel: CIFS VFS: Server 192.168.0.67 has not responded in 120 seconds. Reconnecting...

Mar 29 21:50:20 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

There's no previous link down message, but there are almost nothing but link up messages for the rest of the syslog until attempted shutdown. You'll notice that the link up messages are all at intervals of multiples of 6 seconds. They start at somewhat random 6 second intervals, but quickly settle into a series of 42 seconds, then 48 seconds,then they stay almost completely at 60, 66, and 72 second intervals until the end.

It's too soon to conclude that the Realtek or its driver is defective, but I suspect that if you replaced it with an Intel NIC, you would not see these issues.

jphipps · March 30, 2016

I have been having an issue with the past 2 beta's with loosing connectivity to the server and the console becoming mostly unresponsive. I did happen to get the syslog copied onto the flash drive before rebooting, but couldn't get the diagnostic run because of the UI not responding.

System boots fine and starts the array without issue, then is quiet for about 90 minutes, then suddenly at Mar 29 21:48:02, something goes wrong with the Realtek NIC, and a Call Trace is reported.

Mar 29 21:48:02 Dumpster kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x18c/0x1f2()

Mar 29 21:48:02 Dumpster kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

Mar 29 21:48:02 Dumpster kernel: Modules linked in: md_mod mxm_wmi powernow_k8 kvm_amd kvm k8temp r8169 mii mvsas libsas ahci scsi_transport_sas libahci pata_amd wmi acpi_cpufreq

Mar 29 21:48:02 Dumpster kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.6-unRAID #1

Mar 29 21:48:02 Dumpster kernel: Hardware name: 113 1/113-M2-E113, BIOS 6.00 PG 09/30/2008

Mar 29 21:48:02 Dumpster kernel: 0000000000000000 ffff88013fc83dd0 ffffffff813688da ffff88013fc83e18

Mar 29 21:48:02 Dumpster kernel: 000000000000012f ffff88013fc83e08 ffffffff8104a28a ffffffff81552498

Mar 29 21:48:02 Dumpster kernel: ffff88013a430000 ffff880095a52400 ffff88013a4303a0 0000000000000001

Mar 29 21:48:02 Dumpster kernel: Call Trace:

Mar 29 21:48:02 Dumpster kernel: <IRQ> [<ffffffff813688da>] dump_stack+0x61/0x7e

...[snipped]...

Mar 29 21:48:02 Dumpster kernel: ---[ end trace 3bb4eae1ef92424f ]---

Mar 29 21:48:02 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

Mar 29 21:48:14 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

Mar 29 21:48:23 Dumpster kernel: rpc-srv/tcp: nfsd: sent only 125556 when sending 262276 bytes - shutting down socket

Mar 29 21:48:44 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

Mar 29 21:49:59 Dumpster kernel: CIFS VFS: Server 192.168.0.67 has not responded in 120 seconds. Reconnecting...

Mar 29 21:50:20 Dumpster kernel: r8169 0000:05:00.0 eth0: link up

There's no previous link down message, but there are almost nothing but link up messages for the rest of the syslog until attempted shutdown. You'll notice that the link up messages are all at intervals of multiples of 6 seconds. They start at somewhat random 6 second intervals, but quickly settle into a series of 42 seconds, then 48 seconds,then they stay almost completely at 60, 66, and 72 second intervals until the end.

It's too soon to conclude that the Realtek or its driver is defective, but I suspect that if you replaced it with an Intel NIC, you would not see these issues.

I think I have an Intel NIC laying around. What would you suggest as the next course of action, revert back to 6.1 to test out the Realtek, or test out the Intel under 6.2?

Thanks,

Jeff

RobJ · March 30, 2016

I think I have an Intel NIC laying around. What would you suggest as the next course of action, revert back to 6.1 to test out the Realtek, or test out the Intel under 6.2?

I assume you have already tested the Realtek with 6.1? It would be useful to know if replacing the Realtek under 6.2 clears up the networking issues.

jphipps · March 30, 2016

I assume you have already tested the Realtek with 6.1? It would be useful to know if replacing the Realtek under 6.2 clears up the networking issues.

Yeah, it is the one on the motherboard and have been using with unRaid since 5.x without any issues. Ill try the Intel card tonight and see how it works out.

Thanks for your help..

jowe · March 30, 2016

What speeds can you expect during dual parity sync? I used to have around 80mb/s average. Starting at 120mb/s. This sync starts at 40-45mb/s. At 5% now, saying 19h to go!

Is that normal?

JorgeB · March 30, 2016

What speeds can you expect during dual parity sync? I used to have around 80mb/s average. Starting at 120mb/s. This sync starts at 40-45mb/s. At 5% now, saying 19h to go!

Is that normal?

For Intel CPUs there's a penalty of ~15%, but only when you're CPU limited, wouldn't expect that with your setup.

AMD CPUs appear to take a larger hit, ~50% in the ones I tested.

jowe · March 30, 2016

For Intel CPUs there's a penalty of ~15%, but only when you're CPU limited, wouldn't expect that with your setup.

AMD CPUs appear to take a larger hit, ~50% in the ones I tested.

Ok, the cpu looks like its running 30-50% in Dashboard.

I'll let it finish anyway.

JorgeB · March 30, 2016

Ok, the cpu looks like its running 30-50% in Dashboard.

I'll let it finish anyway.

Default tunable values are not optimized for the LSI, try these (you can change while the check is running and see if it improves):

Settings > Disk Settings

Frank1940 · March 30, 2016

What speeds can you expect during dual parity sync? I used to have around 80mb/s average. Starting at 120mb/s. This sync starts at 40-45mb/s. At 5% now, saying 19h to go!

Is that normal?

For my Test Bed server (spec's below) the time was 7:51:0 +/- 2 minutes running 6.1.*. With dual parity, it was 7:55:02. As you can see that system has a CPU that about as low as you can go on the AMD totem pole. (However, I did ran the optimization script on it and used its output to reset the md tunables.)

I would say that you might have a problem. Have a look at the smart reports for your drives as a starting point.

jowe · March 30, 2016

Default tunable values are not optimized for the LSI, try these (you can change while the check is running and see if it improves):

Settings > Disk Settings

That helped, went up to 50-55mb/s. Only one of parity disks are on LSI, the other on mobo.

jowe · March 30, 2016

For my Test Bed server (spec's below) the time was 7:51:0 +/- 2 minutes running 6.1.*. With dual parity, it was 7:55:02. As you can see that system has a CPU that about as low as you can go on the AMD totem pole. (However, I did ran the optimization script on it and used its output to reset the md tunables.)

I would say that you might have a problem. Have a look at the smart reports for your drives as a starting point.

I´ve checked all disk, nothing seems to be wrong there. Maybe if i stop the parity check, and run diskspeed script to se if anything is not ok.

Where can i find that optimization script? Edit: Found it!

Frank1940 · March 30, 2016

For my Test Bed server (spec's below) the time was 7:51:0 +/- 2 minutes running 6.1.*. With dual parity, it was 7:55:02. As you can see that system has a CPU that about as low as you can go on the AMD totem pole. (However, I did ran the optimization script on it and used its output to reset the md tunables.)

I would say that you might have a problem. Have a look at the smart reports for your drives as a starting point.

I´ve checked all disk, nothing seems to be wrong there. Maybe if i stop the parity check, and run diskspeed script to se if anything is not ok.

Where can i find that optimization script? Edit: Found it!

Before I get any more PM's....

http://lime-technology.com/forum/index.php?topic=29009.0

jphipps · March 30, 2016

I think I have an Intel NIC laying around. What would you suggest as the next course of action, revert back to 6.1 to test out the Realtek, or test out the Intel under 6.2?

I assume you have already tested the Realtek with 6.1? It would be useful to know if replacing the Realtek under 6.2 clears up the networking issues.

I must have used my Intel card in another machine. Guess if it happens again, Ill have to try reverting back to 6.1...

jargo · March 31, 2016

How long does it normally take to upgrade? It has been saying "syncing - please wait..." for about 30 min now, starting to think something is wrong.

CHBMB · March 31, 2016

I've been running the Betas since the start and have now finally implemented dual parity. All seems to be working fine, good work LT.

unRAID Server Release 6.2.0-beta20 Available

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

limetech

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment