Jump to content

Bagpuss

Members
  • Posts

    87
  • Joined

  • Last visited

Posts posted by Bagpuss

  1. Just a quick message to say that I've solved the problem now. Mapping out the USB ports helped me a lot.

    I've now moved the unRAID USB stick to the 0e:00 USB controller, and am now able to passthrough both USB controllers on 09:00.

     

    I also made an elementary mistake when editing the XML to make the GPU a multifunction device. I put the multifunction='on' in the wrong place. Doh!

     

    I'm very happy now. 😀

     

    Just need to work out how to put my Windows install on the Sabrent Rocket NVMe drive now. Any pointers would be much appreciated.

     

  2. 10 minutes ago, meep said:

    Hmm, you say you can’t pass through 0e.00.0, and show an error that seems to indicate 0e.00.4. Are you saying you e tried 0e.00.3 and it produces the same error? And this is due to a known kernnal issue on the x570 platform?

     

    have you tried acs override? Could you drop in a pcie USB adapter and pass that through?

    Check this thread for more details on the problems with X570:

     

    I've tried all combinations I could think of, and it wouldn't work.

    I guess I could get a separate PCI USB adapter, but I'd prefer not to. One of the reasons I went with the Crosshair MB was for the extra USB ports (it's got 12).

     

  3. 19 minutes ago, meep said:

    Why pass through 0e.00.0? 0e.00.3 is the usb controller?

    It is, and according to other posts on here, there is a Linux kernel bug which causes this to fail on X570 platforms.

    The onboard audio controller is similarly affected. I'm told there is a patch for the audio controller, but I've not seen anything for the USB.

    In both cases, this isn't yet available in a 5.x kernel for unRAID.

     

  4. Hi All,

     

    I've recently build a new Ryzen 3950X system based on the Asus Crosshair VIII Hero (WiFi) motherboard (latest BIOS v1302).

    System has a Gigabyte RTX2080 OC graphics card, Mellanox ConnectX/2 dual NIC, 2x32GB Corsair Vengeance RBG Pro DIMMS, and 4 x SSD (2 x M.2 and 2 x SATA).

    I had some random lockup issues, but these are now fixed with the 6.0-beta1.

     

    I'm now trying to create a Windows 10 VM where I passthrough my RTX2080 and a USB controller.

    I initially had some major issues with the VM locking up, but from searching here, I discovered that you can't passthrough the onboard audio and one specific USB controller.

     

    I've used Skital's VFIO-PCI plugin (thanks Skital), and have now managed to successfully passthrough just the GPU, so I can now boot into Windows 10.

     

    My current config is as follows:

    IOMMU group 0:	[1022:1482] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
    IOMMU group 1:	[1022:1483] 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
    IOMMU group 2:	[1022:1483] 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
    IOMMU group 3:	[1022:1482] 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
    IOMMU group 4:	[1022:1482] 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
    IOMMU group 5:	[1022:1483] 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
    IOMMU group 6:	[1022:1482] 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
    IOMMU group 7:	[1022:1482] 00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
    IOMMU group 8:	[1022:1482] 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
    IOMMU group 9:	[1022:1484] 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
    IOMMU group 10:	[1022:1482] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
    IOMMU group 11:	[1022:1484] 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
    IOMMU group 12:	[1022:1484] 00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
    IOMMU group 13:	[1022:1484] 00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
    IOMMU group 14:	[1022:790b] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
    	[1022:790e] 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
    IOMMU group 15:	[1022:1440] 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0
    	[1022:1441] 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1
    	[1022:1442] 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2
    	[1022:1443] 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3
    	[1022:1444] 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4
    	[1022:1445] 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5
    	[1022:1446] 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6
    	[1022:1447] 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7
    IOMMU group 16:	[1987:5016] 01:00.0 Non-Volatile memory controller: Phison Electronics Corporation E16 PCIe4 NVMe Controller (rev 01)
    IOMMU group 17:	[1022:57ad] 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream
    IOMMU group 18:	[1022:57a3] 03:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    IOMMU group 19:	[1022:57a3] 03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    IOMMU group 20:	[1022:57a3] 03:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    IOMMU group 21:	[1022:57a3] 03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    IOMMU group 22:	[1022:57a3] 03:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    IOMMU group 23:	[1022:57a4] 03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    	[1022:1485] 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
    	[1022:149c] 09:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
    	[1022:149c] 09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
    IOMMU group 24:	[1022:57a4] 03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    	[1022:7901] 0a:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
    IOMMU group 25:	[1022:57a4] 03:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
    	[1022:7901] 0b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
    IOMMU group 26:	[1987:5012] 04:00.0 Non-Volatile memory controller: Phison Electronics Corporation E12 NVMe Controller (rev 01)
    IOMMU group 27:	[15b3:6750] 05:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
    IOMMU group 28:	[10ec:8125] 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller
    IOMMU group 29:	[8086:1539] 07:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
    IOMMU group 30:	[8086:2723] 08:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
    IOMMU group 31:	[10de:1e87] 0c:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)
    	[10de:10f8] 0c:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
    	[10de:1ad8] 0c:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
    	[10de:1ad9] 0c:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
    IOMMU group 32:	[1022:148a] 0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function
    IOMMU group 33:	[1022:1485] 0e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
    IOMMU group 34:	[1022:1486] 0e:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
    IOMMU group 35:	[1022:149c] 0e:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
    IOMMU group 36:	[1022:1487] 0e:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
    IOMMU group 37:	[1022:7901] 0f:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
    IOMMU group 38:	[1022:7901] 10:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

     

    For unRAID boot menu entry, I have:

    kernel /bzimage
    append video=efifb:off initrd=/bzroot

    With VFIO-PCI, I'm currently just passing through the GPU with:

    BIND=0c:00.0 0c:00.1 0c:00.2 0c:00.3

     

    Output from plugin is as follows:

    vfio-pci.thumb.jpg.7d14d3d9b0041e254aa5c74ef24034d9.jpg

     

    I know that I can't passthrough the 0e:00.0 as this will cause the VM to crash the system with the following errors:

    Apr 18 15:51:41 Tower kernel: vfio-pci 0000:0e:00.4: not ready 1023ms after FLR; waiting
    Apr 18 15:51:43 Tower kernel: vfio-pci 0000:0e:00.4: not ready 2047ms after FLR; waiting
    Apr 18 15:51:46 Tower kernel: vfio-pci 0000:0e:00.4: not ready 4095ms after FLR; waiting
    Apr 18 15:51:51 Tower kernel: vfio-pci 0000:0e:00.4: not ready 8191ms after FLR; waiting
    Apr 18 15:52:01 Tower kernel: vfio-pci 0000:0e:00.4: not ready 16383ms after FLR; waiting
    Apr 18 15:52:18 Tower kernel: vfio-pci 0000:0e:00.4: not ready 32767ms after FLR; waiting
    Apr 18 15:52:54 Tower kernel: vfio-pci 0000:0e:00.4: not ready 65535ms after FLR; giving up
    Apr 18 15:52:54 Tower kernel: clocksource: timekeeping watchdog on CPU16: Marking clocksource 'tsc' as unstable because the skew is too large:
    Apr 18 15:52:54 Tower kernel: clocksource:                       'hpet' wd_now: 9510c1c6 wd_last: 9357bec0 mask: ffffffff
    Apr 18 15:52:54 Tower kernel: clocksource:                       'tsc' cs_now: 19b24db3682 cs_last: 19a510945b0 mask: ffffffffffffffff
    Apr 18 15:52:54 Tower kernel: tsc: Marking TSC unstable due to clocksource watchdog
    Apr 18 15:52:54 Tower kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
    Apr 18 15:52:54 Tower kernel: sched_clock: Marking unstable (474034170098, -113600877)<-(474295664568, -375107169)

    I tried passing through just the 09:00.0 and 09:00.1 devices, but this prevented the unRAID USB stick from being recognised, as it's on 09:00.3.

    I'm now a bit stuck on how to passthrough the USB that I need. Do I have to use pcie_acs_override?

     

    My VM XML config is as follows:

    <?xml version='1.0' encoding='UTF-8'?>
    <domain type='kvm'>
      <name>Windows 10</name>
      <uuid>a0e0b769-3ca4-ea6f-e2b0-d80d9519c029</uuid>
      <metadata>
        <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
      </metadata>
      <memory unit='KiB'>16777216</memory>
      <currentMemory unit='KiB'>16777216</currentMemory>
      <memoryBacking>
        <nosharepages/>
      </memoryBacking>
      <vcpu placement='static'>16</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='17'/>
        <vcpupin vcpu='2' cpuset='2'/>
        <vcpupin vcpu='3' cpuset='18'/>
        <vcpupin vcpu='4' cpuset='3'/>
        <vcpupin vcpu='5' cpuset='19'/>
        <vcpupin vcpu='6' cpuset='4'/>
        <vcpupin vcpu='7' cpuset='20'/>
        <vcpupin vcpu='8' cpuset='5'/>
        <vcpupin vcpu='9' cpuset='21'/>
        <vcpupin vcpu='10' cpuset='6'/>
        <vcpupin vcpu='11' cpuset='22'/>
        <vcpupin vcpu='12' cpuset='7'/>
        <vcpupin vcpu='13' cpuset='23'/>
        <vcpupin vcpu='14' cpuset='8'/>
        <vcpupin vcpu='15' cpuset='24'/>
      </cputune>
      <os>
        <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
        <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
        <nvram>/etc/libvirt/qemu/nvram/a0e0b769-3ca4-ea6f-e2b0-d80d9519c029_VARS-pure-efi.fd</nvram>
      </os>
      <features>
        <acpi/>
        <apic/>
        <hyperv>
          <relaxed state='on'/>
          <vapic state='on'/>
          <spinlocks state='on' retries='8191'/>
          <vendor_id state='on' value='none'/>
        </hyperv>
      </features>
      <cpu mode='host-passthrough' check='none'>
        <topology sockets='1' cores='8' threads='2'/>
        <cache mode='passthrough'/>
        <feature policy='require' name='topoext'/>
      </cpu>
      <clock offset='localtime'>
        <timer name='hypervclock' present='yes'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>restart</on_crash>
      <devices>
        <emulator>/usr/local/sbin/qemu</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='raw' cache='writeback'/>
          <source file='/mnt/user/domains/Windows 10/vdisk1.img'/>
          <target dev='hdc' bus='virtio'/>
          <boot order='1'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw'/>
          <source file='/mnt/user/isos/Win10_1909_EnglishInternational_x64.iso'/>
          <target dev='hda' bus='ide'/>
          <readonly/>
          <boot order='2'/>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw'/>
          <source file='/mnt/user/isos/virtio-win-0.1.173-2.iso'/>
          <target dev='hdb' bus='ide'/>
          <readonly/>
          <address type='drive' controller='0' bus='0' target='0' unit='1'/>
        </disk>
        <controller type='pci' index='0' model='pci-root'/>
        <controller type='ide' index='0'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
        </controller>
        <controller type='virtio-serial' index='0'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
        </controller>
        <controller type='usb' index='0' model='ich9-ehci1'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci1'>
          <master startport='0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci2'>
          <master startport='2'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci3'>
          <master startport='4'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
        </controller>
        <interface type='bridge'>
          <mac address='52:54:00:01:81:be'/>
          <source bridge='br0'/>
          <model type='virtio'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
        </interface>
        <serial type='pty'>
          <target type='isa-serial' port='0'>
            <model name='isa-serial'/>
          </target>
        </serial>
        <console type='pty'>
          <target type='serial' port='0'/>
        </console>
        <channel type='unix'>
          <target type='virtio' name='org.qemu.guest_agent.0'/>
          <address type='virtio-serial' controller='0' bus='0' port='1'/>
        </channel>
        <input type='tablet' bus='usb'>
          <address type='usb' bus='0' port='1'/>
        </input>
        <input type='mouse' bus='ps2'/>
        <input type='keyboard' bus='ps2'/>
        <sound model='ich9'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
        </sound>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
          </source>
          <rom file='/mnt/disk1/isos/TU104.stripped.rom'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
        </hostdev>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x0c' slot='0x00' function='0x1'/>
          </source>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
        </hostdev>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x0c' slot='0x00' function='0x2'/>
          </source>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
        </hostdev>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x0c' slot='0x00' function='0x3'/>
          </source>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x3'/>
        </hostdev>
        <memballoon model='none'/>
      </devices>
    </domain>

     

    I've mapped out the USB ports, and they look like this:

     

    867838813_CrosshairUSBMapping.thumb.png.9baa7040649c86ed2462a5d571e1da2e.png

     

    Ultimately, I want to use this as my gaming VM, and so need to passthrough enough USB for the mouse, keyboard and Oculus Rift sensors.

     

    Would really appreciate some pointers on where to go next.

     

    Thanks,

     

    Andy.

  5. Hi All,

     

    Just wondering if anyone can help me?

     

    I've recently build a new Ryzen 3950X system based on the Asus Crosshair VIII Hero (WiFi) motherboard (latest BIOS v1302).

    System has a Gigabyte RTX2080 OC graphics card, Mellanox ConnectX/2 dual NIC, 2x32GB Corsair Vengeance RBG Pro DIMMS, and 4 x SSD (2 x M.2 and 2 x SATA).

    The system has been working fine with a native install of Windows 10 1909. I've run Karhu ram test and OCCT for 24 hours and seen no errors. I've also run multiple different benchmarks (3DMark, Cinebench R20, Blender, AIDA64 stress test and Prime95) with no issues. Maximum CPU temperature during any of these runs was around 71C. I'm not overclocking the CPU and am running the ram as it's rated 3600MHz.

     

    Once I was certain that the build was stable under Windows, I wanted to test out performance in unRAID. I'm aiming to replace several Synology units with unRAID and also do some GPU passthrough for gaming.

     

    I started off by working my way through SpaceInvader One's tutorial on setting up a Windows VM without passthrough. This worked fine, and I was about to embark on GPU passthrough when unRAID paniced. I've attached a screenshot showing the panic strings from the console. When the system crashed, no VMs were running, and I was just clicking between the tabs in the WebUI.

     

    Following the crash, the whole system wouldn't even POST. I was getting a 0d error in the Q-Code readout on the motherboard (which is documented as being for future expansion), and the RAM error LED was lit orange on the motherboard. Googling reveals that others have seen this error on the previous versions of the Crosshair motherboard, but I couldn't find anything specific to the X570 variant that I'm running with.

     

    In order to diagnose, I removed the Corsair memory, and installed 2 x 8GB T-Force Xtreem DIMMs. This caused the system to POST again, and I was able to reset the CMOS and get things up and running again. I then ran 24 hours of RAM and stress tests with these DIMMs under WIndows, which didn't show any problems. I then re-installed the Corsair memory, repeated the tests and still didn't see a problem.

     

    I'd really appreciate peoples thoughts on what I should do next, as I really need stability in unRAID if I'm going to replace my Synology systems.

    I've seen suggestions that some tweaks are needed with Ryzen, but I largely thought that these were no longer required with the 3950X. Should I consider any of the following:

     

    - Change "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar).

    - Add RCU callbacks parameter to syslinux file.

    - Run latest beta with 5.x kernel.

     

    I've also attached the diagnostic output from the server, in case anyone wants to take a look.

     

    Thanks,

     

    Andy.

     

    unraid-panic-smaller.jpg

    tower-diagnostics-20200418-1614-anonymous.zip

    • Like 1
  6. Very interesting drive.  It is not behaving in a standard way, so my comments should be treated as low confidence, as I've never seen a drive like this.

     

    Current_Pending_Sector count did not change, did not increase because long test does not write, but I was hoping it would drop, perhaps even clear completely.  Reallocated_Event_Count increased by 58, yet there are no remapped sectors and no logged errors.  Offline_Uncorrectable increased by 112.

     

    Your previous Preclear caused Current_Pending_Sector count to increase by 112, Reallocated_Event_Count to increase by 183 (an average of 61 per Preclear pass), and Offline_Uncorrectable to increase by 336 (112 times 3, 112 per Preclear pass).

     

    It's very difficult to trust this drive, but I can't actually say it's bad, with no errors logged and no critical attribute values (those marked Prefail).

     

    Try one more Preclear.  You might also look for a firmware update from Samsung.

     

    I'm glad to hear you say that. I was genuinely confused by what it's doing.

     

    I'll give it one more preclear, and see what happens. I've not had chance to look for a firmware update, so I'll give that a try too.

    At the end of the day, it's a recycled drive from an old Dell laptop, so no great loss if it doesn't work.

     

    Thanks again for your help.

     

  7.  

    I can't help being a little suspicious of these SMART numbers.  Attributes 196, 197, and 198 are all increasing by significant amounts, yet there are no remapped sectors and no SMART errors logged.  It says 375 hours on the drive, is that plausible?  I note that it indicates there have been 3 times as many power cycles of the drive as there have been operational hours.  That is, for every hour it has been on, it thinks it has been turned on 3 times each hour?!?  I suppose that is possible if this was in a laptop used for a number of short sessions (turn on, check email, turn off), or perhaps aggressive power-saving (quick turn off of hard drive when idle).

     

    I recommend running a SMART long test, to see if it will reset some of the SMART attributes.  Then post a subsequent SMART report.  If we can get it to reset, and it looks OK, then you will need at least one more Preclear before you can trust this drive.

     

    Thanks for getting back to me, Rob.

     

    With regards to the operational hours etc., then it's entirely possible that these figures are accurate, as the drive was a pull from a defunct primary (elementary) school laptop. It's quite likely that it could have been turned on and off regulary in very short sessions.

     

    I'll try running a long SMART test, and see what happens.

     

  8. Hi All,

     

    I've just done a preclear on a 120GB 2.5" drive that I was intending to use as a cache drive in a small unRAID system that I'm building for a friend.

    I've done 3 pre-clear cycles on the drive, and on each subsequent cycle, the number of sectors pending re-allocation has increased by 112.

     

    I've attached the pre-clear reports, and would appreciate an opinion on the condition of this drive.

     

    Thanks,

     

    Andy.

    preclear_rpt__S14PJD0Q752009_2013-07-25.txt

    preclear_start__S14PJD0Q752009_2013-07-25.txt

    preclear_finish__S14PJD0Q752009_2013-07-25.txt

  9. Hi All,

     

    Just had a slightly suspect pre-clear report (my first, thankfully) and I'd really appreciate an opinion on this particular disk.

    I was planning to run another pre-clear to see if I get the same result, but haven't done this yet.

     

    My concern is around some UNC reads and writes that are logged on the drive.

     

    I've included the pre and post SMART results, as well as the pre-clear report.

    The thing that I'm concerning about is that the sectors pending re-allocation have gone down following pre-clear.

    I was sort of expecting that these sectors would have been re-allocated.

     

    pre-clear report

    ========================================================================1.13
    == invoked as: ./preclear_disk.sh -A /dev/sdc
    ==  WDC WD30EZRX-00MMMB0    WD-WMAWZXXXXXXX
    == Disk /dev/sdc has been successfully precleared
    == with a starting sector of 1 
    == Ran 1 cycle
    ==
    == Using :Read block size = 8225280 Bytes
    == Last Cycle's Pre Read Time  : 9:16:43 (89 MB/s)
    == Last Cycle's Zeroing time   : 9:02:10 (92 MB/s)
    == Last Cycle's Post Read Time : 21:30:38 (38 MB/s)
    == Last Cycle's Total Time     : 39:50:31
    ==
    == Total Elapsed Time 39:50:31
    ==
    == Disk Start Temperature: 24C
    ==
    == Current Disk Temperature: 32C, 
    ==
    ============================================================================
    ** Changed attributes in files: /tmp/smart_start_sdc  /tmp/smart_finish_sdc
                    ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
              Seek_Error_Rate =   100     200            0        ok          0
          Temperature_Celsius =   120     128            0        ok          32
    No SMART attributes are FAILING_NOW
    
    7 sectors were pending re-allocation before the start of the preclear.
    7 sectors were pending re-allocation after pre-read in cycle 1 of 1.
    0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
    0 sectors are pending re-allocation at the end of the preclear,
        a change of -7 in the number of sectors pending re-allocation.
    0 sectors had been re-allocated before the start of the preclear.
    0 sectors are re-allocated at the end of the preclear,
        the number of sectors re-allocated did not change. 
    ============================================================================
    

     

     

    I've attached the SMART reports, as they were too large to be posted as part of the message content.

     

    Any thoughts on this would be much appreciated.

     

    Andy.

    preclear_finish.txt

    preclear_start.txt

  10. Hi there,

     

    im although looking for a way to set the server to sleep.

     

    Im running a Plex server and thats the problem i think. the library of the plex server is on the cache drive. i think thats the only solution.

    The plex process makes the cache drive always to be spun up. and ten the server does not go to sleep. im using the Sleep Mode of SF.

     

    Is there a way to ignore the cache drive or better to look for network traffic but not have the quest to have exactly 0 bytes of traffic. That will never be. Everytime the ethernet card will have some bytes of traffic.

    If i am using the server with streaming to a plex client or copying some files oder reading some files, there are always some megabytes in a minute. Not only some kb. That  must be enough to know if the server can go to sleep oder not.

     

    Does the SF Cache Dir plugin make some drive activity? Does the "Wait for array inactivity" option looks for the cache drive?

     

    Sorry about the questions, but there are no infos about the Sleep Mode used in SF and how to use it right.

     

    Hi Julian,

     

    I've spotted the same problems with this script on my unRAID box.

    I've attached a fixed version, which ignores activity on the cache drive. I've not noticed any side effects of this on my machine, and I've been running the script for 3 months now.

     

    Also, the checking for idle network activity is broken, and has been from the start.

     

    The current test for idle network is:

    TCP=$(bwm-ng -o csv -c 1 -d 0 -T avg | grep eth0 | cut -d";" -f5)

     

    However, this fails to take into account that the avg function in bwm-ng needs sample data over multiple seconds to calculate an average.

    When you repeatedly run this command in a shell, you will find that 7 times out of 10, it returns 0.00, even when the network is very busy.

     

    I was finding that my server was going to sleep at really odd times, when I knew that downloading/uploading was in full flow.

     

    Expecting 0.00 for average network activity is unrealistic, unless the machine is unplugged from the network. There is always some kind of network housekeeping going on, even in a very small network.

     

    To fix the problem, I've changed the test to this:

                    TCP=$(bwm-ng -o csv -c 30 -d 0 -t 1000 -T avg | grep eth0 | cut -d";" -f5 | tail -1 | sed 's/.\{3\}$//')

     

    This does two things. Firstly, we move to a 1 second sample interval, and take 30 samples to calculate our average.

    Secondly, I've moved to integers for reporting the network activity. The fractions were not significant in determining activity, so it seemed pointless to test them.

     

    The complete new function is as follows:

     

    check_TCP_activity() {
            if [ "$checkTCP" = $yes ]
            then
                    # Previous test for this failed to acknowledge that first value from bwm-ng is nearly always 0
                    # when using -c 1. A true average is only calculated if you let bwm-ng run multiple times.
                    # New test sets sample interval to 1 second and takes 30 samples, returning the last one for testing.
                    # On my system, even when completely idle, you still see approx 5-6KB/s total network activity.
                    # Expectng 0.00 activity is unrealistic, unless machine is unplugged from the network.
                    # Have also moved to integers, as the fractions are unimportant in this test.
    
                    TCP=$(bwm-ng -o csv -c 30 -d 0 -t 1000 -T avg | grep eth0 | cut -d";" -f5 | tail -1 | sed 's/.\{3\}$//')
            else
                    TCP="$noTCP"
            fi
            echo $TCP
    }
    

     

    In addition, I've also added a couple of other features:

     

    1) Debug logging

     

    Possible options are:

     

    # Enable debug logging

    # debug=0 - no logging

    # debug=1 - logs to syslog and auto_s3_sleep.log

    # debug=2 - logs to syslog

    # debug=3 - logs to auto_s3_sleep.log

    # debug=4 - log to console

     

    Simply edit the script, and change the debug= line to one of the values above (depending on what you want).

    The default is 'debug=1'.

     

    2) Shell activity checking.

     

    Options are set in these lines:

    checkSSH=$yes # check for any SSH connections

    checkSHELL=$no # check for any locally logged in sessions (if "no" allows console debugging)

     

    This allows you to prevent the machine from sleeping if someone is logged in over ssh, at the console, or both.

    The default is to prevent sleep if someone is logged in over SSH, but allow it if they are logged in at the console.

     

    I've not had a single 'false' sleep, since making these changes.

     

    I've also attached a simple script which runs the 'sleep' checks, and reports on whether the system will go to sleep when the countdown timer expires.

    I keep meaning to add a 'test' option to the main script, but I've never got around to it.

     

    Hope this is useful for you.

     

    Andy.

    auto_s3_sleep.sh.zip

    test_sleep_conditions.sh.zip

×
×
  • Create New...