Rhynri

Members
  • Posts

    68
  • Joined

  • Last visited

Report Comments posted by Rhynri

  1. I wrote a rather in-depth reply then accidentally deleted it and there is no undelete. 

     

    Suffice to say moving the VM to the other NUMA node reduced the incidence of the problem and improved the rendering performance of the VM in question.  It's still not gone but I think a lot of the remaining NUMA misses are related to unraid caching things, which is hardly a priority operation:

     

    numastat
                               node0           node1
    numa_hit              2773556844      1684914320
    numa_miss                6233397       193845232
    numa_foreign           193845232         6233397
    interleave_hit             84430           84643
    local_node            2773481539      1684881326
    other_node               6308702       193878226

    Starting from a clean boot and looking at numastat when booting the two important VMs yields very few numa_miss (es) relative to the previous configuration.  This is after 8 days of uptime.

     

    @limetech - If you could please include lstopo in a future release I'd greatly appreciate it.  I linked a slackware build for hwloc in a previous post in this thread if that helps.  There are a few BIOS settings relating to IOMMU allocation in relation to the CCX's on Threadripper and I'd like to do some A/B testing with lstopo to see what if any difference they make.  As I mentioned in that reply, it would also potentially be a useful addition to the System Devices page.  Please and thank you for your time and effort in making Unraid OS awesome.

    • Like 1
  2. 4 hours ago, testdasi said:

    @Rhynri: so does <numatune> work at all?

    It looks like it's trying to work.  It will slow down the startup significantly and cause the numa misses to skyrocket. I've since discovered that only one of my VMs behaves this way.  I'm wondering if I can move that one to the other node it keeps trying to allocate memory on and see if that fixes the issue.  Does anyone know if it matters which cores are isolated?  Say, if i want to move my isolated cores to the beginning (0-11 physical), instead of at the end (4-15 physical) if unraid cares at all?

  3. I've been looking into this, and I think it may have something to do with which NUMA node the GPU is on. I was able to force correct NUMA allocations by changing the memory size of my node0 VM to neatly fill the available memory on that node, then booting the remaining two, but that results in a super lopsided memory allocation (28,16,8), and it's a very manual process. 

     

    I'm going to be asking around the VFIO community to see if there is anything I've been overlooking.

     

    I've been trying to install hwloc (slackbuild link) into unraid so I can have access to the very useful

    lstopo

    which would let me know which node(s) my pcie devices are on.  I keep running in to compilation issues, however, so I'm going to keep working on that.  However, the lstopo output as a standalone would be something very useful to have on the tools page as it gives you a very good idea of what devices are nested for pass-through... it's arguably as useful as anything on the [Tools]>[System Devices] page in terms of pass-through usage.  I've also attached an image of what the lstopo gui output looks like.

     

    Example (not my system):

    # lstopo
    Machine (256GB)
      NUMANode L#0 (P#0 128GB)
        Socket L#0 + L3 L#0 (20MB)
          L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
          L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2)
          L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4)
          L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6)
          L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8)
          L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10)
          L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12)
          L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14)
        HostBridge L#0
          PCIBridge
            PCI 1000:005d
              Block L#0 "sda"
          PCIBridge
            PCI 14e4:16a1
              Net L#1 "eth0"
            PCI 14e4:16a1
              Net L#2 "eth1"
            PCI 14e4:16a1
              Net L#3 "eth2"
            PCI 14e4:16a1
              Net L#4 "eth3"
          PCI 8086:8d62
          PCIBridge
            PCIBridge
              PCIBridge
                PCIBridge
                  PCI 102b:0534
          PCI 8086:8d02
            Block L#5 "sr0"
      NUMANode L#1 (P#1 128GB)
        Socket L#1 + L3 L#1 (20MB)
          L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#1)
          L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#3)
          L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#5)
          L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#7)
          L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#9)
          L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#11)
          L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#13)
          L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)
        HostBridge L#7
          PCIBridge
            PCI 15b3:1003
              Net L#6 "eth4"
              Net L#7 "eth5"

     

    image.png

  4. NUMA daemon source

     

    As for the webterminal, once it has enough text to get a decent scroll back the scrolling gets choppy and the typing lags a little. I do use a fairly old MacBook Air and chrome to access unraid, but it’s not something I noticed last build.  It’s possible it’s just that machine being goofy too.

     

    I haven’t had time to research the issue fully, but I’ll look into it tomorrow and let you know if I find any suggestions. 

  5. I have a 1950.  I’m using all the back USB ports for the Unraid OS itself and then I pass individual controllers off a Sonnet card for the VMs.  I have the wireless and Bluetooth disabled because it cleans up the pass through for the rest, although I’d love for unraid to be able to use the wireless for additional network redundancy.  

     

    In the bios I have NUMA set up and then additional tweaks to the pcie setup because I’m splitting the bottom slot between the sonnet card and a U.2. 

  6. 9 hours ago, Benson said:

    01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller [1022:43ba] (rev 02)

        Subsystem: ASMedia Technology Inc. Device [1b21:1142]
        Kernel driver in use: xhci_hcd

     

    06:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:2142]
        Subsystem: ASUSTeK Computer Inc. Device [1043:8756]
        Kernel driver in use: xhci_hcd

     

    You have several XHCI controller in your system, if I am correct TR / Ryzen should have a USB come from CPU ( but MFG may not use it ). Does all have same problem ? Suggest try differnet USB port too.

    The ASMedia USB header requires a specific plug that I don't have available, so it is enabled but not currently used.

    The rest of the USBs (minus my Sonnet USB card, which is for VM use only and isolated at boot) I believe stem from the AMD chipset.  All available USBs seem to exhibit the problem.

     

     

    2 hours ago, Jerky_san said:

    Since you have a Zenith Extreme we have the same board.. Mine works yours doesn't so lets do a few experiments. Do you have a chasis usb3.0 cable that can connect the onboard USB3.0 boards to external ports. In your manual #16 for motherboard list. If you do then plug the usb drive for unraid into a usb plug off of that. Further if your chasis has usb2.0 ports you can use header #26 like I did.

    In my previous reply, I mention that the front panel USB works for M/KB with new bios revision.  While this isn't terribly obvious (sorry about that), front panel connections are always via MB header.  While I haven't attempted to use my front panel USB 2.0 ports to boot from, I'd argue that if that does in fact solve the problem it's not a solution, as if you don't have USB 2.0 header ports available it won't work, and it's probably a symptom of a bigger problem.

     

    Also, what TR do you have and what board revision?  Use:

     dmidecode --type 2

    And you should get back something like mine:

    Manufacturer: ASUSTeK COMPUTER INC.
            Product Name: ROG ZENITH EXTREME
            Version: Rev 1.xx
            Serial Number: 170706217200585

     

  7. I was able to get a little farther this time, in that the front panel USB worked, meaning I had a functional keyboard.  I was able to get a USB key to mount, and copied off the contents of the /var/log directory, figuring that'd be the most useful.

     

    I then tried booting with the UNRAID drive in this port, but unfortunately that failed in the exact same manner.  I've PM'd the full zip to eschultz and am posting the dmesg output here for all eyes.

    dmesg.txt

  8. 43 minutes ago, eschultz said:

    Can you please try upgrading to BIOS 1402 (released on 8/10/18) and retrying Unraid 6.6.0-rc1?

    Yessir, am working on that now.  Will follow up after that's done.  Actually thought I had this update but apparently downloaded it and didn't apply it.

    20 minutes ago, jonathanm said:

    Having array drives attached via USB is not ideal. It will work when everything is healthy, but error handling and recovery is sub par, and smart reporting can be problematic.

    Thank you for your concern, but I'm well aware of this.  There is no way to fit them inside the chassis with the other hardware in there.  I actually was going to mention this in the original post, but figured it didn't move the conversation forward or have anything to do with the bug.  My apologies for the omission.