• [6.6.0-rc1] xHCI dead, incomplete boot Threadripper 1950


    Rhynri
    • Solved Urgent

    My apologies upfront - no logs available as the xHCI controller dies, and there is no PS2 support on the board for CLI login.  Additionally, since the UNRAID drive is unreachable during a critical point in boot, no networking is available.

     

    As you can see from the attached screen, the xHCI controller is listed as dead by the kernel during a critical juncture in the boot-up sequence, preventing successful boot.  Just before this, you can see my array drives being listed, and they are all attached via USB, so clearly the controller is up just prior to this message.  The first time this happened, the machine hung so severely it required a full hard power-switch cycle to properly reboot and be able to enumerate the USB attached drives again.  From what I can see the boot precedes according to previous boot logs until this point.  The same thing happens on safe boot.

     

    Rolling back to previous version was successful and Unraid booted without incident.  Please let me know how I may assist troubleshooting.  Diagnostic file in separate post below.  Motherboard is an Asus Zenith Extreme.

    unraidBoot.jpg




    User Feedback

    Recommended Comments

    Having array drives attached via USB is not ideal. It will work when everything is healthy, but error handling and recovery is sub par, and smart reporting can be problematic.

    Link to comment
    43 minutes ago, eschultz said:

    Can you please try upgrading to BIOS 1402 (released on 8/10/18) and retrying Unraid 6.6.0-rc1?

    Yessir, am working on that now.  Will follow up after that's done.  Actually thought I had this update but apparently downloaded it and didn't apply it.

    20 minutes ago, jonathanm said:

    Having array drives attached via USB is not ideal. It will work when everything is healthy, but error handling and recovery is sub par, and smart reporting can be problematic.

    Thank you for your concern, but I'm well aware of this.  There is no way to fit them inside the chassis with the other hardware in there.  I actually was going to mention this in the original post, but figured it didn't move the conversation forward or have anything to do with the bug.  My apologies for the omission.

    Link to comment

    I was able to get a little farther this time, in that the front panel USB worked, meaning I had a functional keyboard.  I was able to get a USB key to mount, and copied off the contents of the /var/log directory, figuring that'd be the most useful.

     

    I then tried booting with the UNRAID drive in this port, but unfortunately that failed in the exact same manner.  I've PM'd the full zip to eschultz and am posting the dmesg output here for all eyes.

    dmesg.txt

    Link to comment

    XHCI should work but failed at device side, XHCI driver update issue in 6.6rc1 ?

     

    [   30.917042] usb usb6-port4: attempt power cycle
    [   32.189079] usb 6-1: new SuperSpeed USB device number 5 using xhci_hcd
    [   32.202160] usb-storage 6-1:1.0: USB Mass Storage device detected
    [   32.202696] scsi host3: usb-storage 6-1:1.0
     

     

    [  102.443495] usb 1-5: USB disconnect, device number 2
    [  109.781040] usb 1-5: new high-speed USB device number 5 using xhci_hcd
    [  110.008870] usb-storage 1-5:1.0: USB Mass Storage device detected
     

    01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller [1022:43ba] (rev 02)
        Subsystem: ASMedia Technology Inc. Device [1b21:1142]
        Kernel driver in use: xhci_hcd

     

    06:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:2142]
        Subsystem: ASUSTeK Computer Inc. Device [1043:8756]
        Kernel driver in use: xhci_hcd

     

    You have several XHCI controller in your system, if I am correct TR / Ryzen should have a USB come from CPU ( but MFG may not use it ). Does all have same problem ? Suggest try differnet USB port too.

    Edited by Benson
    Link to comment

    Since you have a Zenith Extreme we have the same board.. Mine works yours doesn't so lets do a few experiments. Do you have a chasis usb3.0 cable that can connect the onboard USB3.0 boards to external ports. In your manual #16 for motherboard list. If you do then plug the usb drive for unraid into a usb plug off of that. Further if your chasis has usb2.0 ports you can use header #26 like I did.

    Link to comment
    9 hours ago, Benson said:

    01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller [1022:43ba] (rev 02)

        Subsystem: ASMedia Technology Inc. Device [1b21:1142]
        Kernel driver in use: xhci_hcd

     

    06:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:2142]
        Subsystem: ASUSTeK Computer Inc. Device [1043:8756]
        Kernel driver in use: xhci_hcd

     

    You have several XHCI controller in your system, if I am correct TR / Ryzen should have a USB come from CPU ( but MFG may not use it ). Does all have same problem ? Suggest try differnet USB port too.

    The ASMedia USB header requires a specific plug that I don't have available, so it is enabled but not currently used.

    The rest of the USBs (minus my Sonnet USB card, which is for VM use only and isolated at boot) I believe stem from the AMD chipset.  All available USBs seem to exhibit the problem.

     

     

    2 hours ago, Jerky_san said:

    Since you have a Zenith Extreme we have the same board.. Mine works yours doesn't so lets do a few experiments. Do you have a chasis usb3.0 cable that can connect the onboard USB3.0 boards to external ports. In your manual #16 for motherboard list. If you do then plug the usb drive for unraid into a usb plug off of that. Further if your chasis has usb2.0 ports you can use header #26 like I did.

    In my previous reply, I mention that the front panel USB works for M/KB with new bios revision.  While this isn't terribly obvious (sorry about that), front panel connections are always via MB header.  While I haven't attempted to use my front panel USB 2.0 ports to boot from, I'd argue that if that does in fact solve the problem it's not a solution, as if you don't have USB 2.0 header ports available it won't work, and it's probably a symptom of a bigger problem.

     

    Also, what TR do you have and what board revision?  Use:

     dmidecode --type 2

    And you should get back something like mine:

    Manufacturer: ASUSTeK COMPUTER INC.
            Product Name: ROG ZENITH EXTREME
            Version: Rev 1.xx
            Serial Number: 170706217200585

     

    Edited by Rhynri
    Link to comment

    I have a 2990WX

     

    Base Board Information
            Manufacturer: ASUSTeK COMPUTER INC.
            Product Name: ROG ZENITH EXTREME
            Version: Rev 1.xx

     

    Only BIOS changes made are I turned on " Enumerate all iommu in IVRs " and SVM.. Also I used to us my USB 3.0 ports I had to stop as when I passed my controller it passed two USB controllers instead of one.. The onboard one and the 3.1 in the back are the only two that don't pass when I do the passing command even though I only told Unraid to grab 1 of the IOMMUs groups and not both. Was a very strange thing. Also if I turned off my wireless/bluetooth that USB controller no longer passes.

    Link to comment

    I have a 1950.  I’m using all the back USB ports for the Unraid OS itself and then I pass individual controllers off a Sonnet card for the VMs.  I have the wireless and Bluetooth disabled because it cleans up the pass through for the rest, although I’d love for unraid to be able to use the wireless for additional network redundancy.  

     

    In the bios I have NUMA set up and then additional tweaks to the pcie setup because I’m splitting the bottom slot between the sonnet card and a U.2. 

    Edited by Rhynri
    Link to comment

    Hello!  The new RC2 now boots correctly.  Thanks!  I've run into some other weirdness but it's not the same issue so I've closed this one.

    Edited by Rhynri
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.