• 6.9.1 network card instability


    shergar
    • Solved Urgent

    Since upgrading from beta 25 to 6.9.0 (now 6.9.1) I have had my hp nc523sfp network card continually drop out. Initially it appeared there was no rhyme or reason but now its consistently doing it approximately 30 minutes after restarting my server. Card works fine in the mean time. Unfortunately it has rendered my server useless at its primary role. It appears that the card is timing out for some reason and is unable to be restarted.  Any ideas? 

    tower-diagnostics-20210311-1344.zip




    User Feedback

    Recommended Comments

    Is anyone able to tell if the new Linux kernel has deprecated support for my card? I have seen a number of issues as a result of upgrading to rc2 is this when the kernel was updated? I'm literally at a loss with this and the only solution appears to be replacing my card with a newer mellanox model. I have had to revert my network to the onboard 1gbit as its the only way I can reliably maintain functionality of my plex server. Its been rock solid since. 

    Link to comment

     

    I don't know if you still have your issue at the moment, but I basically encountered a similar issue on 6.8.x with my R720 onboard NIC, it would just drop the eth link and no amout of pounding internet cables in RJ45 ports would fix it.
    I would have to restart the server several time to have what I would call a "good boot" where it doesn't loose connection in the 30minutes-1hour of uptime frametime.

     

    I ended up splurging on a PCIe Base-T NIC, that held flawlessly since.
    Then since 6.8.3, and worse since 6.9, it's my OTHER PCIe NIC, an SFP+ one that started to have this issue.

     

    And know what? NO one, in 10 months, cared enough to take a look at it.

     

    A deal breaking issue, for a server, to randomly loose connection, and no one gives a foo about it.

    SO my call would be for you to take a loss on a PCIe express slot and half a hundred quid+ for a PCIe NIC, or to switch to an other server distro than unraid 6.9, because at this rate it aint gonna be fixed before Unraid 42.0

     


     

    Link to comment
    On 3/13/2021 at 12:16 AM, shergar said:

    Is anyone able to tell if the new Linux kernel has deprecated support for my card?

    Is still supported.

    Mar 11 13:39:48 Tower kernel: QLogic 1/10 GbE Converged/Intelligent Ethernet Driver v5.3.66
    Mar 11 13:39:48 Tower kernel: qlcnic 0000:0a:00.0: 2048KB memory map
    Mar 11 13:39:48 Tower kernel: qlcnic 0000:0a:00.0: Default minidump capture mask 0x1f
    Mar 11 13:39:48 Tower kernel: qlcnic 0000:0a:00.0: FW dump enabled
    Mar 11 13:39:48 Tower kernel: qlcnic 0000:0a:00.0: Supports FW dump capability
    Mar 11 13:39:48 Tower kernel: qlcnic 0000:0a:00.0: Driver v5.3.66, firmware v4.20.1
    Mar 11 13:39:48 Tower kernel: qlcnic: 2c:59:e5:7c:64:80: NC523SFP 10Gb 2-port Server Adapter Board Chip rev 0x54
    Mar 11 13:39:48 Tower kernel: qlcnic 0000:0a:00.0: using msi-x interrupts
    Mar 11 13:39:48 Tower kernel: qlcnic 0000:0a:00.0: eth0: XGbE port initialized

     

    You may want to check the HP site if there are known issues with this specific driver version.

     

    Link to comment

    I think I may have stumbled on something whilst playing about with my new cpu. whilst looking through the system temp plugin I noticed that my qlc nic temp was 104 deg C. I have never seen this in system temp before although I have never looked but I'm sure that for some reason the update has now made that a thing and is disabling the card after a while due to over temp. I have read that this will only reset on restart which also lines up with what I have. I have made no hardware changes with the update from beta to stable so I can only deduce its a driver update within the kernel causing this. Anyways I have a 40mm fan on the way to go directly on the nic, In addition to the 2 80mm fans I have cooling the general pcie slots area of my server. I will keep you posted on my results

    Link to comment

    update. the new fan is zip tied to my nic. temp has dropped from 104 deg C at idle to a very acceptable 64 deg C under load. no dropped connections as yet with about an hour and a half uptime. looks promising. one small observation though, I'm maxing out at about 900MB/s whereas prior to the update I was running a full 1.04GB/s.  a win is a win though.

    Link to comment
    On 4/10/2021 at 7:43 AM, shergar said:

     Anyways I have a 40mm fan on the way to go directly on the nic, In addition to the 2 80mm fans I have cooling the general pcie slots area of my server. I will keep you posted on my results

     I'd like to ask your general setup (Server frame and size/layout). I have a Chenbro NR12000 (One of those 12 Disk 1U Servers) and the HP 10G NC523SFP Card in mine is also loosing connectivity. I was thinking of getting one of the 40mm fans you mentioned from amazon to try and help it out but with not much room there I don't know how you have yours setup to work decent.

    Link to comment
    16 minutes ago, Eugene D said:

    I'd like to ask your general setup

    This is a solved thread for an Unraid release. You might get better response if you start a new thread in an appropriate subforum. Or maybe just message the person you are replying to.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.