• 6.10-RC1 Unable to access WebUI post update on two servers


    DieFalse
    • Minor

    Upgraded to 6.10-RC1 to attempt to resolve a kernel issue (call trace) and now I can only access SSH and not the WebUI.  WebUI times out regardless of https://ip http://ip or fqdn or hash.unraid.net.

     

    Notes:

    While I can SSH - Running anything freezes the session for eternity unless a basic no-drawing command (ls,ps,etc works) (top,htop,mc freezes)

    While I can see the Samba shares or NFS shares, trying to browse into any of them freezes the session

    It seems the server is overloaded, I was able to run "docker stop $(docker ps -q)" which worked, but still can not do anything of use. also ran "/etc/rc.d/rc.libvirt stop" and the server is still hung, currently waiting to see if "fuser -mv /mnt/disk* /mnt/user/*" returns anything as its hung.

     

    netstat shows listening on the right ports for the nginx server:

     

    netstat -tulpn | grep LISTEN
    tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      7642/rpcbind
    tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      47166/nginx: master
    tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      7447/sshd: /usr/sbi
    tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      47166/nginx: master
    tcp        0      0 0.0.0.0:41787           0.0.0.0:*               LISTEN      7646/rpc.statd
    tcp6       0      0 :::111                  :::*                    LISTEN      7642/rpcbind
    tcp6       0      0 :::80                   :::*                    LISTEN      47166/nginx: master
    tcp6       0      0 :::22                   :::*                    LISTEN      7447/sshd: /usr/sbi
    tcp6       0      0 :::36151                :::*                    LISTEN      7646/rpc.statd
    tcp6       0      0 :::443                  :::*                    LISTEN      47166/nginx: master
     

    /etc/rc.d/rc.nginx is running and restart makes no difference

     

    curl localhost results in 302 error, so I am assuming this has to do with the dns redirect that unraid uses to the hash url?  (is there anyway to disable that since I use my own dns and hostnames with valid certs anyway and it would be easy to configure).

     

    I had to mv /boot/previous to /boot to restore access and control

     

    I was unable to pull diagnostics as all method timed out, and pulling the usb was not possible due to physical access limitations until I restored. I did collect two sets during the issues though and have attached.

     

    Diagnostics attached from both servers 

    Arcanine = AMD Threadripper on X399D8A-2T

    GSA = Intel Xeon Dell Poweredge R7220XD

    Similarity is the 10GbE SFP+ card is the same in both units

    arcanine-diagnostics-20210819-1207.zip gsa-diagnostics-20210819-0946.zip gsa-diagnostics-20210819-1005.zip syslog.zip




    User Feedback

    Recommended Comments

    > curl localhost results in 302 error, so I am assuming this has to do with the dns redirect that unraid uses to the hash url?  (is there anyway to disable that since I use my own dns and hostnames with valid certs anyway and it would be easy to configure).

     

    ssh into the server and type:

      use_ssl no

    This is the equivalent of going to Settings -> Management Access and setting Use SSL to No.  You will then be able to access the webgui using:

      http://ipaddress  (note: http not https)

    Link to comment
    34 minutes ago, ljm42 said:

    > curl localhost results in 302 error, so I am assuming this has to do with the dns redirect that unraid uses to the hash url?  (is there anyway to disable that since I use my own dns and hostnames with valid certs anyway and it would be easy to configure).

     

    ssh into the server and type:

      use_ssl no

    This is the equivalent of going to Settings -> Management Access and setting Use SSL to No.  You will then be able to access the webgui using:

      http://ipaddress  (note: http not https)

     

    Thank you - I will try that to see if I can get into the WebUI but the hangs / inability to maneuver anything even in ssh is worrisome.  Anything in diagnostics useful?

    Link to comment

    It appears that your bz* files have been modified, are you running a custom kernel? Only the stock kernel is supported

    Link to comment

    No, I haven't had a non-stock kernel on GSA ever, Arcanine used to for mellanox and something else but it went to stock over 3 releases ago.

     

    How do I restore .bz to stock if it's modified?

    Link to comment
    18 hours ago, fmp4m said:

    No, I haven't had a non-stock kernel on GSA ever, Arcanine used to for mellanox and something else but it went to stock over 3 releases ago.

     

    How do I restore .bz to stock if it's modified?

     

    arcanine has the Unraid-Kernel-Helper.plg plugin installed, and the bzfirmware file is 20MB whereas the stock 6.10.0-rc1 file is 10MB.

     

    Does that plugin have an option to return to stock? If so run that and then uninstall the plugin.

     

    Unfortunately, nothing else is really standing out to me. Hopefully somebody else will see something.

    Link to comment
    1 minute ago, ljm42 said:

     

    arcanine has the Unraid-Kernel-Helper.plg plugin installed, and the bzfirmware file is 20MB whereas the stock 6.10.0-rc1 file is 10MB.

     

    Does that plugin have an option to return to stock? If so run that and then uninstall the plugin.

     

    Unfortunately, nothing else is really standing out to me. Hopefully somebody else will see something.

     

    That plugin has been removed, the behaviour exists on GSA and Arcanine both.  There was no "revert" option so I assume on Arcanine if thats the only one with bad .bz, I will need to do something to revert?

     

    I thought when you run upgrade it installs the latest .bz, so that makes me think something would have to modify the .bz post upgrade?

    Link to comment

    gsa has the same plugin installed, so I'd remove that and restore it to stock as well.

     

    gsa has a bunch of btrfs mentions in the log, not sure if that is an issue?

    Aug 18 21:45:17 GSA kernel: BTRFS info (device sdb1): relocating block group 43333096308736 flags data
    Aug 18 21:45:22 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: move data extents
    Aug 18 21:45:22 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: update data pointers
    Aug 18 21:45:23 GSA kernel: BTRFS info (device sdb1): relocating block group 43332022566912 flags data
    Aug 18 21:45:28 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: move data extents
    Aug 18 21:45:28 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: update data pointers
    Aug 18 21:45:29 GSA kernel: BTRFS info (device sdb1): relocating block group 43330948825088 flags data
    Aug 18 21:45:33 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: move data extents
    Aug 18 21:45:34 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: update data pointers
    Aug 18 21:45:35 GSA kernel: BTRFS info (device sdb1): relocating block group 43329875083264 flags data
    Aug 18 21:45:40 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: move data extents
    Aug 18 21:45:41 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: update data pointers
    
    

     

    Link to comment
    15 minutes ago, ljm42 said:

    gsa has the same plugin installed, so I'd remove that and restore it to stock as well.

     

    gsa has a bunch of btrfs mentions in the log, not sure if that is an issue?

    Aug 18 21:45:17 GSA kernel: BTRFS info (device sdb1): relocating block group 43333096308736 flags data
    Aug 18 21:45:22 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: move data extents
    Aug 18 21:45:22 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: update data pointers
    Aug 18 21:45:23 GSA kernel: BTRFS info (device sdb1): relocating block group 43332022566912 flags data
    Aug 18 21:45:28 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: move data extents
    Aug 18 21:45:28 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: update data pointers
    Aug 18 21:45:29 GSA kernel: BTRFS info (device sdb1): relocating block group 43330948825088 flags data
    Aug 18 21:45:33 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: move data extents
    Aug 18 21:45:34 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: update data pointers
    Aug 18 21:45:35 GSA kernel: BTRFS info (device sdb1): relocating block group 43329875083264 flags data
    Aug 18 21:45:40 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: move data extents
    Aug 18 21:45:41 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: update data pointers
    
    

     

    I will restore via instructions this evening.   The btrfs issues were resolved in another thread that lead to me trying 6.10 as the call traces on previous versions.   Leading me to here.  I'll advise when both are 100% stock and then upgrade to 6.10rc1 again and see if any difference is noticed.

    • Like 1
    Link to comment
    17 hours ago, fmp4m said:

    I will restore via instructions this evening.   The btrfs issues were resolved in another thread that lead to me trying 6.10 as the call traces on previous versions.   Leading me to here.  I'll advise when both are 100% stock and then upgrade to 6.10rc1 again and see if any difference is noticed.

     

    Ok, Arcanine is on 6.10rc1 with no issues as of this morning.

     

    GSA I could not get into it in any way other than SSH as previously mentioned, so it still has the issues originally reported. Except error 500 on loading the webui instead of 302.

     

    I used use_ssl no and got into the web ui, call traces are still heavily happening.

     

    The kernel files should all be stock now.

     

    I since I had previously downgraded, manually copied 6.9.2 to root, rebooted, then upgraded to 6.10rc1 through UI and verified bz files matched the downloaded zip for 6.10rc1

    gsa-diagnostics-20210822-1057.zip

    Link to comment

    I continue to have problems with GSA and Arcanine.  Arcanine became completely unresponsive last night.  GSA is SSH'able but nothing will run correct.y

     

    Neither is unusable in its state, so I will be forced to revert to 6.9.2 soon to bring them back online,  I would like to give all information I can to help resolve this. Please let me know what to provide. I can even provide remote access if you want to PM me.

    Link to comment

    Tracking this down with discord assistance, it was determined that the servers was blocking/routing incorrectly due to Jumbo Frames on the NICs.  when the MTU is higher than 1500 on the NICs no web access across the network was possible, yes jumbo frames is correctly configured on the switch's and router, and jumbo frames worked on <6.9.2.  When the MTU is set to 1500, WebUI loads correctly.

    • Like 2
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.