• HP Server - keyboard/mouse/usb & critical BIOS failure


    SliMat
    • Annoyance

    OK, quite a lengthy summary here, which has been going on for over a year and today I am confident that I have got to the bottom of it.

     

    I have previously posted several threads which provide a lot of details for each occasion;

     

     

     

    But in summary... I have had some serious issues with all versions of UnRAID after 6.5.0 with reliability...

     

    The servers concerned are 2 x HP MicroServer G8's, 2 x HP DL380p G8's and today 1 x ML30 G9. Yes... I have replaced two servers thinking it was a hardware issue 😞

     

    The machines are configured to boot in legacy mode and the USB key with UnRAID is installed on the internal USB port. However when the machine boots it sometimes shows a critical BIOS error in the HP ILO and once this error shows, you have no keyboard/mouse control in the ILO remote console screen - so all you can do is to force the machine to reboot by using the hold power option in the ILO. Then in most cases after the machine boots, it fails to recognise the USB key in the internal USB port and the only way I'd found to reboot the machine into UnRAID was to remove the USB key to an external port and reboot.

     

    As I had never managed to sort this out, I bought a new ML30 G9 over Christmas and just built it with a trial key using v6.8.1 today. As soon as it booted I selected UnRAID with GUI option and it booted into UnRAID. Immediately the server showed a critical PCI-Express error and from the ILO remote console the keybpard/mouse wouldnt work... So I logged into UnRAID from the LAN GUI (which worked OK) and shut down the machine. It wouldnt shut down, so I had to foce it with a 'hold-power button' from ILO. When the machine was powered down it still showed the Critical BIOS PCI-Express error, so I removed the power cord and pressed the power button to discharge everything and when I reconnected power and got on the ILO, the error had cleared.

     

    I then rebooted the machine and forgot to change into 'UnRAID with GUI' boot option, which I always have as default so I can get into UnRAID from ILO. I noticed the error didn't apear, so I rebooted and chose 'UnRAID with GUI' option and immediately the critical error flashed up!

     

    I have a friend who runs UnRAID 6.8.0 on another Microserver G8 which is completely remote from me. He switched his to 'UnRAID with GUI' mode and also found he has lost ILO keyboard/mouse - so he rebooted and its now stuck in a 'boot device not found' error as it would seem it cant recognise the internal USB port at the moment. When he gets home he will cold boot it and I know it will be fine again!

     

    I have attached two diagnistic.zip files from my ML30 G9 today - one was generated when the machine DIDN'T have the critical BIOS error and one when it was in Critical condition - in case they are different! The one generated at 11:46 is in normal condition and the 11:55 one is in critical condition.

     

    So far I have replaced my Microserver with another Microserver and then bought this ML30 G9 as I thought it was a compatibility issue with the Microserver and I replaced my DL380 G8 in the datacentre as I thought there was a problem with the original machine... so all in all its been quite costly for me to get to this point, so it would be nice to know that this IS the cause and whether it can be fixed.

     

    I have the virgin ML30 G9 with a trial key and no data or config - so if you want anything done on this to try and find the cause please let me know.

     

    Hope to hear back soon!

     

    hector2-diagnostics-20200117-1146.zip hector2-diagnostics-20200117-1155.zip




    User Feedback

    Recommended Comments

    Another update - in case anyone is locked out of their remote HP server saying "No Boot Device Found"... I have tested and successfully recovered from this on a remote server by using the "reset ILO" feature then power cycling the machine and from ILO remote console make sure "UnRAID with no GUI" is selected from the blue option screen - then it boots, recognises USB key and you can get control of the server again.

     

    REMEMBER TO CHANGE DEFAULT BOOT MODE TO "WITHOUT GUI" IF YOU HAVE CHANGED IT TO BOOT WITH GUI MODE (AS I HAD!!)

    Link to comment

    Respectfully, while I agree that it's urgent in the sense that there is something wrong that needs to be addressed, there is a valid workaround in place to run unraid without triggering the issue, and it only effects a small subset of hardware. GUI mode just doesn't work properly on some systems. It's been that way since it's been introduced.

     

    I don't think this deserves the urgent tag, which implies a showstopper issue for general usage in a majority of hardware with no workaround.

    Link to comment
    1 minute ago, jonathanm said:

    Respectfully, while I agree that it's urgent in the sense that there is something wrong that needs to be addressed, there is a valid workaround in place to run unraid without triggering the issue, and it only effects a small subset of hardware. GUI mode just doesn't work properly on some systems. It's been that way since it's been introduced.

     

    I don't think this deserves the urgent tag, which implies a showstopper issue for general usage in a majority of hardware with no workaround.

    Hi Jonathanm... I flagged it as urgent because it renders the machine unable to boot to the internal USB port saying that there is no bootable device. When my DL380 did this in the datacenter over a year ago I had a 160 mile round trip on a Saturday to recover it - then another 160 mile round trip to re-install it plus I paid for a replacement server - So as it can render the machine unusable I deemed it to be urgent... if you think its a minor inconvenience I will change the flag!

    Link to comment
    7 minutes ago, jonathanm said:

    Respectfully, while I agree that it's urgent in the sense that there is something wrong that needs to be addressed, there is a valid workaround in place to run unraid without triggering the issue, and it only effects a small subset of hardware. GUI mode just doesn't work properly on some systems. It's been that way since it's been introduced.

     

    I don't think this deserves the urgent tag, which implies a showstopper issue for general usage in a majority of hardware with no workaround.

    The 'workaround' was only discovered a few minutes ago... but I have changed to "annoyance" if its not deemed important that peoples machines can be left unusable 😐

     

     

    Edited by SliMat
    typo
    Link to comment
    17 minutes ago, SliMat said:

    but I have changed to "annoyance" if its not deemed important that peoples machines can be left unusable

    I think the 'urgent' category is meant to be used for 'drop everything else until this is fixed' type errors and ones that can cause data loss?  I would only have downgraded this one to 'Minor' rather than all the way to 'Annoyance', but that is just my view.

    Link to comment
    6 minutes ago, itimpi said:

    I think the 'urgent' category is meant to be used for 'drop everything else until this is fixed' type errors and ones that can cause data loss?  I would only have downgraded this one to 'Minor' rather than all the way to 'Annoyance', but that is just my view.

    Urgent seemed appropriate as Urgent shows "Server crash, data loss, or other showstopper" and this problem causes servers to crash and is a show stopper because it introduces unreliability... granted the tests I've done today do mean that it is possible to recover a 'crashed' machine remotely - just a shame I didn't work this out a year, or so, ago when it caused me a major outage and I am still running 6.5.0 in the datacentre as I couldn't trust any later versions 😞

     

    Not being a programmer, developer or engineer I am still not confident to remotely upgrade my DL380, until I can book a slot in the DC when I can pop in if a remote upgrade fails as some of the information on this machine is mission-critical to my business and if it goes down again its a pain switching data across to a backup machine!

     

    OK... we'll meet in the middle - "Minor" it is!

     

    ;-)

    Link to comment
    32 minutes ago, SliMat said:

    The 'workaround' was only discovered a few minutes ago... but I have changed to "annoyance" if its not deemed important that peoples machines can be left unusable 😐

     

     

    It is important. I'm not saying that it isn't.

     

    It's just the urgent tag triggers a bunch of immediate attention, which isn't necessarily productive in this specific instance. Better to put it in the que of important things to try to fix, instead of in the "emergency we better find a solution before thousands of people corrupt their data" category, only to find out that it's not that big of a deal for 99% of the user base.

     

    Screaming for attention for something that in the grand scheme isn't a show stopper may cause the issue to get pushed down farther than it deserves to be as an over reaction to the initial panic.

     

    Politely asking for help resolving it goes a lot further than pushing the panic button.

    Link to comment
    22 minutes ago, SliMat said:

    Urgent seemed appropriate as Urgent shows "Server crash, data loss, or other showstopper"

    You need to see this in the grand scheme of things.

    I understand it is urgent to you, but the problem is limited to your specific hardware.

    Anything marked "urgent" requires immediate attention because it potentially impacts a large portion of users, which isn't the case here.

     

    A good start for situations like this, is "minor" and since there is a workaround now "annoyance" is appropriate.

    Link to comment
    15 minutes ago, bonienl said:

    ...the problem is limited to your specific hardware.

    Anything marked "urgent" requires immediate attention because it potentially impacts a large portion of users, which isn't the case here.

    Fair enough - downgraded as far as I can. I just thought as HP is one of the biggest server and PC manufacturers there may be a fair few people running their kit... Sorry for the trouble. I'll just pop back occasionally to see if a fix is released.

     

    Thanks.

    Link to comment
    1 minute ago, SliMat said:

    I'll just pop back occasionally to see if a fix is released

    That's alright. "Annoyance" doesn't mean there is no follow-up, it will get addressed at some point.

    Link to comment
    1 hour ago, sota said:

    can you post up the BIOS and iLO versions that are installed on the affected machines?

     

    This has been causing me major problems for over a year, so I have updated them all to the latest versions, but I dont have all the details of the previous versions...

     

    HP Microserver/s:

    Current system ROM = J06 05/21/2018 (previous system ROM = J06 11/02/2015)

    Current ILO = ILO4 - ILO Firmware 2.70 May 07 2019 (cant remember the previous version - but it was the latest version without HTML5 console)

     

    DL380s (G8):

    Current System ROM = P70 05/21/2018 (previous system ROM = P70 07/01/2015)

    Current ILO = ILO4 - ILO Firmware 2.70 May 07 2019 (cant remember the previous version - but it was the latest version without HTML5 console)

     

    I dont have the ML30 (G9) details here but can get them tomorrow.

     

    Perhaps @matto2494 can confirm his ILO/ROM versions too?

    Edited by SliMat
    Link to comment

    My DL380 G8...

    System ROM P73 05/24/2019

    Backup System ROM P73 08/02/2014

    iLO Firmware Version 2.70 May 07 2019

     

    I haven't had any of the issues you've described what so ever.

    I did upgrade the ROMs on mine shortly after I got it, in fact before I even installed unRAID on it.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.