ML30 (General HP Error) - UnRAID causes BIOS Critical Failure


SliMat

Recommended Posts

Hi all

 

This is a theme which I have tried to sort out before with no success (with great help from @1812). But there has been a [bad] development today - because I was convinced there might be a hardware issue with my HP Microserver G8, over Christmas I bough a new ML30, with the intention of replacing the Microserver... but this issue appeared on the new machine instantly. Here is an overview and I really hope somoene can help... I am even happy to let someone log in remotely to the server and play as its a virgin machine!

 

So, I have several HP servers running UnRAID (various versions) and they ALL have the same problem... so I am really hoping someone can help me get to the bottom of this.

 

The underlying problem seems to be that UnRAID causes a PCI Bus Error - reports as: "Uncorrectable PCI Express Error" in the Integrated Management Log.

 

I have had this issue with my DL380 (G8) in a datacenter - which is currently running 6.5.0 because this the last version I have successfully installed and not had this errror... obviously being in a DC I cant pop-in whenever I lose control of it!

 

I also have an HP Microserver (G8) at home which has the same issue and today I have installed a clean trial version of UnRAID v6.8.1 which has failed on the first boot!

 

This is what is happening on the brand new ML30 (G9), which is essentially the same on all three HP variations... the symptoms I get are that when the server boots, if I am logged into the Remote Console remotely from ILO, I have no keyboard/mouse... so I cannot get past the login screen. On the front of the server I get a red LED flashing, which indicates an issue with the machine. When I log in to the ILO console I see a System Health - Critical warning;

ML30-critical.thumb.jpg.75baf08453cb9e4243eef5efbbc59fa7.jpg

 

When you interrogate the management log, it tells you that there is an Uncorrectable PCI Express Error;

ML30-IML.thumb.jpg.b68c620b93cce41402d0c81b040e0522.jpg

 

I have plugged a keyboard and mouse into the USB port on the ML30 and do get keyboard/mouse control in the UnRAID environment... but as this is a brand new installation with NO data or config done yet, this has to be a problem with the UnRAID system.

 

Just for completeness, the trial USB key was created using the UnRAID key creator and as the server wouldn't boot into UnRAID in its default UEFI mode, I have switched it to Legacy boot. The onboard B140i controller is not configured and there are 4 x SATA hard disks installed. These are all recognised within UnRAID;

ML30-disks.jpg.259176eced182579ad2d6883e9a3ceb4.jpg

 

 

ML30-disks2.jpg.c1aa310ac257d5153fd8145fa196634e.jpg

 

As mentioned I have reported this several times with no cure, yet. Here is one of the earlier posts...

 

As mentioned the ML30 G9 which I have just set up is an absolutely stock install with no config yet - so I am happy for any advice, regardless of possible dataloss as there is no data on this machine!

 

Hopefully someone will have some thoughts as I can't believe I am the only one suffering with these problems as they have been going on for over a year now and it is forcing me to consider dropping UnRAID as I cant get it working reliably!

 

@1812 has written lots of very useful posts around HP servers - but I cant find anything which fits this problem!

 

Thanks in advance

Link to comment

OK, a quick addendum... 

 

I tried to shut the stock ML30 G9 down from the UnRAID console and it counted up and then told me the machine was powered off...

ML30-shutdown2.jpg.f817950ac72b7fbb02376ec685ed53f6.jpg

 

But the machine is still powered on and the screen shows this...

ML30-shutdown.jpg.5e0b0a79506a02ca7a83d178622df814.jpg

 

But it has failed to shut down!

 

So, I will have to force it down 😞

Link to comment

Addendum 2:

 

I just forced the machine to power off and the 'critical error' remains as showing in the ILO (seen this before), so I removed the power and pressed the power button to clear all residual voltage in the system, then reconnected the power and all is back to normal on the ML30 G9 (at least until I try booting UnRAID again);

ML30-reset.jpg.90d0dd696dcdf5fc460ed0e74e66ca32.jpg

Link to comment

************ I think I may have found the issue... but wouldnt know where to start fixing it...************

 

I just noticed this... I inadvertently let UnRAID boot into its default non-GUI mode when I downloaded the diagnostic ZIP and after this the ML30 G9 didn't show a critical error and when I shut it down from the GUI on my LAN it shut down within a few seconds. So I rebooted it into GUI mode and immediately I have no keyboard/mouse in the remote console and the server is showing a critical error again... so I have attached another log to this message while the server is in critical condition... hope this helps someone work this out.

 

I can confirm all my servers (apart from this one) default to booting into GUI mode by default... so I am about to change that and see if my Microserver boots OK ;-)

 

hector2-diagnostics-20200117-1155.zip

Edited by SliMat
typo
Link to comment

Another BIG update...

 

Having worked out it may be a GUI issue, I asked a friend who has a Microserver G8 to try booting his into GUI mode and guess what... he has also lost keyboard/mouse control in ILO Remote Console too AND he's lost the internal USB port so his machine is now looping saying there is no bootable device!!!

 

 

So this would definitely appear to be a problem with HP Servers when you boot UnRAID into GUI mode!!!

 

Back to you guys for help 🙂

 

Link to comment
44 minutes ago, testdasi said:

I would suggest raising a bug report with Limetech, attaching all the details, including diagnostics for them to have a look.

Will do as I am confident that this is the issue as I have bought 2 Microservers, 2 DL380s and an ML30 in the last year trying to sort this out LOL

- thanks

Link to comment
1 hour ago, SliMat said:

I just noticed this... I inadvertently let UnRAID boot into its default non-GUI mode when I downloaded the diagnostic ZIP and after this the ML30 G9 didn't show a critical error and when I shut it down from the GUI on my LAN it shut down within a few seconds. So I rebooted it into GUI mode and immediately I have no keyboard/mouse in the remote console and the server is showing a critical error again... so I have attached another log to this message while the server is in critical condition... hope this helps someone work this out.

(I'm reading through this pre-coffee), all very interesting because I run 2 ml30 g9's without issue but have never booted them into GUI mode. I will try it this weekend when one of them is not in use and try to confirm the behavior though I'm sure it will occur. I vaguely recall getting intermittent errors related to video output on one or two previous HP servers I've used Unraid on, but. I don't remember which ones specifically. I don't recall it causing shutdown issues either, but that doesn't mean its not a problem. iLo seems to rear an issue every now and then on Unraid but usually nothing this drastic.

 

 

Link to comment
1 minute ago, 1812 said:

(I'm reading through this pre-coffee), all very interesting because I run 2 ml30 g9's without issue but have never booted them into GUI mode. I will try it this weekend when one of them is not in use and try to confirm the behavior though I'm sure it will occur. I vaguely recall getting intermittent errors related to video output on one or two previous HP servers I've used Unraid on, but. I don't remember which ones specifically. I don't recall it causing shutdown issues either, but that doesn't mean its not a problem. iLo seems to rear an issue every now and then on Unraid but usually nothing this drastic.

 

 

Hi @1812 - I linked you in this post as I thought you would be interested... I am confident that this is a confirmed bug now (even though I have bought 3 different servers to prove!) as I asked a friend who is miles away to boot his Microserver into GUI mode to prove the fault and he has lost keyboard/mouse and his machine is stuck in a "no boot device found" loop until he gets home.

 

I am just posting the bug now and will link the post URL here shortly in case anyone in future has this problem and can use this information. I found the 'cure' by chance and am so pleased as this has caused me real conserns for sometime!

 

Thanks

Link to comment
Just now, SliMat said:

Hi @1812 - I linked you in this post as I thought you would be interested... I am confident that this is a confirmed bug now (even though I have bought 3 different servers to prove!) as I asked a friend who is miles away to boot his Microserver into GUI mode to prove the fault and he has lost keyboard/mouse and his machine is stuck in a "no boot device found" loop until he gets home.

 

I am just posting the bug now and will link the post URL here shortly in case anyone in future has this problem and can use this information. I found the 'cure' by chance and am so pleased as this has caused me real conserns for sometime!

 

Thanks

I'll also try the 2 hp workstations I have and see what happens just for "fun." I never use GUI mode because I don't have a use case for it, but now I'm very interested! 

Link to comment
Just now, 1812 said:

I'll also try the 2 hp workstations I have and see what happens just for "fun." I never use GUI mode because I don't have a use case for it, but now I'm very interested! 

Do let me know what happens as I can confirm this is an issue with DL380p G8's, Microserver G8's and now ML30 G9's!. I'd be interested to know that I have actually been able to help with something constructive rather than just take-take-take from the forum 😄

 

Link to comment

Another update... In case anyone gets to this point because they have an HP server with this critical error showing in BIOS and no keyboard/mouse/USB port - I have just tested this on my ML30 G9 and if you reboot the server and before it boots into UnRAID do an ILO reset, it clears the critical error from the ILO console ;-)

Link to comment
1 hour ago, SliMat said:

Another update... In case anyone gets to this point because they have an HP server with this critical error showing in BIOS and no keyboard/mouse/USB port - I have just tested this on my ML30 G9 and if you reboot the server and before it boots into UnRAID do an ILO reset, it clears the critical error from the ILO console ;-)

Just had it confirmed by another user that doing this enabled him to clear the fault and reboot his Microserver into UnRAID successfully 😄

  • Thanks 1
Link to comment
  • 3 years later...
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.