HP Proliant / Workstation & unRaid Information Thread


1812

Recommended Posts

No, I don't think using consumer disks should be an issue.

 

Just wondering if there might be some interaction with pin 3 of the SATA power connector - the signal in the SAS standard and in SATA v3.2+ and v3.3 intended to turn on/off power to the drive to force a full reboot of the drive. But I can't understand why connecting two SSD would make the backplane change behavior.

 

And I can't see how 2xHDD + 2xSSD would draw so much current that something would break.

Link to comment
12 hours ago, pwm said:

Just wondering if there might be some interaction with pin 3 of the SATA power connector

 

I am aware of this issue. But backplanes of ML110 G9 and ML150 G9 can operate them properly.

 

I am getting more convinced that mine was DOA somehow. I will try another ML150 Gen9. Fingers crossed :)

Link to comment
  • 2 weeks later...
On 5/21/2018 at 6:24 PM, 1812 said:

 

Very perplexing. I was just reading about some issues with sata disk recognition on this gen/model and some were resolved by a firmware update... have you looked into that?

 

On 5/21/2018 at 10:26 PM, pwm said:

Any component burned by supplying power to four disks?

In case, you wondered, please find below an update on my issue:

 

HPE replaced the server with a new ML150 Gen9. It comes with H240 card and an 1T LFF SATA hard disk with HPE sticker.

 

Problem started from day 1.

1. I removed H240 as I don't need any RAID for unRAID.

2. Connected mini-SAS cable from the drive cage to mini-SAS on the motherboard.

3. I opted for AHCI in BIOS instead of B140i.

(no any other change like upgrading BIOS, replacing hard disk etc.)

4. Bummer. The server wouldn't recognize the hard disk. Tried both mini-SAS ports on the motherboard. No way.

 

After much fiddling for hours, I just wanted to try H240. It worked in HBA mode. Then I removed H240 and connected to mini-SAS port of motherboard. Again worked. Crazy!

 

Anyway, I decided to use H240 as it seems a safer way. There was no problem for 2-3 days. As H240 doesn't report any hard disk smart attributes back to unRAID, I replaced it with an LSI card. It worked fully. So far, so good.

 

After 10 days of good use, something happened, the sever lost LFF SATA hard disks again. There was a flashing red light on the front. I checked iLO. There were some critical power supply problems painted in red. The final message from iLO was:

Critical,178,17991,0x0014,System Error,,,06/02/2018 07:51:00,9: Server Critical Fault (Service Information: Runtime Fault, System Board,  P5V/P3V3/Chipset/AUX Regulators 1 (04h)) 

Now, motherboard port, H240 and LSI cards are not detecting LFF SATA hard disks at all.

 

Contacted online HPE support last night. The tech examined the iLO report and decided that the motherboard needs to be replaced because of the above error. He said I will be contacted on Monday (tomorrow).

 

I am having hard time to believe that two of my ML150 Gen9 servers had the same motherboard problem. How probable is that?

 

All the best.

 

 

Link to comment

Normal best practices results in quite robust electronics.

 

I wonder if HP have picked up one or two hw designers from Apple. Apple computers are known for having quite weird "unlucky" designs.

 

But HP have by tradition managed well to design and produce very well working equipment that just keeps working way past the expected economical lifetime.

 

One problem currently, is that lots of companies buys parts from other companies to save own R&D and manufacturing costs. Normally not an issues, except it means they lose the control and often aren't aware about changes introduced in parts they buy. And the subcontractors aren't 100% aware about how the parts are used. So over time, there may be issues introduced that never would have happened if engineer A could have walked two corridors down and asked engineer B for feedback about a planned product change.

 

I hope you get a resolution to your problems.

Link to comment

I have an ml350g5 I got from work.  I know it is older.  Only had on Xeon 5520 on it.   I know I can get a second Xeon on eBay for cheap.  But, how can I tell what this board can handle?   If I want to get the fastest Xeon possible for the board?   I know dells support page tells of such things.    Does HP have such a site to tell what top specs of board are? 

Link to comment
13 hours ago, bphillips330 said:

I have an ml350g5 I got from work.  I know it is older.  Only had on Xeon 5520 on it.   I know I can get a second Xeon on eBay for cheap.  But, how can I tell what this board can handle?   If I want to get the fastest Xeon possible for the board?   I know dells support page tells of such things.    Does HP have such a site to tell what top specs of board are? 

 

https://h20195.www2.hpe.com/v2/GetPDF.aspx/c04284193.pdf

 

I would google around and see what others have put in as max procs as well. This includes looking at ebay ads too for complete working servers of the same model.

Link to comment

I recently bought an UPS (APC Smart UPS X 3000 VA LCD), and I'm currently testing it in use. I've got it identified by the apcupsd-daemon in Unraid, and it initiated a shutdown as I'd expected.

 

What I want is a possibility to have the server start when main-power is resumed, but I dont know how to go about this. Any tips? This would "have" to work for both when mains is restored after the shutdown has been initated, but before the UPS is depleted entirely, and in the case where the UPS has been entirely depleted

Link to comment
2 hours ago, Fredrick said:

What I want is a possibility to have the server start when main-power is resumed, but I dont know how to go about this. Any tips? This would "have" to work for both when mains is restored after the shutdown has been initated, but before the UPS is depleted entirely, and in the case where the UPS has been entirely depleted

 

The BIOS has a setting if the machine should resume previous state on return of mains power. So if the machine was running when it lost power, then it auto-boots when power is returned.

 

However, you really should think twice about starting the machine directly the power is returned. If you let it run on batteries and depletes the batteries and you then wake up the machine directly power returns then you have no UPS capacity left - so a second power failure (it's quite likely to get a second power failure if the first was an extended outage) will let your machine fail rather badly with no possibility of a soft shutdown.

 

If you want automatic start after the machine has been shutdown, then you should delay the automatic start until the UPS has recharged. If you are home and need the machine earlier, then you should for that specific case make a manual start after having made an evaluation of the current battery state.

 

If you really, really want the machine to autostart before the batteries have been fully charged, then you should instead configure your system so it shutsdown quite early - at the very max consume maybe 30% of the UPS capacity. Then you will know that there is a good probability for the machine to make a safe shutdown a second time if the power is directly lost again.

  • Like 1
Link to comment

Thanks, that is solid input :)

 

Is there a way to configure this to only boot the server when UPS reached maybe 50% capacity? And I dont see how the BIOS setting would help? If Unraid shuts down before the battery is depleted, the last power state would be off, right? So it wouldnt turn back on.

 

I guess I could somehow have a service on a secondary Raspberry Pi or similar to do the following

  1. Check if server is off
  2. Check if UPS-charge>50%
  3. Wake-On-Lan
Link to comment

I personally have a RPi as "owner" for the UPS units.

 

Besides communicating with UPS and computers, the RPi can also communicate with Z-Wave power plugs to turn on/off equipment.

The RPi, Z-Wave and a LTE-conencted gateway is run from a separate UPS giving them a huge battery time.

 

I have some machines that doesn't turn off the PSU on shutdown - so for them the RPi cuts the power from the outside. And the BIOS setting makes them boot when power is restored.

Some machines do turn off the PSU. Then WoL or IPMI might wake the machine.

  • Like 1
Link to comment
On 6/11/2018 at 9:09 PM, pwm said:

I personally have a RPi as "owner" for the UPS units.

 

 

Any tips/guides on how you set this up? I followed this guide, but that basically leaves me with my RPi as the NUT server, and I don't know how to set it up as you suggested. I've tried to read the NUT-documentation but that was some heavy stuff..

 

Do you control the power of VMs/Unraid server from the RPi itself, or from the individual NUT-client setups on those machines? And how do you set up the RPi to wake the server when UPS is back above a given charge?

Link to comment
On 6/12/2018 at 9:21 PM, Fredrick said:

 

Any tips/guides on how you set this up? I followed this guide, but that basically leaves me with my RPi as the NUT server, and I don't know how to set it up as you suggested. I've tried to read the NUT-documentation but that was some heavy stuff..

 

Do you control the power of VMs/Unraid server from the RPi itself, or from the individual NUT-client setups on those machines? And how do you set up the RPi to wake the server when UPS is back above a given charge?


I run a rather strange setup, since I work a lot with developing systems-level software.

 

So I run apcupsd to handle my UPS equipment.

And more apcupsd in machines that should respond and shutdown to loss of power.

And I run a Mosquitto MQTT broker.

All apcupsd and lots of status information from the servers are published on the MQTT broker.

And I have hacked some basic MQTT support for a Z-Wave hat for an RPi.

So subscribing to MQTT topics I can see the current state of all the equipment.

And I can publish MQTT topics to turn on/off equipment.

In some situations by controlling Z-Wave power plugs.

In some situations by issuing WoL.

ipmitool can be great if the system supports IPMI.

 

In the end, it's a quite large infrastructure. And quite a lot of it developed when I was a bit pissed about lack of supervision in unRAID 5 and while Limes spent time working on Xen support in early unRAID 6 alpha/beta versions.

Link to comment

That seems like quite the hack just to get a (single) computer in my case to turn off in time, and turn back on when battery is good :P

 

Had another outage today, and the server is not given enough time to gracefully shut down. It seems to force shutdown after 90s, which is not enough. I'm not really sure which setting would define this.

 

Also my UPS has three different groups (groups of outlets in the back) which can be controlled individually through the network interface, but that data/function seems to be lost with NUT. 

Link to comment
  • 2 weeks later...

@Fredrick Back in the day before we had decent smart UPS's along with solid software, we did the "no-no" to trick the systems and bought some time.  We would use 2 UPS's in line with each other.  Yes I know this bad practice, but is it?  Here me out.  Two UPS systems, both smart.  #1 would be connected to the wall outlet and #2 was connected to #1 and the PC.  The PC was plugged into #2, but the smart communication of the UPS was connected to #1.  So when #1 lost power and did its shut off procedure, #2 gave the PC enough time to shut off.  #2 was just programmed to turn off after a certain power level and back on when it reached a better charge, if it turned off at all.  Keep all that in thought.

 

Since UPS units only have one USB, unless you're using a system that actually has Ethernet (doubting that scenario here), using two UPS system can fix some issues.  With unRaid you can address the USB comms to the virtual system and it can tell the system to power off before any of the actions of the primary UPS for unRaid kick in.  A secondary cheap UPS with USB communication to power miscellaneous items, such as a monitor or external DVD drive, can tell the virtual system to power off after 30 seconds without power.  I'm not saying to daisy chain.  I'm just saying another power monitoring unit might be a good idea.  Besides, if your server has dual power supplies, I would recommend plugging the 2nd supply into the other unit, but make sure you have/get a UPS that can handle the load.  I run two UPS units for ours.  Each PSU plugged into their own UPS.  One is managed via unRaid, the other is programmed.  Though I haven't tried, but unRaid might be able to manage both, just haven't spent the time yet.

Link to comment

FYI for the HP ProLiant Community...

 

Recently I had some interesting interactions with my DL360 G7 with the Smart Array G6 RAID controller and this is all food for thought.  Some of you may call this a noob experience and some may go, hmm... Interesting.

 

I run unRaid in a zero hardware RAID configuration and use the unRaid RAID abilities for remote management.  Basically each drive is it's own RAID 0/1 cause that's the only way the RAID controller will let you mount the drives individually.  When doing so, the RAID controller assigns each drive a virtual serial number.  unRaid uses the serial numbers of any drives to assign it's configuration for that drive.  Example, your parity drive is assigned "12345678" by the raid controller.  unRaid loads drive 12345678 as parity drive as programmed.  Here's where I had a problem.  IF you introduce a new drive to the system, or remove one, you have to go into your RAID controller's hardware config and make those changes.  Upon doing so, my Smart Array G6 reassigns the virtual serial number to ALL my drives even though I only changed one.  When unRaid booted, it didn't recognize any of the drives and I had to dump the old config and reassigned the drives.  No big deal, as long as you're not afraid of the big "WARNING" unRaid gives you when you decide to create a new config.  That and you reconfigure your system properly.  So be very careful.

 

Another note, I did recently lose a SSD in my array and I replaced it via hot swap with a same make and model SSD.  I was going to configure the system at next boot.  This wasn't urgent as that drive wasn't originally assigned to tasks yet and was 99% empty to begin with.  Another emergency pulled me away before I could command line assign the drive.  So it was in the system as unconfigured, no biggie.  Yeah right.  Remember, I'm not running multi-drive hardware RAID, each drive is independant.  On the next boot, the Smart Array G6 said, "Hey,  I don't know what this drive is, so I'm not loading ANY logical drives".  I had to go for a little drive just to pull out a drive so the controller would load all physical drives and unRaid booted normally.

 

I have never had a RAID controller ignore its config and act as such.  I'm seriously thinking of switching to a new PCIe controller, but then I think about the age of the system, why bother.  I know the habits of this unit and it's reliable....mostly.

 

Moral of the story...

You're better off leaving the dead drive in the system, remove all files from it, and mark it unusable by unRaid before even thinking about replacing it.  Cause the parity drive will have to be rebuilt due to the new assignment of the virtual serial numbers assigned by the changes in drives.  So anything on that dead drive that was being emulated, WILL BE LOST!!!  Again, remove access to dead drive on all shares, move all files to another drive, and backup your system before swapping any drives.  Changing any of the config in your hardware RAID controller (Smart Array G6 and probably some others) will reassign the virtual serial numbers and the unRaid parity drive will be lost.

 

That's what I learned recently.  Lucky me, it wasn't the 100% hard way.  Thought I'd share this for others to keep in mind and save them a headache and/or $$$.

Link to comment
  • 1 month later...

Hi there,

 

I am having issues with my HP ProLiant DL380p Gen8 Server. For more details you can quickly scan over this thread here:

Although I kind of solved the issue in the end by changing some tunables, at least so that parity checks work now, there are still issues persisting. For example:

 

- High CPU usage (~25 % system) when working with PCIe lanes (Transferring data through the HBA, Downloading files through the Network card)

- Very high CPU usage (>40 % in idle) when turning on VMs with many CPU cores passed through from different CPUs.

 

My server basically works but I want to solve this issue. My old server (HP ProLiant DL180 G6) did not have this issue.

 

I thought maybe someone here also has a ProLiant server and has / had the same issue?

 

I will provide further details if needed.

Edited by JuliusZet
Link to comment

I have an HP ML350e G8 and have started to experience similar parity check issues that you were experiencing. I went through multiple parity check cycles with the default tunables an an HP H220 HBA without any issues, but started to see your problems once I added an additional drive and the issues are persisting even after removing the drive. I will be attempting to rebuild the server from scratch when I get a chance to rule out any configuration issues.

Link to comment

 

23 hours ago, JuliusZet said:

Hi there,

 

I am having issues with my HP ProLiant DL380p Gen8 Server. For more details you can quickly scan over this thread here:

Although I kind of solved the issue in the end by changing some tunables, at least so that parity checks work now, there are still issues persisting. For example:

 

- High CPU usage (~25 % system) when working with PCIe lanes (Transferring data through the HBA, Downloading files through the Network card)

- Very high CPU usage (>40 % in idle) when turning on VMs with many CPU cores passed through from different CPUs.

 

My server basically works but I want to solve this issue. My old server (HP ProLiant DL180 G6) did not have this issue.

 

I thought maybe someone here also has a ProLiant server and has / had the same issue?

 

I will provide further details if needed.

 

I don't have any gen 8's and am unable to replicate these problems on older hardware.  But looking at your original thread, you had multiple call traces, which are probably not helping. Unfortunately I am unable to help diagnose those. But I'd be curious if they persisted after you modified the tunables.

Link to comment

I've got a HP DL380G7 -- it's up and running and has been relatively trouble free.  But last night I rebooted and the fans now won't quiet down.   Rebooted again and same issue.  I know this is a tricky and probably bios related problem but all my drives are showing in SMART and 25-30C.  System temps are all low.  System is basically at idle.  Thoughts on things I could try to quiet it down?  It's in my basement so it doesn't need to be silent.... but it's gone from a negligible amount of noise to an annoying amount.

Link to comment
55 minutes ago, Fffrank said:

I've got a HP DL380G7 -- it's up and running and has been relatively trouble free.  But last night I rebooted and the fans now won't quiet down.   Rebooted again and same issue.  I know this is a tricky and probably bios related problem but all my drives are showing in SMART and 25-30C.  System temps are all low.  System is basically at idle.  Thoughts on things I could try to quiet it down?  It's in my basement so it doesn't need to be silent.... but it's gone from a negligible amount of noise to an annoying amount.

 

its determined by the bios and how many pcie cards are in the system. have you tried the latest bios with spectre fixes? I believe some of the releases have quieter fan curves. Also, if you had a fan go out, the others will compensate by ramping up (you can check the fans in iLo or possibly on the server maintenance light area.)

Link to comment
4 hours ago, 1812 said:

 

its determined by the bios and how many pcie cards are in the system. have you tried the latest bios with spectre fixes? I believe some of the releases have quieter fan curves. Also, if you had a fan go out, the others will compensate by ramping up (you can check the fans in iLo or possibly on the server maintenance light area.)

I've got an HP hba card and an expander. Bios info: 

 

<description>BIOS</description>
     <vendor>HP</vendor>
     <physid>0</physid>
     <version>P67</version>
     <date>08/16/2015</date>

 

I actually have no idea how to login to iLo. Do I need to connect another Ethernet cable?

Link to comment
11 hours ago, Fffrank said:

I've got an HP hba card and an expander. Bios info: 

 


<description>BIOS</description>
     <vendor>HP</vendor>
     <physid>0</physid>
     <version>P67</version>
     <date>08/16/2015</date>

 

I actually have no idea how to login to iLo. Do I need to connect another Ethernet cable?

 

that "should" be an ok fan profile. HP is also offering a newer bios for everyone with the spectre mitigation which might be the same or improved fan cube.  To connect to iLo, you need to connect another ethernet cable to the iLo port on the back of the server. When the server boots, it may give you an option to configure iLo if it hasn't been, which then you could view the IP address. If iLo is locked with a password, you'll have to lookup how to use the jumpers on the motherboard to reset it. The easier way is use a local ip range scanner and watch for the new address assignment after you plug in the port/and or reboot the server with the port plugged in.

Link to comment
1 hour ago, 1812 said:

 

that "should" be an ok fan profile. HP is also offering a newer bios for everyone with the spectre mitigation which might be the same or improved fan cube.  To connect to iLo, you need to connect another ethernet cable to the iLo port on the back of the server. When the server boots, it may give you an option to configure iLo if it hasn't been, which then you could view the IP address. If iLo is locked with a password, you'll have to lookup how to use the jumpers on the motherboard to reset it. The easier way is use a local ip range scanner and watch for the new address assignment after you plug in the port/and or reboot the server with the port plugged in.

Got it -- I'll try this over the weekend.  I can watch the DHCP server on my pfsense for a new ethernet assignment (assuming that my server iLo uses DHCP, anyway.) 

Link to comment
On 9/28/2018 at 10:19 AM, Fffrank said:

Got it -- I'll try this over the weekend.  I can watch the DHCP server on my pfsense for a new ethernet assignment (assuming that my server iLo uses DHCP, anyway.) 

Was able to connect to iLo3.  It was a bit of a headache as the firmware as at 1.15 which only supported TLS1.0 (which has been dropped from every current web browser.)  I finally was able to install Firefox 23 and then enable TLS1.0 and SSL3 in order to connect and update to 1.20 and then 1.90 firmwares.  Cool!  ;)

 

Next question -- how can I update the system BIOS either remotely or from UnRaid.  HPE only makes available RPM packages and Windows packages.  Doesn't look like UnRaid can install RPMs.  I tried to flash via a Windows VM and it couldn't detect the system.  My server is headless so I don't believe I can do it via USB.....  thoughts?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.