HP DL360 Fan noise


happythatsme

Recommended Posts

Hi All,

 

Wondering if anyone can help or at least point me in the right direction, i understand this is not an HP support forum, but i know mamy of your have experience. 

 

I bought 2x used HPDL360 ( no warrenty of support from HP)

  1. 2x 10Core Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
  2. 2x 6Core  Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

 

Both of them are running fine, however, the 2x10Core fans are running at 94% most of the time, even with the VM and dockers stopped.

The 2x 6 Core is fine running at 26% 

 

2x10Core:

image.thumb.png.0ac034859fa011d526e6a2141457dca5.png

 

image.thumb.png.2e0cc440b927f41b3421970091938fef.png

 

2x 6Core

image.thumb.png.e76f856237d314d95233f757863ed48c.png

 

image.thumb.png.26d6707fbf5309e60829f5f5cfdf518e.png

 

I cant really see any major difference between them.

I have installed a Solarflare card in each of them. 

 

dl360-20core-diagnostics-20200430-1356.zip ( Licensed ) 

tower-diagnostics-20200430-1357.zip ( trial ) 

 

Anything else i should be looking for? 

 

Thanks

Link to comment
On 4/30/2020 at 7:30 PM, Flubster said:

What brand are the HDD's? Are they HP SAS drives? My DL380p doesn't like non HP approved drives at all and spins up the fans If i install one. Also try removing the 3rd party card and seeing if that helps. They can be very stubborn servers!

Thanks, i checked they are all HP

Link to comment

Hi Everyone, 

 

Thanks for your help so far, I wanted to document what I’ve learned over the weekend just in case anyone else comes across this thread and also so i can remember what i did.. 


I tried the following:

  1. Removed all drives and 3rd Party NIC - fans continue to ramp up
  2. Changed many settings in bios
    1. Cooling
    2. Performance
    3. etc...
    4. Reset to defaults
    5. I copied the settings exactly from the 12Core to the 20Core
  3. I created a fresh install of Unraid just in case my version had messed up. Made no difference. 
  4. Stopped Docker and VM manager. Made no difference. 
  5. Stopped Array. Made no difference
  6. 20Core has 750w PDU - 12Core has 460Watt 
    1. Removed the spare PDU from 12 Core and tried in 20Core. Made no difference.
    2. Plugged in both 460watts with Power to 20Core. Made no difference.
  7. Updated ILo - Trial License – Free until the end of 2020
    1. Free iLO Advanced 2020
  8. I have not attempted to update any other firmware yet.
    1. Note 20Core and 12Core seem to have the same setup.

Viewing the fans staus via iLo became tiresome,  I also had no way to view historical performace.  I decided to pull stats from the server (CPU, fan speed, temps, etc...) then plot them on a time series DB, that way I could compare temps v fan or CPU usage v fan. 

 

I came across this post:  https://www.homelabrat.com/ipmi-dashboard/ 

 

Enable IPMI over LAN Access

Loggin to iLO:

  1. Administration
  2. Access Settings
  3. Enable IPMI/DCMI over LAN Access

 

I installed Community Applications

 

Then installed the following Docker containers:

  • influxdb 
  • granfana
  • telegraf 

Influxdb:

default - no changes

 

Telegraf:

Ensure you create a file at the following location: /mnt/user/appdata/telegraf/telegraf.conf - Note the docker install created a folder telegraf.conf - i deleted it and replaced it with the contents of this file:

Open the console to the telegraf docker 

image.png.e77a9cf17e4b3cd0fe9949281feefc0c.png

 

install ipmitool by running the following commands:

 

apk update
apk add ipmitool

Now you can use ipmitool to pull stats from iLO, verify the following command works, replacing your IP/USER/PASS

  • Note -I can be either lan or lanplus - lanplus worked for me
ipmitool -H 192.168.1.142 -U admin -P password -I lanplus sdr

 

You should see something like so:

ipmitool -H 192.168.1.142 -U admin -P password -I lanplus sdr
UID Light        | 0x00              | ok
Sys. Health LED  | no reading        | ns
01-Inlet Ambient | 32 degrees C      | ok
02-CPU 1         | 40 degrees C      | ok
03-CPU 2         | 40 degrees C      | ok
04-P1 DIMM 1-6   | disabled          | ns
05-P1 DIMM 7-12  | 39 degrees C      | ok
06-P2 DIMM 1-6   | disabled          | ns
07-P2 DIMM 7-12  | 37 degrees C      | ok
08-P1 Mem Zone   | 39 degrees C      | ok

 

Now edit the following file:

/mnt/user/appdata/telegraf/telegraf.conf

 

Search in the file for IPMI, then remove # at the following lines, or paste the text below (replacing you IP/USER/PASS )

  • Im pulling stats from iLo every 10seconds
[[inputs.ipmi_sensor]]
servers = ["admin:password@lanplus(192.168.1.142)","admin:password@lanplus(192.168.1.110)"]
interval = "10s"
timeout = "20s"
path = "/usr/sbin/ipmitool"
metric_version = 2

Save and close the file then restart telegraf

 

image.png.76ff6746da7bf6507caec6f9770852ec.png

 

Check the logs to ensure there are no errors by clicking the log icon on the right of the telgraf docker container:

image.thumb.png.ff56c7cddfbb7e260195a70558af07ca.png

 

telegraf should be pulling information from both my servers and sending it to the telgraf db in influxdb every 10 seconds

 

Grafana

Now we need to visualize the data, note i am not an expert in grafana, i followed a few tutorials online. 

 

login to your docker by connecting to port http://UNRAIDIP:3000

default username/password is admin/admin

 

We need to connect grafana to influxdb, on the left select the settings icon and select data sources

image.png.691b5265ddc3597648ed3171d050bd6a.png

 

Change the following:

name: influxdb

URL: http://UNRAIDIP:8086

Database: telegraf

 

image.thumb.png.38ac8c80ee89e6888f12c8a4ce709b82.png

 

Hit save and test, if everything worked you should see the following: 

 

image.png.0960047cbf880ebc762b9f08b843c512.png

 

We could create a dashboard from scratch, but that will take too long, so lets import one.

Click the plus icon and select import:

 

image.png.af704b8878b82b89fb09739d4d813e4b.png

 

Pre-made dashboards here, i tried both, neither worked straight away. 

https://grafana.com/grafana/dashboards/10192 (DL380 Gen8 )

https://grafana.com/grafana/dashboards/10191 (DL360 Gen7) 

 

 

Paste the id and hit the tab key, either 10191 or 10192 

 

image.png.f6ae0617b371c536fa10b927aaecbc74.png

 

You should then see the following: 

  • Name: whatever you want
  • Folder: general will make it easier to find in the future
  • uid: hit change if you see an error

image.png.b97a9569803abe20c132aed9bf2520dd.png

 

you should now see a screen like so, note the server address is wrong

 

image.thumb.png.75dca0c750ca3d5f586614b90a537e70.png

 

Click the settings icon on the top right and select varibles

 

image.thumb.png.8ff458a3e6dbb1cf6f57fd7b51efbd35.png

 

Double click on the server varible, change the values to match your server and hit the update button then go back to the dashboard.

 

image.png.179bbae50d878ebd45a9a1d12e015f2d.png

 

You can now select your ip from the top left, select it and hit save

 

image.png.a140f30836bce0343f289601fee65a2d.png

 

You should then see something like so:

image.thumb.png.88b30ac64ebda6e114ce7a0111733b13.png

 

Note the fans didn't work by default for me but mess around until you get it working as you like. 

 

HP Serv-1588565699801.json you can also import this my dashboard from this json file

 

I can now see the temperature over time and the fan speed:

  • Notice the three red circles where the fans dopped down to 30ish % 
  • I realized I had reset the ilo at those points, it had also reset the fans...
  •  

image.thumb.png.613885c62a1e654d0b30594999ebf729.png

 

 

By running the following command, i could reset the ilo and thus reset the fans. 

ipmitool -H 192.168.1.110 -U admin -P password -I lanplus mc reset warm

As the fans were still ramping up to 95% after 17mins or so -  I then decided to reset the ilo every 5mins... 

 

image.thumb.png.b4f4bc0a3fc8912af604b26cf4d726c0.png

 

I installed the plugin CA User Scripts, then created a new script called resetIlo

 

Script:

#!/bin/bash
ipmitool -H 192.168.1.110 -U admin -P password -I lanplus mc reset warm

 

Set the a custom schedule to run every 5mins

*/5 * * * *

 

image.thumb.png.adf9e2d3e370d667ed642b3b54e74cc5.png

 

The average over a 5mins period is now 45% much better than 95%

image.thumb.png.b699b824bc9e10c893eab6f66d588c53.png

 

While I clearly have not found the cause of the issue, I've least found a "workaround" for now... 45% average is much better than 95%. 

 

Last 6 hours:

image.thumb.png.5a0d426fb4cc5e9f59c6032133eb7f14.png

 

Thanks

 

 

 

Edited by happythatsme
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.