happythatsme Posted April 30, 2020 Share Posted April 30, 2020 Hi All, Wondering if anyone can help or at least point me in the right direction, i understand this is not an HP support forum, but i know mamy of your have experience. I bought 2x used HPDL360 ( no warrenty of support from HP) 2x 10Core Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz 2x 6Core Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz Both of them are running fine, however, the 2x10Core fans are running at 94% most of the time, even with the VM and dockers stopped. The 2x 6 Core is fine running at 26% 2x10Core: 2x 6Core I cant really see any major difference between them. I have installed a Solarflare card in each of them. dl360-20core-diagnostics-20200430-1356.zip ( Licensed ) tower-diagnostics-20200430-1357.zip ( trial ) Anything else i should be looking for? Thanks Quote Link to comment
happythatsme Posted April 30, 2020 Author Share Posted April 30, 2020 Servers are both HPDL360 Gen 8 I copied the temps into a table to compare, not much difference, mostly the same on each sever, if anything the 20Cores seems to be cooler than the 12Core. Quote Link to comment
Flubster Posted April 30, 2020 Share Posted April 30, 2020 What brand are the HDD's? Are they HP SAS drives? My DL380p doesn't like non HP approved drives at all and spins up the fans If i install one. Also try removing the 3rd party card and seeing if that helps. They can be very stubborn servers! Quote Link to comment
sota Posted May 1, 2020 Share Posted May 1, 2020 also check your power settings in iLO. Quote Link to comment
happythatsme Posted May 4, 2020 Author Share Posted May 4, 2020 On 4/30/2020 at 7:30 PM, Flubster said: What brand are the HDD's? Are they HP SAS drives? My DL380p doesn't like non HP approved drives at all and spins up the fans If i install one. Also try removing the 3rd party card and seeing if that helps. They can be very stubborn servers! Thanks, i checked they are all HP Quote Link to comment
happythatsme Posted May 4, 2020 Author Share Posted May 4, 2020 (edited) Hi Everyone, Thanks for your help so far, I wanted to document what I’ve learned over the weekend just in case anyone else comes across this thread and also so i can remember what i did.. I tried the following: Removed all drives and 3rd Party NIC - fans continue to ramp up Changed many settings in bios Cooling Performance etc... Reset to defaults I copied the settings exactly from the 12Core to the 20Core I created a fresh install of Unraid just in case my version had messed up. Made no difference. Stopped Docker and VM manager. Made no difference. Stopped Array. Made no difference 20Core has 750w PDU - 12Core has 460Watt Removed the spare PDU from 12 Core and tried in 20Core. Made no difference. Plugged in both 460watts with Power to 20Core. Made no difference. Updated ILo - Trial License – Free until the end of 2020 Free iLO Advanced 2020 I have not attempted to update any other firmware yet. Note 20Core and 12Core seem to have the same setup. Viewing the fans staus via iLo became tiresome, I also had no way to view historical performace. I decided to pull stats from the server (CPU, fan speed, temps, etc...) then plot them on a time series DB, that way I could compare temps v fan or CPU usage v fan. I came across this post: https://www.homelabrat.com/ipmi-dashboard/ Enable IPMI over LAN Access Loggin to iLO: Administration Access Settings Enable IPMI/DCMI over LAN Access I installed Community Applications Then installed the following Docker containers: influxdb granfana telegraf Influxdb: default - no changes Telegraf: Ensure you create a file at the following location: /mnt/user/appdata/telegraf/telegraf.conf - Note the docker install created a folder telegraf.conf - i deleted it and replaced it with the contents of this file: https://github.com/influxdata/telegraf/blob/master/etc/telegraf.conf Open the console to the telegraf docker install ipmitool by running the following commands: apk update apk add ipmitool Now you can use ipmitool to pull stats from iLO, verify the following command works, replacing your IP/USER/PASS Note -I can be either lan or lanplus - lanplus worked for me ipmitool -H 192.168.1.142 -U admin -P password -I lanplus sdr You should see something like so: ipmitool -H 192.168.1.142 -U admin -P password -I lanplus sdr UID Light | 0x00 | ok Sys. Health LED | no reading | ns 01-Inlet Ambient | 32 degrees C | ok 02-CPU 1 | 40 degrees C | ok 03-CPU 2 | 40 degrees C | ok 04-P1 DIMM 1-6 | disabled | ns 05-P1 DIMM 7-12 | 39 degrees C | ok 06-P2 DIMM 1-6 | disabled | ns 07-P2 DIMM 7-12 | 37 degrees C | ok 08-P1 Mem Zone | 39 degrees C | ok Now edit the following file: /mnt/user/appdata/telegraf/telegraf.conf Search in the file for IPMI, then remove # at the following lines, or paste the text below (replacing you IP/USER/PASS ) Im pulling stats from iLo every 10seconds [[inputs.ipmi_sensor]] servers = ["admin:password@lanplus(192.168.1.142)","admin:password@lanplus(192.168.1.110)"] interval = "10s" timeout = "20s" path = "/usr/sbin/ipmitool" metric_version = 2 Save and close the file then restart telegraf Check the logs to ensure there are no errors by clicking the log icon on the right of the telgraf docker container: telegraf should be pulling information from both my servers and sending it to the telgraf db in influxdb every 10 seconds Grafana Now we need to visualize the data, note i am not an expert in grafana, i followed a few tutorials online. login to your docker by connecting to port http://UNRAIDIP:3000 default username/password is admin/admin We need to connect grafana to influxdb, on the left select the settings icon and select data sources Change the following: name: influxdb URL: http://UNRAIDIP:8086 Database: telegraf Hit save and test, if everything worked you should see the following: We could create a dashboard from scratch, but that will take too long, so lets import one. Click the plus icon and select import: Pre-made dashboards here, i tried both, neither worked straight away. https://grafana.com/grafana/dashboards/10192 (DL380 Gen8 ) https://grafana.com/grafana/dashboards/10191 (DL360 Gen7) Paste the id and hit the tab key, either 10191 or 10192 You should then see the following: Name: whatever you want Folder: general will make it easier to find in the future uid: hit change if you see an error you should now see a screen like so, note the server address is wrong Click the settings icon on the top right and select varibles Double click on the server varible, change the values to match your server and hit the update button then go back to the dashboard. You can now select your ip from the top left, select it and hit save You should then see something like so: Note the fans didn't work by default for me but mess around until you get it working as you like. HP Serv-1588565699801.json you can also import this my dashboard from this json file I can now see the temperature over time and the fan speed: Notice the three red circles where the fans dopped down to 30ish % I realized I had reset the ilo at those points, it had also reset the fans... By running the following command, i could reset the ilo and thus reset the fans. ipmitool -H 192.168.1.110 -U admin -P password -I lanplus mc reset warm As the fans were still ramping up to 95% after 17mins or so - I then decided to reset the ilo every 5mins... I installed the plugin CA User Scripts, then created a new script called resetIlo Script: #!/bin/bash ipmitool -H 192.168.1.110 -U admin -P password -I lanplus mc reset warm Set the a custom schedule to run every 5mins */5 * * * * The average over a 5mins period is now 45% much better than 95% While I clearly have not found the cause of the issue, I've least found a "workaround" for now... 45% average is much better than 95%. Last 6 hours: Thanks Edited May 7, 2020 by happythatsme Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.