[Plugin] IPMI for unRAID 6.1+


Recommended Posts

Hey @dmacias first of al thank you very much for your gorgeous work an the already great plugin, I'd love to have the opportunity to offer more help on the development, where are you from?

 

Second:

My UnRaid box runs a plexserver and is built for heavy transcoding requirements when I'm starting a transcoded 4K stream the cpu load accelerates up to about 90% in less then 2 seconds in about 10 seconds the temperature goes up to 80°C and about 30 seconds later up to about 90°C, all that at a base fan speed of 1000RPM with my current stock cooler (already looking for a better cooler). About 20 to 30 seconds at 80 tu 90°C later the fancontroll kicks in and puts the fan to 100% of insanely loud 3100RPM and cools down the cpu to 60°C for a minute then the fan controll lowers fan speed to about 1800RPM and temps go up again to about 80°C... and so on.

This game of going up and down in temperature is a known issue of non regulated but controlled systems on cycle times which are not short enough (studying mechanical engineering;)

On a 10 second cycle time it is working like a charm. 

 

As bring down execution counts is verrrry appreciated I totally agree that a cycle time of 10sec is significantly to short for HDDs, but for CPU cooling it is also mandatory.

 

FYI

https://forums.freenas.org/index.php?resources/fan-scripts-for-supermicro-boards-using-pid-logic.24/

this is a link to a FreeNAS forum where a similar fan controll mechanism was developed earlier. This guy suggests the following for cycle times. 

Quote
  1. For the main drive cycle, 5-6 minutes is probably a good interval.
  2. For CPU interval, 1-15 seconds may be appropriate.  For my passively cooled processor with low thermal design power, 15 seconds is OK.  More powerful CPUs have faster temperatures spikes and need shorter intervals.

 

Im using a Powerful Intel Xeon E3-1245v6 CPU with a TDP of 78W and I can confirm fast temperature spikes.

Edited by Diggewuff
Link to comment
Hey [mention=11874]dmacias[/mention] first of al thank you very much for your gorgeous work an the already great plugin, I'd love to have the opportunity to offer more help on the development, where are you from?
 
Second:
My UnRaid box runs a plexserver and is built for heavy transcoding requirements when I'm starting a transcoded 4K stream the cpu load accelerates up to about 90% in less then 2 seconds in about 10 seconds the temperature goes up to 80°C and about 30 seconds later up to about 90°C, all that at a base fan speed of 1000RPM with my current stock cooler (already looking for a better cooler). About 20 to 30 seconds at 80 tu 90°C later the fancontroll kicks in and puts the fan to 100% of insanely loud 3100RPM and cools down the cpu to 60°C for a minute then the fan controll lowers fan speed to about 1800RPM and temps go up again to about 80°C... and so on.
This game of going up and down in temperature is a known issue of non regulated but controlled systems on cycle times which are not short enough (studying mechanical engineering
On a 10 second cycle time it is working like a charm. 
 
As bring down execution counts is verrrry appreciated I totally agree that a cycle time of 10sec is significantly to short for HDDs, but for CPU cooling it is also mandatory.
 
FYI
https://forums.freenas.org/index.php?resources/fan-scripts-for-supermicro-boards-using-pid-logic.24/
this is a link to a FreeNAS forum where a similar fan controll mechanism was developed earlier. This guy suggests the following for cycle times. 
  1. For the main drive cycle, 5-6 minutes is probably a good interval.
  2. For CPU interval, 1-15 seconds may be appropriate.  For my passively cooled processor with low thermal design power, 15 seconds is OK.  More powerful CPUs have faster temperatures spikes and need shorter intervals.
 
Im using a Powerful Intel Xeon E3-1245v6 CPU with a TDP of 78W and I can confirm fast temperature spikes.

That's a good use case. We've had some discussion in here about this but I believe FANA, FANB.. are for peripherals including hard drives. And when set to auto are controlled based on system temp. FAN1234 are based on CPU temp. I was going to ask what the temps are when set to auto and which mode (high I/O, standard, optimal). But if your CPU is on FANA then that wouldn't work. It mentions the same thing in the link. Thanks for that. I hadn't read that one. It has a good idea about setting the mode first to keep the BMC from changing the fan speeds. Very thorough write up.

I may be able to just add a hdd polling time and then add back the 10 sec intervals below 1 min. But still need to be careful about too many ipmi commands.
Link to comment

At the moment my cpu is still on Fan1 but later I want to switch it back to FanA to make use of Fan1234 for my 4 case (HDD) fans.

 

One question:

What exactly is the bottleneck in execution count,

Polling temperatures of CPU?

Polling temperatures of HDD?

Or Setting Fan speeds?

 

Link to comment
At the moment my cpu is still on Fan1 but later I want to switch it back to FanA to make use of Fan1234 for my 4 case (HDD) fans.
 
One question:
What exactly is the bottleneck in execution count,
Polling temperatures of CPU?
Polling temperatures of HDD?
Or Setting Fan speeds?
 

I not sure what you mean. For hard drives, smartctl can impede drive performance if run too often. CPU and fan speeds both use freeipmi to get and send to the BMC. Which can lock up and not respond.
Link to comment

Ok, so in that way, and correct me if I'm wrong, it IS NOT problematic to poll CPU temps and to set Fan speed every 10 seconds.

but it IS problematic to poll HDD temps via smartctl more often then once every few minutes.

If that is correct,

wouldn't it then be an option to prevent the script from HDD temp polling to often by just writing the polled temperature to a local variable with a timestamp and to perform a little inquiry like

if [timestamp of HDD temp] < 60sec then 
	[local HDD temp] = [local HDD temp]
else
	[local HDD temp] = [new polled HDD temp from smartctl]

 

Edited by Diggewuff
Link to comment
Ok, so in that way, and correct me if I'm wrong, it IS NOT problematic to poll CPU temps and to set Fan speed every 10 seconds. but it IS problematic to poll HDD temps via smartctl more often then once every few minutes.

If that is correct,

wouldn't it then be an option to prevent the script from HDD temp polling to often by just writing the polled temperature to a local variable with a timestamp and to perform a little inquiry like

 

if [timestamp of HDD temp] < 60sec then [local HDD temp] = [local HDD temp]else[local HDD temp] = [new polled HDD temp from smartctl]

 

 

 

Basically yes but it's a little more complicated than that. The script is written in php with a main while loop and a couple nested loops. One for the fans/commands. One for the polling time which is a variable loop based on polling time but fixed at 10 sec to check the fan config for changes while waiting. E.g. It sleeps for 10 sec then loops 6 times to create a minute. I would need to carry a global true/false hdd poll and sensor poll and maybe a couple counters to count the time through the main and nested loops. So when time is reached, poll is true. It becomes more complicated if someone were to set the hdd poll less than the sensor poll. I could limit it in the settings or deal with that in the script.

 

I'm not sure of the impact from reading ipmi sensors and setting the ipmi fans every 10 secs. I have seen it choke on too frequent commands. This could lead to the unRAID webgui becoming unresponsive waiting on ipmi commands to render the Settings and Readings pages or the whole webgui waiting to display the footer. It could also error out and not display anything. There could be problems writing config files with the editor.

 

All that said, I'll see what I can come up with but there may be some side effects on shorter intervals.

 

 

 

Link to comment
  • 2 weeks later...
17 hours ago, saisora said:

First of all, great plugin! Thanks for your work.

 

Second: i have a little problem with my Supermicro X11 board. It doesn't want to save my fan threshold configuration.

Am i doing something wrong?

Maybe. What exactly are you doing? Are you clicking on the APPLY at the bottom? What browser and version of unRAID? Do you have fan.cfg in /boot/config/plugins/ipmi/?  Maybe a screenshot and contents of fan.cfg.

Link to comment
On 10.7.2017 at 4:24 PM, dmacias said:

Maybe. What exactly are you doing? Are you clicking on the APPLY at the bottom? What browser and version of unRAID? Do you have fan.cfg in /boot/config/plugins/ipmi/?  Maybe a screenshot and contents of fan.cfg.

 

Thanks for the quick reply. Yes of course i saved my config ;)

I have "UNRAID Version 6.3.5".

 

root@UNRAID:/boot/config/plugins/ipmi# cat fan.cfg
FANCONTROL="disable"
FANPOLL="3"
FANIP=""
IPMIBOARD=""

 

I can see that the config is updated but the errors are still comming.

I have tested it with lower than 300 RPM with the same result.

 

I hope you have enough information now. If you need something else just say it.

(btw. i'm a unraid and ipmi noobie ;) as you probably noticed )

ipmi-sensors.config.txt

ipmi_event_log.PNG

ipmi_sensors.PNG

Link to comment
 
Thanks for the quick reply. Yes of course i saved my config
I have "UNRAID Version 6.3.5".
 
root@UNRAID:/boot/config/plugins/ipmi# cat fan.cfg
FANCONTROL="disable"
FANPOLL="3"
FANIP=""
IPMIBOARD=""
 
I can see that the config is updated but the errors are still comming.
I have tested it with lower than 300 RPM with the same result.
 
I hope you have enough information now. If you need something else just say it.
(btw. i'm a unraid and ipmi noobie as you probably noticed )
ipmi-sensors.config.txt
ipmi_event_log.thumb.PNG.c5218114e8a8a13df91e58363c7e1a8e.PNG
ipmi_sensors.thumb.PNG.7bd6ab444a596f2a03b4c4e3a1676637.PNG

From your event log it is showing that your lower critical is 500 and lower non recoverable is 300. And that your fan4 rpm drops below or equal to those settings triggering an alert.

Your config shows 300, 225, 150 for fan4.

I would click select the sensor config in the editor again then click Revert (this gets the config again from the bmc and saves it to the flash drive). Then select sensor config again. Let me know if the values for fan4 are still 300, 225, 150 or if they changed back to 500, 300. There could be a problem saving config to bmc.
Link to comment
1 hour ago, dmacias said:


From your event log it is showing that your lower critical is 500 and lower non recoverable is 300. And that your fan4 rpm drops below or equal to those settings triggering an alert.

Your config shows 300, 225, 150 for fan4.

I would click select the sensor config in the editor again then click Revert (this gets the config again from the bmc and saves it to the flash drive). Then select sensor config again. Let me know if the values for fan4 are still 300, 225, 150 or if they changed back to 500, 300. There could be a problem saving config to bmc.

 

If i click revert it resets the values. Also if i open the sensor config. Then the values are 700, 500, 300 (in the  "ipmi-sensors.config" file).

Yes that is what i thought the mainboard never gets the actual settings from the file. Is there an easy work around?

 

 

Link to comment
 

If i click revert it resets the values. Also if i open the sensor config. Then the values are 700, 500, 300 (in the  "ipmi-sensors.config" file).

Yes that is what i thought the mainboard never gets the actual settings from the file. Is there an easy work around?

 

 

Try editing the sensors config from the webgui again. Then check the file ipmi-sensors.config. Let me know if the file is saved/changed from the webgui. If the file looks good then you can run this command. Otherwise edit the file then run the command.

ipmi-sensors-config --filename=/boot/config/plugins/ipmi/ipmi-sensors.config --commit

 

 

Link to comment
20 hours ago, dmacias said:

Try editing the sensors config from the webgui again. Then check the file ipmi-sensors.config. Let me know if the file is saved/changed from the webgui. If the file looks good then you can run this command. Otherwise edit the file then run the command.

 


ipmi-sensors-config --filename=/boot/config/plugins/ipmi/ipmi-sensors.config --commit
 

 

 

 

You are a genius! It didnt like my values so i tried some other values and now it works! Thanks for your help!

Maybe it would be good if those errors displayed on the cli would appear in the gui.

 

Edited by saisora
Link to comment
  • 1 month later...

Right now I have one external USB 3 enclosure attached to my server and running a preclear on it. It's not a well designed enclosure and runs a little warm (52C). I notice that this seems to spin up my drive bay cooling fans. Is it possible to segregate externally and internally mounted drives so that they don't spin up the fans needlessly?

Link to comment
4 hours ago, wgstarks said:

Right now I have one external USB 3 enclosure attached to my server and running a preclear on it. It's not a well designed enclosure and runs a little warm (52C). I notice that this seems to spin up my drive bay cooling fans. Is it possible to segregate externally and internally mounted drives so that they don't spin up the fans needlessly?

I do not think you can easily do exactly what you asked.     However you can over-ride the global settings for any of the array drives by clicking on it in the Main tab and setting a custom temperature just for that drive.

Link to comment
10 minutes ago, itimpi said:

I do not think you can easily do exactly what you asked.     However you can over-ride the global settings for any of the array drives by clicking on it in the Main tab and setting a custom temperature just for that drive.

Yeah. I tried that to stop the over temp notifications, but it only works for array drives I think. If I select the drive in the UD section I just get the SMART status. Managed to calm the fans down by setting them to wok with "System Temp" rather than "HDD Temperature". Not a great solution, but it will save wear and tear on the drives until the preclear is finished.

 

Was hoping that the plugin could be modified to ignore external drives, but my guess is that SM lumps all drive temps together.

Link to comment
Just now, itimpi said:

You can set the Global settings to be what you want for the external drives, and then set the array drives to what you want for array drives.    That might give you the setup you want.

I think I'll just leave it as is for now. The temp has stabilized half way between warning temp and critical temp so as long as it doesn't change I won't get any more notifications (I think). If I change too many settings I might forget to reset something a week from now when the preclear finishes.

Link to comment
On 7/13/2017 at 10:20 AM, saisora said:

 

You are a genius! It didnt like my values so i tried some other values and now it works! Thanks for your help!

Maybe it would be good if those errors displayed on the cli would appear in the gui.

 

I'm having this same issue it wont accept my values after running the command it says try new values because it cannot encode them accurately. Not sure what else to do i have Noctua fans. What values ended up working for you?

Link to comment

Hi all,

 

Having a bit of trouble getting Fan Control to work on Asrock E3C236D2I

 

Getting all the sensors , voltages/temps/fan speeds for the 3 fan headers ok on the IPMI Sensors page. Its obviously talking to the IPMI fine, data all matches what I get from the actual IPMI

Fan Control settings are greyed out, and when I try to Force Start the ipmifan i get,

 

ipmifan[3855]: Your AsrockRack motherboard is not supported or setup yet

What exactly should I be setting up? is there some auto scan or file that needs to be edited? 

 

Sorry, i'm a bit useless, getting used to all this been a while since I did anything cli based.

Link to comment
On 9/7/2017 at 3:58 PM, wgstarks said:

Yeah. I tried that to stop the over temp notifications, but it only works for array drives I think. If I select the drive in the UD section I just get the SMART status. Managed to calm the fans down by setting them to wok with "System Temp" rather than "HDD Temperature". Not a great solution, but it will save wear and tear on the drives until the preclear is finished.

 

Was hoping that the plugin could be modified to ignore external drives, but my guess is that SM lumps all drive temps together.

I'll look at maybe adding a setting to ignore specific drives in fan control when hdd temps are polled.

Link to comment
2 hours ago, xoddoza said:

Hi all,

 

Having a bit of trouble getting Fan Control to work on Asrock E3C236D2I

 

Getting all the sensors , voltages/temps/fan speeds for the 3 fan headers ok on the IPMI Sensors page. Its obviously talking to the IPMI fine, data all matches what I get from the actual IPMI

Fan Control settings are greyed out, and when I try to Force Start the ipmifan i get,

 


ipmifan[3855]: Your AsrockRack motherboard is not supported or setup yet

What exactly should I be setting up? is there some auto scan or file that needs to be edited? 

 

Sorry, i'm a bit useless, getting used to all this been a while since I did anything cli based.

Did you click Configure on the Fan Control page? Fan control needs an array of values that reference fan names and their location in the ipmi raw command.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.