July 10, 200817 yr I've backed up and remuxed over 500 blu-rays and hd dvds now. And this has taken a lot of time. My biggest fear is that fans will fail and that several disks will die on me due to heat. A feature that could spin down and in a safe way power down my server if disk temps reach a certain threshold would be a great safety feature. I know this has been a feature mentioned in the "request thread" but is there a way to setup/build this kind of function with latest unRAID?
July 10, 200817 yr You could write a script that used smartctl, awk, and a few other tools to parse smartctl output to get the temps and initiate a shutdown if above a threshold, and fire it off regularly via cron.
July 10, 200817 yr Anything is possible, it just has to be scripted. For example the following lines can be used to retreive temperatures in a script root@Atlas:/boot/custom/bin# smartctl -a /dev/sdf | egrep -i 'emperature' 194 Temperature_Celsius 0x0022 114 109 000 Old_age Always - 38 root@Atlas:/boot/custom/bin# smartctl -a /dev/sdg | egrep -i 'emperature' 190 Airflow_Temperature_Cel 0x0022 060 057 045 Old_age Always - 40 (Lifetime Min/Max 23/41) 194 Temperature_Celsius 0x0022 040 042 000 Old_age Always - 40 (0 19 0 0) or even root@Atlas:/boot/custom/bin# smartctl -a /dev/sdg | egrep '^194' 194 Temperature_Celsius 0x0022 040 042 000 Old_age Always - 40 (0 19 0 0) This has the side effect of waking the disk up if asleep. Therefore you may have to use the -n argument like this root@Atlas:/etc# smartctl -n standby -a /dev/sdc smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device is in STANDBY mode, exit(2) FWIW, there is also the -H Health test that can be done (and should be). I don't think the -H wakes up the disk from standby, so it may be used at all times. See examples below. root@Atlas:/etc# smartctl -H /dev/sdg smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED root@Atlas:/etc# smartctl -n standby -H /dev/sdc smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device is in STANDBY mode, exit(2) root@Atlas:/etc# smartctl -H /dev/sdc smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
July 10, 200817 yr IMO this should be in the main official distro. unRAID is all about safe data storage and heat is the biggest killer of them all. Long overdue I reckon
July 10, 200817 yr Author Something must be wrong with this thread (or the forum). Half the time I watch this thread for replies I just see 3 replies. The last posted by NASUser thelling he thinks this feature should be in the official distro (I think so to btw). Other times I come here the last reply is by WeeboTech telling he might do a installer package (I hope). Why does replies disapear from time to time?
July 10, 200817 yr There are a bunch of messages missing from this thread! I posted a tool called hddtemp and RobJ Responded to it!! Thought something was funky today!
July 11, 200817 yr It looks like the forum was moved. Or at least the ip address changed. I've been accessing the forums by the ip address since Tom moved the site. Today about halfway through the day that stopped working. If I go to the forums by the ip address I get a page not found. I can now get to the forums by the domain name. I bet some people were also accessing the forums by ip address & when that stopped working some posts got lost on the way. I've never heard of that happening but I guess anything is possible. Phil P.S. Finally not a newbie anymore.
July 11, 200817 yr Here's a tool I found that can be used in shell scripts to grab the hdd temperature. It won't spin up drives which are in standby. However, I don't know if it will keep a drive from going into standby. It has a DB file that goes into /usr/share/misc You'll need to add lines in your /boot/config/go script to put these into place. Over the weekend I'll create a package. Look for it in a torrent near you. LOL! Here's an example of the output root@Atlas:/boot/custom/bin# hddtemp /dev/sdg WARNING: Drive /dev/sdg doesn't appear in the database of supported drives WARNING: But using a common value, it reports something. WARNING: Note that the temperature shown could be wrong. WARNING: See --help, --debug and --drivebase options. WARNING: And don't forget you can add your drive to hddtemp.db /dev/sdg: ST31000340AS €: 42 C or F root@Atlas:/boot/custom/bin# hddtemp /dev/sdc WARNING: Drive /dev/sdc doesn't appear in the database of supported drives WARNING: But using a common value, it reports something. WARNING: Note that the temperature shown could be wrong. WARNING: See --help, --debug and --drivebase options. WARNING: And don't forget you can add your drive to hddtemp.db /dev/sdc: WDC WD10EACS-00ZJB0 €: 41 C or F root@Atlas:/boot/custom/bin# hddtemp /dev/sdd /dev/sdd: open: No such file or directory root@Atlas:/boot/custom/bin# hddtemp /dev/sde /dev/sde: WDC WD10EACS-00ZJB0 €: drive is sleeping An interesting link which shows how to script with it. http://www.cyberciti.biz/tips/howto-monitor-hard-drive-temperature.html
July 11, 200817 yr It looks like the forum was moved. Or at least the ip address changed. Maybe Tom is rehosting the forum? Here's a tool I found that can be used in shell scripts to grab the hdd temperature. It won't spin up drives which are in standby. However, I don't know if it will keep a drive from going into standby. It has a DB file that goes into /usr/share/misc You'll need to add lines in your /boot/config/go script to put these into place. Over the weekend I'll create a package. Look for it in a torrent near you. LOL! Thanks WeeboTech!
July 11, 200817 yr Can't help being a bit miffed here. I put a fair amount of time and effort and research into about 3 to 5 or more posts today, plus some work on the wiki, and it has all been lost apparently. All changes to the wiki today have been removed, not even in the history, so lost. I can't help but think that if someone is working on maintenance of the forums and wiki, they could have provided a little notification to us? Maybe tomorrow I'll try to recreate some of what I did today, I'm too tired now.
July 11, 200817 yr Someone else mentioned the IP address for lime-technology changed mid-day. To me, that would indicate the site was moved from one host to another. I'll bet the backup used to migrate was taken before those extra posts you described were made. You are correct... It would have been nice to have had some kind of advance notification. <optimist mode> Perhaps Tom asked his web host to move him to a plan that had more bandwidth, and that resulted in a different IP since it was moved to different physical server. Perhaps he did not have advance notification of when the move would be made. </optimist mode> I agree, it would have been nice to have some notice. I too helped a few users, and those posts are gone too. Joe L.
July 12, 200817 yr It looks like our hosting provider changed servers: http://dotable.com/defiant-server-announcements/2052-defiant-migration-new-shadow-server.html Sorry about the problems. For a brief amount of time incoming email was bounced as well. I am now subscribed to the new server's announcement forum so hopefully I can give advanced warning when something like this happens again.
July 12, 200817 yr It looks like the forum was moved. Or at least the ip address changed. It looks like our hosting provider changed servers: http://dotable.com/defiant-server-announcements/2052-defiant-migration-new-shadow-server.html Sorry about the problems. For a brief amount of time incoming email was bounced as well. I am now subscribed to the new server's announcement forum so hopefully I can give advanced warning when something like this happens again. Looks like I guessed it right. At least I was right for once.. That's ok Tom. Thanks for letting us know.
July 16, 200817 yr Author Here's a tool I found that can be used in shell scripts to grab the hdd temperature. It won't spin up drives which are in standby. However, I don't know if it will keep a drive from going into standby. It has a DB file that goes into /usr/share/misc You'll need to add lines in your /boot/config/go script to put these into place. Over the weekend I'll create a package. Look for it in a torrent near you. LOL! Here's an example of the output root@Atlas:/boot/custom/bin# hddtemp /dev/sdg WARNING: Drive /dev/sdg doesn't appear in the database of supported drives WARNING: But using a common value, it reports something. WARNING: Note that the temperature shown could be wrong. WARNING: See --help, --debug and --drivebase options. WARNING: And don't forget you can add your drive to hddtemp.db /dev/sdg: ST31000340AS €: 42 C or F root@Atlas:/boot/custom/bin# hddtemp /dev/sdc WARNING: Drive /dev/sdc doesn't appear in the database of supported drives WARNING: But using a common value, it reports something. WARNING: Note that the temperature shown could be wrong. WARNING: See --help, --debug and --drivebase options. WARNING: And don't forget you can add your drive to hddtemp.db /dev/sdc: WDC WD10EACS-00ZJB0 €: 41 C or F root@Atlas:/boot/custom/bin# hddtemp /dev/sdd /dev/sdd: open: No such file or directory root@Atlas:/boot/custom/bin# hddtemp /dev/sde /dev/sde: WDC WD10EACS-00ZJB0 €: drive is sleeping An interesting link which shows how to script with it. http://www.cyberciti.biz/tips/howto-monitor-hard-drive-temperature.html Did you get this working?
July 17, 200817 yr Yes it works. I did not make an installable package, however you can unzip and install it manually.
July 17, 200817 yr Author Could you please (if you have the time) write a short instruction on what to download and unzip. And how to get it working. I don't have any experience installing stuff on my unRAID server. But this is a must as I have bad dreams about heat failures all the time these days. I would really appreciate your help!
July 17, 200817 yr I sat down and pondered this for a while. The more I think about this the more I come to the conclusion that this shouldnt be a user hack. It should ABSOLUTELY be part of the standard OS. It is so CRITICAL and really not that much work to add in as a beta feature for us all to test. I know everyone has their own opinions of what is vital and it is subjective but I think we can probably all agree that if this saves one users data its worth delaying something else for. Tom can we have your opinion please.
July 17, 200817 yr I agree. but then again, how many issues have been posted of drives getting too hot on their own for builds that have been solid for a while. I could see this being an issue for new builds until all the kinks have been worked out. UPS support may be more of a priority as that is always an external problem that even careful monitoring cannot prevent.
July 17, 200817 yr They are probably both as important as each other as they both deal with the 2 biggest sources of preventable hardware failures. It is also probably fair to say though that everyone would benefit from heat protection and only a smaller subset would benefit from UPS. Considering all the community groundwork that has been done its not infeasible to ask for both. Id love to hear Toms opinion on this as he seems to never answer posts to do with hacking unRAID which is a bit of a shame since its the community doing free development and testing.
July 17, 200817 yr Tom needs to implement features that appeal to prospective customers. Requests from the peanut gallery may be a good source of suggestions, but ultimately Tom needs to make his own decisions based on his perception of market need. Excessive heat issues caused by unexpected fan failure (UFF? ) or other situations (e.g., HVAC system failing) can cause serious damage to the drives and even the computer. That being said, I have never heard of a single unRAID user reporting such a failure. (Most users that have had heat problems saw the high heat numbers but didn’t act.) I’d classify extreme heat issues as low likelihood high impact risk. I personally would like to see features to get unRAID to shutdown when the drive temps gets high. I think it likely that sometime over the next 10 years it would save my butt. I think it would provide a very positive perception that unRAID does all it can to safeguard its data. The UPS issue is a feature that many prospective users would view as very basic functionality. Is this a feature that impacts users’ buying decisions? Not sure. I personally would like it because I bought a UPS to avoid lengthy parity rebuilds. But the lack of a UPS feature creates a high likelihood low impact risk to most unRAID users (IMO).
July 17, 200817 yr Id love to hear Toms opinion on this as he seems to never answer posts to do with hacking unRAID which is a bit of a shame since its the community doing free development and testing. NASuser, why do your posts inevitably end up as trolls? I'm really sick and tired of them, particularly the snide swipes at Tom and trying to "call him out" like a gunfighter. When you write your own software application, you can decide what features to add and when. Until then, no one hired you as a project manger for unRAID. I applaud people making suggestions for features, and proposing ideas. Voting for features is fine... trying to publicly pressure a developer to implement them is not. We don't need the trolls. Bashing Tom and the fact unRAID development is not happening on any individual's preferred schedule or priorities is not helpful. It has gotten old and stale. I humbly suggest such posts be taken to the lounge, and lets leave technical forum areas for technical information. This thread is titled "Safety" specifically a (IMHO, paranoid) fear about multiple drive failures due to overheating. A script to shutdown/spindown on overheating was suggested. Shutdown/spindown on overheating is a feature with some appeal. But there is a lot of work that such a "feature" involves, more than the scripts proposed to date. Various decisions have to be made for the algorithms..... What do you do when you get a transient spike? Bad data? No data? Can you get a checksum to validate SMART data? Do you do a time weighted average, or moving average, or single point algorithm? Will all drive models support temp reading w/o spinup? Do you start keeping a "bad models" list of drive models that don't, so you can special-handle them? Do you add an interface for enabling/disabling it for individual drives? Anyone with even basic script and HTML skills could do a drive temp monitoring interface in HTML/PHP and some shell scripts, keep a small text database of results each time it runs so as to do weighted averages, and put it all in a browser interface. Include e-mail notification, passwords, and all the bells and whistles you want. Heck, this sounds like a good stand-alone application for any *nix box that isn't using SNMP for real systems management. Someone could become famous as a shareware author! Look at just one recent change made in unRAID... larger fields for the entire drive ID. Seems pretty simple, eh? A "safe" change? Far from it, as it hosed a bunch of people. This was particularly exacerbated by the fact it is drive model specific, and Tom, even with the Herculean QC he does, can not be expected to have every make and model of every drive to test things one. There ALWAYS can be vast unintended consequences to "simple" software changes. Any futzing around with taking actions on drive temps, is froth with risk, as just GETTING drive temps (a necessary step) is far from the "black box" routine most people envision. I'm starting to feel like Linus feels when dealing with the OpenBSD crowd.
July 17, 200817 yr So as far as Safety Features, What is or should be implemented? How and by whom? My thoughts have been. 1. A real off host mail subsystem for alerts. I have developed some packages to assist. What is holding me back from releasing my packages is a drop in architecture agreed upon by all, including the core developer. 2. Array Health Alert. Joe's script does well, but I think some of the mail part should be externalized. 3. Drive Health Alert. I have scripts to check Smart Health. Depends on off host mail subsystem. 4. UPS/ Power Status. We have a drop in daemon package APCUPSD. Again, I think the drop in architecture should be considered so it's easy for users to add in packages. 1. Needs proper power down support. (I can create a package, users still have to modify go script). 2. Needs off host mail support. 5. Drive Health Alert 2 - Temperature Alarm and shutdown on high level. 1. Needs off host mail support. 2. Needs proper power down support. (I can create a package, users still have to modify go script). 3. Needs interface to configure options(can come at a later time). Allot can be done by the community, some of this depends on Tom's blessing in other threads.
July 17, 200817 yr Along with the mail alarming i would also like to see RSS of major events but everything else seems sensible. A clear structure of mail subjects which include a classification and priority level would be useful to sketch out before it was implemented so that serious events stand out and the inevitable hum drum stuff can be filesd away. A few of my firewall clusters use mail for all sorts of reporting and daily send a zip'd copy of config files. This is especially useful for disaster recovery. Whilst on the topic of safety it might also be worth considering something like md5deep/tripwire to check for changed system files. On an static OS like unRAID this could be quite effective.
July 17, 200817 yr I think a consistent mail subject line is a good idea. I'm used to NAGIOS and it's mechanism and use the subject to route specific messages to my pager or visual and auditory alert system. http://www.cotrone.com/rob/archives/2006/04/roku_soundbridg_1.html We should open a discussion for consisten subject line and go from there. As far as tripwire, Unless you are granting access outside of your network, I don't see the need. This is something I have no desire to implement or work on. I'm not negating it's potential usefulness, I just don't want to spend time on something that is going to give me so little in return.
Archived
This topic is now archived and is closed to further replies.