Jump to content

unRAID Server release 4.5-beta4 available


limetech

Recommended Posts

I need to report that there is a problem in the startup of the NTP daemon, for some but not all users.  It is not starting successfully for me, and judging by all of the syslogs I've checked, it appears it is not starting for perhaps a third of other users either.  A typical syslog line sequence is:

Mar 17 19:59:17 Media ntpd_initres[1464]: host name not found: pool.ntp.org
Mar 17 19:59:17 Media ntpd_initres[1464]: couldn't resolve `pool.ntp.org', giving up on it

Mine is:

Apr 14 10:04:46 JacoBack ntpd_initres[1430]: host name not found: 0.us.pool.ntp.org
Apr 14 10:04:46 JacoBack ntpd_initres[1430]: couldn't resolve `0.us.pool.ntp.org', giving up on it
Apr 14 10:04:47 JacoBack ntpd_initres[1430]: host name not found: 1.us.pool.ntp.org
Apr 14 10:04:47 JacoBack ntpd_initres[1430]: couldn't resolve `1.us.pool.ntp.org', giving up on it
Apr 14 10:04:48 JacoBack ntpd_initres[1430]: host name not found: 2.us.pool.ntp.org
Apr 14 10:04:48 JacoBack ntpd_initres[1430]: couldn't resolve `2.us.pool.ntp.org', giving up on it

 

I have had this same issue in 4.4.2 ever since I started using unRAID:

 

Apr 12 11:11:54 Tower ifplugd(eth0)[1195]: Link beat detected.

Apr 12 11:11:54 Tower ntpd_initres[1238]: host name not found: 0.pool.ntp.org

Apr 12 11:11:54 Tower ntpd_initres[1238]: couldn't resolve `0.pool.ntp.org', giving up on it

 

Running a Gigabyte P965-DS3 v3.3 Motherboard and using the on-board NIC.

 

RobJ's solution of hitting the apply button on the time settings does resolve the problem.  It would be great to have this fixed!

 

Link to comment
  • Replies 69
  • Created
  • Last Reply

I need to report that there is a problem in the startup of the NTP daemon, for some but not all users.  It is not starting successfully for me, and judging by all of the syslogs I've checked, it appears it is not starting for perhaps a third of other users either.  A typical syslog line sequence is:

Mar 17 19:59:17 Media ntpd_initres[1464]: host name not found: pool.ntp.org
Mar 17 19:59:17 Media ntpd_initres[1464]: couldn't resolve `pool.ntp.org', giving up on it

Mine is:

Apr 14 10:04:46 JacoBack ntpd_initres[1430]: host name not found: 0.us.pool.ntp.org
Apr 14 10:04:46 JacoBack ntpd_initres[1430]: couldn't resolve `0.us.pool.ntp.org', giving up on it
Apr 14 10:04:47 JacoBack ntpd_initres[1430]: host name not found: 1.us.pool.ntp.org
Apr 14 10:04:47 JacoBack ntpd_initres[1430]: couldn't resolve `1.us.pool.ntp.org', giving up on it
Apr 14 10:04:48 JacoBack ntpd_initres[1430]: host name not found: 2.us.pool.ntp.org
Apr 14 10:04:48 JacoBack ntpd_initres[1430]: couldn't resolve `2.us.pool.ntp.org', giving up on it

 

I have had this same issue in 4.4.2 ever since I started using unRAID:

 

Apr 12 11:11:54 Tower ifplugd(eth0)[1195]: Link beat detected.

Apr 12 11:11:54 Tower ntpd_initres[1238]: host name not found: 0.pool.ntp.org

Apr 12 11:11:54 Tower ntpd_initres[1238]: couldn't resolve `0.pool.ntp.org', giving up on it

 

Running a Gigabyte P965-DS3 v3.3 Motherboard and using the on-board NIC.

 

RobJ's solution of hitting the apply button on the time settings does resolve the problem.  It would be great to have this fixed!

 

Seems to me that the ntpd process is looking for the network connection before it is established. 

 

That is supported by the sequence of log entries in the syslog.  The network link is brought up AFTER the ntpd daemon is started.

[pre]

Apr 11 07:07:06 Tower ntpd[1356]: ntpd [email protected] Mon May  7 05:15:03 UTC 2007 (1)

Apr 11 07:07:06 Tower ntpd[1357]: precision = 1.000 usec

Apr 11 07:07:06 Tower ntpd[1357]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16

Apr 11 07:07:06 Tower ntpd[1357]: Listening on interface #0 wildcard, 0.0.0.0#123 Disabled

Apr 11 07:07:06 Tower ntpd[1357]: Listening on interface #1 lo, 127.0.0.1#123 Enabled

Apr 11 07:07:06 Tower ntpd[1357]: kernel time sync status 0040

...

Apr 11 07:07:07 Tower ifplugd(eth0)[1325]: Link beat detected.

Apr 11 07:07:07 Tower emhttp: shcmd (5): rm /etc/samba/smb-shares.conf >/dev/null 2>&1

Apr 11 07:07:07 Tower emhttp: shcmd: shcmd (5): exit status: 1

Apr 11 07:07:07 Tower emhttp: shcmd (6): cp /etc/exports- /etc/exports

Apr 11 07:07:07 Tower emhttp: shcmd (7): killall -HUP smbd

Apr 11 07:07:07 Tower emhttp: shcmd (8 ): /etc/rc.d/rc.nfsd restart | logger

Apr 11 07:07:08 Tower ntpd_initres[1360]: host name not found: us.pool.ntp.org

Apr 11 07:07:08 Tower ntpd_initres[1360]: couldn't resolve `us.pool.ntp.org', giving up on it

Apr 11 07:07:08 Tower ifplugd(eth0)[1325]: Executing '/etc/ifplugd/ifplugd.action eth0 up'.        <-- these should be done before starting the NTP daemon.

Apr 11 07:07:08 Tower logger: /etc/rc.d/rc.inet1:  /sbin/ifconfig eth0 hw ether 00:11:11:75:FB:7E        <-- these should be done before starting the NTP daemon.

Apr 11 07:07:08 Tower logger: /etc/rc.d/rc.inet1:  /sbin/ifconfig eth0 192.168.2.100 broadcast 192.168.2.255 netmask 255.255.255.0    <-- these should be done before starting the NTP daemon.

Apr 11 07:07:08 Tower logger: /etc/rc.d/rc.inet1:  /sbin/route add default gw 192.168.2.1 metric 1      <-- these should be done before starting the NTP daemon.

[/pre]

 

Edit:

I just noticed the process-ID for ifplugd is 1325, and the process-ID for the ntpd was 1360, indicating the network startup process was started first... however, it was nowhere near completed when the ntpd process went looking for a network time server.  A time delay of a 10 seconds or more might be in order before starting the ntp daemon. 

 

Might even be able to schedule it with something like:

echo "/etc/rc.d/rc.ntpd restart" | at now + 1 minute

 

Joe L.

 

Link to comment

The new power button shutdown does not seem to be working, for me or for users here.  I had a sudden thunderstorm this morning, very close, and tried the power button first.

 

Next I tried powerdown, expecting my copy of an older version of WeeboTech's powerdown script to shut things down, only to see a series of errors involving localhost, http, and other web stuff.  I finally tried Ctrl-Alt-Del, and that worked, as it is hooked directly to my powerdown script.

 

I checked and there is a new powerdown script installed in /usr/local/sbin, which just happens to be first in the path, plus a link to it in the home directory.  The contents of this script are:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
# Works only if webGui is running.

/usr/bin/wget localhost/update.htm?shutdown=apply >/dev/null 

... which explains the errors I got.  I'm guessing a webGui is coming, but not quite here?

Link to comment

The new power button shutdown does not seem to be working, for me or for users here.  I had a sudden thunderstorm this morning, very close, and tried the power button first.

 

Next I tried powerdown, expecting my copy of an older version of WeeboTech's powerdown script to shut things down, only to see a series of errors involving localhost, http, and other web stuff.  I finally tried Ctrl-Alt-Del, and that worked, as it is hooked directly to my powerdown script.

 

I checked and there is a new powerdown script installed in /usr/local/sbin, which just happens to be first in the path, plus a link to it in the home directory.  The contents of this script are:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
# Works only if webGui is running.

/usr/bin/wget localhost/update.htm?shutdown=apply >/dev/null 

... which explains the errors I got.  I'm guessing a webGui is coming, but not quite here?

Often, when users are trying to power down from the command line, it is because the web-interface is NOT available.  I personally don't think the use of wget/emhttp is a good long term solution.  (Also, from my prior experience, unless Tom changed something, it will fail if there is a root administrative password.)
Link to comment

One last issue concern:  Perhaps because I'm more of a purist, but I'm uncomfortable with setting the NCQ queue_depth to a higher value than the kernel decided was best.  The older SATA controllers don't support NCQ, so they set the queue_depth to 0, not 1.  Perhaps it is completely equivalent to 1, but I've seen some set to 1 and others to 0, so I have to assume there *could* be a difference in the handling of it somewhere.

 

I'd like to suggest changing the comparison operator from -ne to -lt in the following line from set_ncq:

if [ $2 -ne `cat /sys/block/$1/device/queue_depth` ]; then 

 

Then it will only set the queue_depth to 1 if it is higher than 1.

 

( Note: I'm going out of town tomorrow evening for 3 or 4 days.  You will be happy to see me shut up for awhile, eh?  ;D )

Link to comment

The memtest file included in unRAID Server 4.5-beta4.zip is actually the zip of the actual memtest tool.  If you try to run it, from the boot menu, you will get a corrupt kernel image error.  The memtest is actually available, if you extract it from the server distro, then extract the memtest86+-2.10.bin from within memtest, and rename it to memtest, and copy it to the flash.  (That memtest86+-2.10.bin is actually memtest86+-2.11.bin - their naming mistake.)

 

Heh - noticed that about an hour after posting -beta4.  Fixed in -beta5, you guys don't miss a thing :)

Link to comment

The new power button shutdown does not seem to be working, for me or for users here.  I had a sudden thunderstorm this morning, very close, and tried the power button first.

 

Next I tried powerdown, expecting my copy of an older version of WeeboTech's powerdown script to shut things down, only to see a series of errors involving localhost, http, and other web stuff.  I finally tried Ctrl-Alt-Del, and that worked, as it is hooked directly to my powerdown script.

 

I checked and there is a new powerdown script installed in /usr/local/sbin, which just happens to be first in the path, plus a link to it in the home directory.  The contents of this script are:

#!/bin/bash
#
# Helper script to gracefully power-down unRAID server.
# Works only if webGui is running.

/usr/bin/wget localhost/update.htm?shutdown=apply >/dev/null 

... which explains the errors I got.  I'm guessing a webGui is coming, but not quite here?

 

Fixed in -beta5.

Link to comment

Often, when users are trying to power down from the command line, it is because the web-interface is NOT available.   I personally don't think the use of wget/emhttp is a good long term solution.  (Also, from my prior experience, unless Tom changed something, it will fail if there is a root administrative password.)

 

The feature was added to permit graceful power down when pressing the Power Switch on the case.  The event has to be hooked up to a script & so I just made the script available via symlink in the root 'home' directory.  The script works by using 'wget' to simulate the user clicking the 'Power down' button.  This is because there's a lot of work to shut everything down gracefully which has already been implemented in 'emhttp'.  There are cases where the script won't work & it has been fixed in -beta5.

 

In those cases where the web-interface is not available, I guess I'd ask what do you mean by "not available"?  If you don't have access to a web-browser, then that's what the button is for.  If you only have access to the console then you can use the button or the 'powerdown' command (or if you don't have a button or acpi is not enabled, then you can use the 'powerdown' command).  If the web-interface is unresponsive then this is a bug which should be fixed.

 

If all else fails you can use the three-finger-salute to invoke linux shutdown command, though in this case parity check will want to run upon next reboot.  If that fails, you can hold the power button down for around 5 seconds & the system bios will shut it down.  If that fails you can yank the power cord :)

Link to comment

One last issue concern:  Perhaps because I'm more of a purist, but I'm uncomfortable with setting the NCQ queue_depth to a higher value than the kernel decided was best.  The older SATA controllers don't support NCQ, so they set the queue_depth to 0, not 1.  Perhaps it is completely equivalent to 1, but I've seen some set to 1 and others to 0, so I have to assume there *could* be a difference in the handling of it somewhere.

 

I'd like to suggest changing the comparison operator from -ne to -lt in the following line from set_ncq:

if [ $2 -ne `cat /sys/block/$1/device/queue_depth` ]; then 

 

Then it will only set the queue_depth to 1 if it is higher than 1.

 

( Note: I'm going out of town tomorrow evening for 3 or 4 days.  You will be happy to see me shut up for awhile, eh?   ;D )

 

I haven't seen any cases where the actual queue_depth value is a 0, but of course I have not used all the h/w out there people use to build their server :)  In a cursory review of the code it looks like a value of 1 is always used to turn off NCQ; nevertheless, I modified the set_ncq script so that if you're trying to set queue_depth to 1 and the value is already 1 or 0, then it won't try to set it to 1 (script will just exit).  This is in -beta5.

Link to comment
If all else fails you can use the three-finger-salute to invoke linux shutdown command, though in this case parity check will want to run upon next reboot.  If that fails, you can hold the power button down for around 5 seconds & the system bios will shut it down.  If that fails you can yank the power cord Smiley

 

Perhaps consider my graceful shutdown at the shell level and hook that into the 3 finger salute.

You can include my powerdown script/package if you want.

 

I think by hooking the bare minimim into the /etc/rc.local_shutdown or hooking this up with my rc.unRAID script it could help alleviate some conditions which cause parity check on boot.

 

Link to comment

I also vote for the 3 finger salute... not all systems have an ATX PSU or even a power switch.  I have built a number of servers and punched out or otherwise didn't use a power switch... it's a server and no good can come of a easy-to-reach power switch!

Link to comment

Often, when users are trying to power down from the command line, it is because the web-interface is NOT available.   I personally don't think the use of wget/emhttp is a good long term solution.  (Also, from my prior experience, unless Tom changed something, it will fail if there is a root administrative password.)

 

The feature was added to permit graceful power down when pressing the Power Switch on the case.  The event has to be hooked up to a script & so I just made the script available via symlink in the root 'home' directory.  The script works by using 'wget' to simulate the user clicking the 'Power down' button.  This is because there's a lot of work to shut everything down gracefully which has already been implemented in 'emhttp'.  There are cases where the script won't work & it has been fixed in -beta5.

 

In those cases where the web-interface is not available, I guess I'd ask what do you mean by "not available"?  If you don't have access to a web-browser, then that's what the button is for.  If you only have access to the console then you can use the button or the 'powerdown' command (or if you don't have a button or acpi is not enabled, then you can use the 'powerdown' command).  If the web-interface is unresponsive then this is a bug which should be fixed.

Specifically, there are times where emhttp was either killed by the kernel as it ran out of memory, or un-responsive for some other unknown reason.  We then get users coming to the forum panicing stating they can't get to the web-interface and want to shut the server down cleanly.

 

The use of wget would not help them at that point...  emhttp is toast, there is little memory left to invoke much of anything.

 

I like the idea of the rc.d/rc.unRAID script having all the logic needed to cleanly stop the array. (kill samba, nfs, close open files, loop devices, sync, un-mount file-systems, stop the array, and then power down.)

 

That way, no matter how... three-finger-salute, or specific "powerdown" (which just invokes /etc/rc.d/rc.unRAID stop, followed by poweroff.) we shut down cleanly.    It is just as likely that a UPS will initiate a shutdown on an extended power outage.  It must shut down all processes, including any extensions, and for that reason I like the rc.d structure.  rc.0 calls rc.local_shutdown, which in turn

has:

[ -x /etc/rc.d/rc.unRAID ] && /etc/rc.d/rc.unRAID stop

 

If all else fails you can use the three-finger-salute to invoke linux shutdown command, though in this case parity check will want to run upon next reboot.

Actually, with the way the powerdown command WeeboTech is tied in, it shuts down cleanly and does not need a parity check.

  If that fails, you can hold the power button down for around 5 seconds & the system bios will shut it down.  If that fails you can yank the power cord :)

Believe it or not, I've had to do that once or twice in my past... (but not on the unRAID server, since I have NO power button... remember, YOU removed it before you shipped the MD1200 server to me.)  Fortunately, the rocker switches on the power supplies still work  ;D)

 

 

Link to comment

As requested I switched from AHCI mode to IDE mode and my Parity-check speeds are now identical at about 95Mb/s weather "Force NCQ Disabled" is set to On or Off. I am running the latest BIOS (1803) on the motherboard. I hope this helps, if you want me to check anything else just let me know.

 

 

Santan.

Link to comment
The use of wget would not help them at that point...  emhttp is toast, there is little memory left to invoke much of anything.

 

wget also won't work if they have a password on root.

 

I like the idea of the rc.d/rc.unRAID script having all the logic needed to cleanly stop the array. (kill samba, nfs, close open files, loop devices, sync, un-mount file-systems, stop the array, and then power down.)

 

I second that.  And if unRAID is already shut down, the call to the script during shutdown is harmless.

Link to comment

The use of wget would not help them at that point...  emhttp is toast, there is little memory left to invoke much of anything.

 

wget also won't work if they have a password on root.

 

I like the idea of the rc.d/rc.unRAID script having all the logic needed to cleanly stop the array. (kill samba, nfs, close open files, loop devices, sync, un-mount file-systems, stop the array, and then power down.)

 

I second that.  And if unRAID is already shut down, the call to the script during shutdown is harmless.

 

Another feature of my rc.unRAID script is it saves the syslogs on the flash drive.

If zip is included in the distribution it will automatically zip the latest one as an asciicrlf txt file too. (this lets it load in notepad easier).

 

Link to comment

I like that alot. has it got a size sanity check as occasionally my syslog is huge and you wouldnt want to wait 10 mins as a file copied to flash

 

No size check, it saves the latest 10 logs (configurable in script) and zips the latest.

It could be changed to only zip and save the latest.

However, I had not heard of any issues so far.

 

I say, if your syslog is that big, there might be another issue.

Link to comment

Yeah there is an issue... moving files around with mc combined with ls -R  created zillions of dupe file syslog entries. Its more of a byproduct of how things are implemented than a bug though and i dont expect it to be fixed anytime soon.

 

If it takes xx number of log entries thats a perfect solution though.

Link to comment

Yeah there is an issue... moving files around with mc combined with ls -R  created zillions of dupe file syslog entries. Its more of a byproduct of how things are implemented than a bug though and i dont expect it to be fixed anytime soon.

 

If it takes xx number of log entries thats a perfect solution though.

I think that a combination of the first lines and the tail-end lines might be perfect.  Stuff in the middle is very likely to be duplicates and of little additional value.

 

Something like this would grab at most 6000 lines, and most often far less:

( head -3000 /var/log/syslog | cat -b

 cat -b /var/log/syslog | tail -3000 ) | sort -u | gzip - >/boot/logs/syslog.gzip

Link to comment

Can't you do something like:

grep -v duplicate /var/log/syslog >/boot/syslog_nodups

 

You have had this massive dupe problem for awhile, and I have to confess, I refrained from saying anything before, because I thought you would take care of it without our help.  I've never heard of a dupe problem that wasn't fairly easy to fix, once you understood where the dupes were, and had decent tools to deal with it.  So we must be missing something here.  The UnMENU 'Dupe Files' tool should help if it is just odd files duped across various disks.  But if it is massive amounts of duplication, that is usually the easiest to fix, because it is NOT odd files but whole trees that are duplicated.  For that, you simply rename a selected top level or branch (eg. Movies Archive or Movies2), and --- BAMM --- the whole problem is solved!  In just a few seconds!  I admit I'm probably missing something important.  Tell us more, and maybe someone will have some ideas.  It is not something you should be trying to work around, or live with.

Link to comment

The problem stems around having so many files and dupes. Nothing more really. How they happen and where they come from is unavoidable and not something i can go into here.

 

Every now and again I track them down and delete them but even having a few dupes and doing ls -R means loads of logs.

 

Things might be better now that syslog is fixed however until unRAID stops telling me I have a dupe file that its told me about before the only fix is to find and locate the files..... and thats manpower and time i cant do at the drop of a hat.

 

I haven't installed unMENU since the new www project as I had loads of stability problems with it.

 

Grepping the logs for duplicates only show one of the dupe files and not them both. not so helpful if you have 15 drives all with the same folder structure.

 

Edit: I have since found out from Joe that this statement is simply incorrect. Thats good news as it makes it much easier for me to find these dupes.

 

So in summary Ive had it for a while... but not the same dupe files.

Link to comment
Grepping the logs for duplicates

 

No, this was "grep -v duplicate /var/log/syslog >/boot/syslog_without_dupe_lines", which saves a tiny copy of your syslog minus the dupe lines.  Perhaps WeeboTech can create a filtering version of his tool, with the option to save and zip only the filtered 'unduped' version of the syslog.

Link to comment

I would like to make a request to have the ability to obtain a clean power down by depressing the power off button on the server. In my situation, I keep the server powered down most of the time. At the most, I may use it for an hour or two, three times a week. It is located in the basement with my other network gear.

 

I have a remote operated switch connected across the power button so I can turn on the server remotely. This works great, but to shut down I have to remember to use the web management page to do so. And often I don't have the computer turned on which means I must deal with that issue.

 

If power down, as purposed, remains, then I could program my remote to power down the server, along with the other components with the press of one button. To me, that would be cool.  ;D

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...