unRAID OS version 6.4.0-rc10b available

October 28, 20178 yr

Author

5 minutes ago, tstor said:

1. After downloading/installing the upgrade via web menu and rebooting via Dynamix System Button, unRAID came up with "unclean shutdown" and wanting to do a parity check.

That should not happen. Did everything related to 'reboot' work correctly? Meaning, you didn't have to manually cycle power or press reset button, etc? Also, are you running in gui console mode or normal mode?

6 minutes ago, tstor said:

2. The GUI for entering the passphrase has reverted to the one before rc9, forcing you to enter the passphrase two times for unlocking. Is this intention or a regression? I prefer the rc9 variant.

This also should not happen. Are you talking about the dialog on the Main page or on Encryption Settings page?

Are you using array autostart?

Posting diagnostics.zip would be a great help.

October 28, 20178 yr

Unclean Shutdown: The array is not set to autostart. All disks except one are encrypted, therefore it has to wait for the keys before becoming useful anyway. Otherwise the reboot worked (no manual intervention necessary). rc9 was running in GUI console mode, while rebooting however i was not at the server, therefore it came up with rc10 in normal console mode.

Passphrase Entry: Being used to rc9 I went to the Encryption Settings page.

diagnostics attached: tower-diagnostics-20171028-1917.zip

October 28, 20178 yr

Author

1 hour ago, tstor said:

Unclean Shutdown: The array is not set to autostart. All disks except one are encrypted, therefore it has to wait for the keys before becoming useful anyway. Otherwise the reboot worked (no manual intervention necessary). rc9 was running in GUI console mode, while rebooting however i was not at the server, therefore it came up with rc10 in normal console mode.

Passphrase Entry: Being used to rc9 I went to the Encryption Settings page.

diagnostics attached: tower-diagnostics-20171028-1917.zip

Right, ok encryption keyfile handling has changed a bit, due to: the s/w does not really know if an array device is encrypted or not unless an attempt is made to Start the array. This is to handle the case where an encrypted device is 'disabled' and being emulated. In this case there is no physical device to query and we can only check for LUKS header after the md/unraid driver has started, when we can then query the 'md' device.

What we recommend is, for servers with encrypted devices, you set "Settings/Disk Settings/Enable auto start" to Yes. Following boot of course there will be no keyfile present so this autostart will fail. But in this case s/w also now knows there are encrypted devices and you will see right on the Main page a place to enter the encryption passphrase or upload a keyfile.

Try that, I think you will like the result. Ultimately we want to get rid of the "Settings/Encyrption Settings" page entirely.

re: unclean reboot: did it generate a diagnostics zip file on the flash in the 'logs' directory?

October 28, 20178 yr

Yes, and in syslog it shows problems to unmount:

Oct 28 18:29:22 Tower root: rmdir: failed to remove '/mnt/user': Device or resource busy

tower-diagnostics-20171028-1830.zip

October 28, 20178 yr

Author

7 minutes ago, tstor said:

Yes, and in syslog it shows problems to unmount:

Oct 28 18:29:22 Tower root: rmdir: failed to remove '/mnt/user': Device or resource busy

tower-diagnostics-20171028-1830.zip

That explains the unclean shutdown... thanks for posting that, I will have to study a bit and see if there's a good way to solve this.

October 28, 20178 yr

11 minutes ago, limetech said:

That explains the unclean shutdown... thanks for posting that, I will have to study a bit and see if there's a good way to solve this.

You're welcome :-)

While you are thinking about a solution, are you willing to share what the root cause is and - more important - would manually stopping the array before a reboot avoid the issue so that we have a workaround until it is fixed?

October 28, 20178 yr

Author

7 minutes ago, tstor said:

You're welcome :-)

While you are thinking about a solution, are you willing to share what the root cause is and - more important - would manually stopping the array before a reboot avoid the issue so that we have a workaround until it is fixed?

Sure. What's happening is there is some process with an open file descriptor to a file on a mounted device. This prevents 'unmounting' the device. Instead of 'Reboot' if you click 'Stop' the stop would 'hang', well actually it will be stuck in a loop, waiting for you, the user, to eliminate the condition holding up the un-mount. A typical source of this behavior is having a telnet/ssh window open where the 'current directory' is on one of the disk paths or user share. The solution in this case is to close the window.

Other sources of hang ups would be docker containers or VM's not shutting down in a timely manner. The "Settings/Disk Settings/Shutdown time-out" setting configures how long to wait during shutdown for such processes to terminate. If set too short there is possibility of terminating a VM (for example) before it has completely shut down (and possibly causing data loss). If set too long, and shutdown is a result of power loss, the UPS battery could drain before server has a chance to complete clean shutdown.

So it's really a tricky problem: you want shutdown/reboot to proceed quickly without data loss, but there are times when user intervention is necessary because system doesn't automatically know if it's ok to unconditionally kill any given process. You see similar behavior when shutting down Windows. Some conditions exist where windows shutdown will not proceed because there is a "Save/Cancel" open file dialog being presented by some app.

October 28, 20178 yr

17 minutes ago, limetech said:

What's happening is there is some process with an open file descriptor to a file on a mounted device.

Ok, that explains my case. There were no dockers or VMs running, but I had a ssh session open with pwd below /mnt/user. As it was just sitting at the shell prompt, I didn't bother to close it, because I am used to get thrown out by UNIX/LINUX style operating systems once a shutdown is initiated. I am aware that unmount doesn't work if a file descriptor is open, especially as a Mac user where sometimes the finder "thinks" that a file is open even though it isn't (as can be checked with lsof), but I did not expect the ssh session to survive long enough after shutdown initiation to become a problem. I noticed the expected "system shutting down" message in the ssh window immediately after clicking the shutdown button, but didn't follow to see, when the session got killed. I ultimately did get kicked out, because the reboot happened, so if I understand correctly, it is a basically a question of order and currently unmounting is attempted before remote sessions are forcibly disconnected.

October 28, 20178 yr

I'm still seeing a problem I reported seeing with -rc8 and -rc9, namely the following three ACPI error messages in my syslog every 10 seconds:

Oct 28 23:06:10 Northolt kernel: ACPI Error: SMBus/IPMI/GenericSerialBus write requires Buffer of length 66, found length 32 (20170531/exfield-427)
Oct 28 23:06:10 Northolt kernel: ACPI Error: Method parse/execution failed \_SB.PMI0._PMM, AE_AML_BUFFER_LIMIT (20170531/psparse-550)
Oct 28 23:06:10 Northolt kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20170531/power_meter-338)

This is on a HP Microserver Gen 8. To fix it I add the following two lines to the end of the /etc/sensors3.conf file:

chip "power_meter-*"

    ignore power1

and that does the job without having to restart any services. Is there any chance that the stock /etc/sensors3.conf file could be modified in this way to make it permanent, please? In the meantime I'll add a line to my go script to update the file.

My original report:

The bug discussed: https://community.hpe.com/t5/ProLiant-Servers-Netservers/ACPI-Error-SMBus-or-IPMI-write-requires-Buffer-of-length-66/td-p/6943959

The fix: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1079544/comments/18

My diagnostics: northolt-diagnostics-20171028-2302.zip

October 28, 20178 yr

Author

8 minutes ago, John_M said:

Is there any chance that the stock /etc/sensors3.conf file could be modified in this way to make it permanent, please?

If you look at the top of that file we see this comment:

# In general, local changes should not be added to this file, but rather
# placed in custom configuration files located in /etc/sensors.d. This
# approach makes further updates much easier.

Please try this:

Create a file on your usb flash in the config directory called "sensor-quirk" (or whatever you want to name it), with those two lines in it. Next, copy the file into the /etc/sensors.d directory and verify that it also works to do this.

If this is the case, then add this to your 'go' file:

cp /boot/config/sensor-quirk /etc/sensors.d

I don't think we would add this to the stock unRAID OS distro because it might cause problems for other users. But this kind of customization is exactly what the 'go' file was intended to support. What is wrong with continuing to use it this way to solve this problem?

October 28, 20178 yr

A HUGE THANK YOU to LT for recognising and fast tracking this fix. Much appreciated.

I can't believe the difference.

October 29, 20178 yr

I concur with Beancounter here. This fix has been a real godsend for those that look to use KVM and hated the NTP=0 performance tradeoff.

October 29, 20178 yr

Has this issue been addressed or is there a recommended way to change this parameter in a reboot persistent way:

Oct 13 07:55:08 Tower php-fpm[10268]: [WARNING] [pool www] server reached max_children setting (10), consider raising it

October 29, 20178 yr

Author

42 minutes ago, Videodr0me said:
Has this issue been addressed or is there a recommended way to change this parameter in a reboot persistent way:
Oct 13 07:55:08 Tower php-fpm[10268]: [WARNING] [pool www] server reached max_children setting (10), consider raising it

Not addressed. The setting to change is defined in /etc/php-fpm.d/www.conf:

pm.max_children = 10

To experiment with different settings you can make your own copy of 'www.conf' and save it on the usb flash device in 'config' directory. Next, add this line in your 'go' file before emhttp is started:

cp /boot/config/www.conf /etc/php-fpm.d/

You can also make a change to /etc/php-fpm.d/www.conf directly and then type '/etc/rc.d/rc.php-fpm restart'.

The original default value is 5 and we set to 10. Apparently that's too low. Maybe double it again?

October 29, 20178 yr

i assume there is still no way to access a docker with sep ip (br0) from the host system ?

October 29, 20178 yr

Smooth upgrade from rc9.

Another note about the SSL cert/DNS redirect (carried over from rc9):

In only get the DNS redirection when I go to HTTP://servername or IP, if HTTPS is used redirection does not occur. Is it possible to get the DNS redirection irregardless of which protocol is used to access the webUI?

Thanks!

Edited October 29, 20178 yr by Dephcon

October 29, 20178 yr

Actually, I'd like the option to disable DNS redirects. They cause issues for me over my VPN.

October 29, 20178 yr

15 hours ago, limetech said:
If you look at the top of that file we see this comment:
# In general, local changes should not be added to this file, but rather
# placed in custom configuration files located in /etc/sensors.d. This
# approach makes further updates much easier.
Please try this:

Create a file on your usb flash in the config directory called "sensor-quirk" (or whatever you want to name it), with those two lines in it. Next, copy the file into the /etc/sensors.d directory and verify that it also works to do this.

If this is the case, then add this to your 'go' file:

cp /boot/config/sensor-quirk /etc/sensors.d

I don't think we would add this to the stock unRAID OS distro because it might cause problems for other users. But this kind of customization is exactly what the 'go' file was intended to support. What is wrong with continuing to use it this way to solve this problem?

OK. Thanks for the reply. I'll do it that way.

October 29, 20178 yr

I'm one of the holdouts still running unRAID as a VM guest (ESXI 6.5)

Upgraded from unRAID interface from 6.4 RC9 to 6.4 RC10 and rebooted to this:

https://imgur.com/a/QQWp3

I tried checking the file system for errors, but none were found. Copied "previous" files over root, unRAID 6.4 RC9 boots normally.

(I'm using USB passthrough to the VM and PLOP ISO to boot the unRAID USB)

October 29, 20178 yr

16 hours ago, limetech said:

Sure. What's happening is there is some process with an open file descriptor to a file on a mounted device. This prevents 'unmounting' the device. Instead of 'Reboot' if you click 'Stop' the stop would 'hang', well actually it will be stuck in a loop, waiting for you, the user, to eliminate the condition holding up the un-mount. A typical source of this behavior is having a telnet/ssh window open where the 'current directory' is on one of the disk paths or user share. The solution in this case is to close the window.

Other sources of hang ups would be docker containers or VM's not shutting down in a timely manner. The "Settings/Disk Settings/Shutdown time-out" setting configures how long to wait during shutdown for such processes to terminate. If set too short there is possibility of terminating a VM (for example) before it has completely shut down (and possibly causing data loss). If set too long, and shutdown is a result of power loss, the UPS battery could drain before server has a chance to complete clean shutdown.

So it's really a tricky problem: you want shutdown/reboot to proceed quickly without data loss, but there are times when user intervention is necessary because system doesn't automatically know if it's ok to unconditionally kill any given process. You see similar behavior when shutting down Windows. Some conditions exist where windows shutdown will not proceed because there is a "Save/Cancel" open file dialog being presented by some app.

Have noticed this behaviour when sshed in a mounted device while the power was out. Perhaps there could be an option enabling the system to detect opened file descriptor processes (ssh/telnet sessions, vms, etc) and killing them if they can't be closed in a set amount of time. This would put safe array shutdown as the top priority before UPS power gets depleted.

October 29, 20178 yr

It is a bit more com-located in that you only want to clo

2 hours ago, realies said:

Have noticed this behaviour when sshed in a mounted device while the power was out. Perhaps there could be an option enabling the system to detect opened file descriptor processes (ssh/telnet sessions, vms, etc) and killing them if they can't be closed in a set amount of time. This would put safe array shutdown as the top priority before UPS power gets depleted.

I think it is a little more complicated in that you only want to kill such things if they point to the array. I often keep a SSH session open, but to /boot and I would rather an array Stopping did not kill such sessions.

Something along this lines will also start being needed if it is ever intended to have VMs that can be left running when the main array is stopped. You would need to detect which ones could not be left running with the array stopped as they had vdisk files on the main array (or the cache at the moment). However if such a VM was only using a vdisk on a UD managed drive it could be left running in such a scenario, although it would obviously still need to be stopped if you were doing a system shutdown/reboot.

October 29, 20178 yr

It has to be a little bit smarter than "oh, a ssh session, i betta kill that". Perhaps the output of lsof /mnt/user/ could be a starting point? (this discussion should probably be elsewhere)

Edited October 29, 20178 yr by realies

October 29, 20178 yr

I would try to limit the complexity of the approach. I could easily live with unRAID just deciding between two situations:

1. Stop array only
In that case I would accept that unRAID just tries to stop the array. If any open file prevents that, throw an error message and let the user find out what is going on. No automatic killing of anything.

2. Shutdown
In the shutdown case I would prefer unRAID to wait a little bit, but then kill what needs to be killed early enough that the array can properly shut down.

Edited October 29, 20178 yr by tstor

October 30, 20178 yr

Author

58 minutes ago, tstor said:

I would try to limit the complexity of the approach. I could easily live with unRAID just deciding between two situations:

1. Stop array only
In that case I would accept that unRAID just tries to stop the array. If any open file prevents that, throw an error message and let the user find out what is going on. No automatic killing of anything.

2. Shutdown
In the shutdown case I would prefer unRAID to wait a little bit, but then kill what needs to be killed early enough that the array can properly shut down.

This is exactly how it behaves now.

October 30, 20178 yr

On 10/28/2017 at 11:12 PM, limetech said:

That explains the unclean shutdown... thanks for posting that, I will have to study a bit and see if there's a good way to solve this.

2 minutes ago, limetech said:

This is exactly how it behaves now.

As my example that you previously looked at shows, it did not kill the remote sessions quickly enough, which resulted in a array that was not properly shut down.

unRAID OS version 6.4.0-rc10b available

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)