unRAID Server Release 5.0-rc16c Mover spinup disk1 with no target


Recommended Posts

This has been reported in the past and lost, with the official issue list, hopefully it can be tracked now. This has been the same behaviour in many releases.

 

1) When mover executes it always spins up Disk#1 when there is no data slated for it.

2) Executing mover via the WebGui never comes back upon completion of mover. (Maybe it needs to be something like what was done with New permissions, a new window launched, maybe others have better ideas).

 

Screenshot attached showing cache drive was spun up all others spun down

 

Notice Disk #1 has 12 reads and 14 writes (at this stage)

 

Before-executing-mover.jpg.832022b2b98cdfb3e96b140757b6c39e.jpg

Link to comment

Placed 3 TV shows to be moved (TV Show share consists of disks#11-18) (Movie share consists of disks#1-7)

 

Screenshot attached showing Disks 12/15/16 spun up as they are apart of the TV shows share, and Disk 1 spun up which is not apart of the TV Shows share.

 

Notice Disk #1 has 22 reads and 26 writes

after-mover-completes-Disk1-spinup.jpg.b6e33ab98b0b4d40607c297b1d76bce6.jpg

Link to comment

Looking at disk#1 to see anything out of the ordinary. 

 

Last mover copy to the Movies share was on 6/30/2013 (2 movies) yet 4 of the top level folders are time stamped 7/7/2013 (all other shares havent been written to in a longggg time), my understanding is mover (rsync) does not modify data/time stamps, so what's causing this?

UPDATE: Well I just got lucky and found out why, browsing the "Movies" share I wanted to delete a movie I got a better quality of, upon deletion I decided to check Disk#1, the date/time stamp changed on the top level folder, BUT ONLY on Disk#1 and the movie I deleted was NOT on Disk#1. So the question changes to, is this normal behaviour; that deletions via a user share change a date/time stamp on a top level folder AND only on one disk (and not the disk from which the deletion took place)?

UPDATE#2:Deleted a second movie for the same reason, this time it changed the date/time stamp on Disk#3, the movie did not reside on Disk#3.. This can't be good.

 

Mover ran as stated above, spun up Disk#1 which it should not have, WebGui shows 10 additional reads and 12 writes between when Disk#1 was spun down and then spun up by mover.

 

Screenshot attached showing top level folders.

"YNG1GNXA_3TB" is a stub file I have on each disk containing part of the serial number of the drive and its raw size (I have these stub files on the root of each disk)

disk1_toplevel.jpg.1e57e47e493b8301b49d2ec020c38af0.jpg

Link to comment

Placed 3 TV shows to be moved (TV Show share consists of disks#11-18) (Movie share consists of disks#1-7)

This is almost certainly a misconfiguration and not an issue with unRaid s/w.  How are you enforcing the above partitioning?  There are two ways:

 

a) You can use the 'Included disk(s)' mask on the Share Settings page for the share.  For example:

For TV set include mask to "disk11-18"

For Movies set include mask to "disk1-7"

 

b) You can set split level for share to "0" and manually create the top-level "TV" directories on disks 11 through 18, and manually create the top-level "Movies" directory on disks 1 through 7.

 

The best way to analyze this is look at directory listings of the top-level directories on all the disks, as well as the current share settings of all the shares.  The best way to get the former is to type this for each disk:

 

v -a /mnt/disk1

v -a /mnt/disk2

:

 

The best way to get the latter is to click on the 'vars' utility.

 

Since this is likely to contain personal info, you can either sanitize the output or email to me [email protected].

Link to comment

I never really noticed this but my live server is on rc10 (yes this is an rc16 issues thread) and I see I have the mover disk 1 spinup checked log this morning 1 40 gb file to move disk 1 is 90% file 1 was written to disk 8 at 40% full but disk 1 spun up.  I should upgrade to latest rc and test again.

Link to comment

I'm sorry guys, I'm trying to aid, contain and keep threads concise with issues that Tom can accept as 'gotta be fixed in this release', rescheduled or not an issue.

I can rename it. Since we are working on 5.0-rc16c I figured I would prefix it.

Perhaps I'll leave that off next time.

 

If this is one of those long outstanding minor issues that is not 'recent' release related we can rename it or move it.

I'll leave it with Tom to guide me on this one.

 

The spin up can be configuration related or even some how memory related (if you aren't running cache dirs).

 

I think the hanging the interface part could be an issue and may need more review.

I'm not sure if it's thrown into the background or running as a child of emhttp and waiting.

 

 

Link to comment

I'm sorry guys, I'm trying to aid, contain and keep threads concise with issues that Tom can accept as 'gotta be fixed in this release', rescheduled or not an issue.

There are no longer any "release stopper" issues left.  As soon as I finish the documentation I will release 5.0 "stable".  If I can throw in a few 'cosmetic' type fixes before then I will but primary focus now is finishing docs to aide new users and those moving from 4.7.

Link to comment

I'm sorry guys, I'm trying to aid, contain and keep threads concise with issues that Tom can accept as 'gotta be fixed in this release', rescheduled or not an issue.

I can rename it. Since we are working on 5.0-rc16c I figured I would prefix it.

Perhaps I'll leave that off next time.

 

If this is one of those long outstanding minor issues that is not 'recent' release related we can rename it or move it.

I'll leave it with Tom to guide me on this one.

 

The spin up can be configuration related or even some how memory related (if you aren't running cache dirs).

 

I think the hanging the interface part could be an issue and may need more review.

I'm not sure if it's thrown into the background or running as a child of emhttp and waiting.

Nothing to be sorry about, its in the right place and Tom is working the issue, that all that matters in the end  ;)

 

...or even some how memory related (if you aren't running cache dirs).

I am not using cache_dirs even though I tried it for the first time not to long ago to see it the behaviour would be the same with it running, since so many do. Same behaviour disk#1 spins up. I did run into a weird issue with cache_dirs though and waiting on Joe L. review privately.

 

The spin up can be configuration related

This has been ruled out already with the data provided, and some tests.

Link to comment

Make this change in the mover by editing /usr/local/sbin/mover

 

On the line that reads:

      -exec rsync -i -dIWRpEAXogt --numeric-ids --inplace {} /mnt/user0/ \; -delete

 

Change to:

      -exec rsync -i -dIWRpEAXogtO --numeric-ids --inplace {} /mnt/user0/ \; -delete

 

The difference is the addition of ‘O’ to the option list.  This tells rsync to omit directories from modification time updates.

 

Here’s what’s happening.  Let’s say you have a share that is limited to use only disk2, e.g., /mnt/disk2/myshare.  And let’s say you create a directory on the cache drive called /mnt/cache/myshare (you don’t need to put any files in the directory).  Now you invoke the mover.  The mover will see that the modification time of /mnt/cache/myshare is newer than /mnt/disk2/myshare so it will operate on this directory.  Before moving any files it will first update the modification time on the target directory to match the source directory.  It then will see there are no files to move, so it’s done.

 

But how does this result in disk1 getting spun up also?  Well when rsync decides to update the modification time of /mnt/cache/myshare, it also has to update /mnt/cache/. (the parent dir) modification time also.  So it executes a ‘utimens’ operation on /mnt/user/.  (the utimens function just sets a timestamp).  An operation on the root of the user share file system always resolves to the lowest numbered mounted disk, in this case /mnt/disk1.  It’s this operation that is spinning up the drive so it can update the time stamp.  But wait you say, I thought disks are mounted with “nodiratime” option.  Yes they are, but the ‘utimens’ operation taking place is on the mount point.  Now the mount point is in the root file system which for a server booted from the flash, is a tmpfs, which is NOT mounted with “nodiratime” option.  Hence this quirk is causing disk1 to spin up.  To confound the problem, I don’t see this behavior on my software development machine because it uses ‘reiserfs’ file system which IS mounted with ‘nodiratime’ mount option.  So there you have it.  I will include this fix in 5.0 ‘stable’.

 

Edit: don't know about the "hang" - can't reproduce that.

Link to comment

I'm sorry guys, I'm trying to aid, contain and keep threads concise with issues that Tom can accept as 'gotta be fixed in this release', rescheduled or not an issue.

There are no longer any "release stopper" issues left.  As soon as I finish the documentation I will release 5.0 "stable".  If I can throw in a few 'cosmetic' type fixes before then I will but primary focus now is finishing docs to aide new users and those moving from 4.7.

 

Great to hear, what type of cosmetic changes were you thinking?

Link to comment

Make this change in the mover by editing /usr/local/sbin/mover

 

On the line that reads:

      -exec rsync -i -dIWRpEAXogt --numeric-ids --inplace {} /mnt/user0/ \; -delete

 

Change to:

      -exec rsync -i -dIWRpEAXogtO --numeric-ids --inplace {} /mnt/user0/ \; -delete

 

The difference is the addition of ‘O’ to the option list.  This tells rsync to omit directories from modification time updates.

I just tested with the two tests I performed for you last night, both with empty directory on the cache drive and with data in a directory. With the empty directory no disks spun up! mover just deleted the empty directory, perfect! Second test with a directory with data, it only spun up the array drive (in my case Disk#15) that was required to move the data too, no more spin up of Disk#1, perfect. The Date/stamp stamp was ONLY reflected on the lowest directory (now with this change) where the data was placed, perfect, perfect, perfect! Also to add since its not doing all that stuff u explained now (incorrectly), mover is much faster now.

 

The Plex scrapper is going to be very happy about this as well (amongst other things). SO thank you for taking the time out to flush this out once and for all! and the detailed info on what was occurring.

 

Is there a SED command via GO script I could you to add that 'O' to the mover file, so its persistent between reboots for now? I need to move on to another test to get you more info for another issue thread that was started and want to start with a fresh reboot.

 

 

I don’t see this behavior on my software development machine because it uses ‘reiserfs’ file system which IS mounted with ‘nodiratime’ mount option.  So there you have it.  I will include this fix in 5.0 ‘stable’.

Understanding here is 5.0 Final will have the updated mover script only, not 

‘reiserfs’ file system which IS mounted with ‘nodiratime’ mount option
  as mounting flash that way is not possible as it is FAT?

 

Edit: don't know about the "hang" - can't reproduce that.

Not sure what you mean, there was no hang with this particular issue (at least I don't see where I stated that), I do see the thread name uses that word.

To that point can this thread please be renamed "Mover spins up disk1 with no target & Date/Time stamp issue; when mover runs" this was far from a RC16c issue as the thread name currently states.

 

Link to comment

So secondary to this main issue is: WebGui stuck displaying what it initially displayed when executing "Move now", never comes back; posted above (third post).

Since this was marked solved, i guess the secondary issue with the WebGui part should be split off as outstanding, please.

Link to comment

Is there a SED command via GO script I could you to add that 'O' to the mover file, so its persistent between reboots for now?

sed command follows:  You can put it in your config/go script as you wanted.  It will have no impact if the mover command is already corrected so you can leave it there until 5.0 is released.  It only needs be run once per boot of the machine.

 

sed  -i  "s/dIWRpEAXogt /dIWRpEAXogtO /"  /usr/local/sbin/mover

Link to comment

sed command follows:  You can put it in your config/go script as you wanted.  It will have no impact if the mover command is already corrected so you can leave it there until 5.0 is released.  It only needs be run once per boot of the machine.

 

sed  -i  "s/dIWRpEAXogt /dIWRpEAXogtO /"  /usr/local/sbin/mover

Thank you Sir!

Link to comment

So secondary to this main issue is: WebGui stuck displaying what it initially displayed when executing "Move now", never comes back; posted above (third post).

Since this was marked solved, i guess the secondary issue with the WebGui part should be split off as outstanding, please.

I am not sure that is even an error.    For me the GUI is updated to say "Mover started" in that part of the GUI.  If I then click Done the GUI goes back a level.

Link to comment

Sorry but that is nonsense, the "Apply" & "Done" buttons are for the Mover Settings (when modifying them) and clicking "Done" after "Move now" is just a refresh of the GUI, its the same as clicking the Main tab after clicking "Move now" or navigating anywhere else.

Link to comment

Screenshot showing WebGui stuck displaying what it initially displayed when executing "Move now", never comes back.

What do you mean by "never comes back"?  The screen shown is normal.  If you refresh the page and it still says "Mover is running", it's because the mover is still running.  If you refresh the page and the 'Move now' button is available, then the mover has finished.  The 'Done' button is really like a "Back" button (it's not a true back() call because of some other screens it appears on though).  If the Mover finishes while that page is being displayed, the page does not refresh itself - you have to manually poll (refresh the page) to see when Mover is done if you care about knowing when the Mover is done.

Link to comment

OK, all your statements are true Tom. Let me try to express my view point a bit differently, and let me know if it makes a bit more sense.

 

Say I am a new unRAID and/or none linux user.

 

If i click "start array" I see spinning up all disks, mounting array, etc.. (may not be in that order) and the WebGui refreshes and I see the disks are mounts, temps, read, writes, etc..

 

If I click "New Permissions" a new window opens I see the status of new permissions chugging along.

 

If I click "log" a new window opens and I see the tail of the syslog, I can leave it open and close anytime I choose.

 

Now back to mover, if I manually executed mover, I just get "Mover is running...", remember I don't know anything linux , I would like to see what mover is doing (shouldn't need to telnet/ssh and run commands), so why not for an example launch the same thing as by clicking "log" would do and open a new window and tail syslog, so a user can see what mover is doing. Yes we could say that the user can click "log" from the "Share Settings page" But that is something the user might not know to do and lose the initial mover starting entries by that time, where if by executing mover via WebGUI it itself invoked a new window with a tail of syslog and then proceeded to execute mover all would be seen by the user.

 

Hope that makes it a bit more clear. I agree not a bug, maybe more of an enhancement/upgrade to an existing feature. Like you ending up enhancing/updating new permissions.

 

Link to comment

If I click "New Permissions" a new window opens I see the status of new permissions chugging along, no matter how long it takes, I have the option to close the secondary new permissions window if I choose not to follow it and don't care when it completes.

Not exactly true: if you close the 'New Permissions' window before the script has finished, the script is terminated at the point where you closed the window.  You must leave the window open in this case to let the process finish.

 

Now back to mover, if I manually executed mover, I just get "Mover is running...", remember I don't know anything linux , I would like to see what mover is doing (shouldn't need to telnet/ssh and run commands), so why not for an example launch the same thing as by clicking "log" would do and open a new window and tail syslog, so a user can see what mover is doing. Yes we could say that the user can click "log" from the "Share Settings page" But that is something the user might not know to do and lose the initial mover starting entries by that time, where if by executing mover via WebGUI it itself invoked a new window with a tail of syslog and then proceeded to execute mover all would be seen by the user.

I can see where that might seem inconsistent.  Of course you can click the 'Log' button and see it as well.  The documentation being added in the webGui will clear much of this up.

Link to comment

If I click "New Permissions" a new window opens I see the status of new permissions chugging along, no matter how long it takes, I have the option to close the secondary new permissions window if I choose not to follow it and don't care when it completes.

Not exactly true: if you close the 'New Permissions' window before the script has finished, the script is terminated at the point where you closed the window.  You must leave the window open in this case to let the process finish.

Ouch, ok sorry, I never personally closed it, but scratch that part, will update post not to confuse that part for someone.

 

Now back to mover, if I manually executed mover, I just get "Mover is running...", remember I don't know anything linux , I would like to see what mover is doing (shouldn't need to telnet/ssh and run commands), so why not for an example launch the same thing as by clicking "log" would do and open a new window and tail syslog, so a user can see what mover is doing. Yes we could say that the user can click "log" from the "Share Settings page" But that is something the user might not know to do and lose the initial mover starting entries by that time, where if by executing mover via WebGUI it itself invoked a new window with a tail of syslog and then proceeded to execute mover all would be seen by the user.

I can see where that might seem inconsistent.  Of course you can click the 'Log' button and see it as well.  The documentation being added in the webGui will clear much of this up.

Yep like i stated, you could if you knew :) and be quick enough. Your choice in the end, I think is an easy value add, that just doesnt hurt. We'll leave the ball in your court :)

Link to comment
  • 2 weeks later...

Strangely, I experienced a mover hang just yesterday - I invoked mover from a telnet session, it reported that it was copying (moving) the file and no more output appeared in my telnet session.  After some time I checked that the file had been moved and simply killed the telnet session.

Link to comment