Jump to content
lbosley

Unexplained Disk Spin-ups

21 posts in this topic Last Reply

Recommended Posts

I posted a couple of times in the General Support section about an on-going issue in my array with unexplained spin-up of disks.  Most users expected this was simply related to a loss of directory cache.  I should mention that I have been running cache_dirs for a long time.   But now I am able to recreate this unexplained spin-up action which seems like it may rule out directory caching as a factor.  It looks to be a bug somewhere in the software, or perhaps something is failing in my system.  

 

Here is my test scenario (this was also tested in a bare-bones config with no plug-ins or Dockers):

I have one active share (Movies) in my unRAID system across 18 data drives.  If I spin down all disks and copy a folder and files to the share it writes the new data to the cache disk.  Sometimes other drives are spun up during this data copy.  Once finished I once again spin down the disks.  Then I start the Mover process.  The data should be written to my only empty drive (disk18).  The system correctly spins up disk18 and the parity drive and begins writing to the array.  Seconds later disk1 spins up.  The statistics show a modified date on the Movies folder in disk1, with a small handful of reads and writes.  No data is copied to disk1, and the Mover completes the operation to disk18.  I am able to run a find command on the entire Movies share before and after the Mover actions.  The find returns in 4 seconds and does not spin up disks - proving to me that the directory remained fully cached throughout the test.  Also, if this were a problem with directory caching, I cannot explain why the system would be writing anything on disk1 during the Move.

 

Every time I repeat this test disk1 spins up.  I even excluded disk1 from the Movies share and it still spun up during the Mover operation.  Each time the system appears to touch a directory in disk1.  

 

This is just one function in which I was able to repeatedly highlight a drive spinning up without any obvious need.  I believe it has been happening for quite a while in my system.  This is generally observed as a pause in whatever activity I am performing.  Sometimes this pause lasts minutes as multiple drives are sequentially spun up.  I've spent the day testing and flashing components to the latest firmware and BIOS releases.  I also added memory, just to be sure.  I have attached my diagnostics output, and would appreciate your tech support investigating this issue.  

 

One other oddity to mention.  At times I will see most or all of the drives needing to spin up prior to writing a file even though the file should end up on the cache disk.  Again, it does not always appear to me that this happens because of stale directory caching.  Strangely, when this happens 90% of the time one of the drives (disk5) remains spun down.  Disk5 is just one of many drives full of media files in the Movies share.

 

Thank you for your help.

unraid-diagnostics-20170225-1318.zip

Share this post


Link to post

20-30GB   full blu-ray rips, generally.

 

The files I used in testing this Mover issue were more like 3-4GB.

Share this post


Link to post

I just now scanned in Kodi for new media in my unRAID library and it needed to spin up every drive except for disk5 and disk14.  These two drives act rather normal in my opinion - especially disk5.  I rarely see these two drives spun up when all the rest of the array is awakened.  Not sure if this is important, but it is very noticeable to me.

Share this post


Link to post

For sure the cached inode/dirent information obtained from your background cache_dirs is getting ejected by file transfers.  Linux is very aggressive in caching file data.  If you copy a 4GB file from disk A to disk B, the entire 4GB is going to get into RAM if possible.  This is going to eject cached inode/dirent info.  If you move multiple files in this manner, given the amount of RAM you have, it's easy to see that this may cause disk spin ups next time cache_dirs or other scan has to take place.  In general this is a pretty tough problem to solve perfectly, and if the total size of your server is big enough, it would be impossible to solve.  As interesting exercise, from telnet/ssh session type this:


 

cd /mnt/user
tree

 

This will spit out your entire user share directory/file structure.  At the end it will produce a summary.  For example one of my severs produces:

 

31948 directories, 299120 files

 

Each directory requires min 1 page (4096 bytes) to cache, though can be more depending on how many file names are in the directory, and each file 1 page.  Hence to cache this structure would require (31948+299120)x4096 = about 1.4GB.  In this server if I have 4GB free RAM then a transfer of a 4GB file is going to entirely invalidate the cache (meaning those pages will hold file data instead of inode/dirent's).

Share this post


Link to post

85102 directories, 274570 files
 

Just under 1.5GB.  

 

So, the disks remained spun down while the tree command was running.  Afterwards I clicked on a root folder in Windows File Explorer... pause...disk5 spun up.  I create a folder and disk8 spins up (the folder was written to the cache drive).  I copy a 1.5GB file into the folder and then start the Mover.  Target disk18 and parity spin up.  Again, seconds later disk1 spins up.  I spin all drives down again and repeat the test starting with the tree command.  But this time I copy a 1.5MB file into the new folder and run the Mover.  Disk1 spins up again.

 

I can think of no reason why this would be so predictable and consistent if this were a simple matter of stale directory caching.  And just like the find command, wouldn't the tree command have needed to spin up any disk where it did not have the directory contents in memory?  Tree finished both times without a need to spin up a drive.  Doesn't that indicate that cache_dirs is doing its job?  Am I missing something in my thinking here?

 

I accept your explanation, and it may very well explain some of these frustrating spin ups.  But I don't think it explains this one. Today I ordered another 8-port SAS controller to begin systematically swapping hardware.  Any other suggestions are appreciated.

 

 

Share this post


Link to post
34 minutes ago, lbosley said:

Afterwards I clicked on a root folder in Windows File Explorer

Try the same operations using a different file manager, like https://explorerplusplus.com/

Make sure all thumbnail generation or content parsing is turned off so no files are actually opened for reading during the operation.

Share this post


Link to post

Jonathanm - Just to clarify "I clicked on a root folder": I am saying that I opened the mapped drive in Explorer to show my two main folders in the share.  Even the next level of the share holds only more folders.  No file was visible or opened, no icons were loaded, no thumbnails, etc,  All I know is Explorer paused before showing the folders, telling me that a disk had spun up.  I mentioned this action because I want to explain just how prevalent this spin-up activity is in my system.  I had just run the tree command to recursively list all of the directory contents.  Nothing spun up.  Then the instant I look at the same directory in Windows a drive spins up.  I write a file to the cache, another drive spins up.  I move the files from the cache disk and disk1 also spins up.

 

Even if you believe this is a client issue (Windows loading additional metadata or opening a file) it doesn't explain what happens with the Mover.

Share this post


Link to post

Okay, so I received an additional HBA to see if I could isolate this problem to one of my controllers.  As expected, this test changed nothing.  EVERY TIME I run the Mover disk1 is spun up and accessed.  Guys, this is a bug.  I even noticed when starting a quick parity check after changing out a controller that for a brief moment there was write activity to parity and disk1.  I verified this several times.  I believe that I routinely see drives being spun up during write operations. In my test I can repeat this behavior.  Reading from the array seems to function just fine.  If a read requires a drive to spin up it does so - and just that one drive.  A write (like ripping a new movie to the array) usually requires a drive or more to spin up - even after running a clean (no spin-ups) find command just prior. 

 

My unRAID 6.3.2 configuration includes:

SuperMicro X10SLH-F motherboard

Intel i3-4130T CPU

Using 6 SATA on main board

SAS-2LP-MV8 ***

LSI 9207-8e to external enclosure ***

 

*** I used a new LSI 9207-8i to replace my controllers one at a time.  

 

It would be nice if I could hear from other folks using the exact same test process to see if you get anything similar occurring in your array.  

 

Simple test:

  • stop all dockers or anything which might access disks during the test
  • run a find command through your test share to fill the directory cache (ie. find /mnt/user/Movies)
  • spin down all disks
  • repeat the find command and verify that it finishes without a spin-up
  • create a new folder in your share  along with some data - verify that it writes to your cache disk
  • start the Mover and let it finish
  • note if any disks spin up other than the expected target and parity

Share this post


Link to post
13 minutes ago, lbosley said:

Simple test:

  • stop all dockers or anything which might access disks during the test
  • run a find command through your test share to fill the directory cache (ie. find /mnt/user/Movies)
  • spin down all disks
  • repeat the find command and verify that it finishes without a spin-up
  • create a new folder in your share  along with some data - verify that it writes to your cache disk
  • start the Mover and let it finish
  • note if any disks spin up other than the expected target and parity

 

Please repeat this in Safe Mode - this will guarantee no plugin interaction.

In your "start the Mover and let it finish" step, how much data is being moved?

 

Edit: Does the share you are moving to also exist on disk1?

Share this post


Link to post

I just now repeated the test in Safe mode - same results.  Disk1 spun up and shows 1 read and 3 writes.  The folder and file were written to my newest drive - disk19.  The Movies share is present on all of my drives.

 

Also, I had been using 4GB files in my testing, but since you asked I used a 4MB file in this test.  It is definitely not cache or memory related.

Share this post


Link to post
33 minutes ago, lbosley said:

I just now repeated the test in Safe mode - same results

 

Ok thanks.  Please try this: from console or telnet/ssh using vi or nano edit the file /usr/local/sbin/mover

 

Go to the first line that has this:

      | /usr/local/sbin/move /mnt/cache /mnt/user0 '-i -dlDIWR --inplace -pAXogt --numeric-ids'

That should be line 87.  It's in the section "Check for objects to move from cache to array".

Add an 'O' (upper letter O) to the end of the string "-pAXogt" like this:

      | /usr/local/sbin/move /mnt/cache /mnt/user0 '-i -dlDIWR --inplace -pAXogtO --numeric-ids'

See if that solves it.

Share this post


Link to post

Yes, that seems to have resolved the Mover issue.  Does this mean you were able to reproduce this, as well?  

 

I will kick things around and see if there is anything more to report.

 

Thanks for your help.

Share this post


Link to post

Yes, but this is going to have a side effect with directory 'mtimes' (modification times).

 

For example, suppose we create this file on Mar 1 at 1200:

 

/mnt/cache/share/dir/file

 

The 'mtime' on dir and file will both be set to Mar 1 at 1200.  And that is the time we will see if we access /mnt/user/share/dir/file.

 

But after mover runs, with -O option specified, the mtime of 'file' will remain Mar 1, 1200, but the mtime of 'dir' will change to whatever time the mover ran.

 

At one time we did include -O option but this issue of 'dir' mtime changing after mover running caused another problem that someone complained about, and at the moment can't remember what that was o.O

Share this post


Link to post

Also, I don't understand why this would only affect a single drive in this operation?  I would find the Movies folder and the associated subfolder touched in disk1 when this was happening.  My files are arranged like shown below.  When the Mover ran I would see the Movies and the folder_1 or folder_2 folder time stamp change.  Nothing was written in the folders.

 

Movies

  |______ folder_1

                     |_______ movie 1 folder

                     |_______ movie 2 folder

  |______ folder_2

                     |_______ movie 3 folder

                     |_______ movie 4 folder

                     |_______ movie .....

Share this post


Link to post

It's because mover uses 'rsync' to copy a file from cache to array.  After doing the actual copy, rsync will execute a 'utimens()' call to change the mtime of the parent dir on the target to match what it was on the source in order to preserve the original mtime.

 

In your case it must be true that folder_2 exists on disk1 because when user share file system sees that 'utimens' call, it searches in disk order until it finds the target dir and that is the disk that will be the target of the actual utimens() call.

Share this post


Link to post

Yes, I see this time stamp inconsistency happening now.  Not sure how big a deal it will be for me though.  Also, I noticed that the edited Mover config file gets re-written upon reboot.  Is this just a temporary fix at this point?

 

Share this post


Link to post

Update: we have a fix for this but significant enough code changes that we are going to implement in the 6.4 series.

Share this post


Link to post

A different somewhat simpler solution that only comes with one minor cost (or benefit - depends on the way you look at it) is this:

Why spin up any drive (expect the cache disk - which is preferably an SSD) when writing to the array at all? Yes currently it checks whether the directory/file already exists and therefore hast to spin-up some array drives if the cached directories information does not hold that info or a the directory time stamp needs to be updated. An alternative would be to let it write to the cache drive (worst case now there exists two copies of the file, one on the cache drive, one on the array) and let the mover overwrite the array version (if present). As far as I know the cache drive version takes precedent over the array versions if accessed through mnt/user or the network.

 

Here is how I do this with files I get from NzbGet: I just let them unpack to mnt/cache/sharename.No spinup of any drive involved. And the next mover run will move it to the array. At the moment I only use it for "new" files - so in my case the mover script does not need to overwrite. But I imagine that the mover script could easily be changed to overwrite if it does not do it already.

 

P.S.: The two copies of a file which might reside on the cache (until the mover ran) and the array can also be seen as additonal safety.

Edited by Videodr0me

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.