Jump to content

Best practices for many (500k+) files


20_100

Recommended Posts

Hi,

 

I have a growing number of files, around 500.000 , on a single share, spread around 7 disks (XFS).

These files are between 30Mb and 120Mb.

They are in the same directory.

I have severe performance issues.

 

I would like to discuss a local, specific issue, and the generic/general best practices to improve the situation.

 

From a local unraid terminal, the same ls command, which returns a small subset of the files, while being instant on any individual drive

ls mnt/disk{disknumber}/Myshare/Thefolder/*AAA*.*

 

takes forever (almost 1 minute) when I execute on the merged file system

 

ls mnt/user0/Myshare/Thefolder/*AAA*.*

 

The issue is the same when it comes to insert  new file.

 

What I want to explore in this thread is 

  • Is this normal/to be expected.
  • Are there pieces of documentation that cover this topic
  • Are there figures documented in Unraid that
  • Does it depend on the number of drives
  • Does it depend on how files are spread on the disks
    • if so, how does each share allocation method influence this topic
  • The file names are random. Would using sub-directories help? 
    • if so, how would directory split level influence this topic
  • as CPU, RAM, I/O usage don't seem to react, what tools would you use to investigate the bottleneck

 

Edited by 20_100
Link to comment

One thing to try is to spin all the disks up on the array first.

 

Question:  How many items are being listed when you are experiencing a "almost 1 minute' delay?   My observation is that this output listing from the ls command is also sorted alphabetically.  You should realize that this is going to take longer and longer as the number of items increase.  (This time increase depends on which sort routine that was implemented by the Linxu/Unix(?) developer who first wrote this command  as some sort routines are much faster than others--- particularly as the number of items increase!!!)

Link to comment
1 hour ago, Frank1940 said:

One thing to try is to spin all the disks up on the array first.

 

Question:  How many items are being listed when you are experiencing a "almost 1 minute' delay?   My observation is that this output listing from the ls command is also sorted alphabetically.  You should realize that this is going to take longer and longer as the number of items increase.  (This time increase depends on which sort routine that was implemented by the Linxu/Unix(?) developer who first wrote this command  as some sort routines are much faster than others--- particularly as the number of items increase!!!)

100ish results. I can't imagine it takes time to sort that 🙂

Link to comment
1 hour ago, 20_100 said:

100ish results. I can't imagine it takes time to sort that 🙂

I looked at one directory that has 430 items in it.  It took ls about 2seconds to begin the display.  Now, as a disclosure, this directory is cached using  Dynamix Cache Directories so no direct disk access is required to grab the required file data to generate the list. 

 

I did go looking for a single directory with hundreds of items in it outside of that one but I could not find one.  (I learned a long, long time ago if you want to quickly find a single file, you should have some organization structure to how you store things...)   Now, about the time to sort items, here is a link to a Wikipedia article and the comparison of the various sort algorithms.   You will notice that there are tradeoffs between Time required, Memory required, and Code size.    Oh, and hardware can make a difference-- faster CPU = less time for example...

 

     https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms

 

I have not idea which sort algorithm is used by ls. 

Link to comment

try to list the directory with different command if it take so much time too as it might be caused by ls itself. try find or use ls without the sorting - ls -U should be quicker

also coloring of files in bash can have an impact on the output

it can be also filesystem dependent so you can have different results using XFS

 

Link to comment
23 minutes ago, theruck said:

try to list the directory with different command if it take so much time too as it might be caused by ls itself. try find or use ls without the sorting - ls -U should be quicker

also coloring of files in bash can have an impact on the output

it can be also filesystem dependent so you can have different results using XFS

 

 

The same very high latency behaviors happen when FTP services try to list the directory, when I try listing the directory with SMB, sshfs, NFS, the ls command, the "get-childitem" command of a powershell running in docker. I didn't find a way to list this directory without issue.

 

It also happens when I try to just insert a new file, with the above methods.

Link to comment
26 minutes ago, theruck said:

try to list the directory with different command if it take so much time too as it might be caused by ls itself. try find or use ls without the sorting - ls -U should be quicker

also coloring of files in bash can have an impact on the output

it can be also filesystem dependent so you can have different results using XFS

 

Listing 100 000 files from a single drive works instantly on the same server, as long as I use the command on /mnt/disk_ instead of /mnt/user

Link to comment
35 minutes ago, theruck said:

try to list the directory with different command if it take so much time too as it might be caused by ls itself. try find or use ls without the sorting - ls -U should be quicker

also coloring of files in bash can have an impact on the output

it can be also filesystem dependent so you can have different results using XFS

 

Just tried with -U, exactly same behavior

Edited by 20_100
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...