unraid-tunables-tester.sh - A New Utility to Optimize unRAID md_* Tunables


Recommended Posts

Very interesting results.  I set my tunables back to what I had originally decided were the best values (after 2 weeks of parity checks a few months ago ... probably ran 25 full checks with various settings to find them) ==> and it's back to 8:05.    Guess I'll now have to try a few more combinations to see if I can at least get below 8 hours  :)

 

I did remember one factor that added about 8-10 minutes (and I'm just going to leave it that way) ... I've got cache_dirs running without limits (i.e. nothing's excluded).

 

I've read the details of what these tunables mean (in Tom's post here: 

http://lime-technology.com/forum/index.php?topic=4625.msg42091#msg42091

 

... and I've concluded the following ...

 

=>  The md_write_limit parameter does NOT need to be modified at all.  Once it's set to a value that provides a reasonable write buffer, there's no reason to make it larger, as long as the total number of stripes is enough larger than the sync_buffer setting that the specified number of strips for writing will always be available.  I've found that 1500 is a good value for my array, although your mileage may vary.  I'll outline how I determined that number at the end of this post.

 

=>  There's an "implied" setting for minimum read stripes -- the total stripes minus md_write_limit minus md_sync_window.  If you make this implied setting large enough ... I've found 1000 is very good ... then read performance will also be fine during parity checks -- and making it larger won't really help read performance.  Note also that if no writes are being done, read operations can also use all the stripes that would be allocated for writes (an extra 1500 stripes if wd_write_limit is 1500).

 

BOTH read and write performance can be impacted by simple resource availability (e.g. disk thrashing) if the md_sync_window setting is so "perfect" that the disks are fully engaged at all times.  But changing the number of stripes available for reads or writes won't impact that.  I suppose "de-tuning" the sync performance may, but I can't really think of a good reason to do that !!

 

So ...  what I'd love to have is a version of this utility that lets the user fix the md_write_limit value;  fix the minimum number of stripes available for reading;  and then simply sets md_num_stripes = the sum of those two numbers plus the current md_sync_window value as it's doing its tests.

 

You can independently "tune" the number of write stripes by simply testing writes with various settings ... but since this is a maximum number you don't need to change anything else (as long as md_write_limit < md_num_stripes).    But once you've decided on a value, there's no reason to change it as you tune other parameters (notably md_sync_window).  I don't think this is a reasonable thing to do with a script, since you need to do the copies from your PC ... but it's very easy to do manually.

 

Note that writes to the array are influenced by quite a few things;  so if you're going to "tune" this parameter, you should try to be as consistent on all the other things as possible (i.e. are you writing to inner cylinders or outer cylinders? ... is there any network contention?  ... etc.).    I'd create a folder on your PC's hard drive to hold a few large files (3GB is enough to write -- I used a single 3GB file);  then write it to a specific disk on UnRAID (NOT a user share, which could alter the location of the writes);  then delete what you just wrote; alter the parameter; and repeat the process.

 

FWIW I found that 1500 stripes worked well for writes.  A 3GB file took about 1:21 with the default 768 stripes;  took 1:01 with 1500; and dropped to 0:57 at 2500 stripes, with no improvement with larger values (I tried up to 5000).  I decided 1500 was a good choice.    I also tried this with a bunch of different values of md_sync_window, and it makes NO difference what it's set at (as I suspected would be the case, since the md_write_limit sets the maximum number of stripes used for writes).

 

With md_write_limit set to 1500, and md_num_stripes set to md_sync_windows + 2500, there are up to 2500 stripes available for reads during a sync, and 1500 available for writes.  I see no reason to ever use values higher than those.  Note that with these settings, reading the same 3GB file I used for write testing takes 27 seconds (which may be limited by the write speed on my PC's drive).

 

Now all I need to do is find a value of md_sync_window that cuts 5 more minutes off my parity checks !!  :)

 

Interesting, though do not believe that may be the best approach (for everyone).  Thinking out loud, it would be logical for the driver to provide the 'quality of service' and not rely so much on 'thrashing the drive' where you start stacking IO waits because the interface is saturated.  The driver works on a FIFO basis regardless of reads, writes, or parity, if I understand the driver correctly.  At any point in time if a read request comes in it allocates a strip from the max stripe count, any writes, same, parity, same until the max stripe count = zero then it is listed (or queued) and sleeps until a stripe becomes available.    No matter what you are doing there should be resources available (disk interface bandwidth) to perform the task.  That I feel is the sweet spot where the driver is providing your 'quality of service' for read/writes/parity stripe throughput.  Falls back to the testing methodology to find that sweet spot.  Going through this effort to shave 5 minutes off a parity check does make one curious  :o

 

An idea for the script is to take 'hdparm -tT' readings from all drives in the array and taking inventory of the drive OEM and model numbers.  Would also be helpful to have the script stop nfs/samba/afp and cache_dirs services during these tests to reduce the chance of the results being tainted and skewed.  I feel it may be the cause of some of the 'weirdness' seen in reports.

Link to comment

Hey John,

 

Sorry you caught me sleeping. I'm east coast.

 

From earlier tests, I already know that rebuilds are not affected by md_sync_window. [EDIT: I was wrong, rebuilds are affected by the md_sync_window.  Tests performed by John have proven it.] The goal of this test is to see if they are impacted by md_write_limit.

 

Don't need a full spectrum test, not worth it.

 

I recommend extreme length, 128 byte interval, start at 128 and end at 768.

 

If you made the two changes in the code (change NOCORRECT to CORRECT, and WriteLimit=768 to WriteLimit=128) then for byte values 128 and above, md_sync_window and md_write_limit will be set to the exact same test values.  The test results will then show if there is any difference in speed during a parity rebuild.

 

Don't worry about actually trying to find the best value for the write limit, as garycase was suggesting, that's not the goal of this particular test.  We are merely trying to establish that write limit is connected to rebuilds.

 

The test above should only take an hour.

 

Did that answer your questions?

 

-Paul

Link to comment

I already know that rebuilds are not affected by md_sync_window.

 

Interesting ... are you SURE about that?  I'd think the logic would be very close, since both a parity sync (Not check) and a rebuild does almost exactly the same thing as a parity check with the exception of one disk (the disk being rebuilt).    There is, of course, a timing synchronization difference -- the write is always "one behind" in terms of sector # (since it can't be done until the reads are done and the appropriate bit values determined).

 

Even if it's not directly impacted by the md_sync_window (i.e. just does "reads" instead of reading to the allocated md_sync_window stripes), it would still be indirectly related, since all of the md_sync_window stripes will be available for reads.

 

Link to comment

The driver works on a FIFO basis regardless of reads, writes, or parity, if I understand the driver correctly.  At any point in time if a read request comes in it allocates a strip from the max stripe count, any writes, same, parity, same until the max stripe count = zero then it is listed (or queued) and sleeps until a stripe becomes available.    No matter what you are doing there should be resources available (disk interface bandwidth) to perform the task.

 

I agree that with AHCI access is reasonably well optimized; but nevertheless if you have reads and writes queued during a parity check, you will do a good bit of thrashing, as each time slice given to the reads or writes will result in a disk seek on the impacted disk(s), which will effectively stop the parity check while those are satisfied; then the disk will seek back to where it needs to be for the next parity check read ... and this process will continue back-and-forth until there are no pending reads/writes.    The only way to avoid the thrashing would be to stop the parity check for the duration of any pending read/write operations [which, by the way, would completely eliminate any impact of a parity check on system performance  :) ... so from one perspective that wouldn't be a bad idea at all !! ]

 

By the way, I've done quite a bit of testing on desktops to see the impact of various types of simultaneous reads/writes with/without AHCI, and regardless of the setting, one thing is ALWAYS true:  The total time for a series of reads that impact the same disk is always lower if they're done sequentially than if you start them all at the same time.  It's closer with AHCI enabled; but it's still better to do them sequentially.    While my testing was with Windows, I'm sure the Linux results would be the same ... there's simply no way to move the disk heads any differently  :)

 

 

Going through this effort to shave 5 minutes off a parity check does make one curious  :o

 

That was a "tongue in cheek" comment -- I have no plans to do anything else to change my parity times  :)

 

 

Would also be helpful to have the script stop nfs/samba/afp and cache_dirs services during these tests to reduce the chance of the results being tainted and skewed.  I feel it may be the cause of some of the 'weirdness' seen in reports.

 

Agree that stopping services that may access the disk arbitrarily would eliminate a potential for skewing the results.    Booting to UnRAID Safe Mode for performing this test is probably a good idea, although there are still a few things that doesn't stop.

 

Mentioning cache_dirs reminds me of a result I found some time ago:  I tested the impact of cache_dirs on a parity check, and found that it was relatively minor (less than 10 minutes) on my system (with ~ 300,000 files) UNLESS I started a parity check immediately after a reboot ... in that case it adds over 30 minutes.    If I wait 5 minutes after a reboot and do the check, it finishes much quicker.    The reason is fairly obvious:  cache_dirs will, after a reboot, read all of the directories to fill its cache.    If you wait for this to happen (~ 5 minutes), then it's relatively dormant from then on.  But if you start a parity check before this is done, it then causes a LOT of disk thrashing, which slows down BOTH the caching reads and the parity check reads as the head moves back-and-forth a LOT to satisfy both requirements.

 

Link to comment

garycase, your ideas on the writes and reads are similar to mine.  Let me compare and contrast:

 

=>  The md_write_limit parameter does NOT need to be modified at all.  Once it's set to a value that provides a reasonable write buffer, there's no reason to make it larger, as long as the total number of stripes is enough larger than the sync_buffer setting that the specified number of strips for writing will always be available.  I've found that 1500 is a good value for my array, although your mileage may vary. 

 

Your statement is a little confusing, as you first say md_write_limit doesn't need to be modified at all, then you go on to describe modifying it to positive results.  I think what you intended to say is that you don't want it modified as part of this utility, and that it should be independently tested and set based upon a series of write tests.

 

I agree that write tests are the way to verify the correct setting of md_write_limit, but let me explain why I am setting it in this utility:

 

The md_sync_window is the maximum number of stripes available to the parity check.  The md_write_limit is the maximum number of stripes available to writing operations.

 

My hypothesis is that the same number of stripes that maximizes parity check performance IS THE SAME number of stripes that maximizes write performance.

 

My hypothesis is unproven, of course... it's nothing more than a theory.  But if my theory is correct, then you don't have to test writes at all, just test the parity check.  Once you find the right value, set it to both parameters, sync window and write limit.

 

I think you've almost proven my hypothesis:  You indicated that a md_write_limit of 1500 gave you good results, and 2500 basically maxed out performance.  How do those numbers compare to your most recent FULLAUTO test results?  I would expect that md_sync_window 1500 gives you good results, and about 2500 basically maxes out performance.

 

Can you verify?

 

=>  There's an "implied" setting for minimum read stripes -- the total stripes minus md_write_limit minus md_sync_window.  If you make this implied setting large enough ... I've found 1000 is very good ... then read performance will also be fine during parity checks -- and making it larger won't really help read performance.  Note also that if no writes are being done, read operations can also use all the stripes that would be allocated for writes (an extra 1500 stripes if wd_write_limit is 1500).

I mostly agree with your statements here.

 

Yes:  "md_min_read_stripes" = md_num_stripes - md_write_limit - md_sync_window

 

But Also: "md_max_read_stripes" = md_num_stripes (this is slightly different that what you stated)

 

I agree that making it larger than necessary won't further improve performance, but because this is a dynamic allocation of stripes, varying between min_read and max_read depending upon what other tasks are being performed, you have to make a choice as to whether you target optimum read performance on the min_read end of the spectrum, the max_read end of the spectrum, or somewhere in the middle.

 

Extending my hypothesis on md_write_limit, I believe that reads reach max performance at the same number of stripes that writes reach max performance, which is the same number of stripes that parity checks reach max performance.

 

For easy numbers, let's pretend max parity check performance is reached at 1000 stripes.

 

md_sync_window=1000

 

And since my theory states that write stripes = sync stripes:

 

md_write_limit=1000

 

Then, if you want to provide for max read performance while both writing and running a parity check:

 

md_num_stripes=3000 (calculated as md_sync_window + md_write_limit + 1000).

 

But keep in mind, with those settings, if all you are doing is reading (not writing and not running a parity check) then the number of stripes allocated to reading is 3000, more than 3x what you need for optimum performance.  So we can lower it and still get good read performance as long as you are not also writing and running a parity check.

 

So if we set md_num_stripes=2000, you still have 2x what you need for optimum read performance when you are only reading.  If you start up a parity check, it would drop the available read stripes down to 1000, so it would still be optimal.  If instead of running a parity check you were writing files, that too would drop the reads down to 1000, so it would still be optimal.  Problems only occur if you start a parity check and write some files while reading, as that would drop your available read stripes down to 0 - they would get starved and reads would stall.

 

So we want to make md_num_stripes > md_sync_window + md_write_limit.  In the above scenario, I personally think 3000 is excessive, as you now have completely maximized reads, writes and syncs, but since the hard drives are thrashing around trying to do 3 things at once, you are not getting good performance, and you wasted a lot of memory trying to maximize something that can't really be maximized.

 

My formula follows Tom's original formula by making md_num_strips ~11% bigger (technically 10% of the whole value dedicated to reads).  So instead of 3000, we would set md_num_stripes=2222.

 

That way, when only reading, the read stripes are more than 2x what you need for optimum read performance.  And if you are either reading and writing, or reading while running a parity check, the read stripes are still more than what you need for optimum read performance.  It's only when you are reading, writing and running a sync, all at the same time, that read performance would drop below optimal, with only 222 read stripes available in the above scenario.  I think that is a fair compromise that reins in memory use a bit, but I'm sure others disagree.

 

So ...  what I'd love to have is a version of this utility that lets the user fix the md_write_limit value;  fix the minimum number of stripes available for reading;  and then simply sets md_num_stripes = the sum of those two numbers plus the current md_sync_window value as it's doing its tests.

I hear ya, but if my theories about how these three parameters work are correct, then my utility is already applying the logic I laid out (md_write_limit=md_sync_window, and md_num_stripes=(md_write_limit + md_sync_window)*1.11, which means it is doing a better job than you can do manually.  If anything, I can see making it selectable how much larger to make md_num_stripes.  Some people may want 11%, some may want 50%, etc.

 

You don't need to do any other tests - no read tests, and no write tests - because the parity check test already told you everything you need to know about the correct number of stripes, and you simply use a formula to set the other two values based upon md_sync_window test results.

 

Note that writes to the array are influenced by quite a few things;  so if you're going to "tune" this parameter, you should try to be as consistent on all the other things as possible (i.e. are you writing to inner cylinders or outer cylinders? ... is there any network contention?  ... etc.).    I'd create a folder on your PC's hard drive to hold a few large files (3GB is enough to write -- I used a single 3GB file);  then write it to a specific disk on UnRAID (NOT a user share, which could alter the location of the writes);  then delete what you just wrote; alter the parameter; and repeat the process.

 

That's the nice thing about my theory - if I'm correct, then you don't have to worry about any of that!

 

Anyway, I think it would be great if some users (like yourself) could independently test my theories.  I don't think people are too willing to believe me if I am both the source of the theories and the tester, as I am biased on the results.

 

-Paul

 

 

Link to comment

I already know that rebuilds are not affected by md_sync_window.

 

Interesting ... are you SURE about that? 

 

Pretty much positive, at least on my server.  Remember how I originally documented the problem with parity checks because they were so much slower than rebuilds.  In addition to testing parity checks with the md_sync_window values, I also tested rebuilds (I upgraded all my Samsung drives to the 3TB Reds) and my rebuild speed never changed when I tried different md_sync_window values.  I had also increased md_num_stripes during those rebuilds as well to no effect, which basically leaves just md_write_limit.

 

I never tested changing the md_write_limit values during my rebuilds, which is why I'm helping John conduct that specific test.  If the rebuild speed changes at all, then most likely it is directly connected to the md_write_limit.

 

Of course, that's just a guess.  For all I know, Tom considered a rebuild so critical that he set the the rebuild to use md_num_stripes... all of em, to maximize rebuild speed.  Seems a logical thing to do, but again my test results during rebuild didn't hint that the md_num_stripes had any impact on rebuild speed, but since that value was so high to begin with, maybe any increase was too small for me to notice.

 

I'm a dumbass who though I knew what I was talking about.  :-[  Turns out I was wrong wrong wrong.  :o  But I'm not too ashamed to admit it.

 

John just did some tests to check if md_sync_window affects parity rebuilds, and sure enough it does.

 

Interestingly, the rebuild speed appears to scale faster than parity check speed using the same values, and speed maxes out with lower values.  That might be why I didn't observe a relationship in my earlier testing.

 

Link to comment

What is being described is not "thrashing". It's really competition for resources, but not thrashing.

The drives are not being overworked to the point that nothing is getting done.

Each new access will cause the heads to move, thus causing each job to run slower.

 

When the drives are overworked where absolutely nothing is getting done on the system, because the system is so busy doing housekeeping, that is thrashing.

 

By balancing out these tunables, you can alleviate some of the competition and starvation of one type of access with another.

 

FWIW, on my 4 drive micro server system, my values are

 

Tunable (md_num_stripes):  4096

Tunable (md_write_limit):  2048

Tunable (md_sync_window):  1280

 

I'm doing a parity generate (not just a check) to a 4TB parity drive.

 

I'm also copying a file from disk1 to another server at from 60MB/s with Win 7.

 

At the same very minute, I have the following stats.

 

Total size:  4  TB

Current position:  749.03  GB (19%)

Estimated speed:  156.88  MB/sec

Estimated finish:  345  minutes

 

I would hardly call this thrashing.

 

Writes will probably drop to about 10-20MB/s as expected. Still, this is not thrashing.

The heads are moving, the drives are doing what they are designed to do, and work is still getting done. Albeit, divided among the tasks.

Link to comment
Remember how I originally documented the problem with parity checks because they were so much slower than rebuilds.

 

I've found that rebuilds or generates to be faster then read checks.

It could also be the direction of the traffic on the bus or controller.

The PCIe bus is full duplex. It could also be that write caching on the hardware comes into play.

When I used my Areca with write caching my parity generate speed improvement was measurable.

Link to comment

I agree that with AHCI access is reasonably well optimized; but nevertheless if you have reads and writes queued during a parity check, you will do a good bit of thrashing, as each time slice given to the reads or writes will result in a disk seek on the impacted disk(s), which will effectively stop the parity check while those are satisfied; then the disk will seek back to where it needs to be for the next parity check read ... and this process will continue back-and-forth until there are no pending reads/writes.    The only way to avoid the thrashing would be to stop the parity check for the duration of any pending read/write operations [which, by the way, would completely eliminate any impact of a parity check on system performance  :) ... so from one perspective that wouldn't be a bad idea at all !! ]

 

This is what I would call 'load optimization'.  Basically Tom just needs to give a priority to each of the queues, and if he exposed those priority assignments to us as tunable values, then we could choose which queue gets processed first.  If you set parity checks to the lowest priority, then reads and writes would process through the queue first, minimizing thrashing and providing faster read/write performance.  If you prioritized reads over writes over parity checks, then your Blu-Ray might actually play back smoothing while running other tasks!

 

 

By the way, I've done quite a bit of testing on desktops to see the impact of various types of simultaneous reads/writes with/without AHCI, and regardless of the setting, one thing is ALWAYS true:  The total time for a series of reads that impact the same disk is always lower if they're done sequentially than if you start them all at the same time.  It's closer with AHCI enabled; but it's still better to do them sequentially.    While my testing was with Windows, I'm sure the Linux results would be the same ... there's simply no way to move the disk heads any differently  :)

 

There is a way to move the heads differently - it is called Native Command Queuing (NCQ).  This allows the drive to re-order requests, grouping them together based upon the area of the platter the data resides, to minimize thrashing and improve overall throughput.  Tom makes this selectable as a disk tunable.  It's quite possible that enabling this will allow you to watch movies while running a parity check, so it's worth playing with.  I haven't tested recently, but in my earlier manual tests, having NCQ enabled changed the parity check performance at various md_sync_window values - so you would want to do a FULLAUTO again with NCQ enabled to find the right values with NCQ.

 

 

Mentioning cache_dirs reminds me of a result I found some time ago:  I tested the impact of cache_dirs on a parity check, and found that it was relatively minor (less than 10 minutes) on my system (with ~ 300,000 files) UNLESS I started a parity check immediately after a reboot ... in that case it adds over 30 minutes.    If I wait 5 minutes after a reboot and do the check, it finishes much quicker.    The reason is fairly obvious:  cache_dirs will, after a reboot, read all of the directories to fill its cache.    If you wait for this to happen (~ 5 minutes), then it's relatively dormant from then on.  But if you start a parity check before this is done, it then causes a LOT of disk thrashing, which slows down BOTH the caching reads and the parity check reads as the head moves back-and-forth a LOT to satisfy both requirements.

 

That's very interesting.  I haven't been stopping anything on my server when doing my tests, and I get very smooth test results.  I don't have a lot running, but I do have cache-dirs, unMENU, and a few other items.

 

I'm not sure if running barebones is the right answer, as running with plug-ins makes the results applicable to your normal, every day use model.  But certainly allowing enough time for your plug-ins to settle down is very wise.

Link to comment

Your statement is a little confusing, as you first say md_write_limit doesn't need to be modified at all, then you go on to describe modifying it to positive results.  I think what you intended to say is ... that it should be independently tested and set based upon a series of write tests.

 

Yes, that's exactly what I was saying ... I don't think it's related to the optimal md_sync_window.

 

 

My hypothesis is that the same number of stripes that maximizes parity check performance IS THE SAME number of stripes that maximizes write performance.

 

My hypothesis is unproven, of course... it's nothing more than a theory.  But if my theory is correct, then you don't have to test writes at all, just test the parity check.  Once you find the right value, set it to both parameters, sync window and write limit.

 

I think you've almost proven my hypothesis:  You indicated that a md_write_limit of 1500 gave you good results, and 2500 basically maxed out performance.  How do those numbers compare to your most recent FULLAUTO test results?  I would expect that md_sync_window 1500 gives you good results, and about 2500 basically maxes out performance.

 

Can you verify?

 

When I was testing the write values I wasn't very granular at all -- I just roughly doubled the default; then added 1000 to it ... so I only tested 1500, then 2500.  Then I doubled it again to see if there was any additional benefit.    So ... I just tried 2048, and it also results in 57 seconds => so on my system I get very good results (1:01) with 1500;  could shave a bit off at 2048; and get zero gain with numbers larger than that (as I noted before, I tested 5000 as well).

 

The last Fullauto showed a balls-to-the-wall value [not called that, of course  :) ]  of 2688.  Just for grins, I did 5 minutes of a parity check with that value;  then did 5 minutes of a parity check with double that (5376) ... and get about 5% improvement with the higher value !!    So for parity checks, I get continued improvement with higher values.  I'm confident that if I set my md_write_limit to ~ 2048 it would be maxed out, with no further improvement ... and at what I've elected to use (1536) it's at the "plenty good enough" point.    But I do NOT think the optimal values for these two parameters are the same.

 

 

Yes:  "md_min_read_stripes" = md_num_stripes - md_write_limit - md_sync_window

 

But Also: "md_max_read_stripes" = md_num_stripes (this is slightly different that what you stated)

 

Obviously md_max_read_stripes = md_num_stripes ... what did I say that sounded different??

... actually, looking at my earlier note, I noted you could also use the write stripes, but didn't note that you could also use the sync stripes (since it was in the context of a parity check -- which would be using those => I assume that's what you're referring to.  But yes, I'm well aware of how these parameters interact  :)

 

 

Extending my hypothesis on md_write_limit, I believe that reads reach max performance at the same number of stripes that writes reach max performance, which is the same number of stripes that parity checks reach max performance.

 

Definitely don't agree that reads optimize at the same number of stripes as writes.  Reads are FAR faster, since there's no parity involved, so the dynamics are a good bit different.  The stripes can empty much quicker (assuming a Gb network).    I haven't done any testing on what different numbers do here, as I can generally max out my Gb network with reads anyway.  I suspect the optimal number for reads is likely SMALLER than for writes, since the network can likely empty them at almost the same rate they can fill ... but that wouldn't be true on outer cylinders with high areal density disks (like our 1TB/platter Reds), so perhaps the number is actually larger.  In any event, as long as both reads and writes aren't done simultaneously during a parity check, I suspect the stripes allocated for writes, together with some minimal number of "extra" stripes allocated for "md_min_read_stripes" is plenty.

 

 

I agree with your comments about not needing to have too much headroom vis-à-vis a "md_min_read_stripes" ... but do think it should be a reasonable number to account for those who might want to do both read and writes during parity syncs [not something I do].      Assuming a Gb network, 250 or 500 is plenty ... and if you assume that simultaneous reads and writes are unlikely, then even small values would be okay (since the writes stripes are available in that case).    Clearly you do NOT want it to be zero (or, for that matter, I'd say no lower than 100), as that would starve reads if writes were underway.

 

Link to comment

There is a way to move the heads differently - it is called Native Command Queuing (NCQ).

 

This is done in Windows as well whenever you enable AHCI.  In fact, it's probably the most important reason to use AHCI.  [Hot swap capability is another, but I'm "old school" on that and still don't hot swap drives]

 

In any event, the results I noted include the use of NCQ.

 

Link to comment

What is being described is not "thrashing". It's really competition for resources, but not thrashing.

 

I think we are all describing the same thing, just using different words.  Semantics.

 

I'm using the term thrashing to describe the behavior of disk heads inefficiently moving around to different cylinders/sectors to satisfy competing requests.  In all my years of IT, I've never heard the term thrashing used to describe a stall, but perhaps I've just never heard the correct technical definition.

 

I think we all agree work is getting done, it's just inefficient due to competing resources.

 

This is where load prioritization in unRAID, allowing you to put one queue at a higher priority over another, would be the best possible solution.  Work still gets done, but in the order the user prefers, and the work becomes somewhat more efficient since there will be less head movement to satisfy competing requests, as complementary requests with the same priority get bundled together.

 

 

Link to comment

What is being described is not "thrashing". It's really competition for resources, but not thrashing.

The drives are not being overworked to the point that nothing is getting done.

Each new access will cause the heads to move, thus causing each job to run slower.

 

When the drives are overworked where absolutely nothing is getting done on the system, because the system is so busy doing housekeeping, that is thrashing.

 

A semantic question.  Whenever you have multiple disk accesses ongoing that each require a seek, the heads are doing a lot of movement.  The result is a slowdown ... for example, the 30 minutes extra a test takes if I don't wait 5 minutes for cache_dirs to fill its cache after a reboot.    I'd call that thrashing -- you don't, since there IS some work being done.

 

You're getting 60MB/s reads ... compared to ~ double that you'd likely get without the parity check.  You indicated writes slow to 10-20MB/s ... 1/4th to 1/2 what they typically should be.    Those slowdowns are clearly due to excessive seeks ... whether you call them "thrashing" or not is semantics.

 

Link to comment

This is where load prioritization in unRAID, allowing you to put one queue at a higher priority over another, would be the best possible solution.  Work still gets done, but in the order the user prefers, and the work becomes somewhat more efficient since there will be less head movement to satisfy competing requests, as complementary requests with the same priority get bundled together.

 

Definitely agree.  As I noted earlier (almost as an aside), if parity checks, rebuilds, and parity syncs were designed to automatically pause during reads or writes, the system would have maximum performance for the user ... and if reads were prioritized over writes, then there'd be virtually no "stalls" during media streaming, etc.    This set of priorities wouldn't necessarily satisfy everyone ... but IMHO it would work very well.    Providing users the ability to set these priorities would be even nicer.    And of course this would also virtually eliminate "unnecessary head movement" [whatever you want to call it  :) ].

 

Link to comment

Here are the results of running the script with parity rebuild and the following settings: Thorough (4 min), 128 byte increment, start at 8, end at 1920. Is this suitable or would you like me to update the following

/root/mdcmd set md_write_limit $WriteLimit

to

/root/mdcmd set md_write_limit 768

 

ETA: This is with the new 4TB drive replacing the wonky 32GB SSD

 

Tunables Report from  unRAID Tunables Tester v2.2 by Pauven

NOTE: Use the smallest set of values that produce good results. Larger values
      increase server memory use, and may cause stability issues with unRAID,
      especially if you have any add-ons or plug-ins installed.

Test | num_stripes | write_limit | sync_window |   Speed 
---------------------------------------------------------------------------
   1  |    776     |     768     |     8     |  43.2 MB/s 
   2  |    904     |     768     |     136     |  59.0 MB/s 
   3  |    1032     |     768     |     264     |  86.4 MB/s 
   4  |    1288     |     768     |     392     | 106.5 MB/s 
   5  |    1416     |     768     |     520     | 113.1 MB/s 
   6  |    1544     |     768     |     648     | 113.7 MB/s 
   7  |    1680     |     776     |     776     | 113.8 MB/s 
   8  |    1936     |     904     |     904     | 113.3 MB/s 
   9  |    2192     |    1032     |    1032     | 113.8 MB/s 
  10  |    2576     |    1160     |    1160     | 114.0 MB/s 
  11  |    2832     |    1288     |    1288     | 113.9 MB/s 
  12  |    3088     |    1416     |    1416     | 114.1 MB/s 
  13  |    3344     |    1544     |    1544     | 114.1 MB/s 
  14  |    3600     |    1672     |    1672     | 114.0 MB/s 
  15  |    3984     |    1800     |    1800     | 114.2 MB/s 

Completed: 1 Hrs 2 Min 59 Sec.

Best Bang for the Buck: Test 6 with a speed of 113.7 MB/s

     Tunable (md_num_stripes): 1544
     Tunable (md_write_limit): 768
     Tunable (md_sync_window): 648

These settings will consume 60MB of RAM on your hardware.


Unthrottled values for your server came from Test 15 with a speed of 114.2 MB/s

     Tunable (md_num_stripes): 3984
     Tunable (md_write_limit): 1800
     Tunable (md_sync_window): 1800

These settings will consume 155MB of RAM on your hardware.
This is 112MB more than your current utilization of 43MB.
NOTE: Adding additional drives will increase memory consumption.

In unRAID, go to Settings > Disk Settings to set your chosen parameter values.

Link to comment

Here are the results of running the script with parity rebuild and the following settings: Thorough (4 min), 128 byte increment, start at 8, end at 1920. Is this suitable or would you like me to update the following

/root/mdcmd set md_write_limit $WriteLimit

to

/root/mdcmd set md_write_limit 768

 

ETA: This is with the new 4TB drive replacing the wonky 32GB SSD

 

John, thank you very much!  Your results have proven very valuable!

 

Once again, your Bizarro server strikes again.  Lower values have produced slower times.  This might be because you replaced the 32GB SSD.  We won't know for sure until your rebuild is done and you can test normal parity checks again.

 

What is interesting is that we see throughput changes even when md_write_limit is static at 768.  I've never seen that in my results.  That means that rebuilds are linked to either md_sync_window or md_num_stripes (or both).

 

At least we know rebuilds are not connected to md_write_limit, and for that I thank you.

 

If you wanted to, you could run one more test, from 8 to 520 (where the values changed quite dramatically), but with the md_num_stripes hard coded to 1544 (you can hard code this in the CalcValues procedure at the top of the program).  This would show whether it is md_sync_window or md_num_stripes that is impacting rebuild speed.

 

-Paul

 

 

Link to comment
Those slowdowns are clearly due to excessive seeks ... whether you call them "thrashing" or not is semantics.

 

If the drives are capable of doing 160MB/s and work is progressing, ie. parity check is moving forward while other work is operating, it's not thrashing.

 

If I'm getting the maximum possible performance out of the drives and work is progressing, it's not thrashing.

From a parity check, it's performing, from a user perspective, it's performing slowly. Competing.

 

What you define as 'excessive seeks' is incorrect.

IO wait in a queue, is not thrashing.

What is being described is not "thrashing". It's really competition for resources, but not thrashing.

 

I think we are all describing the same thing, just using different words.  Semantics.

 

I'm using the term thrashing to describe the behavior of disk heads inefficiently moving around to different cylinders/sectors to satisfy competing requests.  In all my years of IT, I've never heard the term thrashing used to describe a stall, but perhaps I've just never heard the correct technical definition.

 

I think we all agree work is getting done, it's just inefficient due to competing resources.

 

This is where load prioritization in unRAID, allowing you to put one queue at a higher priority over another, would be the best possible solution.  Work still gets done, but in the order the user prefers, and the work becomes somewhat more efficient since there will be less head movement to satisfy competing requests, as complementary requests with the same priority get bundled together.

 

Thrashing by it's technical term is in dealing with virtual memory and paging.

 

In our usage, it's by competition of resources to the point that the workload has grown so great that no work can be done efficiently.  But technically, that's not the correct definition of the term.

 

If I am reading a disk by 1 process and getting 90MBs locally.

Then by 2 processes I get 40MB/s.

Then by 3 I get further divided at a somewhat equal level, i.e. 30MB/s, It's not thrashing until the point that every process suffers so much that it's almost pointless to continue.

Link to comment

When I was testing the write values I wasn't very granular at all -- I just roughly doubled the default; then added 1000 to it ... so I only tested 1500, then 2500.  Then I doubled it again to see if there was any additional benefit.    So ... I just tried 2048, and it also results in 57 seconds => so on my system I get very good results (1:01) with 1500;  could shave a bit off at 2048; and get zero gain with numbers larger than that (as I noted before, I tested 5000 as well).

 

The last Fullauto showed a balls-to-the-wall value [not called that, of course  :) ]  of 2688.  Just for grins, I did 5 minutes of a parity check with that value;  then did 5 minutes of a parity check with double that (5376) ... and get about 5% improvement with the higher value !!    So for parity checks, I get continued improvement with higher values.  I'm confident that if I set my md_write_limit to ~ 2048 it would be maxed out, with no further improvement ... and at what I've elected to use (1536) it's at the "plenty good enough" point.    But I do NOT think the optimal values for these two parameters are the same.

You make some good points, but I'm concerned how many variables you have in play here.  Network speed and/or desktop speed might be the limiting factors preventing higher values from improving write performance.

 

I believe some benchmark programs fabricate data out of thin air when performing write tests, eliminating all variables except for HD write performance.  Short of that, having a 10Gb/s LAN connection, transferring data from a RAM drive, might minimize those variables.

 

Since you have not eliminated those variables, I do not think we can conclusively say whom is correct.

 

Obviously md_max_read_stripes = md_num_stripes ... what did I say that sounded different??

It read like you though md_max_read_stripes=md_num_stripes - md_sync_window, which is what you thought I was referring to.

 

Extending my hypothesis on md_write_limit, I believe that reads reach max performance at the same number of stripes that writes reach max performance, which is the same number of stripes that parity checks reach max performance.

 

Definitely don't agree that reads optimize at the same number of stripes as writes.  Reads are FAR faster, since there's no parity involved, so the dynamics are a good bit different.  The stripes can empty much quicker

You make some excellent points, that since reads can empty twice as fast as writes, the stripes empty faster.  Does that mean you can do with half the number of read stripes vs. write stripes, since it is harder to fill them up?  Or do you need to double them up since they empty twice as fast? 

 

Keep this in mind, though:  Parity Checks/Rebuilds and Reads are all unidirectional - each drive is only accessed in a single direction.  Since the number of stripes allocated is 'per drive', then it stands to reason that reads and syncs need a similar number of stripes.

 

That would make writes the only outlier, since writes are bidirectional (data has to be both read and written from the parity and data disk to calculate the new parity bit on the fly).  So do you need double the write stripes to maintain full speed, or half as many?

 

-Paul

Link to comment

Network speed and/or desktop speed might be the limiting factors preventing higher values from improving write performance.

 

Very unlikely, since it's a Gb network.  I can do writes over 100MB/s between my Windows machines ... so the ~50MB/s I'm getting on UnRAID isn't even close to what the desktop and network can provide.  Just to be sure, I just wrote the same test file I've been using to my wife's machine across the network, and it completed in 28 seconds.

 

Link to comment

Network speed and/or desktop speed might be the limiting factors preventing higher values from improving write performance.

 

Very unlikely, since it's a Gb network.  I can do writes over 100MB/s between my Windows machines ... so the ~50MB/s I'm getting on UnRAID isn't even close to what the desktop and network can provide.  Just to be sure, I just wrote the same test file I've been using to my wife's machine across the network, and it completed in 28 seconds.

 

Try copying a file from the cache drive.

Link to comment

 

If you wanted to, you could run one more test, from 8 to 520 (where the values changed quite dramatically), but with the md_num_stripes hard coded to 1544 (you can hard code this in the CalcValues procedure at the top of the program).  This would show whether it is md_sync_window or md_num_stripes that is impacting rebuild speed.

 

Kicked off, with a 64 byte increment.

 

I can also repeat the test with the SSD to see how much of an impact it had.

 

-John

Link to comment

Kicked off, with a 64 byte increment.

 

I can also repeat the test with the SSD to see how much of an impact it had.

 

-John

 

Awesome, thanks!

 

I don't think you need to repeat with the SSD.  You can analyze it's impact on the next parity check.

 

-Paul

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.