Disk-write speed increase on low-powered unRaid build

October 20, 200916 yr

Greetings everbody!

I'd like to share with you some interesting experiments I did with my unRaid box, which resulted in ~30% speed increase for the writes to the array, and ~12% speed increase for the reads.

Until now, I've had writing speeds to the unRaid array of about 15 MB/s, and only on a good day, and only when copying to the unRaid from a Linux machine. When copying from a XP machine it's about half that, and it's been bugging the hell out of me.

I noticed that the stock unRaid sets the max_sectors_kb to 512 by default (for disks with max_hw_sectors_kb of 512 and higher). Since people have often reported best read/write results with values much lower than that, I decided to give it a try. Here are the surprising results: The writing speeds went up from ~15MB/s to ~20MB/s! WOW!!

root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-10    bs=10K count=100000
100000+0 records in
100000+0 records out
1024000000 bytes (1.0 GB) copied, 67.9073 s, 15.1 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-100    bs=100K count=10000
10000+0 records in
10000+0 records out
1024000000 bytes (1.0 GB) copied, 66.7751 s, 15.3 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-1000    bs=1000K count=1000
1000+0 records in
1000+0 records out
1024000000 bytes (1.0 GB) copied, 66.5369 s, 15.4 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-10000    bs=10000K count=100
100+0 records in
100+0 records out
1024000000 bytes (1.0 GB) copied, 67.609 s, 15.1 MB/s
root@Tower:~#
root@Tower:~# echo 128 > /sys/block/sda/queue/max_sectors_kb
root@Tower:~# echo 128 > /sys/block/sde/queue/max_sectors_kb
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-128-10    bs=10K count=100000
100000+0 records in
100000+0 records out
1024000000 bytes (1.0 GB) copied, 50.5722 s, 20.2 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-128-100    bs=100K count=10000
10000+0 records in
10000+0 records out
1024000000 bytes (1.0 GB) copied, 50.5701 s, 20.2 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-128-1000    bs=1000K count=1000
1000+0 records in
1000+0 records out
1024000000 bytes (1.0 GB) copied, 49.954 s, 20.5 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-128-10000    bs=10000K count=100
100+0 records in
100+0 records out
1024000000 bytes (1.0 GB) copied, 49.9764 s, 20.5 MB/s
root@Tower:~#

Scroll the box above and notice what happens when we write 128 to max_sectors_kb!

(Note: /dev/sda is my parity disk, and /dev/sde is disk4 of the array)

Also, the default scheduler in unRaid is "noop". Changing that to "cfq" can bring about 15% increase in the write speeds of the stock unRaid. (Although that doesn't show a cumulative improvement together with tweaking the max_sectors_kb value, at least in my setup). Here is the same set of tests as before, but this time with the cfq scheduler:

root@Tower:~# echo cfq > /sys/block/sda/queue/scheduler
root@Tower:~# echo cfq > /sys/block/sde/queue/scheduler
root@Tower:~# echo 512 > /sys/block/sda/queue/max_sectors_kb
root@Tower:~# echo 512 > /sys/block/sde/queue/max_sectors_kb
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-512-10    bs=10K count=100000
100000+0 records in
100000+0 records out
1024000000 bytes (1.0 GB) copied, 59.469 s, 17.2 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-512-100    bs=100K count=10000
10000+0 records in
10000+0 records out
1024000000 bytes (1.0 GB) copied, 58.0788 s, 17.6 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-512-1000    bs=1000K count=1000
1000+0 records in
1000+0 records out
1024000000 bytes (1.0 GB) copied, 58.4941 s, 17.5 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-512-10000    bs=10000K count=100
100+0 records in
100+0 records out
1024000000 bytes (1.0 GB) copied, 58.5234 s, 17.5 MB/s
root@Tower:~#
root@Tower:~# echo 128 > /sys/block/sda/queue/max_sectors_kb
root@Tower:~# echo 128 > /sys/block/sde/queue/max_sectors_kb
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-128-10    bs=10K count=100000
100000+0 records in
100000+0 records out
1024000000 bytes (1.0 GB) copied, 51.6275 s, 19.8 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-128-100    bs=100K count=10000
10000+0 records in
10000+0 records out
1024000000 bytes (1.0 GB) copied, 51.2785 s, 20.0 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-128-1000    bs=1000K count=1000
1000+0 records in
1000+0 records out
1024000000 bytes (1.0 GB) copied, 52.3884 s, 19.5 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/cfq-128-10000    bs=10000K count=100
100+0 records in
100+0 records out
1024000000 bytes (1.0 GB) copied, 52.5635 s, 19.5 MB/s
root@Tower:~#

Similar speed increases are also observed for the disk reads, although not as drastic as for the writes. (I'll skip posting those results because this post got too long already) On average, the read speed increase is about 12%.

I ran full sets of tests for all kinds of values for max_sectors_kb: all the way down to 16, and all the way up to 1024. For this particular machine the value of 128 gave the best speeds for both reads and writes.

As a result of my experiments I modified the following line in my syslinux.cfg:

    append  elevator=cfq  initrd=bzroot

And also I added the following line to my GO script:

for i in /sys/block/[hs]d?; do echo 128 > $i/queue/max_sectors_kb ; done 2>null

Now bear in mind that these results may be very different on other machines. (This particular unRaid box has a 1GHz Ultra-Low-Voltage Celeron Mobile without any level-2 cache, 1GB RAM, stock unRaid v.4.5.7. All tests were done with NCQ enabled. Tomorrow I may run all these tests again with disabled NCQ to see it it matters). Chances are that on more powerful machines the speed increases will not be that signifficant, if any. But still, it is worth experimenting.

Yours,

Purko

---

edit: At the end I settled on the deadline elevator rather than cfq for this particular server.

October 20, 200916 yr

Author

Greetings again!

Here is an update:

I ran all the tests again, this time with NCQ disabled. That did not do me any good. I saw a ~5% speed decrease accross the board. So I abandoned that idea, and reenabled my NCQ.

Next, I ran all the tests but with the deadline I/O scheduler. That worked even better than with the cfq scheduler before. Here are the results:

root@Tower:~#
root@Tower:~# for i in /dev/[hs]d? ; do echo 128 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>null
root@Tower:~# for i in /dev/[hs]d? ; do echo deadline > /sys/block/${i:5}/queue/scheduler ; done 2>null
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-10    bs=10K count=100000
100000+0 records in
100000+0 records out
1024000000 bytes (1.0 GB) copied, 52.2128 s, 19.6 MB/s
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-100    bs=100K count=10000
10000+0 records in
10000+0 records out
1024000000 bytes (1.0 GB) copied, 51.3745 s, 19.9 MB/s
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-1000    bs=1000K count=1000
1000+0 records in
1000+0 records out
1024000000 bytes (1.0 GB) copied, 52.1474 s, 19.6 MB/s
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-10000    bs=10000K count=100
100+0 records in
100+0 records out
1024000000 bytes (1.0 GB) copied, 52.265 s, 19.6 MB/s
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/mnt/disk4/test/deadline-128-NCQ-10   of=/dev/null    bs=10K count=100000
100000+0 records in
100000+0 records out
1024000000 bytes (1.0 GB) copied, 10.7187 s, 95.5 MB/s
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/mnt/disk4/test/deadline-128-NCQ-100   of=/dev/null    bs=100K count=10000
10000+0 records in
10000+0 records out
1024000000 bytes (1.0 GB) copied, 11.2518 s, 91.0 MB/s
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/mnt/disk4/test/deadline-128-NCQ-1000   of=/dev/null    bs=1000K count=1000
1000+0 records in
1000+0 records out
1024000000 bytes (1.0 GB) copied, 11.2221 s, 91.2 MB/s
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/mnt/disk4/test/deadline-128-NCQ-10000   of=/dev/null    bs=10000K count=100
100+0 records in
100+0 records out
1024000000 bytes (1.0 GB) copied, 11.2136 s, 91.3 MB/s
root@Tower:~#

So now the append line in my syslinux.cfg looks like this:

  append  elevator=deadline  initrd=bzroot

Surprisingly, it's not only the write speed that improved as a result of the two tweaks, but also the overall responsiveness of the unraid box. Before all this, a large copy to the unRaid box initiated from a Windows machine over samba could totally freeze the unRaid: a whole bunch of processes on the unRaid were getting stuck in a disk wait state, and I couldn't even telnet into unRaid until all that copying was done. Not anymore.

And again, all of the above may well apply to only this particular unRaid machine. But still, I find it intriguing enough that I had to share it with you.

Yours,

Purko

October 20, 200916 yr

Interesting tests...

Can you try your tests usinf cfq and deadline disk scheduler and various write blocks kb using

/mnt/user/test/deadline-128-NCQ-100

vs.

/mnt/disk4/test/deadline-128-NCQ-100

What impact on performance occurs when you read and write through the user-shares?

Joe L.

October 20, 200916 yr

I think this should be tested with a file that is 2X ram or more in order to completely exhaust the buffer cache.

Even when you run bonnie or bonnie++ it says to use a test file that is 2x ram otherwise you are exercising the buffer cache.

unRAID has tweaks in sysctl designed to allow maximum throughput and adjust buffer cache writes.

FWIW, I know without any tweaks, if I do an rsync from one disk to another with large files and set bwlimit to 12-20mb/s It will keep steady because of the timing with how much data is pushed to the other disk and how the buffer cache is flushed.

Without any tweaks I'm getting 18-19MB/s

With unRAID and many media files, how many write only 1GB vs an actual DVD's worth of data?

October 20, 200916 yr

this is interesting.

I set the numbers as recommended using CFQ and 128 and I lost performance.

Parity Seagate 1.5TB 7200RPM 32MB cache

Data Seagate 1.5TB 7200RPM 32MB cache 66% used.

NOOP / 512 / 4GB

4194304000 bytes (4.2 GB) copied, 237.739 s, 17.6 MB/s

CFQ / 128 / 4GB

4194304000 bytes (4.2 GB) copied, 251.965 s, 16.6 MB/s

DEADLINE / 128 / 4GB

4194304000 bytes (4.2 GB) copied, 232.727 s, 18.0 MB/s

October 20, 200916 yr

Author

I set the numbers as recommended using CFQ and 128

Actually, it was deadline and 128 in the later update above.

You too got some increase there with deadline and 128, although not as signifficant as in my tests.

But then again, your machine is a kickass compared to my tiny lttle ULV Mobile processor without L2 cache.

I expected something like this. Still, for me the difference is signifficant. Jumping from 15MB/s to 20 MB/s here is not nothing.

Purko

October 20, 200916 yr

Author

Can you try your tests using...

/mnt/user/test/...

That would be testing with too many variables. Besides, I don't use user shares.

If I have some time later today I may enable them and see how it goes.

But first, I'm going to redo all my tests with 2xRAM files as WeeboTech suggested.

Purko

October 20, 200916 yr

I did some testing a while back with the scheduler and went back to the default noop. I could improve HDD performance with cfq but network performance hit negated this and more than two concurrent operations (reads or writes tanked network throughput).

October 20, 200916 yr

The speed itself is not so important to me - I would be interested in an improved behaviour of responsiveness - avoiding all streams to break if I copy a file to the box...

October 20, 200916 yr

Author

I think this should be tested with a file that is 2X ram

Very well. I redid all the tsts with 2xRAM and the results still hold:

Parity: WD-RE4 2TB 64MB cache

Data: WD-EADS Green 2TB 32MB cache

noop / 512 / 2GB

2048000000 bytes (2.0 GB) copied, 136.634 s, 15.0 MB/s

deadline / 128 / 2GB

2048000000 bytes (2.0 GB) copied, 103.357 s, 19.8 MB/s

Here's the actual output:

root@Tower:~#
root@Tower:~# for i in /dev/[hs]d? ; do echo noop > /sys/block/${i:5}/queue/scheduler ; done 2>null
root@Tower:~# for i in /dev/[hs]d? ; do echo 512 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>null
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-NCQ-20    bs=20K count=100000
100000+0 records in
100000+0 records out
2048000000 bytes (2.0 GB) copied, 136.634 s, 15.0 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-NCQ-200    bs=200K count=10000
10000+0 records in
10000+0 records out
2048000000 bytes (2.0 GB) copied, 135.068 s, 15.2 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-NCQ-2000    bs=2000K count=1000
1000+0 records in
1000+0 records out
2048000000 bytes (2.0 GB) copied, 136.284 s, 15.0 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/noop-512-NCQ-20000    bs=20000K count=100
100+0 records in
100+0 records out
2048000000 bytes (2.0 GB) copied, 136.268 s, 15.0 MB/s
root@Tower:~#
root@Tower:~# for i in /dev/[hs]d? ; do echo deadline > /sys/block/${i:5}/queue/scheduler ; done 2>null
root@Tower:~# for i in /dev/[hs]d? ; do echo 128 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>null
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-20    bs=20K count=100000
100000+0 records in
100000+0 records out
2048000000 bytes (2.0 GB) copied, 104.492 s, 19.6 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-200    bs=200K count=10000
10000+0 records in
10000+0 records out
2048000000 bytes (2.0 GB) copied, 102.804 s, 19.9 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-2000    bs=2000K count=1000
1000+0 records in
1000+0 records out
2048000000 bytes (2.0 GB) copied, 103.357 s, 19.8 MB/s
root@Tower:~#
root@Tower:~# echo 3 > /proc/sys/vm/drop_caches
root@Tower:~# dd if=/dev/zero   of=/mnt/disk4/test/deadline-128-NCQ-20000    bs=20000K count=100
100+0 records in
100+0 records out
2048000000 bytes (2.0 GB) copied, 103.412 s, 19.8 MB/s
root@Tower:~#

I rest my case.

October 20, 200916 yr

Author

The speed itself is not so important to me - I would be interested in an improved behaviour of responsiveness - avoiding all streams to break if I copy a file to the box...

I absolutely agree with you!

As I wrote above...

Surprisingly, it's not only the write speed that improved as a result of the two tweaks, but also the overall responsiveness of the unraid box. Before all this, a large copy to the unRaid box initiated from a Windows machine over samba could totally freeze the unRaid: a whole bunch of processes on the unRaid were getting stuck in a disk wait state, and I couldn't even telnet into unRaid until all that copying was done. Not anymore.

For me too this is the more important part.

Purko

October 20, 200916 yr

Very interesting results. My improvement is small, but there is a whole series of variables here.

Thanks.

One of these days I'll compile the driver for my caching controller and put the parity drive there to see what happens.

October 20, 200916 yr

Author

Another interesting observation:

I started a parity check, and it's been runnung for awhile at ~24,000 KB/sec. (deadline,128)

This used to be ~20,000 KB/sec before /w noop and max_sectors_kb 512.

Purko

October 20, 200916 yr

What is the system/hardware you are using?

October 20, 200916 yr

Author

What is the system/hardware you are using?

I mentioned some of it above, but here it is again:

Norco's NS-520, /w embedded 1GHz ULV Mobile Celeron, no L2 cache, 1 GB ram, 2x Gigabit LAN.

8 SATA-II ports on board (Marvell chipset): 5 internal SATA disks + 3 external eSATA ports.

5 disks in the array, 2TB each: One WD-RE4 2TB for parity, and four WD-EADS 2TB Green for data.

One 2.5" WD-Scorpio IDE 300GB (not in the protected array, for torrents and such)

Stock unRaid 4.5.7-beta.

Purko

October 20, 200916 yr

AMD LE-1200 (2.1Ghz), 2GB DDR2 6400, Geforce 8300/SB700, 6xsata all 7200RPM (4x1TB, 2x500GB). No noticable difference on either drive type/size. Tried noop, cfq and deadline. 128, 256 and 1024 max_sector_kb settings.

Maybe a usefull tuning option for lower specced unraid servers but made no discernable difference on my system. Once level two testing is done I might drop a slower CPU in to see if I can reproduce.

October 20, 200916 yr

What is the system/hardware you are using?

I mentioned some of it above, but here it is again:

Norco's NS-520, /w embedded 1GHz ULV Mobile Celeron, no L2 cache, 1 GB ram, 2x Gigabit LAN.

8 SATA-II ports on board (Marvell chipset): 5 internal SATA disks + 3 external eSATA ports.

5 disks in the array, 2TB each: One WD-RE4 2TB for parity, and four WD-EADS 2TB Green for data.

One 2.5" WD-Scorpio IDE 300GB (not in the protected array, for torrents and such)

Stock unRaid 4.5.7-beta.

Purko

Ahh.. sorry, I did not see where the whole machine was mentioned.

I have the same machine.. I'll have to run some tests.

I know on my Abit AB9 PRO, there was a very small difference, but it not much. After several tests, there was nothing conclusive as to increasing speed, but I do agree that other schedulers would be better for overall smoothness.

That has been a big beef of mine with unRAID, whenever I do massive transfers and the buffer cache flushes, the system pauses (even with multicore).

I resorted to doing ionice on certain jobs to insure the priority gets dropped so it does not interfere with the whole system.

Perhaps you have found a more elegant solution, thanks!

October 21, 200916 yr

My own tests on the original unRAID Intel MB. 512Meg of RAM. Disk8 is a 750 Gig Segate IDE drive. Parity is a 1TB Seagate SATA drive.

Using the "noop" scheduler:

root@Tower:/boot# for i in /dev/[hs]d? ; do echo 512 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>/dev/null

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 164.057 s, 12.5 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

Switching to 128 kb block queue, performance went down

root@Tower:/boot# for i in /dev/[hs]d? ; do echo 128 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>/dev/null

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-2000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 172.385 s, 11.9 MB/s

root@Tower:/boot#

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

Switching back to 512kb transfers, performance back up, but varies from test to test.

root@Tower:/boot# for i in /dev/[hs]d? ; do echo 512 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>/dev/null

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-2000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 163.378 s, 12.5 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-2000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 176.776 s, 11.6 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 162.08 s, 12.6 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 178.3 s, 11.5 MB/s

I stopped the running cache_dirs program here.

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 120.149 s, 17.0 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 132.697 s, 15.4 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 122.51 s, 16.7 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 135.781 s, 15.1 MB/s

I switched to 128 kb queue here

root@Tower:/boot# for i in /dev/[hs]d? ; do echo 128 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>/dev/null

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 134.366 s, 15.2 MB/s

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 131.226 s, 15.6 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 129.72 s, 15.8 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/noop-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 138.095 s, 14.8 MB/s

Switching to "deadline" scheduler here

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 129.133 s, 15.9 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 138.73 s, 14.8 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 130.167 s, 15.7 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 139.919 s, 14.6 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 129.257 s, 15.8 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 139.338 s, 14.7 MB/s

Setting 512 as queue size

root@Tower:/boot# for i in /dev/[hs]d? ; do echo 512 > /sys/block/${i:5}/queue/max_sectors_kb ; done

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 121.216 s, 16.9 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 135.237 s, 15.1 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 121.143 s, 16.9 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 134.568 s, 15.2 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 121.319 s, 16.9 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 134.197 s, 15.3 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 120.154 s, 17.0 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 133.511 s, 15.3 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 121.352 s, 16.9 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 132.467 s, 15.5 MB/s

root@Tower:/boot# sync;echo 3 > /proc/sys/vm/drop_caches

root@Tower:/boot# dd if=/dev/zero of=/mnt/disk8/data/deadline-1000 bs=1000K count=2000

2000+0 records in

2000+0 records out

2048000000 bytes (2.0 GB) copied, 123.903 s, 16.5 MB/s

Notice how the speed goes up and down alternately on each writing of the file. I'm clearing the cache, so the only thing I can think of is that the reiserfs is having to re-allocate the data blocks rather than reuse the same blocks, and every other time is involves different operations.

So far, on my server, the default block size is better with the anticipatory scheduler. But there is a lot to experiment with...

Joe L.

October 21, 200916 yr

Author

Notice how the speed goes up and down alternately on each writing of the file. I'm clearing the cache, so the only thing I can think of is that the reiserfs is having to re-allocate the data blocks rather than reuse the same blocks, and every other time is involves different operations.

I wonder why your results fluctuate so much. Was it a plain stock unRaid 4.5.7 you running? What else was running on top of it?

Also, was that disk relatively old and fragmented? I ran mine on a freshly added new disk, precleared only a couple of weeks ago. I also output each consecutive dd copy to a separate file, if that made any difference.

Purko

October 21, 200916 yr

Author

Maybe a usefull tuning option for lower specced unraid servers

That would be my guess too.

October 21, 200916 yr

Author

...overall smoothness...

That has been a big beef of mine with unRAID, whenever I do massive transfers and the buffer cache flushes, the system pauses (even with multicore).

I resorted to doing ionice on certain jobs to insure the priority gets dropped so it does not interfere with the whole system.

That's been bugging me even more than the speeds. I never had much success ionicing things. Dropping large files to unRaid from Windows explorer used to freeze up for long periods of time, often time-outing and aborting in the middle of the copy with nasty messages. That was unacceptable.

Now copying big number of large files from the XP machines works just fine. And if I watch the network with Task Manager, the graphic is much smoother now. It doesn't get stuck down at 0% for long periods of time as it used to.

Purko

October 21, 200916 yr

Notice how the speed goes up and down alternately on each writing of the file. I'm clearing the cache, so the only thing I can think of is that the reiserfs is having to re-allocate the data blocks rather than reuse the same blocks, and every other time is involves different operations.

I wonder why your results fluctuate so much. Was it a plain stock unRaid 4.5.7 you running? What else was running on top of it?

It is plain stock 4.5b7, the only thing running is unMENU, but it was not serving pages, so it would not be using any CPU or I/O at all. Originally I had cache_dirs running which would have had a "find" command actively running every few seconds trying to keep the directory entries in the buffer cache, but then I terminated it and saw the resulting increase in apparent I/O speed. There was no other activity at all. I was not writing to the array and I was not watching any movies or playing any media files from the array.

Also, was that disk relatively old and fragmented? I ran mine on a freshly added new disk, precleared only a couple of weeks ago. I also output each consecutive dd copy to a separate file, if that made any difference.

I used the same file, again and again. The disk is about 60% full. Obviously, it makes a difference. I used a 2 Gig file, and I only have 512 Meg of RAM, so I can guarantee it does not fit in the disk buffer memory. What was interesting was the very apparent pattern of a higher speed, followed by a lower speed, followed by a higher speed, etc.

If you only looked at every other test, the values are very consistent.

Joe L.

October 21, 200916 yr

[...] Dropping large files to unRaid from Windows explorer used to freeze up for long periods of time, often time-outing and aborting in the middle of the copy with nasty messages. That was unacceptable.

I have the same problem as described - so what change is required to avoid this? It's the syslinux.cfg entry, adding the elevator=

Is this compatible and can I be sure not to break anything with that? Sorry for asking, I have no knowledge of those things ...

October 21, 200916 yr

Author

[...] Dropping large files to unRaid from Windows explorer used to freeze up for long periods of time, often time-outing and aborting in the middle of the copy with nasty messages. That was unacceptable.

I have the same problem as described - so what change is required to avoid this? It's the syslinux.cfg entry, adding the elevator=

Is this compatible and can I be sure not to break anything with that? Sorry for asking, I have no knowledge of those things ...

Yes, it is pretty safe, you're not going to break anything. It may or may not bring improvement for you though -- you have to try it and see. For my setup it made a big difference! For some other people the change was not very noticeable.

Do the TWO changes: to your syslinux.cfg and to your go script. (as described in the first two posts of this thread). Notice that neither one of the two changes alone fixed my problem, bot the two TOGETHER did the trick.

Or, you may try it out without chnging your syslinux.cfg and your go script: Just telnet to your unraid box and issue the following two commands:

for i in /sys/block/[hs]d?; do echo deadline > $i/queue/scheduler ; done
for i in /sys/block/[hs]d?; do echo 128 > $i/queue/max_sectors_kb ; done 2>null

(If your telnet client allows you to paste stuff from the clipboard, then just copy the lines from here and paste them there to avoid any typing mistakes)

So try it out, copy some big number of large files to your unRaid box, and tell us what happens. I'd love to hear your report.

Purko

October 22, 200916 yr

for i in /dev/[hs]d? ; do echo deadline > /sys/block/${i:5}/queue/scheduler ; done 2>null
for i in /dev/[hs]d? ; do echo 128 > /sys/block/${i:5}/queue/max_sectors_kb ; done 2>null

This has improved the responsiveness of my server no end.

I can't comment on raw throughput and, largely, I don't really care.

Even when all the disks are thrashing reading and writing (with parity) I can continue on as normal with other array tasks.

Previously anything approaching a slightly intensive bought of writing would cause serious blocking issues for any other access.

Nice find.

Disk-write speed increase on low-powered unRaid build

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)