Slow and weird copy process to array with reconstruct write on

Torben · July 16, 2021

Hi guys,

I experience - for my unknowing logic - a weird copy behavior when writing data to the array (e.g. from USB HDD):

It all starts as expected, all disks except the 2 writing ones (Parity and the disk the data is landing on) read with 223-250 MB/s and the writing ones write with the same speed, but after a file or so most disks stop reading and the parity and mostly 2, sometimes 3 other disks continue reading/writing like using read/modify/write...but simultainiously on multiple data disks. It looks like this:

copy.PNG.770158f587d2733cac08bb4f147a84a9.PNG

I tried it with rsync and Midnight Commander, with dockers on and off, always the same result. From time to time the copy process seems to get back on track copying as expected, but again shows this behavior shortly after. Also the whole system becomes "weird", like stopping/restarting dockers takes forever or doesn't even work at all or the GUI not showing they are stopped, accessing dockers websites is horribly slow (although they are on the Cache drives) and so on.

I would think it's some temporary caching to RAM stuff, but I have no real idea how to find a way around it (without possibly breaking my unRAID setup 🙂 ). I tried rsync --drop-cache and tried googleing for "linux" and "unraid disable copy cache" - which in the unRAID case brings up stuff about the cache drives...it's like googleing for "big hammer".

So do you have an idea what's going on and how to solve it/get around it? I wanted to copy ~10TB and after leaving it running for about 8h MC tells me it'll take another 31h to finish...

Edited July 16, 2021 by Torben

JorgeB · July 16, 2021

This happens if allocation mode is set to most free, writes will overlap to multiple disks and when this happens Unraid disables turbo write, most free should never be used for best performance, use fill up or highwater.

Torben · July 16, 2021

Thank you for the fast reply.

I'm running all shares in High-Water mode, but investigating in this direction maybe this information could be relevant: 90% of the copy process is replacing existing files (mostly because of changed dates), only 10% is creating new ones.

JorgeB · July 16, 2021

Yes, anytime you write to multiple disks you'll experience that, firstly because writes to parity will overlap, secondly because, and against what we asked, LT auto-disables turbo write when writes to multiple disks are detected, and nothing we can do about it.

Torben · July 16, 2021

Ok, I got that, thank you for explaining. 🙂

The point I don't really get is why it's writing to multiple disks at once (and if it's possible to work around that). I'm not starting multiple copy processes, I'm just marking all files and hit "Copy" (with one share as target), so something must "make it" multiple. I could guess, but with my 1h Google knowledge about Linux copy processes this probably wouldn't be helpful. 🙂

JorgeB · July 16, 2021

1 hour ago, Torben said:

and if it's possible to work around that

If the source is also multiple disks, you can do what I do when I need to transfer a lot of data, disable parity, then transfer data to 3 or 4 disks at the same time, I limit to 4 since I usually do it using SSH over 10GbE and more than 4 usually doesn't go any faster, then sync parity in the end.

Torben · July 16, 2021

Nope, it's just one source, one target share, copying file a, b, c, d, e, all in a row not at the same time and that's why I don't get how there can be multiple copy processes at once causing the issue. 😞

I just thought about it and when there is absolutely is no other way, I'll delete the files before copying them over again (and hopefully don't mess up the script). It's kinda stupid, but it's computers. 🙂

JorgeB · July 16, 2021

14 minutes ago, Torben said:

all in a row not at the same time and that's why I don't get how there can be multiple copy processes at once causing the issue.

Because when writing to multiple disks it will start writing to another disk before data is flushed from RAM to the previous one, if there's no other way for your use case setting RAM cached writes to minimum should alleviate the issue.

Torben · July 16, 2021

Ok, so it's the RAM cache feature, which normally doesn't bother me (probably it's even useful in certain use cases 🙂 ).

I found your post regarding the other way around (not enough RAM is used):

So I set vm.dirty_ratio to 2 and vm.dirty_background_ratio to 1 and the copy job runs way better. Most of the time with 120 MB/s and not 250, but that's 1/3 of the time it would have taken otherwise.

If I should change something else or adjust these two ratios to other values, please let me know.

Anyways, thank you very, very much for your help!

Edited July 16, 2021 by Torben

itimpi · July 16, 2021

6 hours ago, Torben said:

90% of the copy process is replacing existing files (mostly because of changed dates), only 10% is creating new ones.

Normally if you are over-writing an existing file it is updated ‘in-place’ which can mean that if they are spread across multiple drives you end up with multiple drives receiving writes.

Torben · July 20, 2021

Sure, I just didn't expect files to be copied simultaniously, since I didn't know about RAM write cache in Linux until that point. And when I see that it disables Turbo write and almost lowers write speed to half when in normal mode, I'm asking myself if it really makes sense. There sure are use cases when you e.g. have no cache SSDs or work with a faster source than the target and don't want to wait for the copy process to finish, but otherwise I do only see cons.

Lowering both "dirty ratios" helps big time, although not completely mitigating the behavior. Further searching the web brought up mounting a drive with "-o sync" should disable RAM write caching. Since I don't want to tinker with something that important in the unRAID config, without exactly knowing what I'm doing, what do you guys think? Would this change bring up the expected behavior of fully transferring the first file to the array then start with the next? This would overall speed up copy processes for me.

If so, could you advise where to do the changes to give it a try?

Edited July 20, 2021 by Torben

tjb_altf4 · July 21, 2021

On 7/16/2021 at 3:06 PM, JorgeB said:

This happens if allocation mode is set to most free, writes will overlap to multiple disks and when this happens Unraid disables turbo write, most free should never be used for best performance, use fill up or highwater.

So we get the choice of a parity bottleneck (most free) or a single array disk bottleneck (fill up / highwater)

Slow and weird copy process to array with reconstruct write on

Recommended Posts

Torben

Link to comment

JorgeB

Link to comment

Torben

Link to comment

JorgeB

Link to comment

Torben

Link to comment

JorgeB

Link to comment

Torben

Link to comment

JorgeB

Link to comment

Torben

Link to comment

itimpi

Link to comment

Torben

Link to comment

tjb_altf4

Link to comment

Join the conversation