Jump to content

Slow and weird copy process to array with reconstruct write on


Recommended Posts

Hi guys,

 

I experience - for my unknowing logic - a weird copy behavior when writing data to the array (e.g. from USB HDD):

 

It all starts as expected, all disks except the 2 writing ones (Parity and the disk the data is landing on) read with 223-250 MB/s and the writing ones write with the same speed, but after a file or so most disks stop reading and the parity and mostly 2, sometimes 3 other disks continue reading/writing like using read/modify/write...but simultainiously on multiple data disks. It looks like this:

 

copy.PNG.770158f587d2733cac08bb4f147a84a9.PNG

 

I tried it with rsync and Midnight Commander, with dockers on and off, always the same result. From time to time the copy process seems to get back on track copying as expected, but again shows this behavior shortly after. Also the whole system becomes "weird", like stopping/restarting dockers takes forever or doesn't even work at all or the GUI not showing they are stopped, accessing dockers websites is horribly slow (although they are on the Cache drives) and so on.

 

 

I would think it's some temporary caching to RAM stuff, but I have no real idea how to find a way around it (without possibly breaking my unRAID setup 🙂 ). I tried rsync --drop-cache and tried googleing for "linux" and "unraid disable copy cache" - which in the unRAID case brings up stuff about the cache drives...it's like googleing for "big hammer". ;-)

 

So do you have an idea what's going on and how to solve it/get around it? I wanted to copy ~10TB and after leaving it running for about 8h MC tells me it'll take another 31h to finish...

Edited by Torben
Link to comment
  • Torben changed the title to Slow and weird copy process to array with reconstruct write on

Thank you for the fast reply.

 

I'm running all shares in High-Water mode, but investigating in this direction maybe this information could be relevant: 90% of the copy process is replacing existing files (mostly because of changed dates), only 10% is creating new ones.

Link to comment

Ok, I got that, thank you for explaining. 🙂

 

The point I don't really get is why it's writing to multiple disks at once (and if it's possible to work around that). I'm not starting multiple copy processes, I'm just marking all files and hit "Copy" (with one share as target), so something must "make it" multiple. I could guess, but with my 1h Google knowledge about Linux copy processes this probably wouldn't be helpful. 🙂

Link to comment
1 hour ago, Torben said:

and if it's possible to work around that

If the source is also multiple disks, you can do what I do when I need to transfer a lot of data, disable parity, then transfer data to 3 or 4 disks at the same time, I limit to 4 since I usually do it using SSH over 10GbE and more than 4 usually doesn't go any faster, then sync parity in the end.

Link to comment

Nope, it's just one source, one target share, copying file a, b, c, d, e, all in a row not at the same time and that's why I don't get how there can be multiple copy processes at once causing the issue. 😞

 

I just thought about it and when there is absolutely is no other way, I'll delete the files before copying them over again (and hopefully don't mess up the script). It's kinda stupid, but it's computers. 🙂

Link to comment
14 minutes ago, Torben said:

all in a row not at the same time and that's why I don't get how there can be multiple copy processes at once causing the issue.

Because when writing to multiple disks it will start writing to another disk before data is flushed from RAM to the previous one, if there's no other way for your use case setting RAM cached writes to minimum should alleviate the issue.

  • Thanks 1
Link to comment

Ok, so it's the RAM cache feature, which normally doesn't bother me (probably it's even useful in certain use cases 🙂 ).

 

I found your post regarding the other way around (not enough RAM is used):

So I set vm.dirty_ratio to 2 and vm.dirty_background_ratio to 1 and the copy job runs way better. Most of the time with 120 MB/s and not 250, but that's 1/3 of the time it would have taken otherwise.

 

If I should change something else or adjust these two ratios to other values, please let me know.

 

Anyways, thank you very, very much for your help!

 

Edited by Torben
  • Like 1
Link to comment
6 hours ago, Torben said:

90% of the copy process is replacing existing files (mostly because of changed dates), only 10% is creating new ones.

Normally if you are over-writing an existing file it is updated ‘in-place’ which can mean that if they are spread across multiple drives you end up with multiple drives receiving writes.

Link to comment

Sure, I just didn't expect files to be copied simultaniously, since I didn't know about RAM write cache in Linux until that point. And when I see that it disables Turbo write and almost lowers write speed to half when in normal mode, I'm asking myself if it really makes sense. There sure are use cases when you e.g. have no cache SSDs or work with a faster source than the target and don't want to wait for the copy process to finish, but otherwise I do only see cons.

 

Lowering both "dirty ratios" helps big time, although not completely mitigating the behavior. Further searching the web brought up mounting a drive with "-o sync" should disable RAM write caching. Since I don't want to tinker with something that important in the unRAID config, without exactly knowing what I'm doing, what do you guys think? Would this change bring up the expected behavior of fully transferring the first file to the array then start with the next? This would overall speed up copy processes for me.

 

If so, could you advise where to do the changes to give it a try?

Edited by Torben
Link to comment
On 7/16/2021 at 3:06 PM, JorgeB said:

This happens if allocation mode is set to most free, writes will overlap to multiple disks and when this happens Unraid disables turbo write, most free should never be used for best performance, use fill up or highwater.

So we get the choice of a parity bottleneck (most free) or a single array disk bottleneck (fill up / highwater) :(

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...