[6.8.3] docker image huge amount of unnecessary writes on cache

limetech · July 14, 2020

15 minutes ago, Dephcon said:

blkdiscard /dev/sdX or blkdiscard /mnt/cache?

blkdisard /dev/sdX # that is, on the raw device

16 minutes ago, Dephcon said:

can i assume space_cache=v2 is being used for the testing/default in an upcoming release?

correct. That is default now.

Dephcon · July 14, 2020

1 hour ago, limetech said:

blkdisard /dev/sdX # that is, on the raw device

correct. That is default now.

i might have to install beta25 sometime this week as I'm very curious now lol

Edited July 14, 2020 by Dephcon

CybranNakh · July 14, 2020

5 hours ago, limetech said:

The loopback approach is much better from the standpoint of data management. Once you have a directory dedicated to Docker engine, it's almost impossible to move it to a different volume, especially for the casual user.

Is upgrading to 6.9.0-beta25 all that is needed in order to fix this bug? I see from the change log it says this issue has been fixed. I currently have encrypted xfs on my array and my cache drive but from iotop -oa still shows loop2 writing excessively. I'm assuming I am just missing something obvious here....

I see one of the recommendation by @testdasi was to recreate the img to be docker-xfs.img as a work around.

bobo89 · July 15, 2020

9 month old 2x 1 TB Silicon Power 1TB NVMe in a BTRFS RAID1.

Data units written 395,293,581 [202 TB]

Data units read107,756,069 [55.1 TB]

2-3 VMs

15+ dockers

lots of unpacking on the drives, however 200 does seem excessive...

JorgeB · July 15, 2020

Did a test with a Windows VM to see if there was a difference with the new partition alignment, total bytes written after 16 minutes (VM is idling doing nothing, not even internet connected):

space_cache=v1, old alignment - 7.39GB

space_cache=v2, old alignment - 1.72GB

space_cache=v2, new alignment - 0.65GB

So that's encouraging, though I guess that unlike v2 space cache the new alignment might work better for some NVMe devices and don't make much difference for others, still worth testing IMHO, since for some it should also give better performance, for this test I used an Intel 600p.

TexasUnraid · July 15, 2020

interesting that the alignment would make that much of a difference, anyone have a technical explanation on why this is?

thecode · July 15, 2020

1 hour ago, TexasUnraid said:

interesting that the alignment would make that much of a difference, anyone have a technical explanation on why this is?

Here is a nice explanation why alignment is important:
https://www.minitool.com/lib/4k-alignment.html
https://www.thomas-krenn.com/en/wiki/Partition_Alignment_detailed_explanation

However I did not find a written reference why 1MiB. I did however worked on development of a Linux box with internal eMMC few years ago and I remember that the internal controller had very large erase block size, something between 1-3 MiB.

This may explain again that if the FS is not aligned with the erase block size it will increase the number of data written to the flash. There is a little mention about it here:
https://www.anandtech.com/show/14543/nvme-14-specification-published#:~:text=More Block Size and Alignment,block sizes measured in megabytes.

limetech · July 15, 2020

53 minutes ago, thecode said:

why 1MiB

Because that's what Microsoft chose.

Good reference to other references:

https://superuser.com/questions/1483928/why-do-windows-and-linux-leave-1mib-unused-before-first-partition

Theoretically partitions should be aligned on SSD "erase block size", eg:

https://superuser.com/questions/1243559/is-partition-alignment-to-ssd-erase-block-size-pointless

However "erase block size" is an internal implementation detail of an SSD device and the value is not commonly exported by any transfer protocol. You can write a program to maybe figure it out:

https://superuser.com/questions/728858/how-to-determine-ssds-nand-erase-block-size

But, referring back to "that's what Micorsoft chose" - SSD designers are going to make sure their products work well with Windows, and they know how Microsoft aligns partitions. Hence, pretty sure trying to figure out exact alignment is pointless IMHO.

TexasUnraid · July 15, 2020

Thanks for the explanation, makes total sense, just never really chased that tail to the conclusion before.

Just ordered a drive to use for parity so looking forward to 6.9 and being able to move dockers back onto the cache.

TexasUnraid · July 19, 2020

Ok, so as I understand it the fix for the excessive writes is the combo of space_cache=v2 and the new alignment.

The space_cache=v2 I can do now but it will still be roughly 2x the writes without the alignment fix.

The alignment fix can not be done until 6.9 as unraid will not recognize the partition.

Am I on the right track there?

Is there any way to use the new alignment with unraid 6.8?

I just got a drive to use for parity but that means I need to remove the SSD from the array that is currently formatted XFS and has docker/appdata.

Debating options now.

John_M · July 19, 2020

18 minutes ago, TexasUnraid said:

I need to remove the SSD from the array

Forgive me if you mentioned it earlier in this 17-page thread, but why did you assign your SSD to the array? Why not simply assign it as the (single) cache drive and format it XFS? I can't think of any advantage in putting it in the array.

Edited July 19, 2020 by John_M
typo

TexasUnraid · July 19, 2020

6 minutes ago, John_M said:

Forgive me if you mentioned it earlier in this 17-page thread, but why did you assign your SSD to the array? Why not simply assign it as the (single) cache drive and format it XFS? I can't think of any advantage in putting it in the array.

Because I already have a cache pool setup and 6.8 does not support multiple cache pools.

This drive is not even supposed to be in the server, I stole it out of a laptop since using this drive formatted as xfs was 50-100x less writes vs using the cache pool.

John_M · July 19, 2020

2 minutes ago, TexasUnraid said:

Because I already have a cache pool setup and 6.8 does not support multiple cache pools.

In that case, have you investigated whether the Unassigned Devices plugin would help you in the meantime? It allows devices that are not assigned to the array or the cache to be mounted when the array starts. It's quite possible that that isn't quite early enough to support putting the docker.img there and I haven't tried it, but it might be worth checking the support thread for that plugin.

TexasUnraid · July 19, 2020

I didn't think putting the docker on a UD device would be a good long term option. I always saw UD as a temporary use feature.

That said I could be wrong and it would work perfectly fine for docker and appdata. Anyone have any info on this?

John_M · July 19, 2020

7 minutes ago, TexasUnraid said:

Anyone have any info on this?

It seems like it has worked in the past, then something broke it and it may be fixed again, or not. See this old thread:

TexasUnraid · July 19, 2020

I realized why it won't work, you would have to edit every docker to point to the appdata on UD which would be a real pain and easy to mess up when having to do it x20+.

Although, can symlinks be used on unraid? So could I use a symlink between the cache and a UD device so that I could keep the same paths I have now?

John_M · July 20, 2020

3 hours ago, TexasUnraid said:

can symlinks be used on unraid?

Yes, certainly they can. Why don't you keep your appdata on the cache pool? That's where most people keep it.

TexasUnraid · July 20, 2020

19 minutes ago, John_M said:

Yes, certainly they can. Why don't you keep your appdata on the cache pool? That's where most people keep it.

The writes are massively inflated with appdata as well as docker. Now that we understand why, it makes sense, the tiny writes that both make will cause the writing of at least 2 full blocks on the drive + the filesystem overhead with the free space caching. Even if it just wanted to write 1 byte.

Great, so the symlinks won't cause any issues with the fuse file system?

I simply put a symlink in cache pointing towards the UD drive and everything works as expected, the files will be accessible from the /user file system?

That could work, have not actually used symlinks in linux yet but no time like the present to learn lol. Used them a lot in windows.

Edited July 20, 2020 by TexasUnraid

limetech · July 20, 2020

1 hour ago, TexasUnraid said:

The writes are massively inflated with appdata as well as docker

I don't think this is true anymore if you repartition the SSD device(s).

testdasi · July 20, 2020

11 hours ago, TexasUnraid said:

The writes are massively inflated with appdata as well as docker. Now that we understand why, it makes sense, the tiny writes that both make will cause the writing of at least 2 full blocks on the drive + the filesystem overhead with the free space caching. Even if it just wanted to write 1 byte.

Great, so the symlinks won't cause any issues with the fuse file system?

I simply put a symlink in cache pointing towards the UD drive and everything works as expected, the files will be accessible from the /user file system?

That could work, have not actually used symlinks in linux yet but no time like the present to learn lol. Used them a lot in windows.

No prob with symlinks. I use that to point things everywhere.

I even make a kill switch for my most important data (bash script to remove the symlink takes millisecond to complete and would completely cut off my data from e.g. any cryptovirus doing sinister stuff on the network).

Edited July 20, 2020 by testdasi

TexasUnraid · July 20, 2020

10 hours ago, limetech said:

I don't think this is true anymore if you repartition the SSD device(s).

Yeah, it was the same root cause as the docker writes. So fix one and you fix them both.

So are you saying that I can reparition the SSD's on 6.8 and they will work? Basically make my cache like 6.9 will be (and hopefully compatible as well so I don't need to convert again later)?

How would I go about doing this?

In the UD thread something was said about the new partition not being backward compatible.

Edited July 20, 2020 by TexasUnraid

TexasUnraid · July 20, 2020

8 minutes ago, testdasi said:

No prob with symlinks. I use that to point things everywhere.

I even make a kill switch for my most important data (bash script to remove the symlink takes millisecond to complete and would completely cut off my data from e.g. any cryptovirus doing sinister stuff on the network).

Good to know, interesting use case as well. How would the script know that an attack is taking place?

So no gotchas with symlinks on unraid? works just like any other linux system (aka, I can look up generic symlink tutorials online)?

Edited July 20, 2020 by TexasUnraid

JorgeB · July 20, 2020

2 minutes ago, TexasUnraid said:

So are you saying that I can reparition the SSD's on 6.8 and they will work?

No, new alignment only works on v6.9.

TexasUnraid · July 20, 2020

2 minutes ago, johnnie.black said:

No, new alignment only works on v6.9.

Thats what I thought, I am waiting for at least the RC of 6.9 to consider upgrading now that the server is in service.

Symlinks / UD sound like a good stopgap, makes it simple to swap over to 6.9 as well since the paths will remain the same.

Edited July 20, 2020 by TexasUnraid

boomam · July 20, 2020

Do we know if there will be a definitive guide created for this issue once 6.9 drops, to help people convert over/transfer data/etc?

[6.8.3] docker image huge amount of unnecessary writes on cache

User Feedback

Recommended Comments

limetech 3327

Link to comment

Dephcon 20

Link to comment

CybranNakh 0

Link to comment

bobo89 5

Link to comment

JorgeB 7481

Link to comment

TexasUnraid 113

Link to comment

thecode 49

Link to comment

limetech 3327

Link to comment

TexasUnraid 113

Link to comment

TexasUnraid 113

Link to comment

John_M 413

Link to comment

TexasUnraid 113

Link to comment

John_M 413

Link to comment

TexasUnraid 113

Link to comment

John_M 413

Link to comment

TexasUnraid 113

Link to comment

John_M 413

Link to comment

TexasUnraid 113

Link to comment

limetech 3327

Link to comment

testdasi 500

Link to comment

TexasUnraid 113

Link to comment

TexasUnraid 113

Link to comment

JorgeB 7481

Link to comment

TexasUnraid 113

Link to comment

boomam 15

Link to comment

Join the conversation