ZFS plugin for unRAID


steini84

Recommended Posts

Yeah, I'm just using image files on unraid.  I found great returns on virtual machines, especially when based on the same install iso - but even on different ones.  I found reasonable returns on isos and documents.  I probably have a few duplicated isos with not so obvious names so it saves having to sort that out.  I don't think there's much benefit in dockers but could be wrong.  And that's correct I don't use ZVOL's.  I've tried them and found them them at best to be non-advantageous and a lot less flexible.  I don't yet understand why anyone would use them really, except maybe for iscsi targets.

Link to comment
2 hours ago, Marshalleq said:

I'm using dedup quite successfully.  What I've learnt is that most people whom say it isn't worth it either haven't looked at it for a while (so are just continuing on old stories without checking) or are not applying it to the right type of data.

 

Nobody here said dedup wasn't worth using outright, I said that in my experience running steam folders over network isn't worth it.  If the desired outcome is to tinker and experiment and learn, then I'm all for that.  If the desired outcome is the right tool for the right job, then for the benefit of other people who might be reading this and think this is a good idea for performance reasons - it probably isn't (in my opinion).  A cheap SSD (on the gaming machine itself) will almost certainly perform better.

 

Network speed / ZFS performance are factors but not the only ones - consider all the context switching between drivers and network stacks for every single read and write to storage.

 

A normal gaming computer

application requests read/write data
-> storage driver
-> disk performs read or write
-> storage driver
-> application

 

A gaming computer with iscsi storage

application requests read/write data
-> storage driver (iscsi initator)
-> network driver (client)
-> transmit request over network
-> network driver (server)
-> storage driver (iscsi target)
-> disk performs read or write
-> storage driver (iscsi target)
-> network driver (server)
-> transmit request over network
-> network driver (client)
-> storage driver (iscsi initiator)

 

This context switching is not without cost.  It's the reason SATA is going away and SSD storage is being placed directly on the PCIE bus now - the fewer protocols/driver stacks that storage needs to go through, the better.

 

2 hours ago, Marshalleq said:

In my case I'm running a special vdev.  It works extremely well for the content that can be deduped (such as VM's).  I've never noticed any extra memory being used either as I do believe this is handled by the special vdev.

 

If you think a bit more about how dedupe works, it absolutely must use more memory.  Every time you're issuing a write to dedupe pool, ZFS needs to hash the data you're asking to write and compare it against hashes of data already stored (the DDT table).  If those hashes don't fit in ARC memory, ZFS would need to read the missing hashes from the vdev or the pool every time you issue a write command.  Yikes.  If you throw a bunch of data at ZFS dedupe and haven't planned your memory/metadata requirements, you could find your dedupe storage grinding to a halt.

 

The special vdev just provides a low-latency place for reading/writing metadata/DDT table because they will be constantly thrashed by reads/writes:

https://www.truenas.com/docs/references/zfsdeduplication/

 

So as I stated in previous post, this will add even more context switching/latency than running iscsi alone.

Edited by jortan
Link to comment

Some stats on my setup to give you an indication:

 

I run two configs

 

1 - 4x480G SSD's in RaidZ1 - this hosts docker and virtual machines.  I have only 3 VM's at present totally about 50G.  I have a bunch of dockers but only the VM's are deduped.  There is 653G free and my dedup ratio across the whole pool is 1.11 (i.e. 11%).

2 - 4x16TB HDD's in Raidz1 with 2x 150G mirror for a special vdev with small blocks up to 32k enabled.  Most of the Pool is unique data that cannot be deduped.  There is 598G free on the array and 70G free on the special vdev.  I dedup my backups folder, documents folders, isos, temp folder which totals about 530G, I am getting 1.14 dedup ratio across the whole pool which is about 14%.

 

I think these numbers are pretty good.  I thoroughly tested the memory usage before and after for the Raidz1 array as I was unsure if all or some of it would go to the special vdev.  I noticed no difference at all.  I did the same for the virtual machines on the array without the special vdev and while this was less scientific, also noticed no perceptible difference (I mention because so many people cry out that dedup uses too much RAM).  Now, I do have 96G of RAM in this system, however before enabling dedup on anything the RAM usage was sitting around 93-96% full.  It didn't change.  I think this speaks well to the issue as I would have had big failures if it did use a lot of RAM.  I've been running it like this for a long time now and no issues yet.

 

I hope that helps!

Link to comment

Just saw reply from @jortan (previous reply was just foreseeing some of the questions and trying to be helpful).

34 minutes ago, jortan said:

Nobody here said dedup wasn't worth using outright, I said that in my experience running steam folders over network isn't worth it.  If the desired outcome is to tinker and experiment and learn, then I'm all for that.  If the desired outcome is the right tool for the right job, then for the benefit of other people who might be reading this and think this is a good idea for performance reasons - it probably isn't (in my opinion).  A cheap SSD (on the gaming machine itself) will almost certainly perform better.

None of my comments were directed at you, just directed at the misinformation lying around the web - which is what you find when you google and get old documents.  Some of the newer stuff now is reflecting the newer state, but unfortunately also some of the newer stuff is still getting written by people whom haven't tried it for quite a while and are repeating out of date experiences - special vdevs in particular being the main case of change here.

 

34 minutes ago, jortan said:

If you think a bit more about how dedupe works, it absolutely must use more memory.  Every time you're issuing a write to dedupe pool, ZFS needs to hash the data you're asking to write and compare it against hashes of data already stored (the DDT table).  If those hashes don't fit in ARC memory, ZFS would need to read the missing hashes from the vdev or the pool every time you issue a write command.  

I do believe special vdevs hold all the DDT for the pool, it even says that on the page you linked.  Except for when it's full of course.

 

34 minutes ago, jortan said:

The special vdev just provides a low-latency place for reading/writing metadata/DDT table because they will be constantly thrashed by reads/writes:

https://www.truenas.com/docs/references/zfsdeduplication/

 

So as I stated in previous post, this will add even more context switching/latency than running iscsi alone.

If you read that page a little deeper, it says this thrashing happens when the special vdev gets full, not 'constantly' as you say above - this is because it will start putting the DDT in the main pool instead of the special vdev once it starts getting full.  Of course, this is talking about a busy corporate environment that is worrying about IOPS all the time, for the average person playing around at home (something that @subivoodoo seems to indicate is their scope i.e "My aproach is safe some space on clients... and play with IT stuff 😁 ") then this would not be an issue.

 

In any case, I have a very high IOPS requirements and I am constantly marvelling at how well it does considering I'm just running Raidz1 on everything, have dedup on, my special vdev is running on great but older Intel SSD's that are actually quite slow, the mail server, the various web services, automation and undoubtedly a ton of misaligned cluster sizes which are killing it etc etc.  It's under constant use and really it's incredible.

 

Can we argue in a corporate environment we could get more performance?  Absolutely, but if it were we wouldn't be running unraid, it wouldn't be all on one box and a whole bunch of other things.

 

Sorry for the laborious post, but I think it's fair to say that the dedup scare mongering that's out there need some balance - again, not directed at you.

 

Marshalleq.

 

PS, I've tried that lancache, ran it for a few years, it works well sometimes, others not so much.  Definitely worth a try though.

 

Edited by Marshalleq
Link to comment
1 hour ago, Marshalleq said:

1 - 4x480G SSD's in RaidZ1 - this hosts docker and virtual machines.  I have only 3 VM's at present totally about 50G.  I have a bunch of dockers but only the VM's are deduped.

 

Operating system volumes for virtual machines is a use-case where dedupe is going to be very useful.  Glad it's working for you.

 

1 hour ago, Marshalleq said:

just directed at the misinformation lying around the web - which is what you find when you google and get old documents.  Some of the newer stuff now is reflecting the newer state, but unfortunately also some of the newer stuff is still getting written by people whom haven't tried it for quite a while and are repeating out of date experiences - special vdevs in particular being the main case of change here.

 

I don't mean to be rude, but the assertions in your post are bording on misinformation and could lead people to get themselves in to very difficult situations:

 

1 hour ago, Marshalleq said:

I thoroughly tested the memory usage before and after for the Raidz1 array as I was unsure if all or some of it would go to the special vdev.  I noticed no difference at all.  I did the same for the virtual machines on the array without the special vdev and while this was less scientific, also noticed no perceptible difference (I mention because so many people cry out that dedup uses too much RAM).  Now, I do have 96G of RAM in this system, however before enabling dedup on anything the RAM usage was sitting around 93-96% full.  It didn't change.  I think this speaks well to the issue as I would have had big failures if it did use a lot of RAM.

 

I strongly recommend you review how ZFS/ARC allocates memory dynamically as you have reached the wrong conclusion from your observations.

 

ZFS ARC is utilising available unused memory in both situations - this is why you aren't seeing any noticeable change.  When other applications or the operating system demand more memory, ZFS will dynamically reduce the size of the ARC.

 

Without dedupe, the ARC is caching ZFS metadata and data inside your pools.  When you have dedupe enabled it additionally needs to cache the DDT.  The DDT is going to occupy some of that space in the ARC that would otherwise be used to cache data/metadata.  If the size of the DDT exceeds the memory that ARC is able to allocate, you will suddenly run in to serious performance issues.

 

1 hour ago, Marshalleq said:

I do believe special vdevs hold all the DDT for the pool, it even says that on the page you linked.  Except for when it's full of course

 

Yes, the special VDEV absolutely holds the DDT and helps with performance for dedupe pools.  However, this is no substitute for carefully considering the memory requirements for the DDT when enabling dedupe with ZFS.

 

It's critically important that the DDT also fits in memory.  Think for a moment about how ZFS dedupe works.  Every block written is hashed and that hash placed in the DDT - or a reference to an existing hash.  The DDT is written to the special vdev, but also dynamically cached in memory by the ZFS ARC.  Every subsequent write to that dataset needs to hashed and compared against all the hashes in the DDT.  All of them.  For every block you write.  Each write can't occur until it's been compared to all the hashes in the DDT. 

 

If those hashes aren't already cached in the ARC, it needs to read them from the special vdev.  That takes far, far more time than if the hash was already in memory.  If reading parts of the DDT from the special vdev pushes another part of the DDT out of the ARC (because you don't have enough memory), then very suddenly your performance is going to tank abysmally.  Every single block written is going to need hundreds or thousands of reads to your special vdev to complete.  Every.  Single.  Write.

 

The reason your ZFS dedupe is working great, is not (only) because of the special vdev, it's because you (currently) have sufficient memory for the DDT to be permanently cached in the ARC.  That could change and very suddenly if you write more data to your dedupe datasets and/or other applications demand memory and ZFS reduces the ARC size.

 

1 hour ago, Marshalleq said:

Can we argue in a corporate environment we could get more performance?  Absolutely, but if it were we wouldn't be running unraid, it wouldn't be all on one box and a whole bunch of other things.

 

This is not about corporate vs. homelab performance, this is about ZFS dedupe working brilliantly, until it very suddenly works terribly (or stops working at all) because the DDT no longer fits in memory.  To say that ZFS dedupe doesn't have additional memory requirements anymore because of special vdevs is simply not true.

 

1 hour ago, Marshalleq said:

If you read that page a little deeper, it says this thrashing happens when the special vdev gets full, not 'constantly' as you say above

 

No, they will become slower when they fill up, but will always get thrashed relatively speaking - because writes to a dedupe dataset in the pool requires a multiple of writes to the DDT:

 

Quote

Reduced I/Oexpand

Deduplication requires almost immediate access to the DDT. In a deduplicated pool, every block potentially needs DDT access. The number of small I/Os can be colossal; copying a 300 GB file could require tens, perhaps hundreds of millions of 4K I/O to the DDT. This is extremely punishing and slow. RAM must be large enough to store the entire DDT and any other metadata and the pool will almost always be configured using fast, high quality SSDs allocated as “special vdevs” for metadata. Data rates of 50,000-300,000 4K I/O per second (IOPS) have been reported by the TrueNAS community for SSDs handling DDT. When the available RAM is insufficient, the pool runs extremely slowly. When the SSDs are unreliable or slow under mixed sustained loads, the pool can also slow down or even lose data if enough SSDs fail.

https://www.truenas.com/docs/references/zfsdeduplication/

Edited by jortan
Link to comment
10 minutes ago, jortan said:

 

This is not about corporate vs. homelab performance, this is about ZFS dedupe working brilliantly, until it very suddenly works terribly (or stops working at all) because the DDT no longer fits in memory.  To say that ZFS dedupe doesn't have additional memory requirements anymore because of special vdevs is simply not true.

Sigh, yes, it absolutely is, the original poster declared that a home scenario was what they were working on and you seem to keep comparing it to disaster scenarios.   No-one here is saying don't be careful, don't plan, backup your data or whatever applies, people need to be given some credit, they're not all morons.

 

10 minutes ago, jortan said:

I don't mean to be rude, but the assertions in your post are bording on misinformation and could lead people to get themselves in to very difficult situations:

 

LOL, if I answer this it's going to get into a flame war, so I'm just going to leave it (and the remainder of the points).  The poster has the information and two opinions on it.  I have given actual evidence, you have given your experience, which I'm sure is also extensive.  They can make their own decision as to whether this works for their lab, or whatever they end up doing.  Thanks for the info, have a great day.

 

Marshalleq.

Edited by Marshalleq
Link to comment
6 minutes ago, Marshalleq said:

I have given actual evidence, you have given your experience

 

5 minutes ago, Marshalleq said:

I thoroughly tested the memory usage before and after for the Raidz1 array as I was unsure if all or some of it would go to the special vdev.  I noticed no difference at all.

 

Your evidence is based on not really understanding how ZFS allocates memory.

 

If you keep adding deduped data or applications with memory requirements, at some point your system will suffer significant performance issues or might stop working completely.  It might be next week or it might be never - you don't know when because you haven't considered/measured it.

 

Quote

 I've never noticed any extra memory being used either as I do believe this is handled by the special vdev. 

 

This isn't evidence or a difference of opinion, it's just factually incorrect.  If you don't want to learn, that's fine, I'm trying to correct misinformation for the benefit of others who might try to learn about ZFS here.  ZFS Dedupe does have additional memory requirements, regardless of the addition of a special vdev.  It's crucial that you have enough memory for the DDT.

 

22 minutes ago, Marshalleq said:

declared that a home scenario was what they were working on and you seem to keep comparing it to disaster scenarios.

 

Implementing ZFS dedupe without measuring/considering memory requirements is inviting a disaster scenario.  You may not care if your ZFS pool/Unraid server could potentially fall off a performance cliff or stop working at all.  For people who would consider that a minor disaster:

 

Quote

ZFS supports deduplication ... It must be considered exceedingly carefully and the implications understood, before being used in a pool.

https://www.truenas.com/docs/references/zfsdeduplication/

Link to comment

My intention was not to start the to dedup or not discussion ☺️ But your discussion is interesting...

 

I have a homelab and my intension is just figure out how iSCSi and ZFS works. So I started to test with a TrueNAS VM on my existing unraid server... were I get great results (almost max 10G line speed from cache, dedup ratio up to 3 for 4 more or less equal steam libs).

 

So as I know the drawbacks of dedup I will descide... bigger cheep SSD's in the PC's of my kids or figure out if my current 80G RAM of unraid is enough. But other than that... the dedup ratio is anyway 1.00 if i run this setup directly on unraid (new sparse zvol's). So it does not work anyway... but I need understand why just for myself 😁 

 

By the way, its cool anyway that iSCSI + ZFS is possible on unraid with 2 great plugins from the community + one manual CMD line to create a block backstorage in iSCSI based on a zvol...

 

 

Link to comment

Update on my testing... zfs with dedup/compression/zvol + iSCSI works also on unraid. It was my fault, I forgot to disable bitlocker on the test laptop. Now as I have disabled bitlocker, the dedup ratio begins to raise if I copy the same file multiple times onto the drive... see here (still the imported pool from TrueNAS):

 

# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
testpool   928G  11.6G   916G        -         -     0%     1%  1.08x    ONLINE  -

 

Edited by subivoodoo
Link to comment

Nice to hear @subivoodoo, yeah encryption that will do it!  There's a video online somewhere about a guy doing something similar for multiple machines in his house.  He configured steam data to be on a zfs pool for all computers and the the dedup meant he only had to store one copy.  Cool idea - I would have thought the performance was bad, but apparently not.  How's your RAM usage?

 

@jortan I've seen you've replied, but I'm not going to read it sorry - I can see it's just more of the same and I don't see the value for everyone else of having a public argument.  I get that differences of opinion get annoying and it feels good to be right, so lets just say you're right.  Have a great day and don't stress about it. :)

Link to comment

I'll report when its done... at the moment I don't see much RAM useage. But what I can see right now is way more CPU usage for the dedup, sync and/or iSCSI running directly on unraid compared with the TrueNAS VM as "same backend" for this (on the same machine).

 

Having just one copy of all the games in my house seams to be possible... probably also GPU-P can be a thing now for me with only the OS locally... Linus has done a Video for that right now https://youtu.be/Bc3LGVi5Sio

Just sharing my main Gaming rig with VM's 3-4 times with the full Steam Lib "loaded" within every VM and don't need more storage for every new copy! Not to mention about buying new GPU's at the moment for all of my kids... 😑

Link to comment
9 hours ago, Marshalleq said:

@jortan I've seen you've replied, but I'm not going to read it sorry - I can see it's just more of the same and I don't see the value for everyone else of having a public argument ... I get that differences of opinion get annoying

 

I'm not trying to win an argument.  If I post something that's wrong, I appreciate it when someone takes the time to correct me (and will edit the original post containing the incorrect information).  The information you have posted isn't a difference of opinion, it's just incorrect as per openzfs documentation.  If you don't want to learn and don't want to edit your posts then I guess we're done here.  For someone who complains about ZFS misinformation on the internet, I find this bizarre.  Have a great day?

 

Quote

The hash table (also known as the deduplication table, or DDT) must be accessed for every dedup-able block that is written or freed (regardless of whether it has multiple references). If there is insufficient memory for the DDT to be cached in memory, each cache miss will require reading a random block from disk, resulting in poor performance. For example, if operating on a single 7200RPM drive that can do 100 io/s, uncached DDT reads would limit overall write throughput to 100 blocks per second, or 400KB/s with 4KB blocks.

https://openzfs.readthedocs.io/en/latest/performance-tuning.html

Edited by jortan
Link to comment
25 minutes ago, ensnare said:

Is there a way to update yet to OpenZFS 2.1.2? There are some bug fixes that affect my set up. Thank you

On what unRAID version are you?

 

For 6.9.2 and all versions above OpenZFS 2.1.2 is already compiled and ready to install, I think you have to delete the old package on your USB Boot device and reboot your server (make sure that you've got a active internet connection on boot, otherwise the download will fail).

 

To remove the old package run this command from a unRAID console:

rm /boot/config/plugins/unRAID6-ZFS/packages/*

 

Link to comment

Here some numbers while I'm still testing my ZFS-dedup+iSCSI game library idea and the pool is not yet exported to unraid.

Specs:

- TrueNAS 12.U7 as VM on unraid with 32G RAM, 2x2TB SATA SSD passthrough (I descided to use some old cheap consumer SSD's)

- these 2 disks striped together in a pool, sync off, dedup on (data loss doesn't matter, games can be downloaded again and again...)

- 4 game libraries in total (2x800G + 2x200G) fully loaded, each as individual zvol linked via iSCSI to the clients, tested and all is working

- 951G allocated disk space for this 2TB data in total (at the moment)

- Dedup ratio of 2.16, compression off (I don't get more than 1.01 compression ration on the game libs)

- 58mio DDT entries à 300B => around 16-17GB dedup table in RAM needed

- 10G network

 

Subjective impressions:

Read speed is astonshing for disks over network, game loading times doesn't differ a lot compared with local NVMe SSD's!

Write speeds are good as long as the data fit's into the ARC cache... afterwards it drops to 50-100MB/s until the cache is flushed and ready for high speeds 😁.

 

Some benchmarks screenshot and the "10G file copy" test attached (Windows really can't calculate a correct copy time...).

 

And here some game loading time differences in seconds (local gen3 Nvme vs. iSCSI-ZVOL):

- MSFS2020 until Menu: 185 vs. 195 (I personally don't understand why this game tooks so long to load also on Nvme!!!)

- Battelfield V until Menu: 57 vs 63

- Battlefield level loading: 26 vs 28

- Doom Ethernal until Menu: 46 vs 56 (I hate watching intros and warn text every time 😄)

- Doom Ethernal level loading: 6 vs 8

- Cyberpunk 2077 level loading: 6 vs 14 (here I can see the biggest difference)

 

The next step is export this game library ZFS pool to unraid, configure iSCSI on unraid and test again.

20220113-ZFS-Dedup-iSCSI-Game-Ladezeiten-CrystalDisk.png

20220113-ZFS-Dedup-iSCSI-Game-Ladezeiten-ATTO-Transfer.png

20220113-ZFS-Dedup-iSCSI-Game-Ladezeiten-ATTO-IO.png

  • Like 1
Link to comment
34 minutes ago, Marshalleq said:

@subivoodoo Great feedback!  Did you by chance try the special vdev (or want to) to see what the difference is in terms of ram usage for dedup?  I figure for this test, any small ssd would do (though typically you'd want it to be mirrored).

 

By default openzfs on Linux will "consume" up to half of system memory for the ARC, subject to other memory requirements of the system.  This memory is allocated to the ARC regardless of whether dedupe is enabled or not and the amount allocated can't be used to measure how much memory is being used by dedupe.  The special vdev provides faster read/write access for permanent storage of the metadata and DDT, but the DDT still needs to fit in ARC memory to avoid significant performance issues.

 

Quote

If there is insufficient memory for the DDT to be cached in memory, each cache miss will require reading a random block from disk, resulting in poor performance.

https://www.truenas.com/docs/references/zfsdeduplication/

 

The amount of memory required to keep DDT cached in ARC ...

 

Quote

... depends on the size of the DDT and how much data will be stored in the pool. Also, the more duplicated the data, the fewer entries and smaller DDT. Pools suitable for deduplication, with deduplication ratios of 3x or more (data can be reduced to a third or less in size), might only need 1-3 GB of RAM per 1 TB of data. The actual DDT size can be estimated by deduplicating a limited amount of data in a temporary test pool, or by using zdb -S in a command line.

https://www.truenas.com/docs/references/zfsdeduplication/

 

zdb doesn't function by default in Unraid due to lack of persistant storage location for its database.  You can enable zdb database in memory using these commands:

 

Quote
mkdir /etc/zfs
zpool set cachefile=/etc/zfs/zpool.cache poolname

 

Further recommendations for estimating DDT size in a ZFS pool (and subsequently, the memory required for performant dedupe in ZFS):

https://serverfault.com/questions/533877/how-large-is-my-zfs-dedupe-table-at-the-moment

 

Hope this helps.

Edited by jortan
Link to comment

I changed from 3 HDDs in RAIDZ1 to 2 SSDs striped only for higher read speed in "real world".

 

The DDT real RAM use can't be shown anywhere... you have to calculate it by the following command (from that forum entry / I used it to calc the 16-17GB):

zpool status -D poolname

=> DDT entry count * the bytes in core (which by the way raises the higher the dedup ratio goes)

 

I have no more SSD left to test... and as jortan writes you need to have the complete DDT in RAM anyway because of performance.

 

Edited by subivoodoo
Link to comment
26 minutes ago, subivoodoo said:

I have no more SSD left to test... and as jortan writes you need to have the complete DDT in RAM anyway because of performance.

 

Have you found some doc that says performance of dedup with special Vdev is bad?  I mean it’s going to be slower than ram In most cases but that doesn’t mean we will notice it or make it unusable. The other link that is continuously posted above is ambiguous. I’ve heard otherwise and that aligns with my experience. Or are you just speaking generally from educated guesses? (Genuine question). 

I have mine with HDDs so probably is why I don’t notice it. 

Edited by Marshalleq
Link to comment
1 hour ago, Marshalleq said:

Have you found some doc that says performance of dedup with special Vdev is bad?

 

Neither me nor the documentation I'm referencing are saying dedupe performance with a special vdev is bad (at least not over dedupe performance without a special vdev)  Without the special vdev your normal pool devices will very busy writing your data, but also all the hashes/references for the deduplication table (DDT).  For spinning rust disks, this is a lot of additional random I/O and hurts performance significantly.

 

Unless designed very poorly, the special vdev will increase performance because it spreads the DDT writes to separate, fast storage devices. 

 

1 hour ago, Marshalleq said:

I mean it’s going to be slower than ram In most cases but that doesn’t mean we will notice it or make it unusable

 

The DDT is not a cache, and neither is the special vdev ---

 

Quote

--- the DDT is a fundamental ZFS structure. It is treated as part of the pool’s metadata. If a pool (or any dataset in the pool) has ever contained deduplicated data, the pool will contain a DDT, and that DDT is as fundamental to the pool data as any of its other file system tables. Like any other metadata, DDT contents may temporarily be held in the ARC (RAM/memory cache) or L2ARC (disk cache) for speed and repeated use, but the DDT is not a disk cache.

https://www.truenas.com/docs/references/zfsdeduplication/#deduplication-on-zfs

 

The DDT needs to be stored within your pool (in vdev or special vdev) and constantly updated for every block of data that you write to the pool.  Every write involves more writes to the DDT (either new hashes or references to existing hashes)

 

When ZFS writes new entries to DDT (or needs to read them from the pool/special vdev) it's cached in memory (ARC) and will push out other information that would otherwise be stored in the ARC.  If your DDT becomes large enough to exceed the amount of memory that ZFS is allocating for ARC, that's where you will run in to significant performance issues.  That's not counting the fact that there is other data that ZFS wants to keep in the ARC (regularly accessed data and ZFS metadata) for performance reasons.

 

Keeping the hashes/references to what has already been written in memory is fundamental to how de-duplicating file systems work.  However, those hashes/references are fundamental structures of the file system, which is why they can't only exist in memory and must also be written to the filesystem.

 

Edited by jortan
Link to comment
43 minutes ago, Marshalleq said:

Have you found some doc that says performance of dedup with special Vdev is bad?

No, just speaking generally from educated guesses. Probably I can test it within the VM... but then it would be a disk image instead of a real SSD. And before I buy another sata ssd I will go and grab another 64G consumer RAM 😉

 

Next I'll figure out if the performance "native" on unraid (I mean with the great community plugins) is the same as on the TrueNAS VM... or I hope even higher because of no virtio needed in between. But I don't have time right now... probalby next week.

 

 

  • Thanks 1
Link to comment

To clarify earlier comments about ZFS memory usage - the ARC doesn't show how much memory ZFS needs to function, the ARC will dynamically consume memory that the system doesn't otherwise need.  This is why you can't assume how much memory ZFS "needs" for dedupe/DDT based on ARC size before/after turning on dedupe.  It is expected that the ARC would be the same size in both scenarios.

 

To demonstrate that the ARC is dynamic and doesn't actually show how much memory ZFS "needs", you can artificially reduce the amount of memory available to ZFS by consuming memory in a ram disk

 

mount -t tmpfs -o size=96G tmpfs /mnt/ram/

 

As you copy files to the ram disk, and as available memory approaches 0%, the ZFS ARC will release memory back to the system, dynamically reducing its size:

zfs.png.4a29c4466e86f741ce0104ef12a5213c.png\

 

ZFS will continue functioning without issue (but with less data cached) until the DDT starts getting pushed out of the ARC because it no longers fits.  This can happen if you:

 

- keep adding data (DDT size increases), and/or

- reduce the amount of memory available to ZFS

 

At this point, performance of the ZFS filesystem will reduce, likely by orders of magnitude.

Edited by jortan
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.