Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Dedup in zfs

Featured Replies

(This is a follow-up to http://lime-technology.com/forum/index.php?topic=4604.0.)

 

De-duplication in zfs has just made its way into the public builds of OpenSolaris. My first impression: it is computationally expensive!

 

Making a duplicate of a 7GB file that takes up 4GB of HDD space (because of file-system level compression) takes 100% CPU time for around a quarter of the copying time on a quad-core (Q9550). The concrete example took 45secs. The same copy without de-duplication took 3mins at <10% CPU usage. [both numbers include de- and re-compression, I would imagine.] The good news: the copy with de-duplication consumes 0.1 GB of space.

 

Additional bad news: even though straight reading from the de-duplicating file system does not appear to take substantially more CPU resources [EDIT: the preceding statement is probably wrong; see also below], there was a speed penalty of around a third compared to the duplicating case.

 

 

My tentative conclusion is (either that the current dedup implementation in zfs can be improved or) that de-duplication is only going to pay for itself in highly specialised cases. RAM caching seems to respect de-duplication, for example, and one clear case that looks like it will benefit is running several copies of a given virtual machine. For my part, I won't jump into anything just yet.

 

 

How does it fare if you disable compression?

  • Author

How does it fare if you disable compression?

 

52secs for the copying, with what appears to be the same CPU usage profile and 0.2GB space consumption for the copy. Similar speed penalty for straight reading.

 

This is all very uncontrolled, though and, with two variables and a single test, it's not really good for any sort of decision making.

 

 

 

EDIT: straight reading does appear to take somewhat more resources in the de-duplicating case, both with and without compression; sorry about that.

  • 3 weeks later...
  • Author

As it turns out, the numbers I reported above are apples to oranges.

 

The issue is this: zfs uses checksums (by default). Initially, this was to have filesystem-level error correction. For dedup, duplicates are identified by comparing checksums (by default). This is entirely reasonable for checksums with essentially no collisions. In particular, turning on zfs dedup turns on sha256 checksums (by default). Otherwise, zfs uses fletcher checksums (by default), which are much cheaper to compute. This will account for a part of the performance differences I reported.

 

However, it appears to be even more important that up to the still-to-be released svn_131 build, zfs uses it's own non-optimised sha256 code. The numbers reported in various Sun blogs for the speed-up by switching to the optimised sha256 code in the kernel are based on artificial data sets and it's entirely unclear what the real-world effect will be, but it could be substantial. Additionally, the kernel code can take advantage of hardware acceleration, where available.

 

More later ...

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.