ZFS Performance Tuning - Getting the most of your UnRAID server and containers on ZFS


BVD

Recommended Posts

1 hour ago, BVD said:

@Partizanct (and @Marshalleq if you're interested of course, more the merrier!) would you have time to give this a once over?

Why would we want ZFS on UnRAID? What can we do with it?

 

This is much less a 'technical thing can technically be done X way' doc than a 'here's why you might be interested in it, what problems it solves, and in what ways'. Given this, and that those types of reference material can often be interpreted numerous differing ways by different folks, I just want to make sure it's at least coherent, without going so deep into the weeds that someone newer to ZFS would just click elsewhere after seeing the encyclopedia britannica thrown at em  as their 'introduction' lol.

 

Open to any and all feedback here - again, this isn't supposed to get super technical, and has a unique goal of explaining why someone should care, as opposed to the rest of them which go over how to actually do the stuff once you've decided you * do * care enough to put forth the effort, so there's no such thing as 'bad' or 'useless' feedback for this type of thing imo.

 

Anyway, thanks for your time!

I would be interested in hearing your thoughts on the creation of the top level pool itself and what settings you would recommend (since you can modify compression levels ETC by dataset). 

Link to comment

@Partizanct That was actually intentional on my part 😅 

 

This is primarily because there's precious few that applies to 'everything'. For instance, everyone says 'ashift=12' for the pool, right? But that means that our physical layer block size is set to 4K, and there's a huge amount of NAND out there that's 8k, even some that're 16k, leaving a lot to be desired. Or what about setting dnodesize to auto? This is great, but really. works best with xattr set to SA, and if you're not accessing the data primarily over NFS/iSCSI/SMB, you could actually lose (not much, likely, but some) performance. Heck, setting xattr to sa also means that pool's linux only now, losing the portability to BSD kernels (and others), as it's a Linux only thing. I'd hate to recommend something like that too broadly, then the user find out years later when they try to move the pool to some hot new BSD based system that's got all the new bells and whistles that they simply can't because some guy online said it was a good idea and they never looked any further into it right?

 

Better that those values get researched, their implications understood, and folks choose what's best for them and their specific situation. Recommendations differ for HDD vs NAND as well.

 

The other part of my reasoning also goes back to what I feel is required for someone to be successful with ZFS (will to learn, ability to research, and time to invest in both). For this one doc at least, the idea isn't to give someone an all inclusive summary of the best way to use ZFS on UnRAID overall, but to spark that something that gets them into the game if they read it and find themselves thinking 'this could've saved me hours last week on X, I wonder what else it can do...'

 

I do give more explicit detail where possible though - for instance, postgres has it's fileset configuration laid out, with explanations of why. for each, same as I hope to continue to do with each other app as I find time to get them translated from my deployment notes to the docs github. 

 

I mentioned there's precious little I'd say applies globally, but that which does boils down to:

atime = off
compression = (at least something more than 'off' - again, whether to use lz4 or zstd still kinda depends, as if someone were using the old Westmere or Nehalem procs, lz4 is probably it for them)

 

Everything else has sane defaults for most systems, with recommendations for specific deployments needs...

 

I'm sorry in advance - I know this isn't super helpful in and of itself! I just hope my reasoning on why I did it this way makes some kind of sense at least!

Link to comment
  • 3 months later...
  • 1 month later...

I just switched from TrueNAS Scale to Unraid and am sharing my ZFS datasets via Samba in smb-extras.conf. Read performance over Samba went from about 1.1GB/s to 100-300MB/s with tremendous variance. I can't quite pinpoint the cause of the slowness. This is a 15-wide RaidZ2 pool on a machine w/ 512GB RAM.

 

Would anyone happen to have sane/optimized ZFS settings for sysctl.conf that I can try?

 

Right now my settings look sort of like this - any input appreciated on how to get read speeds to full 10-gbit.

 

/boot/config/smb-extra.conf

[global]
	server multi channel support = yes
	aio read size = 1
	aio write size = 1
	local master = yes
	preferred master = yes
	dead time = 10
	max smbd processes = 1000
	vfs objects = catia shadow_copy2 fruit streams_xattr
	shadow: snapdir = .zfs/snapshot
	shadow: sort = desc
	shadow: format = -%Y-%m-%d-%H%M
	shadow: snapprefix = ^zfs-auto-snap_\(frequent\)\{0,1\}\(hourly\)\{0,1\}\(daily\)\{0,1\}\(monthly\)\{0,1\}
	shadow: delimiter = -20
	fruit:model = MacSamba
	fruit:posix_rename = yes
	fruit:veto_appledouble = no
	fruit:nfs_aces = no
	fruit:wipe_intentionally_left_blank_rfork = yes
	fruit:delete_empty_adfiles = yes
	fruit:resource = file
	fruit:metadata = stream
	fruit:encoding = native
	fruit:advertise_fullsync = true
	fruit:aapl = yes
	log file = /var/log/samba/%m.log
	max log size = 10000
	log level = 1

[media]
path = /mediapool/media
browseable = yes
guest ok = yes
writeable = yes
read only = no
create mask = 0777
directory mask = 0775
delete veto files = Yes
veto files = /*.DS_Store/.apdisk/.TemporaryItems/.windows/.mac/
zfsacl:acesort = dontcare

 

/boot/config/modprobe.d/zfs.conf

options zfs l2arc_rebuild_enabled=1
options zfs zfs_prefetch_disable=1
options zfs l2arc_noprefetch=0
options zfs l2arc_write_max=524288000
options zfs l2arc_headroom=12
options zfs zfs_arc_max=350000000000
options zfs zfs_nocacheflush=1

# increase them so scrub/resilver is more quickly at the cost of other work
options zfs zfs_vdev_scrub_min_active=24
options zfs zfs_vdev_scrub_max_active=64

# sync write
options zfs zfs_vdev_sync_write_min_active=8
options zfs zfs_vdev_sync_write_max_active=32

# sync reads (normal)
options zfs zfs_vdev_sync_read_min_active=8
options zfs zfs_vdev_sync_read_max_active=32

# async reads : prefetcher
options zfs zfs_vdev_async_read_min_active=8
options zfs zfs_vdev_async_read_max_active=32

# async write : bulk writes
options zfs zfs_vdev_async_write_min_active=8
options zfs zfs_vdev_async_write_max_active=32

options zfs zfs_dirty_data_max_percent=40

options zfs zfs_txg_timeout=15

# default : 32768
options zfs zfs_immediate_write_sz=131072

 

Link to comment

@ensnare before diving too deeply into the configuration, my recommendation (as @JorgeB alluded to above) would be narrow down the source a bit (confirming this pool was originally created on scale (not core) as well would be useful). Can you go over what testing you've done so far to better pinpoint this? In general, assuming you've done nothing yet, a good start would be:

  • Do you experience the same throughput with a generic IO stream? I'd use an extended FIO run with fully randomized IO bypassing the L2arc to start (don't expect it's storage related, but doesn't take much time, and rules out a ton of other junk), then hit it with bi-directional iperf as well, and finally nfs.
  • Assuming you see 900+ MB/s for each of the above, THEN you can start focusing on SMB. You've a ton of additional config's added for samba, so I'd first try commenting all those out and simply copying a file over, then see what (if anything) changes to get a better idea on what direction to take this.
  • Is *all* SMB traffic equally impacted, regardless of which host/application is attempting to write? Are read requests similarly impacted?

Really the biggest thing here is to do some troubleshooting to narrow the focus of your analysis/investigation. If you could share what you've already done on that front, it'll probably help us give you a better idea of where to go next.

Link to comment
On 1/26/2023 at 8:10 AM, BVD said:

@ensnare before diving too deeply into the configuration, my recommendation (as @JorgeB alluded to above) would be narrow down the source a bit (confirming this pool was originally created on scale (not core) as well would be useful). Can you go over what testing you've done so far to better pinpoint this? In general, assuming you've done nothing yet, a good start would be:

  • Do you experience the same throughput with a generic IO stream? I'd use an extended FIO run with fully randomized IO bypassing the L2arc to start (don't expect it's storage related, but doesn't take much time, and rules out a ton of other junk), then hit it with bi-directional iperf as well, and finally nfs.
  • Assuming you see 900+ MB/s for each of the above, THEN you can start focusing on SMB. You've a ton of additional config's added for samba, so I'd first try commenting all those out and simply copying a file over, then see what (if anything) changes to get a better idea on what direction to take this.
  • Is *all* SMB traffic equally impacted, regardless of which host/application is attempting to write? Are read requests similarly impacted?

Really the biggest thing here is to do some troubleshooting to narrow the focus of your analysis/investigation. If you could share what you've already done on that front, it'll probably help us give you a better idea of where to go next.

What is the best way to install fio for benchmarking? It looks like the NerdTools package isn't active anymore, and I can't find the Slackware 15 binary.

 

On 1/25/2023 at 6:58 AM, JorgeB said:

Did you run a single stream iperf test in both directions to confirm LAN bandwidth is OK?

Yes. iperf3 maxes out at 9.2Gbps in both directions with single stream.

Link to comment
29 minutes ago, ensnare said:

What is the best way to install fio for benchmarking? It looks like the NerdTools package isn't active anymore, and I can't find the Slackware 15 binary.

 

Yes. iperf3 maxes out at 9.2Gbps in both directions with single stream.

 

Ever since the nerd/dev packs went the way of the dodo I've just set up mirrors for all the tools I've ended up using and compiled my own, so while I'm not really certain anymore where the slackware stuff is, you can just build from source instead:

https://git.kernel.dk/cgit/fio/

If you want to avoid mucking around with the hypervisor, you can always use a container of course. I pushed up a little container to github that I've used in the past in such situations in case it's helpful - just clone the repo + build, then run the command noted:

https://github.com/teambvd/docker-alpine_fio

 

Just be sure you've cd'd into the mountpoint path for your SMB share prior to running to ensure the test is valid 👍

Link to comment
  • 5 months later...
  • 1 month later...
On 8/20/2022 at 6:58 PM, Marshalleq said:

...

My main gripes are not so much with the web pages, more to do with load times e.g. startup from docker and the forever chugging away in the background.  It may just be that my library is big.  Plex says I have 114000 tracks / 1092 artists / 8463 albums.  I hadn't seen ioztat before - I'm guessing that better than zpool iostat by going down to dataset level of something?

...

 

So I apparently finally hit the tipping point towards experiencing what you were seeing with Lidarr - seems to be somewhere in the 65-70k track range, where the way the queries to the DB are formulated means the sqlite DB just absolutely chonks in protest. I finished converting Lidarr over to postgres last night, and while it's still sub-optimal IMO from a query perspective, pg is basically able to just brute force its way through. Start-up times cut down to maybe a tenth of what they were previously, and all UI pages populating within a couple seconds at most 👍

Link to comment
  • 1 month later...

Wow this is going back a bit.  I have stopped paying attention to it and can't really say if it's still happening - I assume it is.  Last week I changed my docker back from folder type to image type.  There are just too many bugs with folder type on the new native ZFS implementation - and I never liked how it made a dataset for each docker container either.  Finally, since ZFS went native, I have a properly functioning docker again.  Personally, I preferred the plugin, for one thing I didn't have to stop the array to add or repair disks.

Link to comment
  • 1 month later...
On 10/24/2023 at 4:07 PM, Marshalleq said:

Personally, I preferred the plugin, for one thing I didn't have to stop the array to add or repair disks.

 

+1 !!!!

 

Whole reason I'm still on 6.11.5 😅

 

Really wish I'd thought of this being an issue beforehand - I'd not have agreed so strongly for its implementation lol 🤣

  • Like 1
Link to comment

Yeah I wish unraid would find another way of enforcing their license. It actually ruins their product a bit. I sort of thought it was ok to do it on their proprietary unraid array, but doing it on open source zfs is a bit low. 
 

this and a few other things are making me wonder about running unraid as a vm inside proxmox or truenas scale lately. For virtualisation, backups and especially networking unraid is left in the dust by these platforms. Unraid wins in some other areas though particularly the user interface for docker and vms and the docker App Store. 
 

I haven’t used the unraid array for many years now so that’s not an issue. In fact using zfs in those other products would be a better experience. It’d be good to know if unraid have any plans and what they are to improve zfs and the array integration with licensing. . 

Edited by Marshalleq
  • Upvote 1
Link to comment
15 hours ago, Marshalleq said:

Yeah I wish unraid would find another way of enforcing their license. It actually ruins their product a bit.

 

I came to the same conclusion myself... I'd previously taken issue with the fact that unraid's design means that 'everything' must be taken offline in order to address an inevitable eventuality for a NAS - adding storage, replacing a failed drive, or expanding capacity with larger disks, these are the kinds of things that are expected to be semi-regularly undertaken with any NAS, and unraid is the only OS I've seen that *requires* down time for even the most basic maintenance. The fact that it can be solved any number of ways (several of which have already been proposed which are completely viable), but hasn't had a whisper from LT has been... Difficult.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.