How to optimize array migration to Zpool and preserve hardlinks?

October 22, 20241 yr

I have 36 4TB SSDs in my array at this moment. I will be transferring about 80Tb of data from these onto 5x 20 TB HDDs that I've borrowed from a friend in order to empty the drives and create a Zpool of 4x 9-drive vDEVs and then transfer the data back to the Zpool. (I know the risk of this operation and know the pros/cons of running zfs in unraid).

1. What is the best way to move these files onto the drives to begin with? I used dynamix file manager and unassigned devices to mount the 20TB drive and was getting something like 250-300 mbps on a test run. Is there a better way to move? Krusader? CLI?

2. How should I format the attached 20 Tb drive - zfs to match? XFS? Doesn't matter?

3. I have a large amount of hard-linked files (unsure if on same drive). How will copying files affect these hard links? What is the best way to preserve the linking across multiple file copies? Do I need to pool all 5 drives in order to ensure one filesystem to preserve the links?

Thank you in advance!

Edited October 22, 20241 yr by bs.king

Quote

October 23, 20241 yr

Community Expert
Solution

copy methods vary.
I personally would use Rsync in this case. I have used mc from terminal in the past.
If the main volumes are ZFS then i would stay ZFS...

In Summary:

Use rsync with the proper flags to ensure speed and integrity during transfer.

Format the temporary drives with XFS unless you need ZFS-specific features.

If hard links are important, ensure that all 5 drives are pooled together in a single filesystem.

Here's how you can manage your data migration and handle the specifics of your ZFS setup:

Best Way to Move the Files:

CLI with rsync is often the best tool for this kind of large transfer, especially for preserving permissions, timestamps, and hard links. It's also very reliable for resuming transfers in case of interruptions.Example command:

bash Copy code

rsync -aHAXv --progress /source/directory/ /destination/directory/

-a for archive (preserve permissions, symlinks, etc.)

-H for hard links

-A for ACLs

-X for extended attributes

--progress to monitor the transfer speed.

Krusader: It’s good for a GUI option but not ideal for large-scale data transfers with complex hard link preservation. Not aware of this on unraid without docker/extra install...

Dynamix File Manager/Unassigned Devices: The speeds you're seeing are reasonable for HDDs, but for fine control and reliability, I’d still recommend rsync.

How to Format the 20TB Drive:

You don’t necessarily need to format them with ZFS unless you want them to be part of a ZFS pool (for snapshots or other ZFS-specific features). If they are only temporary drives to hold data, XFS is a fine choice as it is widely supported and reliable for general-purpose storage.

If you decide to use ZFS, you’ll be able to take advantage of ZFS’s checksumming, but that’s likely overkill for temporary storage.

Preserving Hard Links:

Hard links refer to the same inode in a filesystem. If you copy files with rsync and use the -H option, it will preserve hard links within the same filesystem.

If you spread these files across multiple filesystems (i.e., if you don’t pool the 5 drives into one filesystem), you’ll break the hard links because hard links can't exist across filesystems. Pooling the 5 drives: If you pool the drives (e.g., using LVM or even ZFS), you’ll ensure that all the files remain within one filesystem and that hard links are preserved.

Quote

1

October 23, 20241 yr

Community Expert

This is a large assumption though... Ands it is recommend to have currently running services off untail the file transfers are done....

The more i think on it, I assume all 36 ssd are in 1 giant pool. in which case you want to mark that as the old pool and keep the zfs datasets...

so a dd / zfs migrate to send recieve on the new 5 pool zfs may be better..

Yes, you can efficiently transfer the data, datasets, and ZFS properties from one ZFS pool to another using ZFS's native snapshot and send/receive capabilities. This method will not only preserve the files but also the structure, datasets, snapshots, properties, and even hard links if you have them.

Here’s how you can do it:

Steps for Transferring from One ZFS Pool to Another

Create a Snapshot of the Current Pool:

First, you need to create a snapshot of the existing pool. Let’s assume your current pool with the 36 SSDs is named oldpool and contains a dataset called data. To create a snapshot of the dataset:

bash Copy code

zfs snapshot oldpool/data@migration

Transfer the Snapshot Using zfs send and zfs receive:

You can then use the zfs send command to send the snapshot and use zfs receive to receive it on the new pool (newpool).

Assuming your new pool with the 5x 20TB HDDs is called newpool, you can transfer the snapshot like this:

bash Copy code

zfs send oldpool/data@migration | zfs receive newpool/data

This will copy the dataset, snapshots, properties, and other ZFS features from oldpool to newpool efficiently, and it will preserve hard links.

Incremental Transfers (Optional):

If you need to do this transfer in stages or if you plan to transfer updates later, you can use incremental snapshots to send only the differences between snapshots:

bash Copy code

zfs snapshot oldpool/data@migration2 # Create a new snapshot after changes zfs send -I oldpool/data@migration oldpool/data@migration2 | zfs receive newpool/data

This incremental approach allows for faster transfers since it only sends the differences.

Verify the Data:

After the transfer is done, you can compare the datasets to make sure everything transferred properly:

bash Copy code

zfs diff oldpool/data@migration newpool/data

Advantages:

This method will retain all ZFS-specific features, including datasets, snapshots, compression, permissions, ACLs, and hard links.

It will be more efficient than copying files with dd or rsync, especially for large datasets.

A Note on dd:

Using dd to copy data between ZFS pools isn't ideal, as it works at the block level rather than understanding ZFS structures. This method would copy raw data but not preserve ZFS-specific features like datasets, snapshots, or properties. Using ZFS's send/receive mechanism is much more efficient and preserves all the ZFS metadata and features.

This approach should be much faster and more reliable than using dd or traditional file copy methods like rsync. Let me know if you need more details on any step!

Quote

October 23, 20241 yr

Author

@bmartino1 Thank you so much for all the details. I was afraid I would have to pool them all, but did not think about rsync. Appreciate it.

Quote

How to optimize array migration to Zpool and preserve hardlinks?

Featured Replies

Solved by bmartino1

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)