BTRFS Raid1 pool issue


Gnomuz

Recommended Posts

1 minute ago, jonathanm said:

Theoretically forcing things to be mapped to a specific drive instead of a user share shouldn't cause major issues, but like you said, hidden consequences.

 

Since you have CA Backup already doing its scheduled thing, personally I'd use the procedures already set forth in CA Backup's disaster recovery and use your daily backup to restore appdata after formatting the drive. Revert the appdata settings back to cache:only if you have /mnt/cache mapped, as setting it to the default cache:prefer could end up with files on an array disk under some circumstances. Before you start the docker and vm service do an audit of all the shares involved and make sure the data is all where it needs to be, domains and system should properly move back with the mover after setting them to cache prefer.

appdata is currently being restored from the last backup (2:39pm), if I see any issue, I will restore the daily one. I keep you posted !

Link to comment
10 minutes ago, Gnomuz said:

appdata is currently being restored from the last backup (2:39pm), if I see any issue, I will restore the daily one. I keep you posted !

Make sure you clean up after the mover. Having duplicate files and paths under /mnt/diskX/appdata and /mnt/cache/appdata won't end well if you are referencing /mnt/user/appdata anywhere.

 

You didn't make it clear whether you were moving ahead with reformatting, or trying to revert to a working system with the existing pool.

Link to comment
8 minutes ago, jonathanm said:

Make sure you clean up after the mover. Having duplicate files and paths under /mnt/diskX/appdata and /mnt/cache/appdata won't end well if you are referencing /mnt/user/appdata anywhere.

 

You didn't make it clear whether you were moving ahead with reformatting, or trying to revert to a working system with the existing pool.

Sorry for not being clear, I must say I'm a bit worried, if not upset ...

I have formatted the single cache device with XFS, deleted all data in appdata (on disk2 in my case), and restored the backup I just made. So far, it seems OK, all data of the appdata share is on the SSD, and only on the SSD ! And I checked one of the hardlink not moved I gave as an example, it has been restored by CA Restore.

Now moving back system and domains to cache with mover, which hopefully should not raise any issue.

Once docker is running, I'll check all containers and revert them to /mnt/user/appdata to avoid any problem in the future ...

Link to comment
33 minutes ago, jonathanm said:

Don't do that without being mentally prepared to undo it. Switching could break the apps.

Well, pressure is going down, thanks for the mental coaching 😉

VMs and containers restarted properly after the restoration of appdata and the move of both domains and system. Moving from array to SSD was ofc way faster than the contrary, so I didn't have to wait too long.

The only expected difference I can see is the appdata, domains and system shares are now tagged with "Some o all files unprotected" in he shares tab, which makes sense, as they are on a non-redundant XFS SSD.

I checked the containers for the appdata assignment, and only found Plex to have /mnt/cache/appdata. But I remember now I have changed that over time, first reading /mnt/cache was the way to go, and then reading it was no longer useful and could raise issues. I think I now understand what the author had in mind ...

Anyway, I also conclude that the containers which were created with the /mnt/cache and switched afterwards to /mnt/user have this hardlink problem (krusader and speedtest-tracker for me), and that can only be resolved by a fresh install of the container. As I switched Plex right now and it restarted properly, I now have three containers whose appdata can't be moved, only backed up and restored.

But enough for today, again I learnt a lot thanks to the community, even if it was a bit the hard way 😆

Edited by Gnomuz
  • Like 1
Link to comment

Everything seems to be running fine now, just a little feedback on the I/O load when switching from a btrfs Raid1 to xfs, on the same period of 4 hours comparing yesterday and today, with similar overall loads (2 active VMs and 3 active containers) :

XFS     Average Write = 285 kB/s, Average Read = 13 kB/s

BTRFS : Average Write = 968 kB/s, Average Read = 7 kB/s (I/O load per SSD ofc)

 

So the write load, which is the one we all look after on SSDs, is 3.4 times more with BTRFS Raid1, and of course wears both SSDs equallly. The two SSDs had both MBR 1MiB aligned, as recommended.

 

I anticipated a decrease of the write load switching from a redundant pool to a single device, and had the feeling BTRFS Raid1 had a significant I/O amplification anyway, but not that much.

I'll provide stats over a longer period so that everyone can have a better idea of the impact of BTRFS redundancy on SSD wear.

 

Edited by Gnomuz
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.