Unassigned Devices - Replace btrfs RAID-0 failing disk


21 posts in this topic Last Reply

Recommended Posts

I have a RAID-0 btrfs array using unassigned devices that I configured via command line. I have now received two warnings about Current pending sector showing 62, and then again at 148, so I'm assuming it needs to be removed.

I'm trying to follow steps here for replacement, but it seems to assume the drive has completely failed, and the steps are for a RAID1. I'm assuming it should be somewhat similar, but I don't want to lose the data. It's not important, but I'd rather not.

The directions say to use something similar to 

btrfs replace start 7 /dev/sdf1 /mnt

Obviously replacing /mnt with the real mount path. I'm also assuming my path should be /mnt/disks/PVEData (since that's the name of the btrfs share?

 

Thanks!

Edited by joshbgosh10592
Link to post
On 1/14/2021 at 2:48 AM, JorgeB said:

Thank you! As a note for future me/Googlers, I used 

btrfs replace start -f /dev/sdj1 /dev/sdp1 /mnt/disks/PVEData

Where sdj is the failing drive and sdp is the replacement drive. I also learned that you can not have the trailing "/" or it'll error saying:

ERROR: source device must be a block device or a devid

 

Link to post
On 1/14/2021 at 2:48 AM, JorgeB said:

So, I ran this last night and it was progressing pretty well (about 10% an hour). However, it's still not finished and it's stuck on 94.5%, and in UD, it shows "command timedout" for sdj, and the "Current pending sector" count is climbing like crazy (yesterday it was 148, right now it's at 2171).

Is there something special that I should have done instead of the normal replace because of the errors the drive was throwing?

Link to post

Since there's no redundancy in the pool there's no other way of replacing it, and if the device is failing the operation might also fail, in that can you'd need to destroy and re-create the pool.

Link to post
22 hours ago, JorgeB said:

Since there's no redundancy in the pool there's no other way of replacing it, and if the device is failing the operation might also fail, in that can you'd need to destroy and re-create the pool.

is there a way to flag the repair as something like "accept loss" when it comes across dsta it cant read? Im just figuring im missing something as its not really failing, but rather its run out if sectors to write to (as far as I understand), so theres probably a file or two that is corrupted and im willing to accept that.

Link to post
15 hours ago, JorgeB said:

If the device has pending sectors the problem is reading, not writing, you can try cloninh it with ddrescue and then use the clone in the pool.

Thank you! I'm working on cleaning anything off that pool that I can, but how would I swap the failing with the new? The replacement drive is larger than the existing, but I'm assuming that's just a btrfs resize command.

Link to post
16 hours ago, JorgeB said:

To use ddrescue you'd need to clone to a same size device, to keep the partition valid for Unraid.

I figured there would be a way to clone to a larger disk and just resize the pool afterwards?

Regardless though, how would I tell the pool that instead of sdj, use sdp?

Thank you so far!!

Link to post
7 hours ago, joshbgosh10592 said:

I figured there would be a way to clone to a larger disk and just resize the pool afterwards?

Not easily, because the partition would remain the original size, you'd need to extend that first before extending the filesystem.

 

7 hours ago, joshbgosh10592 said:

how would I tell the pool that instead of sdj, use sdp?

Stop the array, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, disconnect old failing device, assign the remaining old and new pool devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), re-enable Docker/VMs if needed, start array.

Link to post
12 hours ago, JorgeB said:

Not easily, because the partition would remain the original size, you'd need to extend that first before extending the filesystem.

 

Stop the array, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, disconnect old failing device, assign the remaining old and new pool devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), re-enable Docker/VMs if needed, start array.

I'd have to do all of that to tell btrfs to use a different drive in one of my unassigned devices btrfs pools? I don't have a cache in that. I'm confused.

Link to post
11 hours ago, joshbgosh10592 said:

I'd have to do all of that to tell btrfs to use a different drive in one of my unassigned devices btrfs pools?

Sorry, forgot it was as unassigned pool, in that case you jut need to unmount and remount with the clone and the remaining old device (still disconnect the bad device), also in this case no problem cloning to a larger device.

Link to post
1 hour ago, JorgeB said:

Sorry, forgot it was as unassigned pool, in that case you jut need to unmount and remount with the clone and the remaining old device (still disconnect the bad device), also in this case no problem cloning to a larger device.

No worries! So then via the unRAID webUI or btrfs commands, and how? I thought that when telling unassigned devices, you only tell one of the disks in the btrfs pool to mount and ignore the other, as it'll just mount with it? Just trying to get it straight so I don't lose anything.

 

Link to post
1 minute ago, joshbgosh10592 said:

I thought that when telling unassigned devices, you only tell one of the disks in the btrfs pool to mount and ignore the other, as it'll just mount with it?

Yes, still the same, just make sure old device is disconnected.

Link to post
15 minutes ago, JorgeB said:

Yes, still the same, just make sure old device is disconnected.

I'm curious how btrfs knows the new disk is it's replacement, because that worked! Thank you!

I'm assuming to expand the btrfs filesystem (original was 2x2TB, and I just swapped one out with a 3TB), I just use this, correct? Since the total usable should be 5TB.

btrfs filesystem resize max /mnt/disks/PVEData

Currently,

root@NAS:/var/log# df -h /mnt/disks/PVEData/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdi1       3.7T  1.7T  2.0T  47% /mnt/disks/PVEData

 

Link to post
47 minutes ago, joshbgosh10592 said:

I'm curious how btrfs knows the new disk is it's replacement, because that worked! Thank you!

I'm assuming to expand the btrfs filesystem (original was 2x2TB, and I just swapped one out with a 3TB), I just use this, correct? Since the total usable should be 5TB.


btrfs filesystem resize max /mnt/disks/PVEData

Currently,


root@NAS:/var/log# df -h /mnt/disks/PVEData/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdi1       3.7T  1.7T  2.0T  47% /mnt/disks/PVEData

 

So, I did more research on it and I thought that would work, but after I ran it, the size didn't seem to change - it still shows the same as the before "df -h /mnt/disks/PVEData"

Thoughts?

Link to post
2 hours ago, joshbgosh10592 said:

I'm curious how btrfs knows the new disk is it's replacement, because that worked!

btrfs recognizes the devices by the UUID, that's why it will recognize a clone.

 

1 hour ago, joshbgosh10592 said:

Thoughts?

Like mentioned, the clone will have the same partition size as the source, so you first need to expand the partition, only then the filesystem.

Link to post
14 hours ago, JorgeB said:

Also note that with raid0 the extra space won't be usable anyway, so not much point.

Because it's RAID-0, wouldn't I be able to get the full size of both disks added, since there's nothing wasted to parity?

I'm trying to see how to expand it, but all documentation refers to the pool as inside /dev/. I'm assuming it would be the first disk in the pool, /dev/sdi.

Looking at the results of parted list, it only shows sdi1 and nothing about sdp, and the size is only 2TB, when the original size was 4TB. I'm wondering if the pool didn't accept the replacement?

Link to post
6 hours ago, joshbgosh10592 said:

Because it's RAID-0, wouldn't I be able to get the full size of both disks added, since there's nothing wasted to parity?

No, raid0 needs to stripe the data to at least two devices, so when the first one is full it can't write anymore, only the single profile can use multiple devices with different capacities.

 

It's not the pool you need to expand, it's the partition on the cloned device, /dev/sdX1, but like mentioned it won't make any difference.

 

 

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.