Unassigned Devices - Replace btrfs RAID-0 failing disk

joshbgosh10592 · January 14, 2021

I have a RAID-0 btrfs array using unassigned devices that I configured via command line. I have now received two warnings about Current pending sector showing 62, and then again at 148, so I'm assuming it needs to be removed.

I'm trying to follow steps here for replacement, but it seems to assume the drive has completely failed, and the steps are for a RAID1. I'm assuming it should be somewhat similar, but I don't want to lose the data. It's not important, but I'd rather not.

The directions say to use something similar to

btrfs replace start 7 /dev/sdf1 /mnt

Obviously replacing /mnt with the real mount path. I'm also assuming my path should be /mnt/disks/PVEData (since that's the name of the btrfs share?

Thanks!

Edited January 14, 2021 by joshbgosh10592

JorgeB · January 14, 2021

Some info here:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=462135

joshbgosh10592 · January 18, 2021

On 1/14/2021 at 2:48 AM, JorgeB said:

Some info here:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=462135

Thank you! As a note for future me/Googlers, I used

btrfs replace start -f /dev/sdj1 /dev/sdp1 /mnt/disks/PVEData

Where sdj is the failing drive and sdp is the replacement drive. I also learned that you can not have the trailing "/" or it'll error saying:

ERROR: source device must be a block device or a devid

joshbgosh10592 · January 19, 2021

On 1/14/2021 at 2:48 AM, JorgeB said:

Some info here:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=462135

So, I ran this last night and it was progressing pretty well (about 10% an hour). However, it's still not finished and it's stuck on 94.5%, and in UD, it shows "command timedout" for sdj, and the "Current pending sector" count is climbing like crazy (yesterday it was 148, right now it's at 2171).

Is there something special that I should have done instead of the normal replace because of the errors the drive was throwing?

JorgeB · January 19, 2021

Since there's no redundancy in the pool there's no other way of replacing it, and if the device is failing the operation might also fail, in that can you'd need to destroy and re-create the pool.

joshbgosh10592 · January 20, 2021

22 hours ago, JorgeB said:

Since there's no redundancy in the pool there's no other way of replacing it, and if the device is failing the operation might also fail, in that can you'd need to destroy and re-create the pool.

is there a way to flag the repair as something like "accept loss" when it comes across dsta it cant read? Im just figuring im missing something as its not really failing, but rather its run out if sectors to write to (as far as I understand), so theres probably a file or two that is corrupted and im willing to accept that.

JorgeB · January 20, 2021

If the device has pending sectors the problem is reading, not writing, you can try cloninh it with ddrescue and then use the clone in the pool.

joshbgosh10592 · January 20, 2021

15 hours ago, JorgeB said:

If the device has pending sectors the problem is reading, not writing, you can try cloninh it with ddrescue and then use the clone in the pool.

Thank you! I'm working on cleaning anything off that pool that I can, but how would I swap the failing with the new? The replacement drive is larger than the existing, but I'm assuming that's just a btrfs resize command.

JorgeB · January 21, 2021

To use ddrescue you'd need to clone to a same size device, to keep the partition valid for Unraid.

joshbgosh10592 · January 22, 2021

16 hours ago, JorgeB said:

To use ddrescue you'd need to clone to a same size device, to keep the partition valid for Unraid.

I figured there would be a way to clone to a larger disk and just resize the pool afterwards?

Regardless though, how would I tell the pool that instead of sdj, use sdp?

Thank you so far!!

JorgeB · January 22, 2021

7 hours ago, joshbgosh10592 said:

I figured there would be a way to clone to a larger disk and just resize the pool afterwards?

Not easily, because the partition would remain the original size, you'd need to extend that first before extending the filesystem.

7 hours ago, joshbgosh10592 said:

how would I tell the pool that instead of sdj, use sdp?

Stop the array, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, disconnect old failing device, assign the remaining old and new pool devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), re-enable Docker/VMs if needed, start array.

joshbgosh10592 · January 22, 2021

12 hours ago, JorgeB said:

Not easily, because the partition would remain the original size, you'd need to extend that first before extending the filesystem.

Stop the array, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, disconnect old failing device, assign the remaining old and new pool devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), re-enable Docker/VMs if needed, start array.

I'd have to do all of that to tell btrfs to use a different drive in one of my unassigned devices btrfs pools? I don't have a cache in that. I'm confused.

JorgeB · January 23, 2021

11 hours ago, joshbgosh10592 said:

I'd have to do all of that to tell btrfs to use a different drive in one of my unassigned devices btrfs pools?

Sorry, forgot it was as unassigned pool, in that case you jut need to unmount and remount with the clone and the remaining old device (still disconnect the bad device), also in this case no problem cloning to a larger device.

joshbgosh10592 · January 23, 2021

1 hour ago, JorgeB said:

Sorry, forgot it was as unassigned pool, in that case you jut need to unmount and remount with the clone and the remaining old device (still disconnect the bad device), also in this case no problem cloning to a larger device.

No worries! So then via the unRAID webUI or btrfs commands, and how? I thought that when telling unassigned devices, you only tell one of the disks in the btrfs pool to mount and ignore the other, as it'll just mount with it? Just trying to get it straight so I don't lose anything.

JorgeB · January 23, 2021

1 minute ago, joshbgosh10592 said:

I thought that when telling unassigned devices, you only tell one of the disks in the btrfs pool to mount and ignore the other, as it'll just mount with it?

Yes, still the same, just make sure old device is disconnected.

joshbgosh10592 · January 23, 2021

15 minutes ago, JorgeB said:

Yes, still the same, just make sure old device is disconnected.

I'm curious how btrfs knows the new disk is it's replacement, because that worked! Thank you!

I'm assuming to expand the btrfs filesystem (original was 2x2TB, and I just swapped one out with a 3TB), I just use this, correct? Since the total usable should be 5TB.

btrfs filesystem resize max /mnt/disks/PVEData

Currently,

root@NAS:/var/log# df -h /mnt/disks/PVEData/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdi1       3.7T  1.7T  2.0T  47% /mnt/disks/PVEData

joshbgosh10592 · January 23, 2021

47 minutes ago, joshbgosh10592 said:
I'm curious how btrfs knows the new disk is it's replacement, because that worked! Thank you!

I'm assuming to expand the btrfs filesystem (original was 2x2TB, and I just swapped one out with a 3TB), I just use this, correct? Since the total usable should be 5TB.
btrfs filesystem resize max /mnt/disks/PVEData
Currently,
root@NAS:/var/log# df -h /mnt/disks/PVEData/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdi1       3.7T  1.7T  2.0T  47% /mnt/disks/PVEData

So, I did more research on it and I thought that would work, but after I ran it, the size didn't seem to change - it still shows the same as the before "df -h /mnt/disks/PVEData"

Thoughts?

JorgeB · January 23, 2021

2 hours ago, joshbgosh10592 said:

I'm curious how btrfs knows the new disk is it's replacement, because that worked!

btrfs recognizes the devices by the UUID, that's why it will recognize a clone.

1 hour ago, joshbgosh10592 said:

Thoughts?

Like mentioned, the clone will have the same partition size as the source, so you first need to expand the partition, only then the filesystem.

JorgeB · January 23, 2021

Also note that with raid0 the extra space won't be usable anyway, so not much point.

joshbgosh10592 · January 24, 2021

14 hours ago, JorgeB said:

Also note that with raid0 the extra space won't be usable anyway, so not much point.

Because it's RAID-0, wouldn't I be able to get the full size of both disks added, since there's nothing wasted to parity?

I'm trying to see how to expand it, but all documentation refers to the pool as inside /dev/. I'm assuming it would be the first disk in the pool, /dev/sdi.

Looking at the results of parted list, it only shows sdi1 and nothing about sdp, and the size is only 2TB, when the original size was 4TB. I'm wondering if the pool didn't accept the replacement?

JorgeB · January 24, 2021

6 hours ago, joshbgosh10592 said:

Because it's RAID-0, wouldn't I be able to get the full size of both disks added, since there's nothing wasted to parity?

No, raid0 needs to stripe the data to at least two devices, so when the first one is full it can't write anymore, only the single profile can use multiple devices with different capacities.

It's not the pool you need to expand, it's the partition on the cloned device, /dev/sdX1, but like mentioned it won't make any difference.

Unassigned Devices - Replace btrfs RAID-0 failing disk

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation