Jump to content
dannygreg

Replacing missing drive in btrfs pool

5 posts in this topic Last Reply

Recommended Posts

Hey folks,

 

I had a cache drive go bad a couple days ago, this resulted in the btrfs failure mode of a read-only filesystem. I had plenty of headroom in the pool so my initial thought was to just shrink the pool down from 4 to 3 disks. This didn't help and still resulted in the filesystem being mounted as readonly.

 

I just put a fresh drive in (same size as before) and, after setting the pool back to 4 disks I was still seeing the same result.

 

I ran `btrfs fi show` and was greeted with our friend `*** Some devices missing`, despite also seeing the new drive added to the pool. This made sense in retrospect, as the GUI had performed an addition, not a replace.

 

I stopped the array and mounted the pool in degraded mode. Deleted the new drive and ran `btrfs replace start 6 /dev/sdp1 /mnt/cache` (/dev/sdp1 being the new drive). This didn't give an error, which I thought was great news… until I ran `btrfs replace status /mnt/cache` and saw the output `Never started`.

 

No matter what I do - I can't convince btrfs to replace the dead drive. 

 

At this point, I'm more than happy to nuke the cache entirely and build a new filesytem. Everything is backed up and it wouldn't take long to copy it back over and rebuild the Docker image etc. etc., mostly just looking to get back up and running as quick as possible.

 

Would love any suggestions, either for nuking the entire cache, or getting a replace operation to run.

 

Here's the output of `btrfs fi show`: 

 

Label: none  uuid: b0707364-0ca2-443d-a5cf-ca6cf64e5bbb
        Total devices 4 FS bytes used 101.96GiB
        devid    3 size 232.89GiB used 90.03GiB path /dev/sdm1
        devid    4 size 232.89GiB used 60.00GiB path /dev/sdn1
        devid    5 size 232.89GiB used 91.03GiB path /dev/sdo1
        *** Some devices missing

 

Share this post


Link to post

If the filesystem is getting mounted read-only when attenpting to delete the missing device there's likely corruption, though we'd need the diags after doing that to confirm, but if that's the case best to re-format the pool and start over.

Share this post


Link to post

You can use

wipefs -a /dev/sdX1

then

wipefs -a /dev/sdX

or

blkdiscard /dev/sdX (for SSDs only)

 

Do it for all cache devices (after backing up any remaining data)

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.