Cache Pool - Remove a disk

jbrukardt · January 10, 2023

Good morning all,

I accidentally mis-allocated a new NVME drive I put in the system. I meant to make it a new cache pool that I was going to use for VMs only, but accidentally added it to the existing cache pool which houses mover/appdata/etc.

I would like to remove it.... but it seems that breaks the entire cache pool if i do that. What is the procedure for removing a disk from the cache pool.

Screenshots attached of the cache details.

JorgeB · January 10, 2023

Unassign the device you want to remove (leave it unassigned for now) and start the array, once the removal is complete you can assign to a new pool.

jbrukardt · January 10, 2023

36 minutes ago, JorgeB said:

Unassign the device you want to remove (leave it unassigned for now) and start the array, once the removal is complete you can assign to a new pool.

ill try again, but that resulted in the cache pool not mounting due to a missing device.

trurl · January 10, 2023

attach diagnostics to your NEXT post in this thread

johntdyer · January 13, 2023

I'm having the same issue, here are my diagnostic

vault-diagnostics-20230113-1516.zip

JorgeB · January 14, 2023

12 hours ago, johntdyer said:

I'm having the same issue

Not really the same, your cache was xfs:

J

an 13 11:50:17 vault  emhttpd: shcmd (71): mkdir -p /mnt/cache
Jan 13 11:50:17 vault  emhttpd: shcmd (72): mount -t xfs -o noatime,nouuid /dev/sdc1 /mnt/cache
Jan 13 11:50:17 vault kernel: XFS (sdc1): Mounting V5 Filesystem
Jan 13 11:50:17 vault kernel: XFS (sdc1): Starting recovery (logdev: internal)
Jan 13 11:50:17 vault kernel: XFS (sdc1): Ending recovery (logdev: internal)
Jan 13 11:50:17 vault kernel: xfs filesystem being mounted at /mnt/cache supports timestamps until 2038 (0x7fffffff)

You cannot add a device to a xfs pool, they support single devices only, you can go back to the single xfs cache pool, backup then reformat btrfs.

ximian · May 8

Stop the Array

Remove the device in the cache pool

Go into Array Operations

There will be a text next to the Start (under it there is a checkbox to start array with missing device in cache pool)
Select the checkbox and start the array

Start the array

Edited May 8 by ximian

devalias · May 24

I'm not sure if it's the exact same issue, but I am having a problem that sounds very similar to this. Context (as best as I can remember the specific steps/etc):

I had an existing (legacy) BTRFS cache drive (250GB SSD), and I wanted to add a new 1TB SSD alongside it.

At first I hadn't looked deeply into the new 'pools' config and how it works, so just expanded the slots of my existing 'cache' pool, and added the new 1TB drive to it. I then started up my array, and noticed that it seemed to be combining both of those drives together; which didn't seem to be what I wanted.

I stopped the array, watched a video about the pools (Ref), and decided I wanted to create a new pool to add the 1TB drive to. So I:

unassigned the 1TB disk from the existing 'cache' pool
set the 'cache' pool slots back to 1
renamed the 'cache' pool to 'cache_250gb'
created a new 'cache_1tb' pool with a single slot
added the 1TB disk to the 'cache_1tb' pool (I think as XFS, but not 100% sure)
started my array

At this point, I expected both pools to work, or maybe the 'cache_1tb' to need to be formatted or similar, but both pools ended up with an error message like this:

Unmountable: Unsupported or no file system

My guess is that somehow in that process, I confused unraid/etc about what the old 250GB disk in 'cache_250gb' (previously just 'cache') is, and that perhaps it's still expecting to try and load 2 disks in that pool or similar. I'm hoping it'll just be a little metadata tweak/repair/similar, and not actually a fully corrupted drive (as I stupidly didn't think I would need to make a full backup of it just to add a new disk in)

I also tried unassigning the 1TB drive from both pools, and restarting the array with the "Start will remove the missing cache disk and then brine the array on-line" option enabled; but it still didn't seem to help. My guess is that maybe if I had done that when I first unassigned the 1TB drive from the 'cache_250gb' pool, and started up the array before creating the 'cache_1tb' pool and assigning the 1TB to it, that maybe it would have 'just worked'.

Attached diagnostics (Ref) :
dalekanium-diagnostics-20240524-1725.zip

I don't necessarily know what i'm looking for within it, but I noticed that within those diagnostics that `./config/pools/cache_250gb.cfg` has `diskFsProfile="raid1"`; and I wonder if maybe that should be `diskFsProfile=""` now that there is only a single slot in that pool.

Edit: Further info:

I was skimming through this thread:

https://www.reddit.com/r/btrfs/comments/1507tl4/can_i_recover_a_btrfs_raid1_disk_that_was_partly/

And saw that one of the commands was `btrfs filesystem show`; so decided to SSH onto my unraid server (while the array is still stopped), and run it; which gave me the following:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 2 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1
	devid    2 size 931.51GiB used 0.00B path /dev/sdj1

That would seem to correlate with the 2 SSD's, and looks as though they may still be seen as a single 'combined pool'; at least at this level.

To confirm that theory:

# smartctl -i /dev/sdb | grep "Device Model"
Device Model:     Samsung SSD 850 EVO 250GB

# smartctl -i /dev/sdj | grep "Device Model"
Device Model:     Samsung SSD 870 EVO 1TB

Edit 2: Looking through some other related threads lead me to the FAQ, which in hindsight explains the 'proper' way to do this; though it's a little frustrating the caveats/process it has there for how/when to remove things; that seemingly isn't checked/enforced by the unraid GUI at all to prevent issues like this from occuring in the first place:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/#comment-480418

There are some commands here for how to change the mode, though i'm hesitant to run them at the moment, and they look like they might need to be done while the array is online (at least for the ones that reference `/mnt/cache`):

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480421

Clicking through to the 'cache_250gb' pool in the Unraid GUI, in the 'Balance Status' section, says that 'Balance is only available when array is Started'.

This forum thread suggested that I will need to "You have to wipe or disconnect the SSD you want to remove, or you'll get unmountable cache pool:

https://forums.unraid.net/topic/51133-remove-a-drive-from-a-cache-pool/?do=findComment&comment=504089

Since wiping sounds scary at this point, I just removed it (hot swappable drive bays ftw), which showed an Unraid GUI warning:

Unraid Cache_250gb disk message: 24-05-2024 18:10
Warning [REDACTED] - Cache pool BTRFS missing device(s)
Samsung_SSD_850_EVO_250GB_S3NYNF0J906569J (sdb)

After which, I ran the following again:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 2 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1
	*** Some devices missing

Since this is now looking a little more positive (relatively), I decided to try starting the array again to see what would happen; and this time my 'cache_250gb' pool seemed to be able to mount (though seemingly with an incorrect size that still included the space of the old disk):

And running this command again:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 2 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1
	devid    2 size 0 used 0 path /dev/sdj1 MISSING

Since the array is running, and the disk mounted, I thought I might be able to follow this, and remove the device:

https://forums.unraid.net/topic/51133-remove-a-drive-from-a-cache-pool/?do=findComment&comment=507082

But it seems not:

# btrfs device remove /dev/sdj1 /mnt/cache_250gb
ERROR: not a block device: /dev/sdj1

Edit 3: Reading `btrfs device remove --help`:

# btrfs device remove --help
usage: btrfs device remove <device>|<devid> [<device>|<devid>...] <path>

    Remove a device from a filesystem

    Remove a device from a filesystem, specified by a path to the device or
    as a device id in the filesystem. The btrfs signature is removed from
    the device.
    If 'missing' is specified for <device>, the first device that is
    described by the filesystem metadata, but not present at the mount
    time will be removed. (only in degraded mode)
    If 'cancel' is specified as the only device to delete, request cancellation
    of a previously started device deletion and wait until kernel finishes
    any pending work. This will not delete the device and the size will be
    restored to previous state. When deletion is not running, this will fail.

    --enqueue                 wait if there's another exclusive operation running, otherwise continue

It sounds like I could do this manually by mounting the filesystem in degraded mode, and then removing the 'missing' device. According to ChatGPT, it sounds like I could do the following to mount in degraded mode then removing the 'missing' device:

sudo mount -o degraded /mnt/cache

sudo btrfs device remove missing /mnt/cache

So, with the 1TB drive still removed, and the array started:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 2 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1
	devid    2 size 0 used 0 path /dev/sdj1 MISSING
    
# btrfs device remove missing /mnt/cache_250gb

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 1 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1

This is looking good.. so I stopped the array, inserted the 1TB drive again, but that just led to this again:

btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 1 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1
	devid    2 size 931.51GiB used 0.00B path /dev/sdj1

And if I started it it in this state, I just get back to this error:

"Unmountable: Unsupported or no file system"

Stopping the array again, I decided to set the 'cache_250gp' pool back to 2 slots, and just add the 1TB device back to it, which shows this warning:

"All existing data on this device will be OVERWRITTEN when array is Started"

Then started the array again, but that just leads to both the 250GB and 1TB devices showing as:

"Unmountable: Unsupported or no file system"

Stopping the array, and removing the 1TB device from slot 2, then attempting to start the array again, ensuring to tick the box next to this:

"Start will remove the missing cache disk and then bring the array on-line. Yes, I want to do this"

Leads to this error (with the array failing to start):

"Wrong Pool State: cache_250gb - too many missing/wrong devices"

Removing the 1TB SSD from the drive slow then trying again still results in the same error.

Edited May 24 by devalias

JorgeB · May 24

46 minutes ago, devalias said:

unassigned the 1TB disk from the existing 'cache' pool

set the 'cache' pool slots back to 1

renamed the 'cache' pool to 'cache_250gb'

created a new 'cache_1tb' pool with a single slot

added the 1TB disk to the 'cache_1tb' pool (I think as XFS, but not 100% sure)

started my array

That would never work.

Unassign both pool devices, start array, stop array, set one of the pools two 2 slots, assign both devices there, start array, post the diags.

devalias · May 24

1 hour ago, JorgeB said:

That would never work.

Well, it's unfortunate that the GUI didn't give me any indication/warning about that before it happened. But since it has already happened, let's work with what we have now.

Unassigned both devices:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 1 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1

Started the array, making sure to select this:

"Start will remove the missing cache disk and then bring the array on-line. Yes, I want to do this"

Stopped the array, assigned the 250GB to the 'cache_250gb' pool's first slot, and the 1TB to the 2nd slot, then started the array:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 1 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1
	devid    2 size 931.51GiB used 0.00B path /dev/sdj1

And in the unraid GUI it shows the following against both drives:

"Unmountable: Unsupported or no file system"

Diagnostics:

dalekanium-diagnostics-20240524-1933.zip

JorgeB · May 24

1 hour ago, devalias said:

Well, it's unfortunate that the GUI didn't give me any indication/warning about that before it happened.

I think some step is missing from your description, since after removing a pool device, you should not be allowed to change the cache slots.

The pool failed to balance previously, because the 250GB SSD didn't have enough space (note the size and used space are the same), not sure it's going to mount now, but try this:

With the array stopped type:

echo 1 > /sys/block/sdj/device/delete

Wait 10 secs, refresh the GUI, and the 1TB cache should drop offline (to get it back you just need to reboot later), start the array with that pool device missing (leaving slots set to two the same) and post new diags.

devalias · May 24

13 hours ago, JorgeB said:

I think some step is missing from your description, since after removing a pool device, you should not be allowed to change the cache slots.

Interesting.. I can't guarantee that my description isn't missing any steps; but as a 'current' example, here is the UI showing the pool with 2 slots, and the missing device, and I can definitely change those slots. Which I believe is the same sort of state it was in when I first got myself into this situation. Maybe a bug, if that's not meant to be possible?

Screenshot2024-05-25at9_51_58AM.png.537919e65858676bad3a0d6439bd9ee5.png

Stopped the array, ran that command, can see the 1TB is offline (see screenshot above), started the array (confirming "Start will remove the missing cache disk and then bring the array on-line. Yes, I want to do this"), the single drive is able to mount:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 1 FS bytes used 227.24GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1

Screenshot2024-05-25at10_01_07AM.png.432f108384ca1f09fc495eb46d9fa9c6.png

Diagnostics:

dalekanium-diagnostics-20240525-0956.zip

Looking at `./config/pools/cache_250gb.cfg` I can see that `diskFsProfile="single"` now; which seems better than the earlier `diskFsProfile="raid1"`

After stopping the array again, the 2nd slot is no longer showing as a 'missing' disk:

image.png.15e513896a28b19e3bb7029e394dd1fd.png

I think I got to this state before (or something that looked like it) on my own, but then when the 1TB disk was re-introduced, it seemed to break things. I'll wait for your next steps so as not to assume; but wondering if that will be the case again.

Edited May 25 by devalias

JorgeB · May 25

9 hours ago, devalias said:

Interesting.. I can't guarantee that my description isn't missing any steps; but as a 'current' example, here is the UI showing the pool with 2 slots, and the missing device, and I can definitely change those slots. Which I believe is the same sort of state it was in when I first got myself into this situation. Maybe a bug, if that's not meant to be possible?

Hmm, not sure how that is happening, if I unassign a pool device I see this:

Note that the number of slots is greydout, it's not selectable, I thought that it might be different if the device is missing, vs you unassign it, but I see the same if I make the device drop, can you detail all the steps done to see if you are doing something different?

Regarding the pool, good that it's mounting now but I do see a warning:

May 25 09:54:29 Dalekanium kernel: BTRFS warning (device sdb1): devid 1 physical 0 len 4194304 inside the reserved space

It's just a warning, not an error, so it may not be serious, and from what I can see a balance may fix it, so you have two options:

- backup that pool somewhere else and then reformat

- free up some space on that filesystem, since it's very full, at least 20GB, then run a balance to see if the error goes away, before doing this I would recommend at least making sure anything important in the pool is backed up somewhere else, in case it goes wrong.

devalias · May 26

19 hours ago, JorgeB said:

Note that the number of slots is greydout, it's not selectable, I thought that it might be different if the device is missing, vs you unassign it, but I see the same if I make the device drop, can you detail all the steps done to see if you are doing something different?

I can't look deeper into this at the moment; but can see if I can mimic that state later on. Immediately off the top of my head, my first wonder was whether there is some special handling for a pool called 'cache' (yours) vs something else (mine); and/or if you're running a different version of Unraid to me, that may have better protections that mine doesn't.

19 hours ago, JorgeB said:

It's just a warning, not an error, so it may not be serious, and from what I can see a balance may fix it, so you have two options:

- backup that pool somewhere else and then reformat

- free up some space on that filesystem, since it's very full, at least 20GB, then run a balance to see if the error goes away, before doing this I would recommend at least making sure anything important in the pool is backed up somewhere else, in case it goes wrong.

So with this, my high level plan was to reduce the space used on this drive (thus adding the new 1TB). The docker image is ~120GB, so I can easily move that off to free up the space, but I guess ideally I would like to understand a bit of the 'why' of things before doing so, in case it's helpful to others (or me again) here in future.

I'm not sure what the 'balance' is meant to do, but given my desired end goal is to have 2 completely independent drives in their own 1 slot pools; I wonder if the 'balance' is needed/useful/etc. What is expected to happen if/when it is able to 'balance' properly?

Also, I guess my main question at the moment relates to the 1TB drive. We told the system to 'remove' it, which would be restored next reboot; but I haven't rebooted yet. Previously when I achieved similar (I think) by removing the 1TB drive from the hotswap bay, when I added it back, it somehow seemed to get 'reconnected' into the BTRFS RAID/etc (even though I had removed it from it); and I was sort of wondering/worried if that was likely to happen again; and if it does, what would be the expected 'solution' to remedy that? Like is there something on that 1TB disk that 'remembers' it was part of a BTRFS RAID/similar that would 're-add' it now (that we need to clean up/clear/etc)?

I don't have time to do the reboot/check right now, so it may be that it will all 'just work' and not be an issue; but I figured I would ask the question up front to save another back and forth delay later on when I get a chance to try it properly.

Thanks for your assistance so far! 🖤

JorgeB · May 26

5 hours ago, devalias said:

is some special handling for a pool called 'cache' (yours) vs something else (mine)

It's the same for any pool, name doesn't matter.

5 hours ago, devalias said:

if you're running a different version of Unraid to me

I used the same one to confirm, v6.12.10

5 hours ago, devalias said:

I'm not sure what the 'balance' is meant to do, but given my desired end goal is to have 2 completely independent drives in their own 1 slot pools; I wonder if the 'balance' is needed/useful/etc. What is expected to happen if/when it is able to 'balance' properly?

The balance would be to see if it resolves that warning.

When you reboot you should disable array auto-start, and before array start wipe the 1TB device with

blkdiscard -f /dev/sdX

replace X with the correct letter, this way it will no longer interfere with the current pool.

devalias · May 28

On 5/26/2024 at 7:49 PM, JorgeB said:

It's the same for any pool, name doesn't matter.

_nods_ fair enough. Was just thinking along the lines of perhaps there was some leftover legacy code/bug where it was handled that way for the 'old style' cache pool, but may not have been for others.

On 5/26/2024 at 7:49 PM, JorgeB said:

I used the same one to confirm, v6.12.10

Good to know. I always like to double check rather than make assumptions.

On 5/26/2024 at 7:49 PM, JorgeB said:

The balance would be to see if it resolves that warning.

Deeper explanation via ChatGPT:

The warning message you're seeing from the BTRFS file system indicates an issue with a device (`sdb1`) where the physical location (`physical 0`) of a certain block (`len 4194304`) is inside the reserved space. This could point to a corruption or inconsistency in the file system's metadata.

**Explanation of the Warning:**
- `devid 1`: Refers to the device ID within the BTRFS file system.
- `physical 0`: The physical offset on the device where the issue is detected.
- `len 4194304`: Length of the block in bytes.
- `inside the reserved space`: Indicates that this block is located in a region reserved for file system metadata or other internal purposes.

**Why a 'Balance' Could Be a Potential Solution:**

A BTRFS balance operation is designed to reallocate the chunks of data and metadata across the available devices. This can help to:
1. **Redistribute Data and Metadata:** The balance operation redistributes data and metadata chunks, which can help to resolve allocation issues or inconsistencies.
2. **Repair Fragmentation:** Over time, as data is written and deleted, fragmentation can occur. Balancing helps to defragment the file system.
3. **Correct Metadata Issues:** The operation can potentially correct inconsistencies in the metadata, as it moves and rewrites chunks.

On 5/26/2024 at 7:49 PM, JorgeB said:

When you reboot you should disable array auto-start

To do this, we can go to Unraid -> Settings -> Disk Settings -> Enable auto start; and set it to 'No'.

On 5/26/2024 at 7:49 PM, JorgeB said:
When you reboot you should disable array auto-start, and before array start wipe the 1TB device with
blkdiscard -f /dev/sdX
replace X with the correct letter, this way it will no longer interfere with the current pool.

Deeper explanation via ChatGPT:

**`blkdiscard -f /dev/sdX`**

- **`blkdiscard`:** This command is used to discard (or trim) all blocks on a device. It is commonly used with SSDs to inform the device which blocks of data are no longer considered in use and can be erased internally.
- **`-f`:** Force the discard operation. This option forces the command to proceed without prompting for confirmation.
- **`/dev/sdX`:** The device to be wiped. Replace `X` with the appropriate letter corresponding to the 1TB device (e.g., `/dev/sdb`).

And in asking ChatGPT if there was a less 'scorched earth' approach to checking/removing the BTRFS metadata from the 1TB drive:

Yes, there are less drastic ways to inspect and deal with BTRFS metadata or similar data on the drive before resorting to `blkdiscard`. Here are some methods you can use to check for BTRFS metadata and potentially clean the drive in a less destructive manner:

### Checking for BTRFS Metadata

1. **Identify the Device:**
First, make sure you know the correct device identifier (e.g., `/dev/sdb`). You can use `lsblk` or `fdisk -l` to list all block devices and their partitions:

```bash
lsblk
```

or

```bash
fdisk -l
```

2. **Check for BTRFS Filesystem:**
Use the `btrfs` command to check if a BTRFS filesystem exists on the device:

```bash
btrfs check /dev/sdX
```

Replace `sdX` with the correct device identifier. This command will perform a read-only check of the BTRFS filesystem on the specified device.

3. **Inspect BTRFS Superblocks:**
You can inspect the BTRFS superblocks to get information about the filesystem:

```bash
btrfs inspect-internal dump-super /dev/sdX
```

This command will display detailed information about the BTRFS superblock, which can help you determine if the device contains BTRFS metadata.

### Removing BTRFS Metadata

If you confirm that the device contains BTRFS metadata, but you want a less destructive way to deal with it, you can use the following methods:

1. **Delete the BTRFS Filesystem:**
You can delete the BTRFS filesystem on the device without wiping the entire drive:

```bash
mkfs.btrfs -f /dev/sdX
```

This will create a new, empty BTRFS filesystem on the device, effectively removing the old BTRFS metadata. The `-f` option forces the creation of the new filesystem, overwriting any existing data.

2. **Zero the Beginning of the Drive:**
If you want to ensure that the partition table and initial sectors are cleared but leave the rest of the drive intact, you can use the `dd` command to zero out the beginning of the drive:

```bash
dd if=/dev/zero of=/dev/sdX bs=1M count=100
```

This command writes zeros to the first 100 megabytes of the device, which is usually enough to clear the partition table and any initial filesystem metadata.

3. **Wipe Filesystem Signatures:**
Use the `wipefs` command to remove filesystem signatures from the device:

```bash
wipefs -a /dev/sdX
```

The `-a` option removes all signatures from the specified device.

### Summary

- **Identify the Device:** Use `lsblk` or `fdisk -l` to find the correct device identifier.
- **Check for BTRFS Metadata:** Use `btrfs check` and `btrfs inspect-internal dump-super` to inspect the device.
- **Remove BTRFS Metadata:**
- Use `mkfs.btrfs -f` to create a new BTRFS filesystem, effectively removing old metadata.
- Use `dd` to zero out the beginning of the drive to clear partition tables and initial metadata.
- Use `wipefs -a` to remove filesystem signatures.

These methods provide a way to inspect and clean the drive in a less destructive manner than using `blkdiscard`, allowing for more targeted actions based on what is found during the inspection.

---

These are the steps I followed just now:

Made a backup of all the files on my 250GB drive: `rsync -av --delete /mnt/cache_250gb/ /mnt/disk6/TempDumpBackups/cache_250gb/`
Set 'Unraid -> Settings -> Disk Settings -> Enable auto start' to 'No'.
Rebooted my NAS
Logged back in to the Unraid web GUI
Confirmed that the 250GB drive is showing as the only device in the 'cache_250gb' pool
Confirmed that the 1TB drive is showing in 'Unassigned disk devices'

Checking the BTRFS filesystem, the 1TB drive is still showing:

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 1 FS bytes used 226.42GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1
	devid    2 size 931.51GiB used 0.00B path /dev/sde1

Checking the mapping for the 1TB drive after rebooting (and noting that it has changed from what it was previously)

# fdisk -l | grep -B 1 -A 5 "Samsung"
Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 850
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000
--
Disk /dev/sde: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: Samsung SSD 870
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Tried checking the BTRFS filesystem on the 1TB drive

# btrfs check /dev/sde
Opening filesystem to check...
No valid Btrfs found on /dev/sde
ERROR: cannot open file system

# btrfs check /dev/sde1
Opening filesystem to check...
parent transid verify failed on 166051840 wanted 100181326 found 100181352
parent transid verify failed on 166051840 wanted 100181326 found 100181352
Ignoring transid failure
ERROR: root [2 0] level 0 does not match 2

ERROR: could not setup extent tree
ERROR: cannot open file system

Ran `blkdiscard` to wipe the 1TB drive:

# blkdiscard -f /dev/sde
blkdiscard: Operation forced, data will be lost!

Confirmed that the 1TB drive is no longer showing in BTRFS

# btrfs filesystem show
Label: none  uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2
	Total devices 1 FS bytes used 226.42GiB
	devid    1 size 232.89GiB used 232.89GiB path /dev/sdb1

Checked that no filesystem was shown for the 1TB drive in Unraid web GUI
Set the 'cache_250gb' slots back to 1
Checked/updated all 'Shares' that used to be on 'cache' were now updated to properly be set to 'cache_250gb'
Started the array
Confirmed the data on the 'cache_250gb' pool mounted correctly
Set 'Unraid -> Settings -> Disk Settings -> Enable auto start' to 'Yes'.
Moved the `docker.img` file out of the 'backup directory' so it doesn't accidentally get wiped out with a future sync
- ```
/mnt/disk6/TempDumpBackups/cache_250gb# mv docker.img ../
```
Set 'Unraid -> Settings -> Docker -> Docker vDisk location' to '/mnt/disk6/TempDumpBackups/docker.img'
Set 'Unraid -> Settings -> Docker -> Default appdata storage location' to '/mnt/cache_250gb/appdata/'
Set 'Unraid -> Settings -> VM Manager -> Default VM storage path' to '/mnt/user/libvirt/libvirt.img'
Set 'Unraid -> Settings -> VM Manager -> Default ISO storage path' to '/mnt/user/VMs/'
Ran the 'Fix Common Problems' plugin, and checked/remedied any outstanding issues that seemed relevant
Checked if/how it's possible to start the Docker service without letting my autostart containers run + fixed their appdata path mapping for the new cache pool name
- https://forums.unraid.net/topic/143270-how-to-enable-docker-without-having-the-containers-i-have-set-to-auto-run-run/
- Made another backup of the `docker.img` file, just in case
- Mounted the `docker.img`, and removed the containers from autostart:
  - ```
  # mkdir /mnt/docker_img
  
  # mount -o loop docker.img /mnt/docker_img
  
  # cd /mnt/docker_img/
  
  /mnt/docker_img# ls
  btrfs/    buildkit/    containers/  image/    plugins/   swarm/  trust/   unraid-autostart           volumes/
  builder/  containerd/  engine-id    network/  runtimes/  tmp/    unraid/  unraid-update-status.json
  
  /mnt/docker_img# file unraid-autostart
  unraid-autostart: ASCII text
  
  /mnt/docker_img# cat unraid-autostart
  this
  has
  a
  list
  of
  autostart
  docker
  containers
  
  /mnt/docker_img# cp unraid-autostart unraid-autostart.bkup
  
  /mnt/docker_img# echo "" > ./unraid-autostart
  
  /mnt/docker_img# cd /mnt
  
  /mnt# umount /mnt/docker_img
```
- Set 'Unraid -> Settings -> Docker -> Enable Docker' to 'Yes'
- On the 'Docker' tab, for each container, checked/updated any mappings directly on `/mnt/cache/*` (eg. `/mnt/cache/appdata`) and updated them to `/mnt/cache_250gb/*` (eg. `/mnt/cache-250gb/appdata`), clicked 'Apply' (since 'Save' doesn't seem to update the mappings in the Unraid Web UI), then manually stopped the container again.
- Set 'Unraid -> Settings -> Docker -> Enable Docker' to 'No'
- Mounted the `docker.img`, and restored the previous autostart containers:
  - ```
  /mnt# umount /mnt/docker_img
  
  /mnt# mount -o loop /mnt/disk6/TempDumpBackups/docker.img /mnt/docker_img
  
  /mnt# cd /mnt/docker_img/
  
  /mnt/docker_img# rm unraid-autostart
  
  /mnt/docker_img# mv unraid-autostart.bkup unraid-autostart
  
  /mnt/docker_img# cd ..
  
  /mnt# umount /mnt/docker_img
  
  /mnt# rm -r /mnt/docker_img/
```
- Set 'Unraid -> Settings -> Docker -> Enable Docker' to 'Yes'
- Confirmed that all of the appropriate containers auto-started, and that their path mappings are all still properly updated to the new cache pool name/mount.
- Updated all the Docker containers to the latest versions for good measure

Went to the 'cache_250gb' pool settings, and ran the 'Balance' command (which is still running now)

## From the Unraid Web UI

Balance Status

btrfs filesystem df:
  Data, single: total=228.87GiB, used=225.71GiB
  System, single: total=4.00MiB, used=48.00KiB
  Metadata, single: total=4.01GiB, used=1.24GiB
  GlobalReserve, single: total=304.28MiB, used=16.00KiB

btrfs balance status:
  Balance on '/mnt/cache_250gb/' is running
  5 out of about 236 chunks balanced (6 considered),  98% left

## From SSH / Terminal

```
# btrfs balance status /mnt/cache_250gb/
Balance on '/mnt/cache_250gb/' is running
5 out of about 236 chunks balanced (6 considered),  98% left
```

I'll edit this post with further info once the balance command finishes running; and then potentially post another diagnostics .zip to go along with it.

Edit: Part way through the balance I realised I hadn't removed the old `docker.img` file from the 'cache_250gb' pool; so have done that now.. we'll see if that causes any issues that requires another balance again after this I guess..

Edited May 28 by devalias

JorgeB · May 28

7 minutes ago, devalias said:

and then potentially post another diagnostics .zip to go along with it.

Yes, re-start the array once the balance finishes and post new diags.

devalias · May 28

Balance finished:

Balance Status

btrfs filesystem df:
  Data, single: total=106.00GiB, used=105.37GiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=2.00GiB, used=1.22GiB GlobalReserve, single: total=292.67MiB, used=0.00B

btrfs balance status:
  No balance found on '/mnt/cache_250gb'

Current usage ratio: 99.4 % --- No Balance required

Diagnostics: dalekanium-diagnostics-20240528-2105.zip

JorgeB · May 28

3 hours ago, JorgeB said:

re-start the array once the balance finishes and post new diags.

devalias · May 28

It's been a long day and was trying to rush this out; my bad.. I have now made sure to stop and start the array (I assume you meant that, and not a literal reboot; but if not, let me know)

Diagnostics: dalekanium-diagnostics-20240528-2249.zip

JorgeB · May 28

The btrfs warning is gone, so everything should be OK now, you can now also create the new pool with the other device.

devalias · May 29

11 hours ago, JorgeB said:

can now also create the new pool with the other device.

Sweet, sounds good. Added the 1TB drive to the 'cache_1tb' pool (with 1 slot), set it to xfs, started the array, got "Unmountable: Unsupported or no file system", selected to format the drive, and all looks good now.

Thanks for your help!

Cache Pool - Remove a disk

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation