jbrukardt Posted January 10, 2023 Share Posted January 10, 2023 Good morning all, I accidentally mis-allocated a new NVME drive I put in the system. I meant to make it a new cache pool that I was going to use for VMs only, but accidentally added it to the existing cache pool which houses mover/appdata/etc. I would like to remove it.... but it seems that breaks the entire cache pool if i do that. What is the procedure for removing a disk from the cache pool. Screenshots attached of the cache details. Quote Link to comment
Solution JorgeB Posted January 10, 2023 Solution Share Posted January 10, 2023 Unassign the device you want to remove (leave it unassigned for now) and start the array, once the removal is complete you can assign to a new pool. Quote Link to comment
jbrukardt Posted January 10, 2023 Author Share Posted January 10, 2023 36 minutes ago, JorgeB said: Unassign the device you want to remove (leave it unassigned for now) and start the array, once the removal is complete you can assign to a new pool. ill try again, but that resulted in the cache pool not mounting due to a missing device. Quote Link to comment
trurl Posted January 10, 2023 Share Posted January 10, 2023 attach diagnostics to your NEXT post in this thread Quote Link to comment
johntdyer Posted January 13, 2023 Share Posted January 13, 2023 I'm having the same issue, here are my diagnostic vault-diagnostics-20230113-1516.zip Quote Link to comment
JorgeB Posted January 14, 2023 Share Posted January 14, 2023 12 hours ago, johntdyer said: I'm having the same issue Not really the same, your cache was xfs: J an 13 11:50:17 vault emhttpd: shcmd (71): mkdir -p /mnt/cache Jan 13 11:50:17 vault emhttpd: shcmd (72): mount -t xfs -o noatime,nouuid /dev/sdc1 /mnt/cache Jan 13 11:50:17 vault kernel: XFS (sdc1): Mounting V5 Filesystem Jan 13 11:50:17 vault kernel: XFS (sdc1): Starting recovery (logdev: internal) Jan 13 11:50:17 vault kernel: XFS (sdc1): Ending recovery (logdev: internal) Jan 13 11:50:17 vault kernel: xfs filesystem being mounted at /mnt/cache supports timestamps until 2038 (0x7fffffff) You cannot add a device to a xfs pool, they support single devices only, you can go back to the single xfs cache pool, backup then reformat btrfs. Quote Link to comment
ximian Posted May 8 Share Posted May 8 (edited) Stop the Array Remove the device in the cache pool Go into Array Operations There will be a text next to the Start (under it there is a checkbox to start array with missing device in cache pool) Select the checkbox and start the array Start the array Edited May 8 by ximian Quote Link to comment
devalias Posted May 24 Share Posted May 24 (edited) I'm not sure if it's the exact same issue, but I am having a problem that sounds very similar to this. Context (as best as I can remember the specific steps/etc): I had an existing (legacy) BTRFS cache drive (250GB SSD), and I wanted to add a new 1TB SSD alongside it. At first I hadn't looked deeply into the new 'pools' config and how it works, so just expanded the slots of my existing 'cache' pool, and added the new 1TB drive to it. I then started up my array, and noticed that it seemed to be combining both of those drives together; which didn't seem to be what I wanted. I stopped the array, watched a video about the pools (Ref), and decided I wanted to create a new pool to add the 1TB drive to. So I: unassigned the 1TB disk from the existing 'cache' pool set the 'cache' pool slots back to 1 renamed the 'cache' pool to 'cache_250gb' created a new 'cache_1tb' pool with a single slot added the 1TB disk to the 'cache_1tb' pool (I think as XFS, but not 100% sure) started my array At this point, I expected both pools to work, or maybe the 'cache_1tb' to need to be formatted or similar, but both pools ended up with an error message like this: Unmountable: Unsupported or no file system My guess is that somehow in that process, I confused unraid/etc about what the old 250GB disk in 'cache_250gb' (previously just 'cache') is, and that perhaps it's still expecting to try and load 2 disks in that pool or similar. I'm hoping it'll just be a little metadata tweak/repair/similar, and not actually a fully corrupted drive (as I stupidly didn't think I would need to make a full backup of it just to add a new disk in) I also tried unassigning the 1TB drive from both pools, and restarting the array with the "Start will remove the missing cache disk and then brine the array on-line" option enabled; but it still didn't seem to help. My guess is that maybe if I had done that when I first unassigned the 1TB drive from the 'cache_250gb' pool, and started up the array before creating the 'cache_1tb' pool and assigning the 1TB to it, that maybe it would have 'just worked'. Attached diagnostics (Ref) : dalekanium-diagnostics-20240524-1725.zip I don't necessarily know what i'm looking for within it, but I noticed that within those diagnostics that `./config/pools/cache_250gb.cfg` has `diskFsProfile="raid1"`; and I wonder if maybe that should be `diskFsProfile=""` now that there is only a single slot in that pool. Edit: Further info: I was skimming through this thread: https://www.reddit.com/r/btrfs/comments/1507tl4/can_i_recover_a_btrfs_raid1_disk_that_was_partly/ And saw that one of the commands was `btrfs filesystem show`; so decided to SSH onto my unraid server (while the array is still stopped), and run it; which gave me the following: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 2 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 devid 2 size 931.51GiB used 0.00B path /dev/sdj1 That would seem to correlate with the 2 SSD's, and looks as though they may still be seen as a single 'combined pool'; at least at this level. To confirm that theory: # smartctl -i /dev/sdb | grep "Device Model" Device Model: Samsung SSD 850 EVO 250GB # smartctl -i /dev/sdj | grep "Device Model" Device Model: Samsung SSD 870 EVO 1TB Edit 2: Looking through some other related threads lead me to the FAQ, which in hindsight explains the 'proper' way to do this; though it's a little frustrating the caveats/process it has there for how/when to remove things; that seemingly isn't checked/enforced by the unraid GUI at all to prevent issues like this from occuring in the first place: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/#comment-480418 There are some commands here for how to change the mode, though i'm hesitant to run them at the moment, and they look like they might need to be done while the array is online (at least for the ones that reference `/mnt/cache`): https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=480421 Clicking through to the 'cache_250gb' pool in the Unraid GUI, in the 'Balance Status' section, says that 'Balance is only available when array is Started'. This forum thread suggested that I will need to "You have to wipe or disconnect the SSD you want to remove, or you'll get unmountable cache pool: https://forums.unraid.net/topic/51133-remove-a-drive-from-a-cache-pool/?do=findComment&comment=504089 Since wiping sounds scary at this point, I just removed it (hot swappable drive bays ftw), which showed an Unraid GUI warning: Unraid Cache_250gb disk message: 24-05-2024 18:10 Warning [REDACTED] - Cache pool BTRFS missing device(s) Samsung_SSD_850_EVO_250GB_S3NYNF0J906569J (sdb) After which, I ran the following again: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 2 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 *** Some devices missing Since this is now looking a little more positive (relatively), I decided to try starting the array again to see what would happen; and this time my 'cache_250gb' pool seemed to be able to mount (though seemingly with an incorrect size that still included the space of the old disk): And running this command again: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 2 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 devid 2 size 0 used 0 path /dev/sdj1 MISSING Since the array is running, and the disk mounted, I thought I might be able to follow this, and remove the device: https://forums.unraid.net/topic/51133-remove-a-drive-from-a-cache-pool/?do=findComment&comment=507082 But it seems not: # btrfs device remove /dev/sdj1 /mnt/cache_250gb ERROR: not a block device: /dev/sdj1 Edit 3: Reading `btrfs device remove --help`: # btrfs device remove --help usage: btrfs device remove <device>|<devid> [<device>|<devid>...] <path> Remove a device from a filesystem Remove a device from a filesystem, specified by a path to the device or as a device id in the filesystem. The btrfs signature is removed from the device. If 'missing' is specified for <device>, the first device that is described by the filesystem metadata, but not present at the mount time will be removed. (only in degraded mode) If 'cancel' is specified as the only device to delete, request cancellation of a previously started device deletion and wait until kernel finishes any pending work. This will not delete the device and the size will be restored to previous state. When deletion is not running, this will fail. --enqueue wait if there's another exclusive operation running, otherwise continue It sounds like I could do this manually by mounting the filesystem in degraded mode, and then removing the 'missing' device. According to ChatGPT, it sounds like I could do the following to mount in degraded mode then removing the 'missing' device: sudo mount -o degraded /mnt/cache sudo btrfs device remove missing /mnt/cache So, with the 1TB drive still removed, and the array started: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 2 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 devid 2 size 0 used 0 path /dev/sdj1 MISSING # btrfs device remove missing /mnt/cache_250gb # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 1 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 This is looking good.. so I stopped the array, inserted the 1TB drive again, but that just led to this again: btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 1 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 devid 2 size 931.51GiB used 0.00B path /dev/sdj1 And if I started it it in this state, I just get back to this error: "Unmountable: Unsupported or no file system" Stopping the array again, I decided to set the 'cache_250gp' pool back to 2 slots, and just add the 1TB device back to it, which shows this warning: "All existing data on this device will be OVERWRITTEN when array is Started" Then started the array again, but that just leads to both the 250GB and 1TB devices showing as: "Unmountable: Unsupported or no file system" Stopping the array, and removing the 1TB device from slot 2, then attempting to start the array again, ensuring to tick the box next to this: "Start will remove the missing cache disk and then bring the array on-line. Yes, I want to do this" Leads to this error (with the array failing to start): "Wrong Pool State: cache_250gb - too many missing/wrong devices" Removing the 1TB SSD from the drive slow then trying again still results in the same error. Edited May 24 by devalias Quote Link to comment
JorgeB Posted May 24 Share Posted May 24 46 minutes ago, devalias said: unassigned the 1TB disk from the existing 'cache' pool set the 'cache' pool slots back to 1 renamed the 'cache' pool to 'cache_250gb' created a new 'cache_1tb' pool with a single slot added the 1TB disk to the 'cache_1tb' pool (I think as XFS, but not 100% sure) started my array That would never work. Unassign both pool devices, start array, stop array, set one of the pools two 2 slots, assign both devices there, start array, post the diags. Quote Link to comment
devalias Posted May 24 Share Posted May 24 1 hour ago, JorgeB said: That would never work. Well, it's unfortunate that the GUI didn't give me any indication/warning about that before it happened. But since it has already happened, let's work with what we have now. Unassigned both devices: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 1 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 Started the array, making sure to select this: "Start will remove the missing cache disk and then bring the array on-line. Yes, I want to do this" Stopped the array, assigned the 250GB to the 'cache_250gb' pool's first slot, and the 1TB to the 2nd slot, then started the array: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 1 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 devid 2 size 931.51GiB used 0.00B path /dev/sdj1 And in the unraid GUI it shows the following against both drives: "Unmountable: Unsupported or no file system" Diagnostics: dalekanium-diagnostics-20240524-1933.zip Quote Link to comment
JorgeB Posted May 24 Share Posted May 24 1 hour ago, devalias said: Well, it's unfortunate that the GUI didn't give me any indication/warning about that before it happened. I think some step is missing from your description, since after removing a pool device, you should not be allowed to change the cache slots. The pool failed to balance previously, because the 250GB SSD didn't have enough space (note the size and used space are the same), not sure it's going to mount now, but try this: With the array stopped type: echo 1 > /sys/block/sdj/device/delete Wait 10 secs, refresh the GUI, and the 1TB cache should drop offline (to get it back you just need to reboot later), start the array with that pool device missing (leaving slots set to two the same) and post new diags. Quote Link to comment
devalias Posted May 24 Share Posted May 24 (edited) 13 hours ago, JorgeB said: I think some step is missing from your description, since after removing a pool device, you should not be allowed to change the cache slots. Interesting.. I can't guarantee that my description isn't missing any steps; but as a 'current' example, here is the UI showing the pool with 2 slots, and the missing device, and I can definitely change those slots. Which I believe is the same sort of state it was in when I first got myself into this situation. Maybe a bug, if that's not meant to be possible? Stopped the array, ran that command, can see the 1TB is offline (see screenshot above), started the array (confirming "Start will remove the missing cache disk and then bring the array on-line. Yes, I want to do this"), the single drive is able to mount: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 1 FS bytes used 227.24GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 Diagnostics: dalekanium-diagnostics-20240525-0956.zip Looking at `./config/pools/cache_250gb.cfg` I can see that `diskFsProfile="single"` now; which seems better than the earlier `diskFsProfile="raid1"` After stopping the array again, the 2nd slot is no longer showing as a 'missing' disk: I think I got to this state before (or something that looked like it) on my own, but then when the 1TB disk was re-introduced, it seemed to break things. I'll wait for your next steps so as not to assume; but wondering if that will be the case again. Edited May 25 by devalias Quote Link to comment
JorgeB Posted May 25 Share Posted May 25 9 hours ago, devalias said: Interesting.. I can't guarantee that my description isn't missing any steps; but as a 'current' example, here is the UI showing the pool with 2 slots, and the missing device, and I can definitely change those slots. Which I believe is the same sort of state it was in when I first got myself into this situation. Maybe a bug, if that's not meant to be possible? Hmm, not sure how that is happening, if I unassign a pool device I see this: Note that the number of slots is greydout, it's not selectable, I thought that it might be different if the device is missing, vs you unassign it, but I see the same if I make the device drop, can you detail all the steps done to see if you are doing something different? Regarding the pool, good that it's mounting now but I do see a warning: May 25 09:54:29 Dalekanium kernel: BTRFS warning (device sdb1): devid 1 physical 0 len 4194304 inside the reserved space It's just a warning, not an error, so it may not be serious, and from what I can see a balance may fix it, so you have two options: - backup that pool somewhere else and then reformat - free up some space on that filesystem, since it's very full, at least 20GB, then run a balance to see if the error goes away, before doing this I would recommend at least making sure anything important in the pool is backed up somewhere else, in case it goes wrong. Quote Link to comment
devalias Posted May 26 Share Posted May 26 19 hours ago, JorgeB said: Note that the number of slots is greydout, it's not selectable, I thought that it might be different if the device is missing, vs you unassign it, but I see the same if I make the device drop, can you detail all the steps done to see if you are doing something different? I can't look deeper into this at the moment; but can see if I can mimic that state later on. Immediately off the top of my head, my first wonder was whether there is some special handling for a pool called 'cache' (yours) vs something else (mine); and/or if you're running a different version of Unraid to me, that may have better protections that mine doesn't. 19 hours ago, JorgeB said: It's just a warning, not an error, so it may not be serious, and from what I can see a balance may fix it, so you have two options: - backup that pool somewhere else and then reformat - free up some space on that filesystem, since it's very full, at least 20GB, then run a balance to see if the error goes away, before doing this I would recommend at least making sure anything important in the pool is backed up somewhere else, in case it goes wrong. So with this, my high level plan was to reduce the space used on this drive (thus adding the new 1TB). The docker image is ~120GB, so I can easily move that off to free up the space, but I guess ideally I would like to understand a bit of the 'why' of things before doing so, in case it's helpful to others (or me again) here in future. I'm not sure what the 'balance' is meant to do, but given my desired end goal is to have 2 completely independent drives in their own 1 slot pools; I wonder if the 'balance' is needed/useful/etc. What is expected to happen if/when it is able to 'balance' properly? Also, I guess my main question at the moment relates to the 1TB drive. We told the system to 'remove' it, which would be restored next reboot; but I haven't rebooted yet. Previously when I achieved similar (I think) by removing the 1TB drive from the hotswap bay, when I added it back, it somehow seemed to get 'reconnected' into the BTRFS RAID/etc (even though I had removed it from it); and I was sort of wondering/worried if that was likely to happen again; and if it does, what would be the expected 'solution' to remedy that? Like is there something on that 1TB disk that 'remembers' it was part of a BTRFS RAID/similar that would 're-add' it now (that we need to clean up/clear/etc)? I don't have time to do the reboot/check right now, so it may be that it will all 'just work' and not be an issue; but I figured I would ask the question up front to save another back and forth delay later on when I get a chance to try it properly. Thanks for your assistance so far! 🖤 Quote Link to comment
JorgeB Posted May 26 Share Posted May 26 5 hours ago, devalias said: is some special handling for a pool called 'cache' (yours) vs something else (mine) It's the same for any pool, name doesn't matter. 5 hours ago, devalias said: if you're running a different version of Unraid to me I used the same one to confirm, v6.12.10 5 hours ago, devalias said: I'm not sure what the 'balance' is meant to do, but given my desired end goal is to have 2 completely independent drives in their own 1 slot pools; I wonder if the 'balance' is needed/useful/etc. What is expected to happen if/when it is able to 'balance' properly? The balance would be to see if it resolves that warning. When you reboot you should disable array auto-start, and before array start wipe the 1TB device with blkdiscard -f /dev/sdX replace X with the correct letter, this way it will no longer interfere with the current pool. Quote Link to comment
devalias Posted May 28 Share Posted May 28 (edited) On 5/26/2024 at 7:49 PM, JorgeB said: It's the same for any pool, name doesn't matter. _nods_ fair enough. Was just thinking along the lines of perhaps there was some leftover legacy code/bug where it was handled that way for the 'old style' cache pool, but may not have been for others. On 5/26/2024 at 7:49 PM, JorgeB said: I used the same one to confirm, v6.12.10 Good to know. I always like to double check rather than make assumptions. On 5/26/2024 at 7:49 PM, JorgeB said: The balance would be to see if it resolves that warning. Deeper explanation via ChatGPT: The warning message you're seeing from the BTRFS file system indicates an issue with a device (`sdb1`) where the physical location (`physical 0`) of a certain block (`len 4194304`) is inside the reserved space. This could point to a corruption or inconsistency in the file system's metadata. **Explanation of the Warning:** - `devid 1`: Refers to the device ID within the BTRFS file system. - `physical 0`: The physical offset on the device where the issue is detected. - `len 4194304`: Length of the block in bytes. - `inside the reserved space`: Indicates that this block is located in a region reserved for file system metadata or other internal purposes. **Why a 'Balance' Could Be a Potential Solution:** A BTRFS balance operation is designed to reallocate the chunks of data and metadata across the available devices. This can help to: 1. **Redistribute Data and Metadata:** The balance operation redistributes data and metadata chunks, which can help to resolve allocation issues or inconsistencies. 2. **Repair Fragmentation:** Over time, as data is written and deleted, fragmentation can occur. Balancing helps to defragment the file system. 3. **Correct Metadata Issues:** The operation can potentially correct inconsistencies in the metadata, as it moves and rewrites chunks. On 5/26/2024 at 7:49 PM, JorgeB said: When you reboot you should disable array auto-start To do this, we can go to Unraid -> Settings -> Disk Settings -> Enable auto start; and set it to 'No'. On 5/26/2024 at 7:49 PM, JorgeB said: When you reboot you should disable array auto-start, and before array start wipe the 1TB device with blkdiscard -f /dev/sdX replace X with the correct letter, this way it will no longer interfere with the current pool. Deeper explanation via ChatGPT: **`blkdiscard -f /dev/sdX`** - **`blkdiscard`:** This command is used to discard (or trim) all blocks on a device. It is commonly used with SSDs to inform the device which blocks of data are no longer considered in use and can be erased internally. - **`-f`:** Force the discard operation. This option forces the command to proceed without prompting for confirmation. - **`/dev/sdX`:** The device to be wiped. Replace `X` with the appropriate letter corresponding to the 1TB device (e.g., `/dev/sdb`). And in asking ChatGPT if there was a less 'scorched earth' approach to checking/removing the BTRFS metadata from the 1TB drive: Yes, there are less drastic ways to inspect and deal with BTRFS metadata or similar data on the drive before resorting to `blkdiscard`. Here are some methods you can use to check for BTRFS metadata and potentially clean the drive in a less destructive manner: ### Checking for BTRFS Metadata 1. **Identify the Device:** First, make sure you know the correct device identifier (e.g., `/dev/sdb`). You can use `lsblk` or `fdisk -l` to list all block devices and their partitions: ```bash lsblk ``` or ```bash fdisk -l ``` 2. **Check for BTRFS Filesystem:** Use the `btrfs` command to check if a BTRFS filesystem exists on the device: ```bash btrfs check /dev/sdX ``` Replace `sdX` with the correct device identifier. This command will perform a read-only check of the BTRFS filesystem on the specified device. 3. **Inspect BTRFS Superblocks:** You can inspect the BTRFS superblocks to get information about the filesystem: ```bash btrfs inspect-internal dump-super /dev/sdX ``` This command will display detailed information about the BTRFS superblock, which can help you determine if the device contains BTRFS metadata. ### Removing BTRFS Metadata If you confirm that the device contains BTRFS metadata, but you want a less destructive way to deal with it, you can use the following methods: 1. **Delete the BTRFS Filesystem:** You can delete the BTRFS filesystem on the device without wiping the entire drive: ```bash mkfs.btrfs -f /dev/sdX ``` This will create a new, empty BTRFS filesystem on the device, effectively removing the old BTRFS metadata. The `-f` option forces the creation of the new filesystem, overwriting any existing data. 2. **Zero the Beginning of the Drive:** If you want to ensure that the partition table and initial sectors are cleared but leave the rest of the drive intact, you can use the `dd` command to zero out the beginning of the drive: ```bash dd if=/dev/zero of=/dev/sdX bs=1M count=100 ``` This command writes zeros to the first 100 megabytes of the device, which is usually enough to clear the partition table and any initial filesystem metadata. 3. **Wipe Filesystem Signatures:** Use the `wipefs` command to remove filesystem signatures from the device: ```bash wipefs -a /dev/sdX ``` The `-a` option removes all signatures from the specified device. ### Summary - **Identify the Device:** Use `lsblk` or `fdisk -l` to find the correct device identifier. - **Check for BTRFS Metadata:** Use `btrfs check` and `btrfs inspect-internal dump-super` to inspect the device. - **Remove BTRFS Metadata:** - Use `mkfs.btrfs -f` to create a new BTRFS filesystem, effectively removing old metadata. - Use `dd` to zero out the beginning of the drive to clear partition tables and initial metadata. - Use `wipefs -a` to remove filesystem signatures. These methods provide a way to inspect and clean the drive in a less destructive manner than using `blkdiscard`, allowing for more targeted actions based on what is found during the inspection. --- These are the steps I followed just now: Made a backup of all the files on my 250GB drive: `rsync -av --delete /mnt/cache_250gb/ /mnt/disk6/TempDumpBackups/cache_250gb/` Set 'Unraid -> Settings -> Disk Settings -> Enable auto start' to 'No'. Rebooted my NAS Logged back in to the Unraid web GUI Confirmed that the 250GB drive is showing as the only device in the 'cache_250gb' pool Confirmed that the 1TB drive is showing in 'Unassigned disk devices' Checking the BTRFS filesystem, the 1TB drive is still showing: # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 1 FS bytes used 226.42GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 devid 2 size 931.51GiB used 0.00B path /dev/sde1 Checking the mapping for the 1TB drive after rebooting (and noting that it has changed from what it was previously) # fdisk -l | grep -B 1 -A 5 "Samsung" Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors Disk model: Samsung SSD 850 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x00000000 -- Disk /dev/sde: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: Samsung SSD 870 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x00000000 Tried checking the BTRFS filesystem on the 1TB drive # btrfs check /dev/sde Opening filesystem to check... No valid Btrfs found on /dev/sde ERROR: cannot open file system # btrfs check /dev/sde1 Opening filesystem to check... parent transid verify failed on 166051840 wanted 100181326 found 100181352 parent transid verify failed on 166051840 wanted 100181326 found 100181352 Ignoring transid failure ERROR: root [2 0] level 0 does not match 2 ERROR: could not setup extent tree ERROR: cannot open file system Ran `blkdiscard` to wipe the 1TB drive: # blkdiscard -f /dev/sde blkdiscard: Operation forced, data will be lost! Confirmed that the 1TB drive is no longer showing in BTRFS # btrfs filesystem show Label: none uuid: f71d2fa8-91a0-4495-8baf-e1f77bd40eb2 Total devices 1 FS bytes used 226.42GiB devid 1 size 232.89GiB used 232.89GiB path /dev/sdb1 Checked that no filesystem was shown for the 1TB drive in Unraid web GUI Set the 'cache_250gb' slots back to 1 Checked/updated all 'Shares' that used to be on 'cache' were now updated to properly be set to 'cache_250gb' Started the array Confirmed the data on the 'cache_250gb' pool mounted correctly Set 'Unraid -> Settings -> Disk Settings -> Enable auto start' to 'Yes'. Moved the `docker.img` file out of the 'backup directory' so it doesn't accidentally get wiped out with a future sync /mnt/disk6/TempDumpBackups/cache_250gb# mv docker.img ../ Set 'Unraid -> Settings -> Docker -> Docker vDisk location' to '/mnt/disk6/TempDumpBackups/docker.img' Set 'Unraid -> Settings -> Docker -> Default appdata storage location' to '/mnt/cache_250gb/appdata/' Set 'Unraid -> Settings -> VM Manager -> Default VM storage path' to '/mnt/user/libvirt/libvirt.img' Set 'Unraid -> Settings -> VM Manager -> Default ISO storage path' to '/mnt/user/VMs/' Ran the 'Fix Common Problems' plugin, and checked/remedied any outstanding issues that seemed relevant Checked if/how it's possible to start the Docker service without letting my autostart containers run + fixed their appdata path mapping for the new cache pool name https://forums.unraid.net/topic/143270-how-to-enable-docker-without-having-the-containers-i-have-set-to-auto-run-run/ Made another backup of the `docker.img` file, just in case Mounted the `docker.img`, and removed the containers from autostart: # mkdir /mnt/docker_img # mount -o loop docker.img /mnt/docker_img # cd /mnt/docker_img/ /mnt/docker_img# ls btrfs/ buildkit/ containers/ image/ plugins/ swarm/ trust/ unraid-autostart volumes/ builder/ containerd/ engine-id network/ runtimes/ tmp/ unraid/ unraid-update-status.json /mnt/docker_img# file unraid-autostart unraid-autostart: ASCII text /mnt/docker_img# cat unraid-autostart this has a list of autostart docker containers /mnt/docker_img# cp unraid-autostart unraid-autostart.bkup /mnt/docker_img# echo "" > ./unraid-autostart /mnt/docker_img# cd /mnt /mnt# umount /mnt/docker_img Set 'Unraid -> Settings -> Docker -> Enable Docker' to 'Yes' On the 'Docker' tab, for each container, checked/updated any mappings directly on `/mnt/cache/*` (eg. `/mnt/cache/appdata`) and updated them to `/mnt/cache_250gb/*` (eg. `/mnt/cache-250gb/appdata`), clicked 'Apply' (since 'Save' doesn't seem to update the mappings in the Unraid Web UI), then manually stopped the container again. Set 'Unraid -> Settings -> Docker -> Enable Docker' to 'No' Mounted the `docker.img`, and restored the previous autostart containers: /mnt# umount /mnt/docker_img /mnt# mount -o loop /mnt/disk6/TempDumpBackups/docker.img /mnt/docker_img /mnt# cd /mnt/docker_img/ /mnt/docker_img# rm unraid-autostart /mnt/docker_img# mv unraid-autostart.bkup unraid-autostart /mnt/docker_img# cd .. /mnt# umount /mnt/docker_img /mnt# rm -r /mnt/docker_img/ Set 'Unraid -> Settings -> Docker -> Enable Docker' to 'Yes' Confirmed that all of the appropriate containers auto-started, and that their path mappings are all still properly updated to the new cache pool name/mount. Updated all the Docker containers to the latest versions for good measure Went to the 'cache_250gb' pool settings, and ran the 'Balance' command (which is still running now) ## From the Unraid Web UI Balance Status btrfs filesystem df: Data, single: total=228.87GiB, used=225.71GiB System, single: total=4.00MiB, used=48.00KiB Metadata, single: total=4.01GiB, used=1.24GiB GlobalReserve, single: total=304.28MiB, used=16.00KiB btrfs balance status: Balance on '/mnt/cache_250gb/' is running 5 out of about 236 chunks balanced (6 considered), 98% left ## From SSH / Terminal ``` # btrfs balance status /mnt/cache_250gb/ Balance on '/mnt/cache_250gb/' is running 5 out of about 236 chunks balanced (6 considered), 98% left ``` I'll edit this post with further info once the balance command finishes running; and then potentially post another diagnostics .zip to go along with it. Edit: Part way through the balance I realised I hadn't removed the old `docker.img` file from the 'cache_250gb' pool; so have done that now.. we'll see if that causes any issues that requires another balance again after this I guess.. Edited May 28 by devalias Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 7 minutes ago, devalias said: and then potentially post another diagnostics .zip to go along with it. Yes, re-start the array once the balance finishes and post new diags. Quote Link to comment
devalias Posted May 28 Share Posted May 28 Balance finished: Balance Status btrfs filesystem df: Data, single: total=106.00GiB, used=105.37GiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=2.00GiB, used=1.22GiB GlobalReserve, single: total=292.67MiB, used=0.00B btrfs balance status: No balance found on '/mnt/cache_250gb' Current usage ratio: 99.4 % --- No Balance required Diagnostics: dalekanium-diagnostics-20240528-2105.zip Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 3 hours ago, JorgeB said: re-start the array once the balance finishes and post new diags. Quote Link to comment
devalias Posted May 28 Share Posted May 28 It's been a long day and was trying to rush this out; my bad.. I have now made sure to stop and start the array (I assume you meant that, and not a literal reboot; but if not, let me know) Diagnostics: dalekanium-diagnostics-20240528-2249.zip Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 The btrfs warning is gone, so everything should be OK now, you can now also create the new pool with the other device. Quote Link to comment
devalias Posted May 29 Share Posted May 29 11 hours ago, JorgeB said: can now also create the new pool with the other device. Sweet, sounds good. Added the 1TB drive to the 'cache_1tb' pool (with 1 slot), set it to xfs, started the array, got "Unmountable: Unsupported or no file system", selected to format the drive, and all looks good now. Thanks for your help! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.