November 10, 20241 yr Had an issue with one of the cache ssds in my cache pool that forced a replacement, my cache filled and the mover was failing to move things to the array, the cache went offline and would not restart. One of the disks was complaining about SMART issues. Replaced the failing Samsung 870 Pro with a new WD Red SSD, powering back on showed that one of the array disks was wrong (Samsung as it was no longer there) so used the New Config Tool, preserving all, assigned the new disks back to the pool and started the array. This now has the cache pool showing as unmountable. If I look at the disk log for the original cache disk it shows: Nov 10 21:19:32 Bigbox kernel: BTRFS warning (device sdf1): devid 2 uuid cea00b5f-c218-446f-9216-033af2036ddf is missing Nov 10 21:19:32 Bigbox kernel: BTRFS warning (device sdf1): devid 1 uuid b8b1e94b-87ea-4c44-84cb-65b545316d8c is missing Nov 10 21:19:32 Bigbox kernel: BTRFS error (device sdf1): failed to read chunk root Nov 10 21:19:32 Bigbox kernel: BTRFS error (device sdf1): open_ctree failed Nov 10 21:19:32 Bigbox emhttpd: /usr/sbin/zpool import -f -d /dev/sdf1 2>&1 BTRFS utils show the following: root@Bigbox:~# btrfs check -s 1 /dev/sdf1 using SB copy 1, bytenr 67108864 Opening filesystem to check... warning, device 2 is missing warning, device 1 is missing bad tree block 4358586368, bytenr mismatch, want=4358586368, have=0 ERROR: cannot read chunk root ERROR: cannot open file system root@Bigbox:~# btrfs fi show Label: none uuid: f5a61371-857b-4fd5-aac2-572a1d3f2038 Total devices 1 FS bytes used 360.00KiB devid 1 size 20.00GiB used 536.00MiB path /dev/loop2 warning, device 2 is missing warning, device 1 is missing ERROR: cannot read chunk root Label: none uuid: c0524add-7ef5-45cf-a8be-f02d1d47a168 Total devices 3 FS bytes used 928.41GiB devid 3 size 931.51GiB used 681.48GiB path /dev/sdf1 *** Some devices missing I'm unsure where to go next to bring things back online. bigbox-diagnostics-20241110-2213.zip
November 11, 20241 yr Community Expert Was sdg the previous pool member device? If yes, post the output from fdisk -l /dev/sdg
November 11, 20241 yr Author sdf was the previous member. Output for both volumes is: root@Bigbox:~# fdisk -l /dev/sdg Disk /dev/sdg: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: WDC WDS100T1R0A Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes root@Bigbox:~# fdisk -l /dev/sdf Disk /dev/sdf: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: WDC WDS100T1R0A Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x00000000 Device Boot Start End Sectors Size Id Type /dev/sdf1 2048 1953525167 1953523120 931.5G 83 Linux
November 11, 20241 yr Community Expert 1 hour ago, chockymonster said: sdf was the previous member. sdf is the current member, was it sdg?
November 11, 20241 yr Author There were originally 2 Samsungs in the cache pool, one was replaced a few months ago.
November 11, 20241 yr Community Expert Pool is missing at least one device, if you no longer have it, even if it was wiped, it won't be possible to recover. Was the failed Samsung completely dead?
November 11, 20241 yr Author No, it still powers up, so I can add it back in. I thought I had the array configured as a raid 1, so I'm a bit confused about how you recover if a disk fails
November 11, 20241 yr Community Expert 15 hours ago, chockymonster said: Nov 10 21:19:32 Bigbox kernel: BTRFS warning (device sdf1): devid 2 uuid cea00b5f-c218-446f-9216-033af2036ddf is missing Nov 10 21:19:32 Bigbox kernel: BTRFS warning (device sdf1): devid 1 uuid b8b1e94b-87ea-4c44-84cb-65b545316d8c is missing According to this, the pool is missing two devices, raid1 can only recover from one missing device.
November 11, 20241 yr Author There's only one missing disk. At shutdown yesterday, Nov 10 20:29:21 Bigbox emhttpd: import 30 cache device: (sdf) WDC_WDS100T1R0A-68A4W0_22471C800876 Nov 10 20:29:21 Bigbox emhttpd: import 31 cache device: (sdg) Samsung_SSD_870_EVO_1TB_S626NF0R253545D Currently it looks like this Nov 10 20:52:34 Bigbox emhttpd: import 30 cache device: (sdf) WDC_WDS100T1R0A-68A4W0_22471C800876 Nov 10 20:52:34 Bigbox emhttpd: import 31 cache device: (sdg) WDC_WDS100T1R0A-68A4W0_232467442312 sdf has always been in the pool
November 11, 20241 yr Community Expert Not according to that output, post the output from: mkdir /x mount -v -t btrfs /dev/sdf1 /x
November 11, 20241 yr Author Sure mount: /x: wrong fs type, bad option, bad superblock on /dev/sdf1, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. latest entry in dmesg [69360.257899] BTRFS info (device sdf1): first mount of filesystem c0524add-7ef5-45cf-a8be-f02d1d47a168 [69360.257909] BTRFS info (device sdf1): using crc32c (crc32c-intel) checksum algorithm [69360.257913] BTRFS info (device sdf1): using free space tree [69360.259019] BTRFS error (device sdf1): failed to read chunk root [69360.259196] BTRFS error (device sdf1): open_ctree failed
November 11, 20241 yr Community Expert The pool was not redundant, at least no fully, or it would mount degraded, you can try connecting the previous device, to see if it can be recovered.
November 11, 20241 yr Author Sure, What's the best way to add it back in? Should I remove the new drive, or add the old one in to the pool?
November 11, 20241 yr Community Expert For now, you don't need to add it to the pool, just have it connected, also leave sdf connected, you can disconnect the current cache2
November 11, 20241 yr Author Ok, old disk is in. All 3 SSDs are in the box. Old disk is showing as /dev/sdh It will mount with the degraded option.
November 11, 20241 yr Community Expert After mount, post output from: btrfs fi usage -T /x Or if you are not using /x, type the correct mount point
November 11, 20241 yr Author root@Bigbox:~# btrfs fi usage -T /x Overall: Device size: 2.73TiB Device allocated: 1.82TiB Device unallocated: 931.51GiB Device missing: 931.51GiB Device slack: 0.00B Used: 1.81TiB Free (estimated): 468.49GiB (min: 468.49GiB) Free (statfs, df): 2.73GiB Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated Total Slack -- --------- --------- -------- --------- ----------- --------- ----- 1 missing 248.00GiB 2.00GiB 32.00MiB 681.48GiB 931.51GiB - 2 /dev/sdh1 929.48GiB 2.00GiB 32.00MiB 1.02MiB 931.51GiB - 3 /dev/sdf1 681.48GiB - - 250.03GiB 931.51GiB - -- --------- --------- -------- --------- ----------- --------- ----- Total 929.48GiB 2.00GiB 32.00MiB 931.51GiB 2.73TiB 0.00B Used 926.75GiB 1.69GiB 176.00KiB
November 11, 20241 yr Community Expert So there were 3 members total, if the Samsung drive is not good it may be better manually copy the data from the pool to another place and then recreate it suing the two new devices, if the Samsung should still be OK, you can mount the pool degraded with the GUI using those two drives only, then wait for the balance to finish removing the missing disk, and when there are only two devices remaining, you can replace the Samsung with the new device.
November 11, 20241 yr Author I don't understand the 3 members. THere have never been 3 disks in the array. It's always been a raid 1 mirror, so unless I've screwed up somewhere I have no clue where it came from. Moving the Samsung into the pool in place of the new disk has allowed the pool to mount., but it's been forced into read only mode. I've stopped all docker instances. The Samsung is showing a large number of SMART errors. I'm opting for the copying data to somewhere on the main array and recreating the pool hopefuly correctly this time!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.