Failed Upgrade from 1TB to 8TB SSDs for cache pool. no partitions exist and unmountable.

July 31, 20232 yr

Hello, I believe i royally screwed up my migration from 1TB to 8TB cache drives.

Apppooldisk Samsung_SSD_870_EVO_1TB_S625NJ0R291316Y - 1 TB (sdh) 26 C 0 0 0 btrfs Unmountable: Unsupported or no file system Apppooldisk 2 Samsung_SSD_870_EVO_1TB_S6PTNJ0R512642Z - 1 TB (sdi) 26 C 0 0 0 Unmountable: Unsupported or no file system Pool of two devices 26 C 0 0 0 Boot Device

my steps were all done in the GUI going from this guide.

stopped array in ui;
replaced sdi with sdb,
started array,
saw btrfs operation
looked okay to me but appeared to complete quickly.
stopped the array
replaced sdh with sdc
started the array
now giving errors about unmountable filesystem,
panic and stop array,
remove sdc with sdh
start array
see unmounatable errors
stop array again
replace sdb with sdi
start array again
see unmountable errors
reboot server
still unmountable errors and zero partitions.

So basically, my array is started with the old 1TB drives and doesn't appear to mount the filsystem or contain any data.

Not sure if the actions above wiped out the partitions and my data. Before i proceed to data recovery, I wanted to see if you think there's a way to resolve this? I was looking at the recovery page, however, it doesn't appear to relate to my situation since my partitions are completely gone.

tower-diagnostics-20230730-1825.zip

Quote

July 31, 20232 yr

Community Expert

Post the output of:

btrfs fi show

sdi and sdh were the original pool devices?

Quote

August 1, 20232 yr

Author

Thanks for the prompt response! yes, sdi/sdh were original, sdb, sdc are new pool devices.

My array is currently shutdown since i'm running ddrescue and testdisk, i do not see an output of the command, after i start the array i will run the command again.

i've created backups of both disks using ddrescue and was looking at this thread

and currently doing a deepsearch of the partitions, i see the primary partition already and hoping that it's possible to repair the partition this way.

ddrescue running on sdh

Scraping failed blocks... (forwards)
     ipos:   18390 MB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   18390 MB, non-scraped:        0 B,  average rate:  27851 kB/s
non-tried:        0 B,  bad-sector:   61167 kB,    error rate:    3072 B/s
  rescued:    1000 GB,   bad areas:     9724,        run time:  9h 58m 30s
pct rescued:   99.99%, read errors:   123690,  remaining time:  7d  8h 59m
                              time since last successful read:         10s

testdisk deepsearch running on sdi

TestDisk 7.1, Data Reco

very Utility, July 2019
Christophe GRENIER <grenier@cgsecurity.org>
https://www.cgsecurity.org

Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63
Analyse cylinder 35285/121600: 29%
Read error at 33360/1/7 (lba=535928469)

  Linux                    0  32 33 121601  80 63 1953523120

Thanks for looking into this!

Quote

August 1, 20232 yr

Author

Here is a screenshot of my disks after the array has been started.

TestDisk 7.1, Data Recovery Utility, July 2019
Christophe GRENIER <grenier@cgsecurity.org>
https://www.cgsecurity.org

Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63
Analyse cylinder 105745/121600: 86%
Read error at 82729/130/32 (lba=1329049606)

  Linux                    0  32 33 121601  80 63 1953523120
  Linux                35941 137 37 37208 116 28   20353024
  Linux                61096 208 41 62363 187 32   20353024
Invalid NTFS or exFAT boot
 0 D HPFS - NTFS          176135 148 23 301524 101 50 2014371352
  HPFS - NTFS          176135 148 23 301524 101 50 2014371352
  Linux                102076  82 23 103343  61 14   20353024

Quote

August 1, 20232 yr

Author

Another update, the testdisk was successful and my filesystem is mounted and i see my files again!

At this point, i'm copying over all my cache files to one of my array disks just for extra safety.

Afterwards, i will backup the ddrescue images to another raid in my desktop PC and wipe them.

I think i'm okay now, however, i'd like to know what thread you would recommend for upgrading my old 1TB SSD's (sdh, sdi) to my 8TB SSD's (sdb, sdc)? This was the latest i could find, do you think this would work for me in my situation?

Quote

August 1, 20232 yr

Community Expert

The procedure is here but don't use the failing device, even with the ddrescue clone it may fail since there may be checksum errors.

Quote

August 2, 20232 yr

Author

I am following the prescribed procedure, however, my partitions seem to be deleted again after following these steps;

Procedure:

stop the array

on the main page click on the pool device you want to replace/upgrade and select the new one from the drop down list (any data on the new device will be deleted)

start the array

a btrfs device replace will begin, wait for pool activity to stop, the stop array button will be inhibited during the operation, this can take some time depending on how much data is on the pool and how fast your devices are.

After starting the array, i'm left with unmountable or unsupported or no file system on both drives.

Quote

August 2, 20232 yr

Author

Not sure if this is a display bug, i just noticed that my apps are still running without any issues.

Here is the output of btrfs fi show now.

root@Tower:~# btrfs fi show
Label: none uuid: a00421a0-fcf7-44a7-b7be-4c1f5558b844
Total devices 1 FS bytes used 8.92GiB
devid 1 size 20.00GiB used 16.52GiB path /dev/loop2

warning, device 2 is missing
warning, device 1 is missing
ERROR: cannot read chunk root
Label: none uuid: 910ac9e5-2ccb-448f-928d-7e05075ab121
Total devices 3 FS bytes used 539.54GiB
devid 3 size 931.51GiB used 51.00GiB path /dev/sdh1
*** Some devices missing

Quote

August 2, 20232 yr

Author

Found some related logs, also attached diagnostics.

root@Tower:~# cat /var/log/syslog | grep BTRFS
Aug 1 16:52:30 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 3 transid 953941 /dev/sdh1 scanned by udevd (879)
Aug 1 16:52:30 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 2 transid 953941 /dev/sdi1 scanned by udevd (892)
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1): using crc32c (crc32c-intel) checksum algorithm
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1): allowing degraded mounts
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1): using free space tree
Aug 1 16:52:53 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug 1 16:52:53 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 200, flush 0, corrupt 0, gen 0
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1): enabling ssd optimizations
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1: state M): allowing degraded mounts
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1: state M): turning on async discard
Aug 1 16:52:53 Tower kernel: BTRFS info (device sdi1): relocating block group 1677282246656 flags data|raid1
Aug 1 16:52:55 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 201, flush 0, corrupt 0, gen 0
Aug 1 16:52:58 Tower kernel: BTRFS: device fsid a00421a0-fcf7-44a7-b7be-4c1f5558b844 devid 1 transid 812324 /dev/loop2 scanned by mount (6195)
Aug 1 16:52:58 Tower kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm
Aug 1 16:52:58 Tower kernel: BTRFS info (device loop2): using free space tree
Aug 1 17:02:15 Tower kernel: BTRFS info (device sdh1): using crc32c (crc32c-intel) checksum algorithm
Aug 1 17:02:15 Tower kernel: BTRFS info (device sdh1): allowing degraded mounts
Aug 1 17:02:15 Tower kernel: BTRFS info (device sdh1): usintower-diagnostics-20230801-1715.zipg free space tree
Aug 1 17:02:15 Tower kernel: BTRFS error (device sdh1): failed to read chunk root
Aug 1 17:02:15 Tower kernel: BTRFS error (device sdh1): open_ctree failed
Aug 1 17:02:16 Tower kernel: BTRFS: device fsid a00421a0-fcf7-44a7-b7be-4c1f5558b844 devid 1 transid 812340 /dev/loop2 scanned by mount (24097)
Aug 1 17:02:16 Tower kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm
Aug 1 17:02:16 Tower kernel: BTRFS info (device loop2): using free space tree

Edited August 2, 20232 yr by squirtyburger.io

Quote

August 2, 20232 yr

Author

I was able to restore the partition again with testdisk and the array came back after a reboot, I got a different output from the following command;

root@Tower:~# btrfs fi show
Label: none uuid: 910ac9e5-2ccb-448f-928d-7e05075ab121
Total devices 3 FS bytes used 539.61GiB
devid 1 size 0 used 0 path MISSING
devid 2 size 931.51GiB used 549.03GiB path /dev/sdi1
devid 3 size 931.51GiB used 51.00GiB path /dev/sdh1

Label: none uuid: a00421a0-fcf7-44a7-b7be-4c1f5558b844
Total devices 1 FS bytes used 8.93GiB
devid 1 size 20.00GiB used 16.52GiB path /dev/loop2

Quote

August 2, 20232 yr

Community Expert

It failed because the pool already had a missing device, so without the other one it had two missing devices:

Aug  1 17:02:15 Tower emhttpd: warning, device 2 is missing
Aug  1 17:02:15 Tower emhttpd: warning, device 1 is missing

You'd need to fix that first, before attempting another upgrade, reboot and post new diags after array start with the old pool assigned.

Quote

August 2, 20232 yr

Author

Thanks, uploaded files below. Any steps on how to fix?

root@Tower:~# btrfs fi show
Label: none uuid: 910ac9e5-2ccb-448f-928d-7e05075ab121
Total devices 3 FS bytes used 539.65GiB
devid 1 size 0 used 0 path MISSING
devid 2 size 931.51GiB used 548.03GiB path /dev/sdi1
devid 3 size 931.51GiB used 50.00GiB path /dev/sdh1

Label: none uuid: a00421a0-fcf7-44a7-b7be-4c1f5558b844
Total devices 1 FS bytes used 8.92GiB
devid 1 size 20.00GiB used 16.52GiB path /dev/loop2

root@Tower:~# cat /var/log/syslog | egrep -i "btrfs|BTRFS"
Aug 2 07:41:04 Tower kernel: Btrfs loaded, crc32c=crc32c-generic, zoned=no, fsverity=no
Aug 2 07:41:04 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 3 transid 955314 /dev/sdh1 scanned by udevd (898)
Aug 2 07:41:04 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 2 transid 955314 /dev/sdi1 scanned by udevd (898)
Aug 2 07:41:27 Tower emhttpd: /sbin/btrfs filesystem show 910ac9e5-2ccb-448f-928d-7e05075ab121 2>&1
Aug 2 07:41:27 Tower emhttpd: shcmd (42): mount -t btrfs -o noatime,space_cache=v2,degraded -U 910ac9e5-2ccb-448f-928d-7e05075ab121 /mnt/apppooldisk
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1): using crc32c (crc32c-intel) checksum algorithm
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1): allowing degraded mounts
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1): using free space tree
Aug 2 07:41:27 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug 2 07:41:27 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 202, flush 0, corrupt 0, gen 0
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1): enabling ssd optimizations
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1: state M): allowing degraded mounts
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1: state M): turning on async discard
Aug 2 07:41:27 Tower emhttpd: shcmd (44): /sbin/btrfs device delete missing /mnt/apppooldisk &
Aug 2 07:41:27 Tower kernel: BTRFS info (device sdi1): relocating block group 1677282246656 flags data|raid1
Aug 2 07:41:32 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 203, flush 0, corrupt 0, gen 0
Aug 2 07:41:33 Tower kernel: BTRFS: device fsid a00421a0-fcf7-44a7-b7be-4c1f5558b844 devid 1 transid 813895 /dev/loop2 scanned by mount (6187)
Aug 2 07:41:33 Tower kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm
Aug 2 07:41:33 Tower kernel: BTRFS info (device loop2): using free space tree

tower-diagnostics-20230802-0750.zip

Quote

August 2, 20232 yr

Community Expert
Solution

It's failing to delete the missing device because sdi is failing, you will need to copy anything you can manually then create the new pool with the larger devices and restore the data.

Quote

August 3, 20232 yr

Author

i've copied everything to my array disks, i assume after creating the new pool i can copy over as-is. Can i create a pool with xfs this time? or do you recommend sticking with btrfs? Haven't had an issue up until now, but i've heard xfs is more stable. thanks for your help Jorge!!

Quote

August 3, 20232 yr

Community Expert

5 hours ago, squirtyburger.io said:

Can i create a pool with xfs this time?

Multi device pools can only be btrfs or since v6.12 also zfs, zfs is usually more robust.

Quote

August 15, 20232 yr

Author

Thanks again for your help Jorge!

Quote

1

Failed Upgrade from 1TB to 8TB SSDs for cache pool. no partitions exist and unmountable.

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)