Failed Upgrade from 1TB to 8TB SSDs for cache pool. no partitions exist and unmountable.


Go to solution Solved by JorgeB,

Recommended Posts

Hello, I believe i royally screwed up my migration from 1TB to 8TB cache drives. 

Apppooldisk Samsung_SSD_870_EVO_1TB_S625NJ0R291316Y - 1 TB (sdh) 26 C 0 0 0 btrfs Unmountable: Unsupported or no file system Apppooldisk 2 Samsung_SSD_870_EVO_1TB_S6PTNJ0R512642Z - 1 TB (sdi) 26 C 0 0 0 Unmountable: Unsupported or no file system Pool of two devices 26 C 0 0 0 Boot Device

my steps were all done in the GUI going from this guide. 


stopped array in ui; 
replaced sdi with sdb, 
started array, 
saw btrfs operation
looked okay to me but appeared to complete quickly. 
stopped the array
replaced sdh with sdc
started the array
now giving errors about unmountable filesystem, 
panic and stop array, 
remove sdc with sdh
start array
see unmounatable errors
stop array again
replace sdb with sdi
start array again
see unmountable errors
reboot server
still unmountable errors and zero partitions. 

So basically, my array is started with the old 1TB drives and doesn't appear to mount the filsystem or contain any data. 

Not sure if the actions above wiped out the partitions and my data. Before i proceed to data recovery, I wanted to see if you think there's a way to resolve this? I was looking at the recovery page, however, it doesn't appear to relate to my situation since my partitions are completely gone. 

tower-diagnostics-20230730-1825.zip

Link to comment

 

Thanks for the prompt response! yes, sdi/sdh were original, sdb, sdc are new pool devices.

 

My array is currently shutdown since i'm running ddrescue and testdisk, i do not see an output of the command, after i start the array i will run the command again. 

i've created backups of both disks using ddrescue and was looking at this thread 

 and currently doing a deepsearch of the partitions, i see the primary partition already and hoping that it's possible to repair the partition this way. 

 

ddrescue running on sdh

Scraping failed blocks... (forwards)
     ipos:   18390 MB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   18390 MB, non-scraped:        0 B,  average rate:  27851 kB/s
non-tried:        0 B,  bad-sector:   61167 kB,    error rate:    3072 B/s
  rescued:    1000 GB,   bad areas:     9724,        run time:  9h 58m 30s
pct rescued:   99.99%, read errors:   123690,  remaining time:  7d  8h 59m
                              time since last successful read:         10s


testdisk deepsearch running on sdi

TestDisk 7.1, Data Reco

very Utility, July 2019
Christophe GRENIER <grenier@cgsecurity.org>
https://www.cgsecurity.org

Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63
Analyse cylinder 35285/121600: 29%
Read error at 33360/1/7 (lba=535928469)

  Linux                    0  32 33 121601  80 63 1953523120

 

Thanks for looking into this!

Link to comment

Here is a screenshot of my disks after the array has been started.image.thumb.png.e8decf76ccbdae3bf353cb41293c526b.png

 

TestDisk 7.1, Data Recovery Utility, July 2019
Christophe GRENIER <grenier@cgsecurity.org>
https://www.cgsecurity.org

Disk /dev/sdi - 1000 GB / 931 GiB - CHS 121601 255 63
Analyse cylinder 105745/121600: 86%
Read error at 82729/130/32 (lba=1329049606)

  Linux                    0  32 33 121601  80 63 1953523120
  Linux                35941 137 37 37208 116 28   20353024
  Linux                61096 208 41 62363 187 32   20353024
Invalid NTFS or exFAT boot
 0 D HPFS - NTFS          176135 148 23 301524 101 50 2014371352
  HPFS - NTFS          176135 148 23 301524 101 50 2014371352
  Linux                102076  82 23 103343  61 14   20353024

 

Link to comment

Another update, the testdisk was successful and my filesystem is mounted and i see my files again!
image.thumb.png.afbd319868f7379f4b7ee19f9fb8aa3b.png

 

At this point, i'm copying over all my cache files to one of my array disks just for extra safety. 

Afterwards, i will backup the ddrescue images to another raid in my desktop PC and wipe them. 

I think i'm okay now, however, i'd like to know what thread you would recommend for upgrading my old 1TB SSD's (sdh, sdi) to my 8TB SSD's (sdb, sdc)? This was the latest i could find, do you think this would work for me in my situation? 

 

Link to comment

I am following the prescribed procedure, however, my partitions seem to be deleted again after following these steps; 

Procedure:

 

stop the array

on the main page click on the pool device you want to replace/upgrade and select the new one from the drop down list (any data on the new device will be deleted)

start the array

a btrfs device replace will begin, wait for pool activity to stop, the stop array button will be inhibited during the operation, this can take some time depending on how much data is on the pool and how fast your devices are.

After starting the array, i'm left with unmountable or unsupported or no file system on both drives. 
image.thumb.png.d4142ebd3b89a1f40ecab22939b36ee3.png

Link to comment

Not sure if this is a display bug, i just noticed that my apps are still running without any issues. 

Here is the output of btrfs fi show now. 

 

root@Tower:~# btrfs fi show
Label: none  uuid: a00421a0-fcf7-44a7-b7be-4c1f5558b844
        Total devices 1 FS bytes used 8.92GiB
        devid    1 size 20.00GiB used 16.52GiB path /dev/loop2

warning, device 2 is missing
warning, device 1 is missing
ERROR: cannot read chunk root
Label: none  uuid: 910ac9e5-2ccb-448f-928d-7e05075ab121
        Total devices 3 FS bytes used 539.54GiB
        devid    3 size 931.51GiB used 51.00GiB path /dev/sdh1
        *** Some devices missing

Link to comment

Found some related logs, also attached diagnostics

root@Tower:~# cat /var/log/syslog | grep BTRFS
Aug  1 16:52:30 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 3 transid 953941 /dev/sdh1 scanned by udevd (879)
Aug  1 16:52:30 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 2 transid 953941 /dev/sdi1 scanned by udevd (892)
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1): using crc32c (crc32c-intel) checksum algorithm
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1): allowing degraded mounts
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1): using free space tree
Aug  1 16:52:53 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug  1 16:52:53 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 200, flush 0, corrupt 0, gen 0
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1): enabling ssd optimizations
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1: state M): allowing degraded mounts
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1: state M): turning on async discard
Aug  1 16:52:53 Tower kernel: BTRFS info (device sdi1): relocating block group 1677282246656 flags data|raid1
Aug  1 16:52:55 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 201, flush 0, corrupt 0, gen 0
Aug  1 16:52:58 Tower kernel: BTRFS: device fsid a00421a0-fcf7-44a7-b7be-4c1f5558b844 devid 1 transid 812324 /dev/loop2 scanned by mount (6195)
Aug  1 16:52:58 Tower kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm
Aug  1 16:52:58 Tower kernel: BTRFS info (device loop2): using free space tree
Aug  1 17:02:15 Tower kernel: BTRFS info (device sdh1): using crc32c (crc32c-intel) checksum algorithm
Aug  1 17:02:15 Tower kernel: BTRFS info (device sdh1): allowing degraded mounts
Aug  1 17:02:15 Tower kernel: BTRFS info (device sdh1): usintower-diagnostics-20230801-1715.zipg free space tree
Aug  1 17:02:15 Tower kernel: BTRFS error (device sdh1): failed to read chunk root
Aug  1 17:02:15 Tower kernel: BTRFS error (device sdh1): open_ctree failed
Aug  1 17:02:16 Tower kernel: BTRFS: device fsid a00421a0-fcf7-44a7-b7be-4c1f5558b844 devid 1 transid 812340 /dev/loop2 scanned by mount (24097)
Aug  1 17:02:16 Tower kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm
Aug  1 17:02:16 Tower kernel: BTRFS info (device loop2): using free space tree

Edited by squirtyburger.io
Link to comment

I was able to restore the partition again with testdisk and the array came back after a reboot, I got a different output from the following command; 
 

root@Tower:~# btrfs fi show
Label: none  uuid: 910ac9e5-2ccb-448f-928d-7e05075ab121
        Total devices 3 FS bytes used 539.61GiB
        devid    1 size 0 used 0 path  MISSING
        devid    2 size 931.51GiB used 549.03GiB path /dev/sdi1
        devid    3 size 931.51GiB used 51.00GiB path /dev/sdh1

Label: none  uuid: a00421a0-fcf7-44a7-b7be-4c1f5558b844
        Total devices 1 FS bytes used 8.93GiB
        devid    1 size 20.00GiB used 16.52GiB path /dev/loop2
 

Link to comment

It failed because the pool already had a missing device, so without the other one it had two missing devices:

 

Aug  1 17:02:15 Tower emhttpd: warning, device 2 is missing
Aug  1 17:02:15 Tower emhttpd: warning, device 1 is missing

 

You'd need to fix that first, before attempting another upgrade, reboot and post new diags after array start with the old pool assigned.

 

 

 

Link to comment

Thanks, uploaded files below. Any steps on how to fix? 

 

root@Tower:~# btrfs fi show
Label: none  uuid: 910ac9e5-2ccb-448f-928d-7e05075ab121
        Total devices 3 FS bytes used 539.65GiB
        devid    1 size 0 used 0 path  MISSING
        devid    2 size 931.51GiB used 548.03GiB path /dev/sdi1
        devid    3 size 931.51GiB used 50.00GiB path /dev/sdh1

Label: none  uuid: a00421a0-fcf7-44a7-b7be-4c1f5558b844
        Total devices 1 FS bytes used 8.92GiB
        devid    1 size 20.00GiB used 16.52GiB path /dev/loop2
 

root@Tower:~# cat /var/log/syslog | egrep -i "btrfs|BTRFS"
Aug  2 07:41:04 Tower kernel: Btrfs loaded, crc32c=crc32c-generic, zoned=no, fsverity=no
Aug  2 07:41:04 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 3 transid 955314 /dev/sdh1 scanned by udevd (898)
Aug  2 07:41:04 Tower kernel: BTRFS: device fsid 910ac9e5-2ccb-448f-928d-7e05075ab121 devid 2 transid 955314 /dev/sdi1 scanned by udevd (898)
Aug  2 07:41:27 Tower emhttpd: /sbin/btrfs filesystem show 910ac9e5-2ccb-448f-928d-7e05075ab121 2>&1
Aug  2 07:41:27 Tower emhttpd: shcmd (42): mount -t btrfs -o noatime,space_cache=v2,degraded -U 910ac9e5-2ccb-448f-928d-7e05075ab121 /mnt/apppooldisk
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1): using crc32c (crc32c-intel) checksum algorithm
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1): allowing degraded mounts
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1): using free space tree
Aug  2 07:41:27 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug  2 07:41:27 Tower kernel: BTRFS warning (device sdi1): devid 1 uuid ad06e349-c8bc-4d10-803b-c8057fef3306 is missing
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 202, flush 0, corrupt 0, gen 0
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1): enabling ssd optimizations
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1: state M): allowing degraded mounts
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1: state M): turning on async discard
Aug  2 07:41:27 Tower emhttpd: shcmd (44): /sbin/btrfs device delete missing /mnt/apppooldisk &
Aug  2 07:41:27 Tower kernel: BTRFS info (device sdi1): relocating block group 1677282246656 flags data|raid1
Aug  2 07:41:32 Tower kernel: BTRFS error (device sdi1): bdev /dev/sdi1 errs: wr 0, rd 203, flush 0, corrupt 0, gen 0
Aug  2 07:41:33 Tower kernel: BTRFS: device fsid a00421a0-fcf7-44a7-b7be-4c1f5558b844 devid 1 transid 813895 /dev/loop2 scanned by mount (6187)
Aug  2 07:41:33 Tower kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm
Aug  2 07:41:33 Tower kernel: BTRFS info (device loop2): using free space tree
image.thumb.png.10817727d0adac4d32d2426b8cb939d5.png

tower-diagnostics-20230802-0750.zip

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.