Endy Posted December 12, 2023 Share Posted December 12, 2023 Generally I have a rule that I don't work on things when I am really tired so that I don't make stupid mistakes. Last night I was really tired and ignored that rule and now I am paying the price. I just swapped motherboard/processor/ram for my Unraid server yesterday and that went well. I also added 2 nvme drives to use for my cache pool to replace the 1 nvme drive I had been using so that I would have a little redundancy there. (I know it's not a proper backup, I am actually in the process of setting up a more proper backup system.) I added 1 of the new nvme drives to the cache pool and that seemed fine and it balanced it. This is where I should have stopped and waited until after I had slept. I was impatient and I thought it was a good idea to just swap the old nvme drive for the new one in the gui directly. Bad idea. Btrfs pool now gone. Turned off the server, went to bed, and now here I am the next day. I've tried the restore options in the FAQ up until the last option. Currently I have the original nvme drive and the one I added 1st yesterday in the cache pool like it was when it was working before I made the mistake. I have a feeling that I am screwed, but is there any way to restore from here? Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 Please post the diagnostics after array start and the output of: btrfs fi show Quote Link to comment
Endy Posted December 12, 2023 Author Share Posted December 12, 2023 Label: none uuid: 849f7ded-6fbb-4f5f-9627-fa781a175567 Total devices 2 FS bytes used 714.96GiB devid 1 size 931.51GiB used 826.48GiB path /dev/sdd1 devid 2 size 931.51GiB used 826.48GiB path /dev/sdb1 Label: none uuid: db14842a-e91c-4154-bdd8-fbe89dfc7ce3 Total devices 1 FS bytes used 340.00KiB devid 1 size 20.00GiB used 536.00MiB path /dev/loop2 Label: none uuid: 6331b51d-add2-421b-a015-22c674718eb5 Total devices 1 FS bytes used 412.00KiB devid 1 size 1.00GiB used 126.38MiB path /dev/loop3 sdd and sdb are part of another btrfs pool not having to do with cache. turtle-diagnostics-20231212-1040.zip Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 No valid btrfs filesystem on the NVMe devices, suggesting they were wiped or fully trimmed, post the output of: fdisk -l nvme2n1 fdisk -l nvme0n1 Quote Link to comment
Endy Posted December 12, 2023 Author Share Posted December 12, 2023 root@Turtle:~# fdisk -l nvme2n1 fdisk: cannot open nvme2n1: No such file or directory root@Turtle:~# fdisk -l nvme0n1 fdisk: cannot open nvme0n1: No such file or directory Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 Sorry, should be : fdisk -l /dev/nvme2n1 fdisk -l /dev/nvme0n1 Quote Link to comment
Endy Posted December 12, 2023 Author Share Posted December 12, 2023 root@Turtle:~# fdisk -l /dev/nvme2n1 Disk /dev/nvme2n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: CT1000P1SSD8 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes root@Turtle:~# fdisk -l /dev/nvme0n1 Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors Disk model: WD_BLACK SN850X 2000GB Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 That confirms no partition exists, if they were wiped with wipefs it may still be recoverable, if a full device trim was done with blkdiscard it won't, you can try this, type: sfdisk /dev/nvme2n1 then type 2048 and hit return and post a screenshot of the results. Quote Link to comment
Endy Posted December 12, 2023 Author Share Posted December 12, 2023 root@Turtle:~# sfdisk /dev/nvme2n1 Welcome to sfdisk (util-linux 2.38.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Checking that no-one is using this disk right now ... OK Disk /dev/nvme2n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: CT1000P1SSD8 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes sfdisk is going to create a new 'dos' disk label. Use 'label: <name>' before you define a first partition to override the default. Type 'help' to get more information. >>> 2048 Created a new DOS disklabel with disk identifier 0xfc45e933. Created a new partition 1 of type 'Linux' and of size 931.5 GiB. /dev/nvme2n1p1 : 2048 1953525167 (931.5G) Linux /dev/nvme2n1p2: Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 Hit Control + C to abort and repeat but this time with 64 sfdisk /dev/nvme2n1 then type 64 and hit return and post a screenshot of the results. Quote Link to comment
Endy Posted December 12, 2023 Author Share Posted December 12, 2023 root@Turtle:~# sfdisk /dev/nvme2n1 Welcome to sfdisk (util-linux 2.38.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Checking that no-one is using this disk right now ... OK Disk /dev/nvme2n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk model: CT1000P1SSD8 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes sfdisk is going to create a new 'dos' disk label. Use 'label: <name>' before you define a first partition to override the default. Type 'help' to get more information. >>> 64 Created a new DOS disklabel with disk identifier 0xe90b059d. Created a new partition 1 of type 'Linux' and of size 931.5 GiB. Partition #1 contains a btrfs signature. Do you want to remove the signature? [Y]es/[N]o: Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 Type N plus return to keep the signature, then type w and hit return to save, then post the output of btrfs fi show Quote Link to comment
Endy Posted December 12, 2023 Author Share Posted December 12, 2023 It's not letting me do the save part. Do you want to remove the signature? [Y]es/[N]o: N /dev/nvme2n1p1 : 64 1953525167 (931.5G) Linux /dev/nvme2n1p2: w unsupported command /dev/nvme2n1p2: W unsupported command /dev/nvme2n1p2: Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 The write command is still inside sfdisk, no outside, but you need to actually type write and enter, not just w, sorry, you can start again, see here for step by step: https://forums.unraid.net/topic/141033-lost-drive-after-creating-pool-v612/?do=findComment&comment=1276199 Quote Link to comment
Endy Posted December 12, 2023 Author Share Posted December 12, 2023 (edited) That worked. /dev/nvme2n1p2: write New situation: Disklabel type: dos Disk identifier: 0xe90b059d Device Boot Start End Sectors Size Id Type /dev/nvme2n1p1 64 1953525167 1953525104 931.5G 83 Linux The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. root@Turtle:~# btrfs fi show Label: none uuid: 849f7ded-6fbb-4f5f-9627-fa781a175567 Total devices 2 FS bytes used 714.96GiB devid 1 size 931.51GiB used 826.48GiB path /dev/sdd1 devid 2 size 931.51GiB used 826.48GiB path /dev/sdb1 Label: none uuid: db14842a-e91c-4154-bdd8-fbe89dfc7ce3 Total devices 1 FS bytes used 340.00KiB devid 1 size 20.00GiB used 536.00MiB path /dev/loop2 Label: none uuid: 6331b51d-add2-421b-a015-22c674718eb5 Total devices 1 FS bytes used 412.00KiB devid 1 size 1.00GiB used 126.38MiB path /dev/loop3 warning, device 2 is missing Label: none uuid: 9bceeb35-e26e-47d8-9ef3-3534abbaa204 Total devices 2 FS bytes used 383.72GiB devid 1 size 931.51GiB used 884.05GiB path /dev/nvme2n1p1 *** Some devices missing *edited to show the part after 'write' as well Edited December 12, 2023 by Endy Quote Link to comment
Solution JorgeB Posted December 13, 2023 Solution Share Posted December 13, 2023 I assume nvme2n1, the Crucial device, was the original cache? If yes we can try and mount it alone, if needed you can then still try to recover the other device. To mount that device alone, unassign both pool members from the pool, start array, stop array, re-assign only the Crucial device, start array, post new diags. Quote Link to comment
Endy Posted December 13, 2023 Author Share Posted December 13, 2023 It's alive! As far as I can tell, everything seems to be there. Docker started up and seems to be working. Am I out of the woods now? If so, because I plan on just using the 2 new drives for cache and removing the Crucial drive, next step would be to temporarily move all data off of the cache pool? Then delete and recreate the cache pool using just the 2 new drives and then move the data back? turtle-diagnostics-20231213-0735.zip Quote Link to comment
JorgeB Posted December 13, 2023 Share Posted December 13, 2023 Everything looks good for now but the pool is still balancing, when the pool activity stops post new diags to confirm all is OK, if yes you can then assign one of the new devices to the pool. Quote Link to comment
Endy Posted December 13, 2023 Author Share Posted December 13, 2023 Ok, it finished. If I am planning on removing the Crucial drive, do I want/need to add one of the new drives? Just want to make sure I don't mess up again. Also, thank you so much for your help JorgeB. While I wouldn't have lost any irreplaceable data, you have saved me from countless hours recreating and setting everything up. I truly appreciate it. turtle-diagnostics-20231213-0936.zip Quote Link to comment
JorgeB Posted December 13, 2023 Share Posted December 13, 2023 Dec 13 09:35:19 Turtle kernel: BTRFS error (device nvme2n1p1): bdev /dev/nvme2n1p1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 Dec 13 09:35:33 Turtle kernel: BTRFS info (device nvme2n1p1): balance: ended with status: -5 Balance didn't finish because some data corruption was detected, possibly also why you had issues before adding the device, in this case you should not add the new device, you can create a new pool with the new device(s), copy everything you can then remove that device. 23 minutes ago, Endy said: Also, thank you so much for your help JorgeB. Your welcome. Quote Link to comment
JorgeB Posted December 13, 2023 Share Posted December 13, 2023 Forgot to mention, it would also be a good idea to run memtest, just to make sure there are no obvious RAM issues. Quote Link to comment
Endy Posted December 13, 2023 Author Share Posted December 13, 2023 Hopefully small hiccup... I stopped the array and created a new pool for the 2 new drives and when I tried to start the array I got a message saying "Wrong Pool State cache - too many missing/wrong devices" I didn't touch the original cache pool. I tried deleting the new pool and same message. Quote Link to comment
JorgeB Posted December 13, 2023 Share Posted December 13, 2023 Post new diags, devices may need to be wiped first. Quote Link to comment
Endy Posted December 13, 2023 Author Share Posted December 13, 2023 New diagnostics turtle-diagnostics-20231213-1125.zip Quote Link to comment
JorgeB Posted December 13, 2023 Share Posted December 13, 2023 1 hour ago, Endy said: I tried deleting the new pool and same message. Missed this part, so this is about the old pool, possibly because the missing device failed to be removed, try re-importin the pool again, unassign the old cache device, assign the new ones to a new pool now, because you may have the same issue later after an array stop, start array, stop array, re-assign old cache, start array. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.