hawgorn Posted July 14, 2023 Share Posted July 14, 2023 Short version: I had a bad SMART readings on my cache 500GB SSD (A-drive), and decided to replace both drives with 1TB drives. I replaced the bad drive with a new one, and the pool (apparently) rebuilded. But when I replaced the second drive (B-drive) the result was “Unmountable: Too many missing/misplaced device” on both drives. I managed to return to working pool by replacing the second original drive (B) back [1]. But now, after a lot of fiddling around with the old and new cache drives (yeah, right… bad move) I have “Unmountable: No pool uuid” on both of the old drives in place. Am I foobar? Long version: After I got the pool working [1, above] I decided to play safe and try moving everything to array first. The mover did not move everything, I’d say that VM’s and Docker did not move (yes, they were shutdown and unenabled in settings). I tried copying the content of cache to a backup-share with Dynamix file manager. I let it run overnight, but when I went to bed it copying status was like 120% and growing?? That copy process did not work, all of cache was still not copied (when comparing Calculate Occupied space). I then tried using MC, straight to disk where Mover had moved some files, but it stalled on several VM-images and there was also some errors (didn’t record the error, silly me). At this stage I still had a working cache pool by using both of the original drives. I then cleared both partition tables of the 1TB disks with gparted on a linux machine , and I then tried the way of the FAQ to replace the drives. I put one new disk to a spare sata port and assigned it to cache to but received “Unmountable: Too many missing/misplaced device” straight away. I then tried clearing the cache config, changing different drive combinations (but always keeping old A & B-drives in their respective slots). Nothing worked, either I got “too many missing..” or “No pool uuid”. And now I have both of the old cache drives in place with “Unmountable: No pool uuid”. How screwed am I? I do have recent backups of everything (appdata, docker.img, VM’s), should I just give in, restore from those, or can the cache be saved? ziggy-nas-diagnostics-20230714-1326.zip Quote Link to comment
JorgeB Posted July 14, 2023 Share Posted July 14, 2023 Post the output of btrfs fi show sdb and sdd were the original pool members correct? Quote Link to comment
hawgorn Posted July 14, 2023 Author Share Posted July 14, 2023 Yes, sdb and sdd were the orinal pool members. ---------- warning, device 4 is missing warning, device 3 is missing ERROR: cannot read chunk root Label: none uuid: f7747906-ca16-4858-b91f-8f600a5b448d Total devices 3 FS bytes used 108.99GiB devid 5 size 476.94GiB used 17.00GiB path /dev/sdb1 *** Some devices missing Quote Link to comment
JorgeB Posted July 14, 2023 Share Posted July 14, 2023 Pool has 2 missing devices, I understood that original pool was 2 devices? Post output of btrfs-select-super -s 1 /dev/sdd1 And again btrfs fi show Quote Link to comment
hawgorn Posted July 14, 2023 Author Share Posted July 14, 2023 11 minutes ago, JorgeB said: btrfs-select-super -s 1 /dev/sdd1 warning, device 4 is missing using SB copy 1, bytenr 67108864 11 minutes ago, JorgeB said: btrfs fi show warning, device 4 is missing Label: none uuid: f7747906-ca16-4858-b91f-8f600a5b448d Total devices 3 FS bytes used 108.99GiB devid 3 size 465.76GiB used 153.03GiB path /dev/sdd1 devid 5 size 476.94GiB used 17.00GiB path /dev/sdb1 *** Some devices missing ---------------- Original pool was 2 devices: samsung EVO 500GB and Pro 512GB Quote Link to comment
JorgeB Posted July 14, 2023 Share Posted July 14, 2023 48 minutes ago, hawgorn said: Original pool was 2 devices: Then something went wrong with the replacement, because it now thinks there should be three devices, lets see if it mounts with these alone two but doubt it, since it doesn't look like a fully redundant pool based on data distribution: stop array unassign all pool devices start array to reset the pool stop array reassign both pool devices start array and post new diags. Quote Link to comment
hawgorn Posted July 14, 2023 Author Share Posted July 14, 2023 I'll be damned, the two original disks came online as a pool. And I did the same procedure once or twice before calling for help. Third time is the charm? Here's the new diags. ziggy-nas-diagnostics-20230714-2041.zip Quote Link to comment
JorgeB Posted July 14, 2023 Share Posted July 14, 2023 Not out of the woods yet: Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated Total Slack -- --------- --------- --------- -------- ----------- --------- ----- 3 /dev/sdd1 150.00GiB 3.00GiB 32.00MiB 312.73GiB 465.76GiB - 4 missing 133.00GiB 3.00GiB 32.00MiB 340.91GiB 476.94GiB - 5 /dev/sdb1 17.00GiB - - 459.94GiB 476.94GiB - sdd has most of the data and it appears to be failing, lets see if it can finish the removing the missing device. Quote Link to comment
hawgorn Posted July 14, 2023 Author Share Posted July 14, 2023 31 minutes ago, JorgeB said: Not out of the woods yet: Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated Total Slack -- --------- --------- --------- -------- ----------- --------- ----- 3 /dev/sdd1 150.00GiB 3.00GiB 32.00MiB 312.73GiB 465.76GiB - 4 missing 133.00GiB 3.00GiB 32.00MiB 340.91GiB 476.94GiB - 5 /dev/sdb1 17.00GiB - - 459.94GiB 476.94GiB - sdd has most of the data and it appears to be failing, lets see if it can finish the removing the missing device. sdb is/was the one that SMART outed. How should I proceed, what do you suggest? Quote Link to comment
JorgeB Posted July 14, 2023 Share Posted July 14, 2023 For now wait some time to see if it can finish the balance. Quote Link to comment
hawgorn Posted July 15, 2023 Author Share Posted July 15, 2023 No balancing happening, stll sitting at same numbers. And you are right, sdd IS the failing one, I misread. Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated Total Slack -- --------- --------- --------- -------- ----------- --------- ----- 3 /dev/sdd1 150.00GiB 3.00GiB 32.00MiB 312.73GiB 465.76GiB - 4 missing 133.00GiB 3.00GiB 32.00MiB 340.91GiB 476.94GiB - 5 /dev/sdb1 17.00GiB - - 459.94GiB 476.94GiB - -- --------- --------- --------- -------- ----------- --------- ----- Quote Link to comment
JorgeB Posted July 15, 2023 Share Posted July 15, 2023 Best option might be trying to copy anything you can from the pool and then recreate. Quote Link to comment
hawgorn Posted July 15, 2023 Author Share Posted July 15, 2023 Would adding a third drive to the pool and hoping it would balance be asking for more headache? Quote Link to comment
JorgeB Posted July 15, 2023 Share Posted July 15, 2023 It won't help, the problem is that the failed drive cannot rebuild the new drive correctly due to read errors, the same would happen if you added a 3rd. Quote Link to comment
hawgorn Posted July 15, 2023 Author Share Posted July 15, 2023 2 hours ago, JorgeB said: Best option might be trying to copy anything you can from the pool and then recreate. ok, thanks. How is this for a plan? -delete old pool -create new pool with the new disks. -point cache shares to the new pool cache shares are are now cache:yes, so everything is in theory in array -copy backups to cache shares in array -change cache:prefer -invoke mover -cross fingers Quote Link to comment
JorgeB Posted July 15, 2023 Share Posted July 15, 2023 If I'm understanding correctly what you mean it should work. 1 Quote Link to comment
hawgorn Posted July 15, 2023 Author Share Posted July 15, 2023 Jorge, you are awesome, thank you!! I'm up and running again. Two of the Debian VM's don't work, but atleast one of thme should just be a matter of fixing grub. 1 Quote Link to comment
hawgorn Posted July 18, 2023 Author Share Posted July 18, 2023 Apparently I'm still not out of the woods. Last night I started working on the VM's. I got "cache is full or read only". I let the server sit for the night. This morning the primary cache drive was in standby mode. I stopped the array, changed spin down delay to 'never'. After starting array the cache disko were 'unmountable: no file system'. Clearing pool config did not work. ziggy-nas-diagnostics-20230718-0944.zip Quote Link to comment
JorgeB Posted July 18, 2023 Share Posted July 18, 2023 Is cache the new pool or the old one? Try running btrfs rescue zero-log /dev/sdb1 Then re-start array and post new diags. Quote Link to comment
hawgorn Posted July 18, 2023 Author Share Posted July 18, 2023 It's the new pool. Clearing the log brought the pool back online. Here's the new diags after re-starting array. ziggy-nas-diagnostics-20230718-1218.zip Quote Link to comment
JorgeB Posted July 18, 2023 Share Posted July 18, 2023 Not normal for the pool to have issues so soon, so there might be an underlying issue, but now looks again. Quote Link to comment
hawgorn Posted July 18, 2023 Author Share Posted July 18, 2023 14 minutes ago, JorgeB said: Not normal for the pool to have issues so soon, so there might be an underlying issue, but now looks again. I agree. I was planning to change the innards to a new chassis soon, but might now just buy a new MB, CPU and memory to install to the new chassis (as I'm running intel 4th and DDR3 now). Quote Link to comment
hawgorn Posted July 18, 2023 Author Share Posted July 18, 2023 Working on the pfsense xml brought the pool/disks down again, and clearing the btrfs log made them online. Could this be a memory problem? Maybe I'll run memtest later this week. Quote Link to comment
JorgeB Posted July 18, 2023 Share Posted July 18, 2023 Could be, but often when you get this log tree issue, it re-occurs, so run memtest but might still be a good idea to re-create the pool anyway. Quote Link to comment
hawgorn Posted July 18, 2023 Author Share Posted July 18, 2023 3 hours ago, JorgeB said: Could be, but often when you get this log tree issue, it re-occurs, so run memtest but might still be a good idea to re-create the pool anyway. Ok, I'll do that. For future reference and as I'd like to learn a bit, what might be causing this log tree issue, and what would I be looking for in the diags? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.