October 2, 20178 yr Hi, I'm following the guide to replace 1 of my 2 SSD cache pool devices which has stopped working. I've taken out the broken drive a nd put the new one in and started the array so I could do a backup of the working cache drive. I did not change any assignments. The new blank drive is /dev/sdj. When I put in the status command, then it says "never started" Diagnostics attached. I can see these entries in the logs. What should I do now? Thanks! Oct 3 00:54:40 Tower kernel: BTRFS info (device sde1): found 4352 extents Oct 3 00:54:40 Tower kernel: BTRFS info (device sde1): relocating block group 2682845265920 flags 17 Oct 3 00:54:43 Tower kernel: BTRFS info (device sde1): found 2792 extents Oct 3 00:55:09 Tower kernel: BTRFS info (device sde1): found 2792 extents Oct 3 00:55:09 Tower kernel: BTRFS info (device sde1): relocating block group 2681771524096 flags 17 Oct 3 00:55:11 Tower kernel: BTRFS info (device sde1): found 4112 extents Oct 3 00:55:39 Tower kernel: BTRFS info (device sde1): found 4112 extents Oct 3 00:55:39 Tower kernel: BTRFS info (device sde1): relocating block group 2680697782272 flags 17 Oct 3 00:55:42 Tower kernel: BTRFS info (device sde1): found 2827 extents root@Tower:/mnt/cache# sfdisk /dev/sdj Welcome to sfdisk (util-linux 2.28.2). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Checking that no-one is using this disk right now ... OK Disk /dev/sdj: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes sfdisk is going to create a new 'dos' disk label. Use 'label: <name>' before you define a first partition to override the default. Type 'help' to get more information. >>> 64 Created a new DOS disklabel with disk identifier 0x20e308b5. Created a new partition 1 of type 'Linux' and of size 931.5 GiB. /dev/sdj1 : 64 1953525167 (931.5G) Linux /dev/sdj2: write New situation: Device Boot Start End Sectors Size Id Type /dev/sdj1 64 1953525167 1953525104 931.5G 83 Linux The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. root@Tower:/mnt/cache# btrfs fi show /mnt/cache Label: none uuid: 6110ce92-70ca-4bac-be5d-15972207af53 Total devices 2 FS bytes used 603.77GiB devid 1 size 931.51GiB used 930.51GiB path /dev/sde1 *** Some devices missing root@Tower:/mnt/cache# btrfs device usage /mnt/cache /dev/sde1, ID: 1 Device size: 931.51GiB Device slack: 0.00B Data,single: 455.47GiB Data,RAID1: 471.00GiB Metadata,single: 2.00GiB Metadata,RAID1: 2.00GiB System,single: 11.00MiB System,RAID1: 32.00MiB Unallocated: 1.00GiB missing, ID: 2 Device size: 0.00B Device slack: 16.00EiB Data,RAID1: 471.00GiB Metadata,RAID1: 2.00GiB System,RAID1: 32.00MiB Unallocated: 458.48GiB root@Tower:/mnt/cache# btrfs replace start 2 /dev/sdj1 /mnt/cache root@Tower:/mnt/cache# btrfs replace status /mnt/cache Never started root@Tower:/mnt/cache# btrfs replace status /mnt/cache tower-diagnostics-20171003-0053.zip
October 3, 20178 yr Community Expert You need to wait for the current balance (going from 2 to 1 devices) to finish before you can add the new device. Then you'll need to add the new device instead of replacing.
October 3, 20178 yr Author Thanks - new one is adding now. Why did it start a balance on its own? Should I have done anything differently? Cheers!
October 3, 20178 yr Community Expert 9 minutes ago, al_uk said: Why did it start a balance on its own? When you start the array with a missing device it will be deleted and the pool rebalanced to the single remaining device. 10 minutes ago, al_uk said: Should I have done anything differently? If you have enough ports you could've done a direct replacement without removing the old device, but the end result will be the same.
October 3, 20178 yr Author ok - that clears that up thanks! I have enough ports, however the SSD failed, and was not being detected. Even on the balancing that is happening now there are lots of writes to the remaining working SSD. My worry with all this balancing, is that this does lots of writes to the remaining SSD, which could then fail. Should I be concerned? sde is the remaining working SSD sdf is the new blank SSD that is being balanced at the moment. Edited October 3, 20178 yr by al_uk
October 3, 20178 yr Community Expert 4 minutes ago, al_uk said: Should I be concerned? No, that's normal.
October 5, 20178 yr Author Just to close this off, all completed successfully. Thanks for the replies. So that is a full recovery with no data loss from a catastrophically failed SSD. It failed after a reboot where it was not detected by the controller. This SSD is a 1TB Samsung 850 EVO which is just under 2 years old, and for the 1st year I was writing CCTV to it at 5MB per second. I think I've exceeded the write cycle warranty. Does the FAQ need updating to show that the balance will start automatically? Label: none uuid: 6110ce92-70ca-4bac-be5d-15972207af53 Total devices 2 FS bytes used 602.05GiB devid 1 size 931.51GiB used 800.03GiB path /dev/sde1 devid 2 size 931.51GiB used 800.03GiB path /dev/sdf1 Data, RAID1: total=796.00GiB, used=600.93GiB System, RAID1: total=32.00MiB, used=144.00KiB Metadata, RAID1: total=4.00GiB, used=1.12GiB GlobalReserve, single: total=512.00MiB, used=0.00B
October 6, 20178 yr Community Expert 8 hours ago, al_uk said: Does the FAQ need updating to show that the balance will start automatically? It does need some updating specially because v6.4rc8 includes several cache enhancements, I'm just waiting for v6.4 stable to update it.
Archived
This topic is now archived and is closed to further replies.