[SOLVED] Problem with btrfs cache pool


Recommended Posts

Hi,

 

So I did something stupid :/... I've been using unraid for several years now, never had any issues until now. 

 

I've stared with a single caching disk in the beginning, in the beginning of this year I added 2 caching disk for safety. 

The last few weeks I've experienced some poor performance from the caching pool, the read \ write speeds not going higher then 30 Mb\s.

So I realized that this problem occurs since I've added those 2 extra SSD's for caching, so I thought I've just remove the 2 SSD's en see what that will do with the performance. 

 

So I just stopped my array, unassinged those 2 extra SSD's and started my array again, but the unraid gave the error that the btrfs  filesystem is Unmountable: No file system. So I quickly stopped the array again, put back the original config of those 3 SSD's and try to start the array again.. But yes.. I already fucked up the btrfs filesystem because it is still Unmountable. 

 

cachepool-error.png

 

So then I started searching online how to recover my btrfs cache pool, then I quickly realized that did not enough reading about the btrfs system and its procedures.

What I tried so far, start the array in maintenance mode and try to check \ repair the filesystem. But somehow the btrfs raid is missing those 2 extra devices. (warning, device 2 is missing)

 

I am able to mount the system in degraded,ro mode but the data seems to be corrupt when I try to access or copy it.

Also when I try a restore, it partially restores files but on like 50% of the data I get an error; Trying another mirror ERROR: exhausted mirrors trying to read (3 > 2).

 

In my opinion theoretically it can be fixed because the disks are fine and I did nothing with the disk, but I seem to be stuck here. 

Do you guys have some experience or tips for me? 

 

My cache setup: 

 

Disk 1 : Samsung 840 EVO 250GB 

Disk 2 : Intel SSD SA2CW160 160GB

Disk 3 : Intel SSD SA2CW160 160GB

 

So if you need more info or debugging or logging let me know! 

Link to comment
52 minutes ago, Robb3rt said:

So I quickly stopped the array again, put back the original config of those 3 SSD's and try to start the array again.. But yes.. I already fucked up the btrfs filesystem because it is still Unmountable. 

Doing this deleted the superblock from the other 2 devices, you should have seen a warning "data on these disks will be deleted at array start" or similar, there was a way around this but only if you asked for help earlier, now not much much to do, at least not with normal btrfs recovery options, you might be able to get more advanced help on the brtfs mailing list or #btrfs on IRC.

Link to comment
5 minutes ago, johnnie.black said:

Doing this deleted the superblock from the other 2 devices, you should have seen a warning "data on these disks will be deleted at array start" or similar, there was a way around this but only if you asked for help earlier, now not much much to do, at least not with normal btrfs recovery options, you might be able to get more advanced help on the brtfs mailing list or #btrfs on IRC.

Is the superblock something like a partition? In the past I have some experiance with partition recovery, so can I try just to restore the btrfs \ superblock? 

On my Samsung SSD there is still a btrfs partition, but on the other 2 devices the partition is Linux \ unknown.

Link to comment

Some updates from my side, I've been searching online and found something. On some forums they talked about that on a device there are more then 1 superblocks, so with the command btrfs inspect-internal dump-super -s 1 /dev/sdx1 I've been did found the superblock. 

 

But I don't realy now how to restore it in the correct way.. 

Link to comment
13 hours ago, Robb3rt said:

Hi,

 

So I did something stupid :/... I've been using unraid for several years now, never had any issues until now. 

 

I've stared with a single caching disk in the beginning, in the beginning of this year I added 2 caching disk for safety. 

The last few weeks I've experienced some poor performance from the caching pool, the read \ write speeds not going higher then 30 Mb\s.

So I realized that this problem occurs since I've added those 2 extra SSD's for caching, so I thought I've just remove the 2 SSD's en see what that will do with the performance. 

 

So I just stopped my array, unassinged those 2 extra SSD's and started my array again, but the unraid gave the error that the btrfs  filesystem is Unmountable: No file system. So I quickly stopped the array again, put back the original config of those 3 SSD's and try to start the array again.. But yes.. I already fucked up the btrfs filesystem because it is still Unmountable. 

 

cachepool-error.png

 

So then I started searching online how to recover my btrfs cache pool, then I quickly realized that did not enough reading about the btrfs system and its procedures.

What I tried so far, start the array in maintenance mode and try to check \ repair the filesystem. But somehow the btrfs raid is missing those 2 extra devices. (warning, device 2 is missing)

 

I am able to mount the system in degraded,ro mode but the data seems to be corrupt when I try to access or copy it.

Also when I try a restore, it partially restores files but on like 50% of the data I get an error; Trying another mirror ERROR: exhausted mirrors trying to read (3 > 2).

 

In my opinion theoretically it can be fixed because the disks are fine and I did nothing with the disk, but I seem to be stuck here. 

Do you guys have some experience or tips for me? 

 

My cache setup: 

 

Disk 1 : Samsung 840 EVO 250GB 

Disk 2 : Intel SSD SA2CW160 160GB

Disk 3 : Intel SSD SA2CW160 160GB

 

So if you need more info or debugging or logging let me know! 

Let us know if you get to the bottom of your original problem (cache is very slow with multiple SSD in cache pool). I seem to be running into a similar issue since I added to my cache pool.

Link to comment

So after a few days I was able to remove those 2 Intel SSD's. 

 

First convert the RAID1 to a single disk config ; btrfs balance start -f -dconvert=single -mconvert=single /mnt/cache

Then I was able to remove the first SSD, after the balance was complete I removed the last SSD : btrfs device remove /dev/intel-ssd /mnt/cache

 

The performance seems better, but the performance is not at the point that I've hoped. 

Those 3 SSD's are like 5/6 years old now, so just to be sure I ordered a new SSD.

Link to comment
42 minutes ago, Robb3rt said:

The performance seems better, but the performance is not at the point that I've hoped. 

Those 3 SSD's are like 5/6 years old now, so just to be sure I ordered a new SSD.

Just to check - do you have the Dynamix SSD Trim plugin (or an equivalent script of your own) to trim the SSD at regular intervals?

Link to comment

So I replaced the Samsung SSD with a new Samsung SSD. In the process of converting the old to the new disk, I see a few BTRFS filesystem errors being corrected, also the smart values of the old SSD are not great.

 

I am using the new SSD for a few days now, and it seems a lot better. For the first in a long period I see the read \ write speeds above the 200MB/s.

 

So in my case I think it was just a bad SSD.. :) 

Link to comment
  • 8 months later...

I made a very similar mistake - or at least ended up with the same issue😖

 

However - the procedure above with

btrfs-select-super -s 1 /dev/sdX1 and 
btrfs-select-super -s 1 /dev/sdY1

saved me 😅

 

Thank you so much @Robb3rt for posting here 👌🤘

Link to comment
  • JorgeB changed the title to [SOLVED] Problem with btrfs cache pool

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.