How to "Un-raid" a cache pool?


Recommended Posts

Hey, so i had my cache drive start failing almost immediately after I installed a brand new drive and set up a BTRFS raid 1 pool. SO I was under the impression that this meant the data is wholly complete on both drives, however, the I pull the bad drive, the array won't start, I tried putting the good new drive in the pool alone but it still uses the old drive and wont start without it plugged in. Even though unraid doesn't have the old drive mounted, for some reason it is still using it in the BTRFS RAID1 and I don't know what to do to make my cache pool a single drive using ONLY the new drive alone.

 

IMO, the GUI lets me set it as a single drive again so it seems like a pretty big oversight that I can't un-raid the pool again. I can assume I need to do some terminal FU to fix this but I don't know what to do for that. Oh and the old drive is in right now and it does boot up and work temporarily but once it starts having write errors the whole server freezes and since the server runs my pfSense VM for my internet, I work from home, this is a major issue I must have fixed as immediately as humanly possible.

 

Please and thank you to anyone who can help me here!

Link to comment

I really wish that worked, as I previously stated, I already did this, and I just tested it out one step at a time based on the link you posted and no dice. Simply put the drive unmountable when the old failing frive is removed from the pool and there is nothing I can do about that. I am at my wits end here, the data is there I just cant get rid of this failing drive and its driving me bananas.

Link to comment

is there any way to use the command line to pull the data off? I mean redoing every docker from scratch is just a ludicrous amount of work I shouldn't have to do at all. I mean, is this a bug in unraid where the GIU is unable to remove a drive form a pool? Because I work tomorrow I had to pull the old bedroom HTPC and installed pfSense on that because the server is 100% down. I know the data is there which is why I wont be doing anything else at all till I can get it mounted and at least pull the data off of it.

Link to comment

It's not mounting because you converted the pool to single profile then removed a device, that's not possible, you can only remove devices from a redundant pool, this might work:

 

-stop array

-unassign all cache devices

-start array

-type on the console (if you rebooted since the diags make sure the ADATA SSD is still sdb):


 

btrfs-select-super -s 1 /dev/sdb1

 

-stop array

-assign both cache devices, there can't be an "all data on this device will be deleted" warning for any of the cache devices

-start array

-post new diags.

 

 

 

  • Like 1
Link to comment

connollyserver-diagnostics-20210703-0244.zip

 

Whatever i just did, it mounted at least. I assume by looking at the command that forced the adata back to being the primary drive? Also, holy heck I just realized it's almost 3AM and I work in the morning... I am gonna go hit the shower and check back in here once more tonight but I have to get in bed like, two hours ago. lol

 

Thanks a bunch for trying to help me out here, it is appreciated! I just know so very little about BTRFS. :D

Link to comment

Since the pool is now in single mode and has a possible failing device you can try to remove it now instead of converting to back to raid1 and then removing, but to remove a device from a single profile pool you can only do it manually, before starting it's a good idea to make sure backups are up to date, then:

 

-with the array started type in the console:

btrfs dev del /dev/sdb1 /mnt/cache

-if the command aborts with errors post new diags, if the command completes without errors and you get the cursor back stop the array

-unassign both cache devices

-start array

-stop array

-assign the Samsung cache device only

-start array

-done

 

 

  • Like 1
Link to comment

As much as I would LOVE to do this right now, its bed time for me, at least I have the internet working on the bedroom HTPC, which you may find interesting to hear that once upon a time it was the pfSense box long ago before i ever put the server together and before I knew what unraid even was. it's like homelab resurrection. :P At least I can work without fear in the morning, then once im off and all that jazz, i will do exactly what you recommend.

 

Question though, so should I balance as Raid 1 again and then refollow the proper steps or should I do it this way from the command line as you suggest? is there any reason it might be better to balance as Raid1 again? Either way, you have gotten it mounted for me and I will get all my backups confirmed that they are truly up to date, I have been slacking a bit on this and my most recent backup is over a month old on cloud storage. No bueno... :)

 

I'll let you know how it goes, thanks my dude!

Link to comment
34 minutes ago, cammelspit said:

I do it this way from the command line as you suggest?

I would suggest this since there's a suspect device, so the quicker it's done the better, but note that if there are read errors there will be problems, but it would be the same if you try to convert to raid1.

  • Like 1
Link to comment

Hey so i tried running the command, and it seems that it thinks its in RAID1

ERROR: error removing device '/dev/sdb1': unable to go below two devices on raid1

 

which is weird because I swear i converted it to Single. However I just noticed, now that it is mounted that I have this.image.png.f454444f7b4b671cebddad5f8d959e08.png

 

So is it confused as to if this is Raid1 or Single? Sorry it took so long to try and test this but I've been real busy at work and also I forgot how long it takes to do a backup of my appdata, I mean, with PLEX and it's ten and a half billion little files it takes forever, especially with a library as big as mine.

 

Also here is a new log for good measure.  

connollyserver-diagnostics-20210704-2011.zip

Link to comment
5 hours ago, cammelspit said:

Hey so i tried running the command, and it seems that it thinks its in RAID1

Sorry, my fault, I forgot about the metadata, it's still raid1, first convert it to single also:

 

btrfs balance start -f -mconvert=single /mnt/cache

Then do the above.

 

  • Like 1
Link to comment
root@connollyserver:~# btrfs balance start -f -mconvert=single /mnt/cache
Done, had to relocate 9 out of 739 chunks
root@connollyserver:~# btrfs dev del /dev/sdb1 /mnt/cache
ERROR: error removing device '/dev/sdb1': Input/output error

connollyserver-diagnostics-20210705-0233.zip

 

I dunno about you, but for me this feels like it should be way more straightforward. Like, drive broke, take out drive, done! Right? :P

Link to comment
1 minute ago, cammelspit said:

I dunno about you, but for me this feels like it should be way more straightforward. Like, drive broke, take out drive, done! Right?

 

It cannot be quite that simple as the default assumption would always be that a 'failed' drive will be replaced.

Link to comment

Sure, I see your point but that really isn't a reason for things to not be straightforward just as a rule. Maybe I am looking at it from the 'filthy pleb' sort of perspective. It is just weird that one can't just remove the drive, I mean, all things being equal it does sound pretty simple. I just got that Samsung drive barely a couple weeks ago so I am confident it's reasonably workable and I do intend on getting another for a RAID1 pair but I kinda had my TV die and a few other unforeseen expenses and I don't think it's unusual to assume someone may want to pull a drive out of a pool at some point or have a very good reason for doing so, but hey that's just me I guess. 🤷‍♂️

Link to comment
2 hours ago, cammelspit said:

ERROR: error removing device '/dev/sdb1': Input/output error

There are read errors on the failing device and because of that some data can't be moved to the other one, there's still a lot of data remaining on the that device, you'll need to back up whatever you can to the array or other device then recreate the cache.

  • Thanks 1
Link to comment

I was thinking that might be what was needed now. I just have to say thanks though, you have been very helpful and have gone above and beyond in helping and for that I am grateful. I already have my appdata backed up which is the important bit and there is maybe one or two small things for convenience I will copy off and I'll just recreate the cache from scratch. Again, you have been great. 👍

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.