Jump to content

Cache Devices, disk issue


Recommended Posts

Hello all,

 

I'm having some issues after finally getting my unraid box back up after some disk issues.  I initially moved my disks over to a new mobo, and everything came back up online.

 

I started seeing that the spinning disks were getting hot, so I decided to re-arrange inside the case a bit.  That issue is resolved, they never got THAT hot, they seem to be just fine.

 

The issue is with the cache devices.  I had two disks, set up as a pool of two devices.  It was my understanding that I had these properly setup to be a mirror.  It's made up of a 500GB M.2 drive, and a 500GB Sata disk.  It seems the sata disk is the one having an issue.  It does not seem to be recognized now, and it's showing up in unassigned devices.

 

The array starts, however it initally didn't start docker saying that it could not find files in /mnt/user/appdata, which are on the cache devices.  A few minutes later though, docker did manage to start up.

 

Looking at the Main page, I was going to stop the array and I see was running a btrfs check, but that finished up.  The file sizes on the cache drives were showing incorrectly before.  It said there was 1TB of space, unstead of the normal 500GB.  That has corrected itself now, showing used and free space properly.

So, it seems like my 850 EVO has just gotten kicked out of the Cache Pool, and won't go back in on it's own.  What's the best way to fix this and not loose all my data in the pool?  Thanks for any help!

 

 

Here is the screen shot I took prior to moving disks around.

 

unraidOriginal.thumb.png.96a4aa3a446978957e73df7ed3a74874.png

 

 

Here is where I am now.

 

unraidBroken.thumb.png.0f9a9b2ed126a3f13eabe7f5d52768a7.png

 

 

 

Link to comment

I guess I would just like some confirmation here that i'm going about this the correct way.

Can I just start the array back up after adding the 850 EVO as Cache disk?  It will erase the data on that single disk, but not the Cache 2 disk?

 

Yes, it's saying "This device", which seems correct, but I am just trying to validate my thoughts here, before wiping out the Array, when that ended up being the device.  Thanks.
image.thumb.png.b39e2d245cc5f95790645a944666ed64.png
 

Link to comment
13 hours ago, Donk303 said:

I guess I would just like some confirmation here that i'm going about this the correct way.

No. If you start the array like that, I'm pretty sure your cache pool data will be gone.

 

I believe the correct thing to do would be to set all cache devices to none, and start the array, then stop the array and assign both devices.

 

However, I'm not at all sure about it, so if I were in your shoes, I'd put it back like it was with a single device in cache2, and hopefully you can see your data, if so I'd follow the normal procedures to replace the cache disk by disabling the docker and VM services so they don't show up in the GUI, set all shares to cache yes and run the mover.

 

My best advice is to wait for @johnnie.black to chime in, he's much more familiar with BTRFS cache pool issues than I am.

 

WHATEVER YOU DO, don't start the array with that "All existing data..." message showing. It WILL erase your cache pool.

 

I stand corrected, thanks to johnnie.black. There have been so many threads with "OMG I JUST LOST MY DATA" when involving a cache pool I am overly paranoid that it's not going to work as advertised.

  • Like 1
Link to comment

Since the pool was started with a single device it was converted to single profile, so the old device can't just be re-assigned and used with existing data, but you can re-add it to the pool and it will be balanced to raid1 again (and the warning is correct, all data on that device will be deleted, but that's OK), if the pool was never started with a single device then there would be a way to pick up the old pool without reconverting.

Link to comment

Thanks for the reply there guys.  Sorry to be late to reply.  I did add the Cache device back in, and all seemed well.

 

Fast forward to today when I get to come back to this.  My goal originally was to add in another disk to increase my parity drive, and we took a turn when I had to buckle down a bit and find out why some SATA ports were not working.  Now that I have the M.2 disk moved over to the other slot, my SATA ports are all working as expected.  That said, my Cache pool is once again broken.

 

Now, when I start up it's not seeing that the M.2 drive (a 970 EVO) is part of the pool.  The SATA SSD (an 850 EVO) shows up in the pool properly.  ish.  So, it's essentially opposite as before.  

 

The ish part, is that it says that there is no file system on the 850.  Which is puzzling.  How did that happen?  That just can't be accurate...  So, even if I were to follow the same procedure that worked before, now my file system is completely toast it seems.

 

So, looking for guidance on this next step.  attaching a new pictures and diagnostic.unraid-diagnostics-20200317-1659.zip

 

On 3/14/2020 at 1:35 AM, johnnie.black said:

Since the pool was started with a single device it was converted to single profile

This seems to be what is causing my grief with the disk dropping out of the pool, and I do understand that.  Right now the pool starts up at device startup.  Is that what you recommend?  I guess a good step whenever messing with the physical disks is to ensure that the pool does not start automatically.  But I don't see how that's gotten me to a invalid FS on the disk now.

If the FS did somehow get corrupted when the disk was not connected, is it possible to start the pool up and tell it to use the 970 as the cache drive and hopefully the FS on it is there?  And not go through the format of course...  I'm baffled as to how the disk would have corrupted.

 

image.thumb.png.10b1249bc7327668556cbf38087f22ad.png

unraid-diagnostics-20200317-1659.zip

Link to comment

A little good news I guess.  I was able to find the info in the faq about mounting the disk as read-only.  I did that successfully and i'm copying data over to the disk array for another backup.  

 

I had all my appdata info, but not the vdisks for my vm's.  So, very glad to get this.

 

Next i'll just have to see about getting it properly mounted if possible and not have to go to backup.  so, yay there.

 

Still would like to know more about why it's doing this.  Yes, I know it's reacting to me moving disks.  But, I thought this solution was a bit more resilient than that.  The cache pool really does not seem to like me.  I do get why it drops the one disk when the pool starts up.  However I don't understand why it force starts when it sees that cache is essentially broken.

 

Anyway, thanks.

 

Link to comment

lol, this is just funny...

 

So, got data backed up.  Proceeding with recommendation of formatting and moving on with life.  I stop the array and before just formatting the 850, I decide I want the 970 to be Cache, and the 850 to be backup.  So, i re-order the disks and start the disk pool.  Fully expecting it to barf and say FORMAT.  Well, it didn't.  No warning of dataloss, nothing like that.  Poof, she just started up and pretended like nothing ever happened.  I is running a BTRFS operation, but the data is there.

 

I have very little faith that the data is redundant.  I saw there is another thread on this just now, but I need to find a way to validate.  Seems I can't just pull a disk and test.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...