Unmountable cache disk pool (After trying to replace a dead SSD)


REllU

Recommended Posts

Hey all, I'll try to keep this one as short and informative as possible.

 

The problem

- Cannot mount cache SSD (Unmountable: not mounted)

 

The cause

- Replaced a dying Kingston SSD, by un-assigning the SSD from the pool (didn't change the pool size to 1)

 

The system

- UnRaid 6.9.2

- Two-ish months old Kingston SE9 USB 2 flash drive

- Two old Kingston 240gb SSD's (btrfs, since these were both mounted at the same time)

- Non-ecc RAM, 8gigs, tested for 48 hours with MemTest before deployment

- DuckDNS installed in Docker.

- New, replacement WD Blue SSD, that wasn't formatted after previous use

 

What I did

- Un-assigned the dying SSD from the cache pool

- Started the array (without changing the cache pool size to 1)

- Shut down the server from GUI

- Replaced the dying SSD with a new one (from an updated computer, that no-longer needed the SSD)

- Started the server, and assigned the new SSD to the Cache 2 slot, where the old SSD was

- UnRaid seemed to be happy with this, but the Cache 1 SSD was throwing an error about not finding the old SSD

- I trimmed the SSD, and then I tried to perform a full balance

- Restarted the server, and noticed the Cache 1 SSD was still throwing the error about the old SSD

- I un-assigned the Cache 1 SSD from the pool, because I wanted to make sure everything worked, or not

- At this point, UnRaid wasn't able to mount the new SSD anymore. I tried to change the cache pool size to 1, and assigning the new SSD there

- Didn't work. Un-assigned the new SSD, removed the cache pool

- With un-assigned devices- plugin, I was able to peek into the new SSD, finding that it was still full with Windows 10 stuff from the previous PC

- Shut-down the server, removed the new SSD, and put the old one back.

- Power on, and try to assign the original SSD's to their original slots, but UnRaid couldn't mount them anymore.

 

What I want

- I'd like to restore the original SSD Cache pool, so that I'd be able to replace the cache "the right way" (moving everything away from cache, and then back to the new cache pool)

- As far as I know, the data isn't gone. Raid 1 information is just missing, and the SSD's are unmountable

 

Misc info and rambling

- I've done a bit of an oopsie when I was building my first UnRaid server, so I knew _not to_ format the SSD's at any point.

- This is a backup server for our main server, so it's not a huge loss if the data is gone. But with that said, I want to learn how to handle this.

- I really thought that the point of having the RAID 1 cache pool, was that my data would be safe, even if one of the SSD's dies. Now, I'm a little worried about the whole situation, and how UnRaid is handling the whole thing. I'm aware that the 6.9.X version is having issues with replacing the Cache drives, but still.

- The server is currently offline, as I didn't want to risk anything weird happening to it, as it is connected with our main server for automatic backups.

- I can post a diagnostics log file later today, when I get back home, if it's of any use. However, right now, the server is set up without cache pool, so if you want any meaningful information from the log, please tell me what to do before I download the log and post it here.

 

I hope I didn't miss anything. I tried to look for something similar through the forums for a few hours, and the closest I was able to find, was someone going "oops, I formatted the SSD, now the data is gone" after trying to troubleshoot the issue.

 

Any help would be appreciated, thank you.

Edited by REllU
Link to comment
4 minutes ago, JorgeB said:

Pool device replacement is broken on v6.9+.

 

You didn't post the diagnostics, to see if there's anything that can still be done.

 

35 minutes ago, REllU said:

Misc info and rambling

 

- I can post a diagnostics log file later today, when I get back home, if it's of any use. However, right now, the server is set up without cache pool, so if you want any meaningful information from the log, please tell me what to do before I download the log and post it here.

 

 

EDIT:´As you probably noticed from my original message, I am aware that the 6.9.x has issues with pool device replacements, and that I am willing to post diagnostics, as soon as I know what to do before I download the diagnostics log.

 

As of right now, the server is shut down, and the cache pool doesn't exists, so downloading any diagnostics right now wouldn't really help much I don't think.

Edited by REllU
Link to comment
21 minutes ago, REllU said:

As you probably noticed from my original message, I am aware that the 6.9.x has issues with pool device replacements,

I'm sorry but I don't understand why you did it if you knew it wasn't going to work.

 

Diags would be most helpful grabbed before shutting down the server, but if those are not available, boot the server, stop the array if set to auto-start, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign both the original cache devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), start array, grab and post the diags.

 

 

Link to comment
10 minutes ago, JorgeB said:

I'm sorry but I don't understand why you did it if you knew it wasn't going to work.

 

I saw a workaround here

Which I wanted to test.

This system is our backup server, and there's not a-lot of data in the cache drives, so it's not a huge issue if the data is gone. But as stated in the original message, I just want to learn how to deal with situations like this.

I'm also planning to upgrade our main server with bigger SSD cache (from 500gigs to 1tb), so I want to test and see what works and what doesn't with the backup server, before I do anything that cannot be fixed.

 

The bug also seemed to be out there for quite a while, despite how serious it seems.

 

10 minutes ago, JorgeB said:

Diags would be most helpful grabbed before shutting down the server, but if those are not available, boot the server, stop the array if set to auto-start, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign both the original cache devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), start array, grab and post the diags.

 

 

 

I feel like I've read this exact message somewhere already haha

Anyway, I'll do this once I get back home. That'll be in 6-8 hours or so. Appreciate the help.

 

EDIT:

Oh also, out of curiosity. What _should_ I do, if a disk / SSD dies on a pool?

I have 2 SSD's on both of my servers, as well as a parity drive for the HDD's. Has there been any word from LimeTech about this whole issue? Just seems rather weird to me.

Edited by REllU
Link to comment
2 minutes ago, REllU said:

I saw a workaround here

Yes, and that would work, but that's not what you did:

 

1 hour ago, REllU said:

- Shut down the server from GUI

- Replaced the dying SSD with a new one (from an updated computer, that no-longer needed the SSD)

- Started the server, and assigned the new SSD to the Cache 2 slot, where the old SSD was

You needed to first start the array with only the remaining cache device assigned, then after the balance completed stop the array to add the new device.

Link to comment
2 minutes ago, JorgeB said:

Yes, and that would work, but that's not what you did:

 

You needed to first start the array with only the remaining cache device assigned, then after the balance completed stop the array to add the new device.

 

Ah, sorry, I missed a step while I was writing the message.

I did un-assign the dying SSD, started the array, and then shut down the server.

I guess the oopsie here, was that I didn't change the cache pool to be the size of 1 when I started the array again?

If so, that might be worth it to be added on the work-around message, or maybe do a "official" work-around post somewhere?

Link to comment
11 minutes ago, JorgeB said:

That's not important.

 

In that case, only thing I can think of, is that the new SSD was already formatted, and filled with (old) data.

Like I said in the original post, UnRaid seemed to be OK after I replaced the SSD. Everything was green, and the new SSD was part of the cache pool just fine.

 

Is there anything else you could think of, why this would've happened?

Link to comment
5 hours ago, JorgeB said:

I'm sorry but I don't understand why you did it if you knew it wasn't going to work.

 

Diags would be most helpful grabbed before shutting down the server, but if those are not available, boot the server, stop the array if set to auto-start, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign both the original cache devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), start array, grab and post the diags.

 

 

 

Hey again,

so I did what you asked here, and here's the log file.

 

The dying SSD is on Cache 2, the sdd one (Kingston_SV)

 

(EDIT: Removed the log file, in case there was anything sensitive)

 

Edited by REllU
Removed the log file, in case there was anything sensitive
Link to comment

- Stopped the array

- While the SSD's were un-assigned, I ran the command above on both SSD's

- Only the known-good SSD seemed to be fine with it, the dying one threw some errors (I'll throw a pic of the terminal in attachments)

- Rebooted the system (with auto-start disabled), grabbed the diags

 

Both SSD's look like they can be mounted with Un-Assigned Devices- plugin. So that's a good sign.

Terminal.PNG

 

Edited by REllU
Uploaded syslog instead of diagnostics.
Link to comment
5 hours ago, JorgeB said:

if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign both the original cache devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), start array, grab and post the diags.

Repeat the first part of the above so Unraid will forget the pool again but now assign only the good device (sde) and start the array, if it doesn't mount post new diags.

Link to comment
1 minute ago, REllU said:

it's throwing out errors in log.

OK, but now at least it's throwing an error that makes sense, strange it didn't show the same in the pool, it's not normal:

 

Aug 23 19:02:32 DataStriver kernel: BTRFS error (device sdd1): devid 2 uuid 4b126ecf-990b-43f5-a5fd-1f31a15ab285 is missing

 

Since the other device is out of sync, try this:


 

mkdir /x
mount -o degraded,ro /dev/sdd1 /x

 

If it doesn't mount post the output, if it does you can browse /x to backup the contents.

Link to comment
18 minutes ago, JorgeB said:

It's still not mounting degraded, that means the missing device it not the only problem, you can try the other recovery options here to see if any of them work.

 

Interesting.. I'll give these a go tomorrow.

In the meanwhile, would you have any ideas as of what actually happened here? And if I manage to recover the data from the SSD, should I replace this SSD as well, or would I be able to continue to use it after a format?

I am planning to replace both SSD's in the near-ish future with WD Red 500gig ones, that are coming off from the main server, once I've replaced them with 1tb ones.

 

Thanks for all the help so far, and hats off to you with your replying times!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.