Unraid Hung when adding NVMe to Cache -- Now Array won't Start

inva_zimm · March 13, 2021

Earlier in the week I updated to 6.9.1 and everything seemed fine.

Then I recently tried adding an NVMe to my system as a cache drive and it appeared to be stuck in the array starting mode with this particular drive continually stating 'Mounting' for over an hour. Not sure if it was formatting and just not reporting that back. Due to that I restarted the machine and now I can no longer start the array and the new drive no longer shows up as one to be added to the cache pool or even within Unassigned Drives.

Made sure that BIOS was updated to the latest for the board prior to attempting any changes to the Unraid array or cache, etc.

Plugging in a monitor I see the following error:

```

BTRFS: error (device sdh1) in cleanup_transaction:1942: errno=-28 No space left
BTRFS: error (device sdh1) in reset_balance_state:3327: errno=-28 No space left

```

Which I guess makes sense given that sdh1 appears to be my previous 250 GB Samsung Evo SSD, but given that I can not set the NVMe to any slot leads me to worry if I can salvage this. According to the terminal the original `/mnt/cache` still contains all the data it had before hand, but obviously do not want to lose any of that. Trying to back up the cache to an array disk is taking forever in the terminal with rsync.

I feel I kind of made a mistake in increasing the cache pool to 2 and then swapping positions of the cache drives before clicking start initially.

I'm adding one of my more recent diagnostics since this has happened. The only downside, is it's prior to starting the array as I can not get to run it while it's stuck in start mode.

Anyone have any thoughts on how to handle this, even if it's just to get to before trying to add the NVMe?

Thanks for any help, suggestions, etc.! Let me know if there might be anything else that I can provide if needed.

alexandria-diagnostics-20210312-1452.zip

JorgeB · March 13, 2021

Diags are with the array stopped so can't see the problem, but if you can't start it with the cache device assigned you'll likely need to re-format it, there are some recovery options here if needed.

inva_zimm · March 13, 2021

Thanks for the heads up, Jorge! I kind of figured that might be the case sadly. I'm going to see what I can do about at least getting the smaller one backed up before continuing forward.

That said, is there a way to make it only recognize the smaller one as it seems to be attempting to access the NVMe or at least something of that size? Like if I reset the pool slots to the original 1 rather than it being at 2 now

JorgeB · March 14, 2021

Without seeing the actual error difficultly to say, but you can reset the cache pool by starting Unraid without any cache devices assigned, then stop the array and you can assign just one device, if a valid btrfs filesystem is detected it will be imported.

inva_zimm · March 17, 2021

I've been able to backup the smaller SSD, though took a little longer than I would've liked. Actually did it a couple times just to be cautious. Though, definitely appreciate the link in regards to restoring/handling btrfs related drives. Helped a bit in doing the back ups.

Setting the Cache to just 1 and empty seems to have allowed me to start the normal array fine. This obviously does remove access to my docker setups and vm, but not really that surprising.

When I go to re-add the original Evo SSD to the Cache (in current state), I end up with the same error as above:

```

BTRFS: error (device sdh1) in cleanup_transaction:1942: errno=-28 No space left
BTRFS: error (device sdh1) in reset_balance_state:3327: errno=-28 No space left

```

I thought about checking the NVMe and perhaps either re-adding it by itself to the Cache first, or perhaps just reformatting. But when checking the drive with `sgdisk` I received the following which has me hesitating

```

***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory.
***************************************************************

Warning! Secondary partition table overlaps the last partition by
33 blocks!
You will need to delete this partition or resize it in another utility.
Disk /dev/nvme0n1: 1953525168 sectors, 931.5 GiB
Model: WDBRP********-****
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): FA9DBD11-2D26-****-****-************
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number Start (sector) End (sector) Size Code Name
1 2048 1953525167 931.5 GiB 8300 Linux filesystem
```

Would it be better to clear this drive before attempting to do anything with it in the Cache?

Thanks, again!

JorgeB · March 17, 2021

6 hours ago, inva_zimm said:

Would it be better to clear this drive before attempting to do anything with it in the Cache?

Yes, you can wipe it with blkdiscard

inva_zimm · March 31, 2021

Thanks again, Jorge!

Have had a few other things to deal with, and being away for a little while, but working on it slowly here and there. Looks like I've at least got the server array back up and running along with my docker services, etc. The NVMe shows up in the unassigned devices as I removed it from the cache pool for the moment while doing some clean up and reworking. So I think I'm back in a good spot to continue with everything.

Thanks again! Considering this solved for now!

Unraid Hung when adding NVMe to Cache -- Now Array won't Start

Recommended Posts

inva_zimm

Link to comment

JorgeB

Link to comment

inva_zimm

Link to comment

JorgeB

Link to comment

inva_zimm

Link to comment

JorgeB

Link to comment

inva_zimm

Link to comment

Join the conversation