Unraid Hung when adding NVMe to Cache -- Now Array won't Start

7 posts in this topic Last Reply

Recommended Posts

Earlier in the week I updated to 6.9.1 and everything seemed fine.


Then I recently tried adding an NVMe to my system as a cache drive and it appeared to be stuck in the array starting mode with this particular drive continually stating 'Mounting' for over an hour.  Not sure if it was formatting and just not reporting that back.  Due to that I restarted the machine and now I can no longer start the array and the new drive no longer shows up as one to be added to the cache pool or even within Unassigned Drives.


Made sure that BIOS was updated to the latest for the board prior to attempting any changes to the Unraid array or cache, etc.


Plugging in a monitor I see the following error:


BTRFS: error (device sdh1) in cleanup_transaction:1942: errno=-28 No space left
BTRFS: error (device sdh1) in reset_balance_state:3327: errno=-28 No space left


Which I guess makes sense given that sdh1 appears to be my previous 250 GB Samsung Evo SSD, but given that I can not set the NVMe to any slot leads me to worry if I can salvage this.  According to the terminal the original `/mnt/cache` still contains all the data it had before hand, but obviously do not want to lose any of that.  Trying to back up the cache to an array disk is taking forever in the terminal with rsync.


I feel I kind of made a mistake in increasing the cache pool to 2 and then swapping positions of the cache drives before clicking start initially.


I'm adding one of my more recent diagnostics since this has happened.  The only downside, is it's prior to starting the array as I can not get to run it while it's stuck in start mode.


Anyone have any thoughts on how to handle this, even if it's just to get to before trying to add the NVMe?


Thanks for any help, suggestions, etc.!  Let me know if there might be anything else that I can provide if needed.


Link to post

Diags are with the array stopped so can't see the problem, but if you can't start it with the cache device assigned you'll likely need to re-format it, there are some recovery options here if needed.

Link to post

Thanks for the heads up, Jorge!  I kind of figured that might be the case sadly.  I'm going to see what I can do about at least getting the smaller one backed up before continuing forward.


That said, is there a way to make it only recognize the smaller one as it seems to be attempting to access the NVMe or at least something of that size? Like if I reset the pool slots to the original 1 rather than it being at 2 now

Link to post

Without seeing the actual error difficultly to say, but you can reset the cache pool by starting Unraid without any cache devices assigned, then stop the array and you can assign just one device, if a valid btrfs filesystem is detected it will be imported.

Link to post

I've been able to backup the smaller SSD, though took a little longer than I would've liked.  Actually did it a couple times just to be cautious.  Though, definitely appreciate the link in regards to restoring/handling btrfs related drives.  Helped a bit in doing the back ups.


Setting the Cache to just 1 and empty seems to have allowed me to start the normal array fine.  This obviously does remove access to my docker setups and vm, but not really that surprising.


When I go to re-add the original Evo SSD to the Cache (in current state), I end up with the same error as above:


BTRFS: error (device sdh1) in cleanup_transaction:1942: errno=-28 No space left
BTRFS: error (device sdh1) in reset_balance_state:3327: errno=-28 No space left



I thought about checking the NVMe and perhaps either re-adding it by itself to the Cache first, or perhaps just reformatting.  But when checking the drive with `sgdisk` I received the following which has me hesitating


Found invalid GPT and valid MBR; converting MBR to GPT format
in memory.

Warning! Secondary partition table overlaps the last partition by
33 blocks!
You will need to delete this partition or resize it in another utility.
Disk /dev/nvme0n1: 1953525168 sectors, 931.5 GiB
Model: WDBRP********-****
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): FA9DBD11-2D26-****-****-************
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048      1953525167   931.5 GiB   8300  Linux filesystem


Would it be better to clear this drive before attempting to do anything with it in the Cache?


Thanks, again!


Link to post
6 hours ago, inva_zimm said:

Would it be better to clear this drive before attempting to do anything with it in the Cache?

Yes, you can wipe it with blkdiscard

Link to post
  • 2 weeks later...

Thanks again, Jorge!


Have had a few other things to deal with, and being away for a little while, but working on it slowly here and there.  Looks like I've at least got the server array back up and running along with my docker services, etc. The NVMe shows up in the unassigned devices as I removed it from the cache pool for the moment while doing some clean up and reworking.  So I think I'm back in a good spot to continue with everything.


Thanks again!  Considering this solved for now!

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.