Keep certain VM's running without array started


Recommended Posts

52 minutes ago, johner said:

Array = parity array…

 

docker and VMs should be able to continue to run on a cache etc.

 

basically make each array independent with their own start/stop/check buttons etc.

 

this might then also support multiple parity arrays…

How do you handle user shares that have part of their storage go away when that pool is down? For instance, I keep some VM vdisks on the parity array, and some in a pool. They all seamlessly are accessible via /mnt/user/domains, and I move them as needed for speed or space, and they all just work no matter which disk actually holds the file.

Link to comment
12 minutes ago, JonathanM said:

How do you handle user shares that have part of their storage go away when that pool is down? For instance, I keep some VM vdisks on the parity array, and some in a pool. They all seamlessly are accessible via /mnt/user/domains, and I move them as needed for speed or space, and they all just work no matter which disk actually holds the file.

In THAT case, whatever uses the array would go offline and not work obviously for your particular scenario. But what WE'RE asking is for the VM's/dockers that remain in a stand alone or cache drive. There could also be a way to mark a VM/docker as 'keep in cache/alive' or w.e so that it is not moved to the array. Plenty of ways to implement it really.

(Edit) You know when you really look at it, what we want is to be able to keep our internet going while doing w.e that requires the array to go down. Pfsense, ubiquity, homassistant? etc. That's what we're really asking here. These things run mainly in RAM too so I'm not seen what the big problem is.

Edited by JustOverride
Link to comment
25 minutes ago, JonathanM said:

How do you handle user shares that have part of their storage go away when that pool is down? For instance, I keep some VM vdisks on the parity array, and some in a pool. They all seamlessly are accessible via /mnt/user/domains, and I move them as needed for speed or space, and they all just work no matter which disk actually holds the file.

That’s a design and user experience question.

 

obviously those specific VMs will not be able to be running if ‘their’ storage is offline.

 

what experience do ‘you’ want? With this requirement enabled, it could be: prevent any vm/docker from using the parity array? (Maybe an easy MVP release), or more complex where the stop solution goes and traces back users of its array and stops any specific (singular VMs, specific shares etc.) services first - I suggest this as a later/subsequent release if people want this experience.

 

first things first, agree the requirements, accept the requirements, scope a release, then design it.

  • Upvote 1
Link to comment
  • 4 weeks later...

Quite interesting thread. Came here trough Google when searching what to do in my situation. I thought that it would be possible somehow do this or at least maybe it would be implemented.

 

I have to migrate my unraid to mATX case when we move for 2 months to escape apartment water pipes renovation. I can't take the array disks with me (power consumption and there is no room for large tower) but I need to run the vm's. 

 

Currently thinking of moving mobo (small B660 with 12100) and only vm ssd + os ssd to small case and install proxmox to run the vm's. After I move back I probably will run Unraid as vm in proxmox with nic&hba passthrough, going forward to get around the vm debacle.

 

I have had my fights with array before, having to run repairs etc to disks and I can't have the downtime for the vm's during those periods.

Link to comment
  • 2 weeks later...

I'm interested in this feature as well. 

 

My VM's & Dockers run on their own pool (NVMe mirror) so have nothing to do with the main array. Some of them may access an SMB/NFS share as a mount, but in the case where the array is down that mount will just go unreachable in the VM. 

 

Theres times I do storage maintenance like the last 2 weeks, where there shouldn't be a need for me to stop every Docker & VM to just replace/upgrade a drive in the primary array. 

 

I get this might be complicated though when users have VM's that have vdisks on /mnt/user

Link to comment
  • 1 month later...
  • 1 month later...
  • 2 weeks later...

+1

I am constantly modifying my Unraid setup as I learn new things.
 

I rarely need to turn off my HA VM or my Win 10 VM; yet taking down the array does that.

 

Worse - turning on the array doesn't auto-on the VMs only Dockers :(

 

---

 

I read numerous times about how to implement this when skimming through this thread.

I also see questions about licensing, what to do when a mapped share is down, how to handle different locations for dockers/vms, etc.

 

The devs really would be the experts at how as there are numerous obstacles to overcome.

Array parity behavior, Array licensing behavior, Access to drives if array is down, Etc.

I get it, there's lots to think through.

 

---

 

Here's my take.
Requirements - a working cache pool

This forces users to not use the array. Limits to more experienced users.

 

Licensing - Pro version enables support for multiple offline pools.

Solves all licensing issues. All the licensing talk is too complicated. Why make code changes when you can make policy changes that make sense.

 

Warning - Any mapped shares that a docker/VM uses will not work and can crash your docker/VM if the internal software cannot handle it.

Why make this a Limetech issue? Push the responsibility on the user & container developer.

 

Ex. 1) Plex - libraries shared to spinning rust on array - okay who cares. Plex handles missing library locations gracefully.

Ex. 2) Windows VM  - can't find a share. Windows will handle that gracefully (or not) lol.

Ex. 3) HA, FW, UtKuma, Unifi, etc. - These reside entirely inside their docker/VM container. They only write out or read out if they have mapped syslog folder or something. Too complicated to check if it's on an array, etc. Who cares. Responsibility is on the user.

 

New Buttons & GUI changes

Array-Start/Stop button (1) - main array

Array-Start/Stop button (2) - cache pool 1

Array-Start/Stop button (n) - cache pool n [drop down menu button]

 

938053132_arraymultiplepoolspindown.thumb.png.0d09e78f67e7f4fd52b4e0f6e8f5618b.png

 

---

 

Implementation 0.1
'All or nothing.'

Shares cannot point to the 'DockVM' pool [they can, but if they do, then you can't keep DockVM pool online]

All Online Dockers volume mappings point to DockVM pool

All Online VM disk mappings point to DockVM pool [This simplifies the programming required. sure there are dedicated disks, usb disks etc. who cares. Start simple]

 

Offline Dockers/VMs are not considered [if they're offline and your turning off the array but not the DockVM pool...who cares]

 

(3 pools) [You could forgo the cache pool of course, but most pro users are going to have a setup like this]

Array [unraid] [hdd]

Cache [whatever; raid, probably 1 or 0+1] [probably ssd or nvme]

DockVM [whatever; raid, probably 1 or 0+1] [probably ssd or nvme]

 

Clicking stop on the array - "error docker.list & VM.list are online & uses array volume mappings"

I would use this right now. I'd make my current cache drive only for docker & vm until implementation 1.0 can roll out.

I don't need to copy files to my cache and then use the mover to move to spinning rust.

I'd rather have my dockers & vms stay up right now.

Plex docker running? - I'd have to shut it down. Who cares, if the array is offline no one is watching movies anyways.

But my HA & W10VM (with internal database on vdisk) stay up; woot.---

 

---

 

Implementation 1.0
'Almost All or nothing.'

 

All Online Dockers volume mappings point to cache

All Online VM disk mappings do not point to array

 

All Offline Dockers cannot start without both array & cache 'online'

All Offline VMs cannot start without both array & cache 'online'

 

The only reason this is separate from 0.1 is due to the fact that the shares are mapped to the same cache pool as the dockers & VMs. Thus I'm assuming there will be some code changes & checks to ensure compatibility.

 

Plex docker - must be shutdown to shutdown array [but keep cache online]

 

---

 

Implementation 2.0

'Granular approach'

 

All of the above but:

 

Checks if the primary storage for docker or VM is on the cache pool.

Ignores mappings to array (but reports warning)

 

Plex docker - stays running; volume mappings to array are offline/not working.

Hopefully container doesn't crash.

 

---

 

Implementation 3.0

'Granular Auto Approach'

 

All of 2.0 plus the ability to auto-on/off dockers & VMs based upon which pool is going up/down.

Global setting for restart Unraid OS auto-start or start/stop array pool(s) to auto-start Docker/VMs.

Link to comment
  • 5 months later...
  • 3 months later...

As people have finally gotten to (read the whole thread), the issue is a licensing issue, so in a sense this thread is kind of named wrong as actually a whole bunch of implications are created via the method chosen to enforce the licence.  Personally, I think that the value in unraid has far surpassed the unraid array now which is undoubtedly where it all started and really it should have nothing to do with how they apply the licence anyway - it was probably just the most convenient method at the time.

 

Unraid have a unique customer focus which nobody else has - which unfortunately has meant some of the more typical NAS features  haven't been well implemented yet.

 

These impacts are not normally expected from hosts providing virtualisation capabilities and needing to provide high uptime.  Presently if you want high uptime and you know what you're doing, you would certainly not use unraid.

 

Esxi, proxmox and TrueNAS scale to name a few software types and also QNAP and Synology style of NAS's all don't exhibit these issues - I can't think of one product that offers virtualisation capabilities that requires you to stop the array for these kinds of changes.  Anyone got one?

 

I would suggest we bring visibility to the impacts and ask for fixes to them, while that's sort of done here it's hidden in a big long thread.  But raising it like that, perhaps we can get Limetech to understand that it's a bit of a negative on their product compared to other offerings and they might do something about it.

 

Perhaps someone can summarise them at the first post by editing it?

 

As a starter for 10 - some unexpected impacts that I have noticed include: (I'm doing this half asleep so please correct any you think I'm mistaken on and also note by 'system' below I mean any customer facing services, not the core OS).

  • Having to stop the whole system to replace a failed disk and this now includes ZFS
  • Having to stop the whole system to change the name of a ZFS array
  • Having to stop the whole system to change the mount point of a ZFS array
  • Having to stop the whole system to make simple networking changes

I think there are quite a few more scenarios, some of them are fair - like isolating CPU cores.

 

When you look at it, it's mostly about disk management I think - which is a bit embarrassing as that is a fundamental of a NAS.  And this is the point right, in this day and age we expect a bit better and it's possible to do, if Limetech get the message properly.

 

Open to alternative suggestions.  Great discussion in this thread!

 

Edited by Marshalleq
  • Like 1
Link to comment
  • 1 month later...

Just updating here as I came across a feature request post from @jonp (ex-Limetech) which pre-dates this one. Completely missed it up until now, as it's in the "Unscheduled" sub-forum... Indicating I guess that it is something that's on the Limetech radar to be implemented at some point, right?

I sincerely hope so! I really would like to move away from virtualizing UnRAID some day 😅

Link to comment
On 1/16/2024 at 4:47 PM, JonathanM said:

Yes, the thread is relevant, but no postings by jonp. I was wondering whether he had multiple threads he was looking at and posted the wrong link.

 

I was referring to this:

 

<NIXED - I misread>

 

There's several others in there that have relation to this with jonp in them, though this one wasn't, apologies all!

Edited by BVD
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.