unVMs Status Update


jonp

Recommended Posts

All,

 

I wanted to take a moment to provide an update on this for those that are wondering where this stands.  In short, the VMs have been used in my personal home environment for the last several months.  For the most part, they work perfect, but there are a few concerns we need to address before releasing to the masses.

 

First and foremost is hardware compatibility.  Not everyone's hardware is going to work with this, and it's not just about checking your motherboard/processor for virtualization support.  We've found that some motherboards / CPUs, even though they claim full virtualization and IOMMU support (which is required for hardware pass through) do not have it implemented properly to work reliably.  This isn't an unRAID OS or even a Linux Kernel issue, but rather, hardware related.  Secondly, not all GPUs work properly when assigned to a guest VM.  Case in point is the GTX 550Ti from nVIDIA.  This is the biggest tease of a video card ever.  Quite simply, the first time you boot a VM with GPU pass through using the 550Ti, it works and works just fine, but upon reboot of the VM, the card fails to continue working.  The card is just not supporting a bus level reset, which is needed to "cycle" the card when a VM is rebooted.  We've tried a number of advanced tactics to "trick" this card but in the end, the card just won't come back after the host is rebooted.  So why did we spend so much time on this card?  Because it's not the only one that will suffer from this, so we wanted to see what we could do to resolve it.

 

Our second concern is to provide a safe way for folks to test this without putting their data at risk.  The good news is I already have a solution for this, but need to have Eric implement it.  The solution is to let folks download the unVM for OpenELEC and start it without starting the array.  By doing this, folks can test out the capabilities of the VM and if it caused a host crash, you wouldn't need to worry about an unclean shutdown (since the array wouldn't have been running anyway).  The OpenELEC VM is ~300MB, so we could literally put the entire virtual disk image in RAM or on the USB flash device for testing purposes.

 

And last of all, we're considering implementing a way for folks who test this to submit their hardware profile to us as "validated" so we can start building a database of hardware that we know is compatible with these VMs.  I don't think we absolutely have to have this done before we can release the VMs, but to be fair, that would be really nice.

 

Ok, so where does this leave us?  First and foremost, all efforts right now are being focused on the integration of Dynamix into the next release of the unRAID beta.  This has been major effort, but we're rapidly approaching completion so we can get this out to everyone.  After that, we can start planning for a release schedule around the unVMs.

 

Definitely curious to hear everyone's feedback here on the plan of attack I'm proposing above.  Thanks everyone and I promise, your patience will pay off.  This stuff is just so freaking cool!!

Link to comment
  • Replies 55
  • Created
  • Last Reply

Top Posters In This Topic

Plan sounds good. Could I also ask you to keep track of the type of filesystem people are running the VM off. I noticed on a KVM site that you shouldn't be running VM's of btrfs (which I am currently doing) as I am also using docker. I think this is causing issues when updating the library and playing video in xbmcbuntu (so openelec should be similar...)

Link to comment

Plan sounds good. Could I also ask you to keep track of the type of filesystem people are running the VM off.

 

What do you mean by "keep track"?  You mean with the hardware reporting, also gather info on the filesystem in use?  I suppose we could do that...

 

I noticed on a KVM site that you shouldn't be running VM's of btrfs (which I am currently doing) as I am also using docker.

 

I actually think that the KVM site may be outdated on that recommendation for not using btrfs.  I only use Btrfs for my VMs right now...

 

I think this is causing issues when updating the library and playing video in xbmcbuntu (so openelec should be similar...)

 

Explain please...

Link to comment

Plan sounds good. Could I also ask you to keep track of the type of filesystem people are running the VM off.

 

What do you mean by "keep track"?  You mean with the hardware reporting, also gather info on the filesystem in use?  I suppose we could do that...

Yes, sorry for lack of clarity

 

I noticed on a KVM site that you shouldn't be running VM's of btrfs (which I am currently doing) as I am also using docker.

 

I actually think that the KVM site may be outdated on that recommendation for not using btrfs.  I only use Btrfs for my VMs right now...

 

I think this is causing issues when updating the library and playing video in xbmcbuntu (so openelec should be similar...)

 

Explain please...

 

So, I am currently running my dockers and xbmcbuntu VM on my cache drive. I am not running a seperate sql db but using the xbmcbuntu VM library for my tv series and movies. Whenever i run a library refresh or it updates at start the VM playback stutters and skips. This occasionally also happens when the plex docker is refreshing (although I have now deleted this docker as I am getting comfortable with xbmc). Other than that I have had no issues and the playback is perfect for full blu-ray rips and the flirc passthrough works perfectly.

 

Link to comment

Plan sounds good. Could I also ask you to keep track of the type of filesystem people are running the VM off.

 

What do you mean by "keep track"?  You mean with the hardware reporting, also gather info on the filesystem in use?  I suppose we could do that...

Yes, sorry for lack of clarity

 

I noticed on a KVM site that you shouldn't be running VM's of btrfs (which I am currently doing) as I am also using docker.

 

I actually think that the KVM site may be outdated on that recommendation for not using btrfs.  I only use Btrfs for my VMs right now...

 

I think this is causing issues when updating the library and playing video in xbmcbuntu (so openelec should be similar...)

 

Explain please...

 

So, I am currently running my dockers and xbmcbuntu VM on my cache drive. I am not running a seperate sql db but using the xbmcbuntu VM library for my tv series and movies. Whenever i run a library refresh or it updates at start the VM playback stutters and skips. This occasionally also happens when the plex docker is refreshing (although I have now deleted this docker as I am getting comfortable with xbmc). Other than that I have had no issues and the playback is perfect for full blu-ray rips and the flirc passthrough works perfectly.

 

Could be your XML for your XBMCbuntu VM and probably has nothing to do with btrfs to be honest.  A couple things:

 

1)  are you storing your library data inside the VM itself or redirecting it using VirtFS to a share on the array/cache?

 

2)  are you pinning any cores specifically to the VM or just letting it run wild with vCPUs?

 

3)  how many vCPUs do you have assigned to the VM?

Link to comment

The card is just not supporting a bus level reset, which is needed to "cycle" the card when a VM is rebooted.  We've tried a number of advanced tactics to "trick" this card but in the end, the card just won't come back after the host is rebooted.  So why did we spend so much time on this card?  Because it's not the only one that will suffer from this, so we wanted to see what we could do to resolve it.

 

Our second concern is to provide a safe way for folks to test this without putting their data at risk.  The good news is I already have a solution for this, but need to have Eric implement it.  The solution is to let folks download the unVM for OpenELEC and start it without starting the array.  By doing this, folks can test out the capabilities of the VM and if it caused a host crash, you wouldn't need to worry about an unclean shutdown (since the array wouldn't have been running anyway).  The OpenELEC VM is ~300MB, so we could literally put the entire virtual disk image in RAM or on the USB flash device for testing purposes.

 

Definitely curious to hear everyone's feedback here on the plan of attack I'm proposing above.  Thanks everyone and I promise, your patience will pay off.  This stuff is just so freaking cool!!

 

I like the idea of the slow path to implementation, however most people want something to play with and for that there is no reason to limit the UnElec VM image for others to try.

Keep in mind that many early adopters (even more so than just the ones running the betas) are messing with a variety of VM with pass-thru with various levels of success.

 

This part is what I've noticed to be the most important when working with GPU pass-thru "Because it's not the only one that will suffer from this, so we wanted to see what we could do to resolve it."

 

I have done testing with both an AMD 5450, and also (now) a GeForce210 card, both with less than repeatable results.

 

The 5450 will absolutely hard freeze the entire server during a reset, or a complete shutdown and then trying to start the VM again.

 

The 210 will not in any case that I have tried EVER lock up the entire server (which is fantastic!), however it has been very inconsistent with reboots, and complete shutdowns and later starting again. Sometimes they work, other times the VM says running however it never actually does anything.

 

The one thing I want to stress, and sorry for sounding negative, is that a lot of your posts lead to the idea that pass-thru is working great, reboots, off/on conditions, etc... However I certainly haven't experienced this with 2 (what I'd call) very popular cards for this kind of usage (HTPC mainly). It seems that others have had limited success also. It's different to say working, vs saying, working but expect issues when starting the VM again, or completely shutting down and starting again.

 

All issues that I have are fixed (for that time of a problem) by completely powering down the server, switching off the power (to force the GPU to reset) and then powering everything back up.

Link to comment

I think the 'cautious' approach is a good one.  I would suggest that the plan to restrict the OpenELEC VM to a non-running array is a good 'option', but should not be the only way one can run it.  If one tests, and has not issues, why not allow it to be used on a running array?  With that said, I know you're planning on building it so that it 'just works', but I suspect that means that you might use some 'default settings' that might not match everyone's needs, like install location, library location, etc.  The more flexible the better in the long run, but the harder to ensure a quality experience.  Just thinking out loud on that.

 

As for the 550Ti, you know I've had issues with it, and I know you and eric have seen the same/similar results.  I managed to get a Windows 8 VM built, and can confirm that this card is working just fine thru a reboot on windows 8, so it's as much an OS issue as a video card issue.

 

I do like the idea of having a 'database' of 'successful' setups.  Those of us that already have hardware may not benefit from it, but for those thinking of jumping it, it would be nice to know what's worked well for others.

 

As for the implementation of dynamix, I'm excited for this.  I'd like to suggest that you want to get it released as soon as possible, even it not 'finished' yet, and let us get to testing what you have.  Maybe even as an 'official plugin', which we can upgrade as you get it more and more updated.

 

I had written in another thread that it kinda feels like you guys are always 'just about ready to release' a new beta, but you keep just adding 'one more thing' first, then have to test, and over and over.  I obviously don't know if this is accurate, it just looks like it from the outside.  I'd just suggest you get something out for testing, and add the updates/upgrades in small, regular updates, instead of trying to get it 'finished' first.

 

Just thinking out loud :)

 

Regardless, I'm looking forward to the next release!

Link to comment

The card is just not supporting a bus level reset, which is needed to "cycle" the card when a VM is rebooted.  We've tried a number of advanced tactics to "trick" this card but in the end, the card just won't come back after the host is rebooted.  So why did we spend so much time on this card?  Because it's not the only one that will suffer from this, so we wanted to see what we could do to resolve it.

 

Our second concern is to provide a safe way for folks to test this without putting their data at risk.  The good news is I already have a solution for this, but need to have Eric implement it.  The solution is to let folks download the unVM for OpenELEC and start it without starting the array.  By doing this, folks can test out the capabilities of the VM and if it caused a host crash, you wouldn't need to worry about an unclean shutdown (since the array wouldn't have been running anyway).  The OpenELEC VM is ~300MB, so we could literally put the entire virtual disk image in RAM or on the USB flash device for testing purposes.

 

Definitely curious to hear everyone's feedback here on the plan of attack I'm proposing above.  Thanks everyone and I promise, your patience will pay off.  This stuff is just so freaking cool!!

 

I like the idea of the slow path to implementation, however most people want something to play with and for that there is no reason to limit the UnElec VM image for others to try.

Keep in mind that many early adopters (even more so than just the ones running the betas) are messing with a variety of VM with pass-thru with various levels of success.

 

This part is what I've noticed to be the most important when working with GPU pass-thru "Because it's not the only one that will suffer from this, so we wanted to see what we could do to resolve it."

 

I have done testing with both an AMD 5450, and also (now) a GeForce210 card, both with less than repeatable results.

 

The 5450 will absolutely hard freeze the entire server during a reset, or a complete shutdown and then trying to start the VM again.

 

The 210 will not in any case that I have tried EVER lock up the entire server (which is fantastic!), however it has been very inconsistent with reboots, and complete shutdowns and later starting again. Sometimes they work, other times the VM says running however it never actually does anything.

 

The one thing I want to stress, and sorry for sounding negative, is that a lot of your posts lead to the idea that pass-thru is working great, reboots, off/on conditions, etc... However I certainly haven't experienced this with 2 (what I'd call) very popular cards for this kind of usage (HTPC mainly). It seems that others have had limited success also. It's different to say working, vs saying, working but expect issues when starting the VM again, or completely shutting down and starting again.

 

All issues that I have are fixed (for that time of a problem) by completely powering down the server, switching off the power (to force the GPU to reset) and then powering everything back up.

Yeah I don't have the issues you have on the numerous devices I've tested. There are some that don't work well like the gtx 550ti that I mentioned, but we have a bunch of others that have worked great.

 

I should probably post a demonstration video this week, huh?

Link to comment

Very glad to hear dynamix is the focus for the next beta. For me, that is the essential update, VMS while very nice are not and could be delivered anytime

 

I feel most of the users here would agree with that.

 

Luckily it does sound like the priority is dynamix with unVM's coming later according to this:

First and foremost, all efforts right now are being focused on the integration of Dynamix into the next release of the unRAID beta.  This has been major effort, but we're rapidly approaching completion so we can get this out to everyone.  After that, we can start planning for a release schedule around the unVMs.

 

And last of all, we're considering implementing a way for folks who test this to submit their hardware profile to us as "validated" so we can start building a database of hardware that we know is compatible with these VMs.  I don't think we absolutely have to have this done before we can release the VMs, but to be fair, that would be really nice.

 

I think the implementation of hardware profile reporting is more important then a "nice to have". I think unVM's shouldn't be released without it. Why? I feel that people will try the unVM when it comes out and it if fails to work that data won't be recorded. Once a update is released to included hardware profile reporting I doubt as many people will retry to rerun the unVM. Collecting "invalided" hardware profiles I feel will be just as important as "valid" hardware profiles. Support will also be easier as if multiple people have trouble with the same hardware, likely its just not going to work and its not just a configuration problem. This will also allow people to narrow down which component isn't cooperating.

 

Thanks everyone and I promise, your patience will pay off.  This stuff is just so freaking cool!!

I'm not in a big rush but I would like to convert my configuration to docker containers and I don't really want to do that until things stop changing and settle down a bit. unVM's is definitely something I would love to play with but I think core improvements need to make it out the door soon/first. Hopefully you guys don't get caught up too much adding cool things, we all have the temptation of just "one more thing". I do get the feeling that unRAID wants to have a "set" feature list once 6.0 gets pushed to final, for the purpose of showing off the capabilities of unRAID.  UI improvement and things the community feels should have be core function long ago just don't sound as cool or exciting as running VM's.

 

Link to comment

 

Could be your XML for your XBMCbuntu VM and probably has nothing to do with btrfs to be honest.  A couple things:

 

1)  are you storing your library data inside the VM itself or redirecting it using VirtFS to a share on the array/cache?

 

2)  are you pinning any cores specifically to the VM or just letting it run wild with vCPUs?

 

3)  how many vCPUs do you have assigned to the VM?

 

Dropped you a PM with the info (worried I am going off topic in your thread) .

Link to comment

Secondly, not all GPUs work properly when assigned to a guest VM.  Case in point is the GTX 550Ti from nVIDIA.  This is the biggest tease of a video card ever.  Quite simply, the first time you boot a VM with GPU pass through using the 550Ti, it works and works just fine, but upon reboot of the VM, the card fails to continue working.  The card is just not supporting a bus level reset, which is needed to "cycle" the card when a VM is rebooted.  We've tried a number of advanced tactics to "trick" this card but in the end, the card just won't come back after the host is rebooted.  So why did we spend so much time on this card?  Because it's not the only one that will suffer from this, so we wanted to see what we could do to resolve it.

 

jonp, appreciate all the hard work and the creativity how you are pushing the limits of unRAID. Can't wait for unVM to be available.

 

A remark related to the display adapters: shouldn't these adapters not working properly: http://wiki.xen.org/wiki/Xen_VGA_Passthrough_Tested_Adapters

 

In particular NVIDIA is very strict and I explicitly changed my Quadro 600 against a Quadro 2000 because of this:

Note that Nvidia officially supports only Quadro FX 3800, 4800 and 5800 for graphics passthrough usage (they've tested and verified their binary drivers for these graphics cards in combination with graphics passthrough).

In addition, Nvidia lists the following graphics adapters as "Multi-OS" capable: Quadro 6000, 5000, 4000, 2000. "Multi-OS" allows VGA passthrough to fully virtualized guests. Note: The Nvidia Quadro 600 is not supported.

Link to comment

Secondly, not all GPUs work properly when assigned to a guest VM.  Case in point is the GTX 550Ti from nVIDIA.  This is the biggest tease of a video card ever.  Quite simply, the first time you boot a VM with GPU pass through using the 550Ti, it works and works just fine, but upon reboot of the VM, the card fails to continue working.  The card is just not supporting a bus level reset, which is needed to "cycle" the card when a VM is rebooted.  We've tried a number of advanced tactics to "trick" this card but in the end, the card just won't come back after the host is rebooted.  So why did we spend so much time on this card?  Because it's not the only one that will suffer from this, so we wanted to see what we could do to resolve it.

 

jonp, appreciate all the hard work and the creativity how you are pushing the limits of unRAID. Can't wait for unVM to be available.

 

A remark related to the display adapters: shouldn't these adapters not working properly: http://wiki.xen.org/wiki/Xen_VGA_Passthrough_Tested_Adapters

 

In particular NVIDIA is very strict and I explicitly changed my Quadro 600 against a Quadro 2000 because of this:

Note that Nvidia officially supports only Quadro FX 3800, 4800 and 5800 for graphics passthrough usage (they've tested and verified their binary drivers for these graphics cards in combination with graphics passthrough).

In addition, Nvidia lists the following graphics adapters as "Multi-OS" capable: Quadro 6000, 5000, 4000, 2000. "Multi-OS" allows VGA passthrough to fully virtualized guests. Note: The Nvidia Quadro 600 is not supported.

That's xen. KVM is like that song from the movie Aladdin:  "a whole new woooorrrrrlllllddd!!!"

Link to comment

I saw that - however my understanding was, that some of the engine under the hood is similar. Sorry if I caused any confusion.

 

No, totally different.  VFIO (virtual function IO) is not available for Xen.  You didn't create any confusion either.  If anything, some folks may have been thinking the same thing as you, read this thread, and now realize that where they may have struggled with Xen, they could have success with KVM.

 

To be completely transparent and fair though, sometimes the reverse is true.  I was able to get the Radeon R9 290 to pass through in Xen to a VM with drivers better than I could on KVM, but that was a while back.

Link to comment

I think I read this somewhere else but want to make sure...

 

We can have both XEN and KVM VMs on the same unRAID host at the same time, correct?

 

John

 

I don't think so.  When unRAID boots up, you select the 'normal' version (which enables KVM), or you select the XEN version.  I believe this makes them mutually exclusive.

Link to comment

someone else will have to clarify that.  I know that early on, in the first build after KVM was added, they were mutually exclusive.  if that's changed, I don't know.

 

I doubt you would want both to run even if you could. Either Xen or KVM would act as a manager between the physical hardware and the virtual machines. If KVM was trying to manage VMs and Xen was stealing hardware without KVM knowing I can only imaging bad things happening. It's like offering two two people the same management position, but not telling either that there is someone else doing the same job.

Link to comment
  • 2 months later...

Looking to whip up an openelec vm. Any gotchas or is it pretty straightforward?  Or should I wait for the unvm release?

 

If you have any success with this please let us know!  I'm pretty certain that Jon and team had to create a custom build to incorporate the needed drivers and such.

 

John

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.