Best Raid 0 NVME drive solution for hosting VM


Zoroeyes

Recommended Posts

Ok, so I've been around and around trying to decide on the best route to take to make the best use of my new unRaid server and each avenue I've taken so far has hit some kind of wall.

 

The setup consists of an AMD Threadripper 2950x (16 core) CPU, 32GB ram, 24 spinning disks (LSI 9305-24i  HBA to 24 port backplane) and 5 NVME 1TB drives (Adata SX8200 pros). 4 of those NVME drives are on a 16x PCIE adaptor card so you can effectively stripe them if you want. The 5th NVME drive is on the mobo.

 

When I initially specced the machine in my head, it was all sounding great. AMD was providing the capability (with Threadripper) to do NVME raid out of the box, so I was going to configure the drives on the PCIE adaptor as a super-fast, Raid 0 for hosting VMs and to act as a cache drive for unRaid. However, I suppose I should have done some more reading before getting to this point, but excitement got the better of me I suppose.

 

The reality is, the AMD raid solution is (mostly) Windows only, with no native support in unRaid (not sure what I was expecting to find tbh). Ok, no worries, I'll use UEFI boot, configure the AMD RaidXpert2 volumes in BIOS, pass through the raw drive controllers to the VM and just employ AMD raid at a VM level, with the Windows drivers. Ah, well, with UEFI enabled, my unRaid-GUI won't fire up (the old blinking cursor issue - reported) and there is a problem with the controllers on my NVME drives that prevents them being passed through to VMs, excellent. So I've hit two walls so far!.

 

So, some further research and other forum members helpfully suggested I create an NVME (Raid 0) cache pool in unRaid, format it as BTRFS and use that for my VMs and cache, great, sounds like we're getting somewhere, but wait, a bit of reading later and it seems BTRFS performance can be very poor compared to most other file systems and certainly wouldn't do my investment in NVME drives the justice they deserve. AMD Raid 0 over 4 of these drives can achieve in the region of 4GB/s write speed, whereas similar - online - tests have reported BTRFS maxing out at around 1.5GB/s. Obviously the proof is in the pudding, but I'd rather not invest a bunch of time setting something up just to confirm my disappointment.

 

I even looked into doing dual-boot Windows/unRaid, so I could get all the performance out of the hardware when on Windows and benefit from unRaid when I needed those features, but then I lose the ability to copy from the windows machine to unRaid, which is a massive requirement for me.

 

So there I am, pulling my hair out, not knowing which direction to take. A final approach I've looked into is to use the ZFS plugin to create a striped ZFS pool for the 4x NVME drives to host my VM, in the hope the ZFS performance will be better than BTRFS. I'd then use the 5th NVME drive as a dedicated cache (XFS formatted) so I can fast-copy from Windows VM to the underlying unRaid.

 

But am I looking at this right, is ZFS a good choice for the striped-raid to host the VM, is it faster than BTRFS. If configured in a striped ZFS raid, will the pass-through issue with my specific NVME drives be mitigated as it's effectively using an existing volume? Also, if I'm not going to use AMD raid (because I'm using ZFS instead), can I go back to normal (non-UEFI) BIOS and still host Windows VMs? Is the UEFI requirement of Windows 10 dealt with by unRaid's virtual BIOS settings in the VM?

 

I've talked about a lot of stuff up there, and I'm most likely mis-informed on most of it, so please don't flame me if I've offended anyone who prefers a certain technology over another. I'm just trying to make sense of what my options are and understand the best way to extract maximum performance out of a considerable investment in (what should be) really fast hardware.

 

Thanks in advance for any useful info you can provide. 

 

 

Link to comment
2 hours ago, Zoroeyes said:

can I go back to normal (non-UEFI) BIOS and still host Windows VMs? Is the UEFI requirement of Windows 10 dealt with by unRaid's virtual BIOS settings in the VM?

You can boot up Unraid in UEFI or non-UEFI mode. Doesn't matter, you're still be able to setup your VMs as UEFI or legacy. Couple people had issues with Unraid booting in UEFI mode when it comes to PCI device passthrough to VMs. I'am running Unraid in non-UEFI mode since day one on a 1950x with 2 VMs each with it's own GPU. Works fine.

 

The RAID questions you have and the AMD Raid idea, I personally won't use this solution. I looked into it when the plattform came out, read a couple articles and watched a couple videos and from all I remember it was a mess. First of all, there was no way to configure the raid in the BIOS. You had to install the windows only software which installed a webserver running all the time and as some reports showed the webserver was vulnerable and easy to "hack". Not sure if this was fixed, but it was a huge tradeoff for lots of people. The benchmarks from I think Wendell from Level1techs showed that you get almost 4 times the read and write speeds from 4 drives, BUT you won't see any big improvements in daily use compared to a single NVME. Sure if you always transfer tons of data around, Video editing for example this is a improvement, but the general user don't really need this.

 

I have a single NVME with all my dockers on it and a couple VMs. Most of the VMs are off and only 1-2 running all the time. A 3rd VM, my daily driver is a Windows 10 VM with a dedicated NVME passed through. If I want, I can restart the server and directly boot from that NVME. In case something is wrong with unraid, I still can use the PC with all the RAM and doubled core count. In general 8 cores 16 threads 16GB RAM and a 1080ti in a VM is more than enough for anything I do. Office, CAD software, Gaming.

 

Your usecase might be different, but if you're the only user using the system especially VMs, you don't really benefit from a RAID0 for your VMs. Software won't start faster and even the difference in games compared to a single SSD isn't really noticable. Keep in mind, if you're working inside a VM, possible other VMs will idle anyways and won't really use much ressources. If you need  the space from 4 NVMEs go with BTRFS Raid0 and if you want some more safety I would mirror the drives. Keep in mind, in any scenario you will have to do backups anyway to another storage the array or an external drive or over the network and there you have the next bottleneck and can't really benefit from the speed.

 

In your case I would go with unraids default options and manage the 4 drives by unraid without an extra layer of possible failure like an extra raid that you can't really manage or an extra plugin like ZFS support.

 

Ask yourself, are you really need the speed from 4 NVME drives all the time? Are you constantly reading and writing large amounts of data? Using a single NVME as cache for the shares as example and I transfer some files onto a share I get the full write speed of I single NVME. General usecase for daily use are couple small documents, maybe pictures from the phone or video. Is the source fast enough to provide the same speeds as the cache? Propably not.

 

Another story is if you have 10 people on the network using the server all connected via 10gig network. Accessing photos or documents, you don't need the speed. If they all transfering large amounts of data to the server or ready from it, ok I go with it. 😁

Link to comment
2 hours ago, Zoroeyes said:

So there I am, pulling my hair out, not knowing which direction to take. A final approach I've looked into is to use the ZFS plugin to create a striped ZFS pool for the 4x NVME drives to host my VM, in the hope the ZFS performance will be better than BTRFS. I'd then use the 5th NVME drive as a dedicated cache (XFS formatted) so I can fast-copy from Windows VM to the underlying unRaid.

 

But am I looking at this right, is ZFS a good choice for the striped-raid to host the VM, is it faster than BTRFS. If configured in a striped ZFS raid, will the pass-through issue with my specific NVME drives be mitigated as it's effectively using an existing volume? Also, if I'm not going to use AMD raid (because I'm using ZFS instead), can I go back to normal (non-UEFI) BIOS and still host Windows VMs? Is the UEFI requirement of Windows 10 dealt with by unRaid's virtual BIOS settings in the VM?

 

I've talked about a lot of stuff up there, and I'm most likely mis-informed on most of it, so please don't flame me if I've offended anyone who prefers a certain technology over another. I'm just trying to make sense of what my options are and understand the best way to extract maximum performance out of a considerable investment in (what should be) really fast hardware.

 

Thanks in advance for any useful info you can provide. 

 

 

 

Firstly, you have a very important misunderstanding about NVMe performance.

"Hosting" a VM on a RAID-0 pool (by that I assume you mean having vdisk on the RAID-0 pool), by itself, will provide you with virtually no noticeable benefit (other than perhaps shaving a few seconds off boot time - and only "perhaps").

The benefit of NVMe (RAID-0 or not) is heavily workload dependent and so you need to first define what your workload is. For example:

  • Booting a VM only benefit from it once (at boot time)
  • Photo editing is a workload that would benefit from non-QLC NVMe drives (QLC is right now terrible with random I/O)
  • Video editing would benefit from NVMe even QLC (more sequential)
  • Gaming almost doesn't benefit at all from NVMe (as compared to a good SATA SSD).

So I would say your effort to get NVMe RAID-0 to work (without first defining your workload) is rather misguided. Yes you will have a bit of bragging rights with benchmarks but is IRL more pain than what it's worth.

 

 

 

Secondly, you mentioned "VMs" i.e. plural. Do ALL of your VM's really need NVMe RAID-0 performance?

Outside of Linus 7-gamer-1-PC kind of projects, very few users would ever need multiple high-performance VM's (that simultaneously need the benefit of NVMe, let alone NVMe RAID-0). Specifically for Threadripper, there are only 3 general kinds of VM:

  • Workstation VM (more cores for best performance)
  • Gaming VM (only cores from the same die connected to the GPU slot for lowest latency).
  • Miscellaneous VM (mostly accessed remotely for miscellaneous uses such a proxy, router etc.)

Of those 3 types, only the 1st type (assuming you have defined the workload as per my first major point) actually would benefit from NVMe. And most users don't need more than 1 workstation VM so why don't you just pass-through the 4 NVMe on the x16 adapter directly to the VM and then use Windows-based striping?

Based on a recent test by Puget system (https://www.pugetsystems.com/labs/articles/NVMe-RAID-0-Performance-in-Windows-10-Pro-1369/), Windows 10 striping performance is pretty good, just that you can't boot from it (which is easy since your VM can boot from a vdisk img from the 5th NVMe if that's what you want).

 

 

Thirdly, and I can't emphasize this enough, RAID-0 is a terrible idea!

The pursue of pure (theoretical) speed at the risk of complete data loss is not advisable to 99.99% of users.

The 0.01% are mostly tech Youtubers who don't use it as a daily driver.

 

Link to comment

Thanks for the great feedback guys, really appreciate you taking the time to comment.

 

It's good to hear that I don't need UEFI boot to run UEFI on the VM so that will hopefully fix one problem. I may also give BTRFS a try (although ZFS is still temping). Either way, the idea would be to regularly snapshot the VM to array storage.

 

I'd love to pass through the NVMEs to the VM for 'bare metal' performance, but I understand the type of drive that I have has issues with hardware passthrough on unraid, which is a shame.

 

As for the workload, I'm a developer that also does some 3D modelling/rendering and CAM (computer aid manufacturing) in my spare time. I've spent so many years spending huge amounts of money on super fast CPUs, ram, GPUs etc. only to see them snoozing for most of the day while they sit around waiting to be supplied the the file or the data they asked for from the slow storage. So when I built this rig I decided that storage was going to get as much investment as the rest, hence the NVME overkill! In truth, no I won't get any payback for having such fast storage, but then, as an enthusiast, I honestly don't care about return on investment. I mean, is there ever a use-case for overclocking a CPU to 6GHz on liquid N2, nope, but we do it because it's fun! All I want to know is that, given the potential performance of my kit, what's the best approach to extract the most of that potential performance. 

Link to comment
5 hours ago, Zoroeyes said:

the idea would be to regularly snapshot the VM to array storage.

You can use the BTRFS snapshot feature if the source and the target both use BTRFS. I have a extra 1.5TB drive where I snapshot all my VMs to on a daily basis. No issue in over a year and recovery also newer showed an issue. A small script run via the user script plugin and scheduled via cron does all the work. Every morning I get a Telegram message from my server that the job was succesful. 😁

 

From my experience using CAD software there isn't that much of a difference NVME compared to SSD. Some projects are 13-15GB in size and no matter which underlying storage I use, I have to wait to load it into the software. RAM in most scenarios is more important for such large projects if the software isn't cache something to the disk. If you have thousands or even houndreds of thousands objects to load you have to wait. 10-15min load times is normal for some of my projects and this is almost the same on SSD storage compared to a single NVME. HDD is a different story. But like with everything, it's different with every software.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.