Extremely slow VM disk performance on SSD (cache)


Go to solution Solved by Vr2Io,

Recommended Posts

(Cross-posted to Reddit.)

 

Hey everyone, 

 

The full story is over on Reddit, but I'll recap here: 

 

I have a Dell R520 running unRAID 6.11.5 with:

 

  • 4 x 4 TB array SAS disks
  • 2 x 4 TB parity SAS disks
  • 2 x 1 TB SSD (SATA) cache disks
  • 2 x Xeon E5-2450 v2 CPUs (32 cores across two physical sockets)
  • 160 GB DDR3 memory

 

In my mind, this setup should be plenty fast for running multiple VMs, Docker containers, and so on. This is all part of a home lab setup, so I have a Linux VM, along with a few Windows Server/Windows 10 VMs.

 

Recently, I upgraded my cache drives from 200 GB Intel SSDs to 1 TB Micron M600 SSDs. I was previously running several VMs on my array and, while performance wasn't great, it was moderately usable. I was eager to move all my VMs on to my cache drives, alongside my Docker containers, for increased speed and room to add even more VMs.

 

Since I've moved the VMs to my new cache drives, read/write speeds are unusably slow. I'm talking 2-3 MBps (bytes, not bits). On the Windows VMs, Task Manager reports 100% disk active time almost all the time, with response times often in the hundreds to thousands of milliseconds. That's really, really bad. If I run one VM (Linux) and try booting three other Windows VMs up, Linux slows to a crawl and almost completely stops responding.

 

After witnessing a RedHat-based VM take 9 mins 44 secs to boot from a powered off state, I opened the logs window just for the heck of it... and saw this:

 

image.thumb.png.8cdc1c895622dfadbd01235bcda66243.png

 

I had already run a BTRFS filesystem check when the array was started in maintenance mode, but didn't see any issues. I also don't see any errors listed next to the cache pool drives:

 

image.thumb.png.7cce24ac8dd64d4fc34140dd2b62b38b.png

 

I'm in the process of moving my domains, appdata, and system shares to the array to see if performance is any better there. If it is, then I suppose I'll be replacing these SSD cache drives.

 

Would you (like me) suspect a hardware issue (bad SSD) at this point?

ih-nas01-diagnostics-20230724-1048.zip

Edited by crescentwire
Added diagnostics file
Link to comment

Thank you, that seems to match with what I'm seeing.

 

I stopped the array, unassigned sdc (the cache pool drive showing errors), but kept sdh. After starting the array, speeds are now in the hundreds of MB/s, which is exactly what I would expect to see.

 

So, sdc is definitely a bad drive. Thank you for the help and confirmation!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.