Incredibly slow I/O on cache, seems to only affect VMs and Dockers


Recommended Posts

I am having an issue on my brothers server that I just can't figure out. Trying to determine if it is hardware related or not. At this point, it seems it is not, but I am running out of ideas.

 

Diagnostic file attached.

 

What is most noticeable is how sluggish his Windows VM feels. Containers are very slow to stop, update, and start.

I simulated some disk activity by coping files to and from the cache through in this manner: 

/mnt/cache/tmp - dd if=/dev/zero of=loadfile bs=1M count=4096

 

4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 0.994332 s, 4.3 GB/s

 

/mnt/cache/tmp - for i in {1..10}; do cp loadfile loadfile1; done

 

Which yields impressive speeds and very little IO wait according to top. Approximately 8-12 while running, which seems very similar to my system, which is actually running significantly slower SSDs.

 

iotop on my two unraid servers rarely yield an IO percentage on an individual process above single digits. Even when it does, it will only be present for one refresh and then gone and settled back down. Under normal operating circumstances, mine is typically showing sub 1 percent figures on all process'.

Now on my brothers, as soon as I start containers that start messing with disk access like say, duplicati, or even leaving all of that shutdown and just start a single Windows VM, iotop goes crazy with huge percentage numbers on each line, typically between 25-99% for all of the top talking process'.

See this screenshot as an example. This is after the system had been booted for some time and basically everything was idle:

image.png.d6f86b27e8d99b4063fadbbc5d53e162.png

 

As you can see, the actual amount of data being transferred is quite low from a throughput/MBps perspective, but the IO percentage is high. And this never changes, it looks similar to this all the time.

 

Inside Windows, it shows 100% disk activity time, with extremely low throughput:

image.png.40b026c9c08602ab4d5d60d55a35f400.png

 

This is what it looks like regardless of the Windows VM we try. His long term desktop he has used for years looks just the same as the screenshot above, which was taken from a fresh install of Windows 10 with nothing installed, no Windows updates happening, nothing, just an idle Windows 10 VM.

 

I have tried various disk cache settings for the Windows VMs, mostly testing with 'none' and the default 'writeback'. That doesn't seem to make any noticeable difference.

 

There doesn't seem to be any obvious resource contention going on. Plenty of CPU and memory is available and as I eluded to above, the disks are capable of FAR more than they are doing, generally speaking.

 

It doesn't feel like a hardware problem because of the sequential speeds I can get, but perhaps these drives are just no good for random I/O... though they claim to be.

By the way the drive in use here are a bit out of the ordinary, they are Eluktronix, PRO-X-1TB-G2. They are formatted with BTRFS and are in a mirror.

 

Anyone have any thoughts on things we could try before throwing different hardware at it? I am running out of ideas.

unraid-diagnostics-20210317-2009.zip

Edited by harshl
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.