Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Tracking down IOwait cause

Featured Replies

I need some help... I have been having high IOwaits for a few months now. Its driving me up the wall...

 

Specs:

Unraid system: Unraid server Pro, version 6.8.3

Model: Custom

Motherboard: Supermicro - X8DT6

Processor:Intel® Xeon® CPU E5620 @ 2.40GHz

HVM: Enabled

IOMMU: Disabled

Cache:L1-Cache = 256 kB (max. capacity 256 kB)

L2-Cache = 1024 kB (max. capacity 1024 kB)

L3-Cache = 12288 kB (max. capacity 12288 kB)

Memory: 48 GB (max. installable capacity 384 GB)

Network:bond0: fault-tolerance (active-backup), mtu 1500

eth0: 1000Mb/s, full duplex, mtu 1500

Kernel:Linux 4.19.107-Unraid x86_64

OpenSSL:1.1.1d

P + Q algorithm:5892 MB/s + 8187 MB/s

 

Steps to fix issues:

I have been having a high amount of IOwait issues. I have tried everything that has been the issue for others:

  • Add a second SSD to cache
  • Move Deluge to its own SSD via unmounted drives
  • Switched from Plex to Emby
  • Added a nVidia 1660 Ti for Emby
  • Disabled and removed Dynamix Cache Directories, waited a week and re-added it
  • DiskSpeed docker on all drives
  • Changed the Parity drive to Seagate Ironwolf Pro (second one still in shipping put in a HP Enterprise as temp Parity)
  • Replaced all drives what errors (even ones with 1 error 3 drives)
  • Disabled the specter and meltdown protections
  • Replaced both CPU Coolers with dual noctua fans
  • Added Memory fans to the system, to solve memory heat issues
  • Change to a new case (Fractal design define 7lx) from 24 bay supermicro
    • Added 8 case fans
    • Using 18 - 3.5 drives and 5 - 2.5 drives
  • Removed all 1 TB drives
  • Watched NetData, htop and Glances like a hawk for any signs as to what it causing the issue

 

I am still getting the IOwait errors messages, and the system is very slow.

Screen Shot 2020-05-02 at 8.27.14 PM.png

tower2-diagnostics-20200502-2038.zip

  • Community Expert

Docker image is on the array, there are similar reports when this is the case, try moving it to cache.

  • Author

I will look into that now.

Edited by Exilepc

  • Author

Moved docker vm to a cache only share still getting high IOwait....

  • Community Expert

And no difference accessing/using the dockers during those i/o waits?

  • Author

Most of the dockers have not been slow, only plex, smb and webgui. Parity checks and the like have been at 90-160mb/s. 

  • Author

561952334_ScreenShot2020-05-03at4_45_48PM.png.4e8a3f5724dfd1858f90b88f0fbf9955.png1461636339_ScreenShot2020-05-03at4_45_49PM.thumb.png.178b4324bf2e3ef0acad9d36091e2845.png564829570_ScreenShot2020-05-03at4_45_53PM.thumb.png.8dc84ca036c4085e3d24bd03ee5859b9.png

  • Author

Screen Shot 2020-05-04 at 3.16.46 PM.png

Screen Shot 2020-05-04 at 3.16.41 PM.png

  • 10 months later...

@Exilepc - Did you ever find a fix for this? I am getting the same issues, and while copying large amounts of data to the array it kills the WebUI's for the server itself, and all the docker containers. 

  • 7 months later...
  • Author

Nope, The issue has not been so pronounced until tonight... So I googled it tonight and this is one of the top posts... Its kinda sad...

  • 4 weeks later...

Having similar issues here.  Sometimes my webgui just doesn't respond (meaning unraid's; Plex's and etc. are fine), and IO performance takes a negative hit.  I just installed Glances (thanks!) and will monitor iowait now. 

  • 11 months later...

Has anyone ever figured this out? My system appears to be getting worse. I download and process on my cache drives (4x 240gb SSD.) Then the data is moved to my media share (which also uses the cache.)

 

During these operations my system comes to a standstill. Anything streaming off the array chokes, my Dockers become unresponsive, VMs time out and I even get ssh shells dropped.

 

It's a dual Xenon 5660 w/ 96gb of RAM.

 

htop doesn't show anything out of the ordinary but the unRAID Dashboard shows cpu use through the roof. I finally stumbled upon this thread and it seems I have the same iowait issues.

 

I've got my dockers all pinned to limit what they can use. I have my docker image on the cache as well as my appdata share. 

 

Very frustrating!

  • 1 month later...

Same problem here. My system is strong enough, however I get IOWAIT up in the 50-60% ranges.

  • 2 weeks later...

I have been dealing with this as well for a couple weeks. I can't seem to figure out what the exact cause is. when I check the logs is seems one or the other of my ssd cache drives resets failed scmd and I also get io errrors sector xxxx op 0x0: (read)  flags 0x80700phys_seg 1 or 2 or 3 or 4 prio class 0

I'm having this issue when trying to export data from the array to another server for backup.  Speeds start around 30MB/s and drop to around 5-7MB/s.  IOWAIT sitting around 33.3 according to glances.  No other activity on the server at the same time.  

 

What's interesting is that the array shows ~100MB/s of reads, but there is only a trickle going over the wire.

 

image.thumb.png.296f4b8bfd61d7c89845a7935de0f76d.png

 

image.thumb.png.26f3970252036df8196214e3b8156d6e.png

 

It's like the system is spinning it's wheels trying to get the data ready to send, but can only send really slowly. For reference, I am "pulling" data from Unraid to macOS.  I am running the rsync commands on MacOS, connected to Unraid via the network.  I have been trying with rsync over SSH and just via SMB but no real difference.

 

image.thumb.png.4b06fe326adb280930af20b5bb3b5229.png

 

  • Author

Sadly I have not been able to track down the issue… I wish I had an answer

  • 2 weeks later...

It doesn't seem like there is a solution to the problem at all. I've been dealing with this for months and it only gets worse.

I have a large unraid server with 17 array drives 2 parity and RAID1 SSD Cache Pool. On that I run 3 VM's and up to 12 dockers some of which are io intensive. I often see high iowait % however in my case I know that it's because I am simply demanding too much from my disks, which would be fine if it didn't cripple every other part of the system.

 

Years back I found a way around iowait consuming the whole CPU. Linux allows you to isolate CPU cores from the system so you can dedicate them to other tasks (VM/Docker). This way when the system is crippled by iowait, your VM's and Docker containers can continue to function happily on the isolated CPU cores, although IO may still suffer if accessing the array/pool causing the iowait.

 

As I understand it, my situation is different than yours, but hopefully this trick will still help you work around some of the headaches.

 

In order to isolate the cores, you have to go into your flash drive and edit /syslinux/syslinux.cfg

 

Here is my default boot mode which I have edited to include "append isolcpus=4-9, 14-19". This option will force the system to run on 0-3, 10-13 leaving the isolated cores idle.

 

label Unraid OS
  menu default
  kernel /bzimage
  append isolcpus=4-9,14-19 initrd=/bzroot

 

I have an old hyperthreaded 10 core Xeon so I have 20 virtual cores 0-19. I chose to keep 4 cores for my system as plugins still run on the system, and I have isolated 6 cores for VM's and Docker containers. For this to work properly you must pin each VM and Docker to the isolated cores of your choosing.

 

Now when you are plagued by iowait, your Dockers and VM's will still have processing power.

 

I hope this helps.

 

 

Edit: After looking into this a bit further, I found that this has been implemented in the GUI. Now you simply go to Settings->CPU Pinning.

Edited by lonnie776

Pinning CPUs makes sense in your case where the performance of other things isn’t acceptable when the issue occurs. My experience is different though…I get the high IOWait but the rest of the system doesn’t hang. 

  • 1 month later...

Bumping this thread. My server's been rendered unusable even without a parity drive and with docker disabled. Tried all the steps in the OP, and they helped, but not enough.

nas-diagnostics-20230301-1643.zip

  • 4 months later...

More or less have followed the same thing as OP with similar "results" but no permanent fix.  Issue became much more apparent after upgrading to 6.11.5 from 6.9.

 

Consistently sitting between 5-10% IOWAIT.  Anytime qbittorrent or any other container really does any large scale file operations it shoots the IOWAIT upwards to 30-50%, sometimes sitting there for hours at a time.  This causes all network traffic to grind to a halt.

 

Some of what I have tried:

 

  • Swapped Cache drives. 
  • Tried adding more cache drives
  • Tried Splitting the workloads between cache drives.
  • Switched all cache drives from BTRFS to XFS (greatly improved the baseline IOWait, but issues continue to persist)
  • Switched Docker.img from BTRFS to XFS (again, improved IOWait issues but they continue to persist)
  • Rebuild Docker.img from 150GB -> 50GB after fixing naughty containers (no performance changes)
  • Ensured docker containers were not writing to Docker.img after build (no performance changes)
  • Switched Docker.img from XFS to Directory (no change)
  • Tried adding better, faster, pool drives (no perceived difference)
  • Replace both CPUs with E5-2650v2 from E5-2650

 

What I am working to try:

  • Replacing all RAM with higher capacity sticks (128GB -> 384GB)

 

Things that really trigger the IOWait:

  • Qbittorrent Cache flushing (get a better IO and system performance if all qbittorrent caching is disabled)
  • Mover (even with/without Nice)
  • Radarr/Sonarr (file analysis)
  • Sonarr (Every 30 seconds on Finished Download Check, typically causes 5-6% IOWait every 30 seconds for ~10 seconds)
  • Sabnzbd (no longer an issue once Nice was adjusted)
  • Unzip/unrar (any kind, have to be incredibly harsh on the nice values to get it to not choke the server)
  • NFSv3 (full stop, any remote NFSv3 actions cause massive IOWait, talking upwards of 40-50% IOWait on just READ ONLY)
  • BTRFS (literally anything BTRFS causes issue on my R720XD, I do not experience this issue on my other servers)

 

Specs:

  • R720XD
  • E5-2650V2
  • 128GB DDR3-1600 MHz
  • Parity - 2 Drives
    • 16TB WD Red
    • 18TB WD Gold
  • Array (not including Parity) - 16 Drives - 236TB Usable, all tested with DiskSpeed, monitored with  
    • Seagate 16TB Exos x7
    • WD x2
    • 14TB x4
    • 12TB x3
  • Cache Pools
    • Team 1TB (Weekly Appdata Backups)
    • P31 1TB (Appdata)
    • 1TB - WD Black NVME (Blank)
    • 4TB - Samsung 870 EVO (for download caching)
  • Dell Compellent SC200
  • Dell 165T0 BROADCOM 57800S QUAD PORT SFP+
  • Dell H200 6Gbps HBA LSI 9211

 

Working Hypothesis:

Monitoring with NetData.  Noticing IOWait jumps typically correlate with Memory Writeback. Specifically Dirty Memory Writeback. All my research comes back to either bad/lacking ram (which I will be swapping all of them out to 384GB) or Tunables need further adjustment.

Edited by maust

On 7/26/2023 at 5:37 PM, maust said:

More or less have followed the same thing as OP with similar "results" but no permanent fix.  Issue became much more apparent after upgrading to 6.11.5 from 6.9.

 

Consistently sitting between 5-10% IOWAIT.  Anytime qbittorrent or any other container really does any large scale file operations it shoots the IOWAIT upwards to 30-50%, sometimes sitting there for hours at a time.  This causes all network traffic to grind to a halt.


This behavior also occurs anytime Mover runs as well.

 

Additionally, tried swapping Cache drives, tried adding more cache drives, tried splitting the load between multiple cache drives.  Switched from BTRFS to XFS (which seemed to help throughput and lowered the baseline IOWAIT but it persists).

 

Things that really trigger the IOWait:
Qbittorrent Cache (somehow get a better IO and system performance if all qbittorrent caching is disabled)
Mover (even with/without Nice)

Radarr (file analysis)

Sabnzbd (no longer an issue once Nice was adjusted)


Interestingly, I only seem to have this issue on my main Poweredge R720XD Unraid 6.11.5 Server, none of my other servers that are still running 6.10 are experiencing this issue.
 

I think im having the same issue with 6.11.5. But i dont know how to see the IOwaits. Think im gonna try Netdata.

But i also use my cache drive for Qbittorent and when that is slamming the Samsung SSD pool i get unresponsive dockers.

  • 3 weeks later...
On 7/26/2023 at 5:37 PM, maust said:

...

 

Working Hypothesis:

Monitoring with NetData.  Noticing IOWait jumps typically correlate with Memory Writeback. Specifically Dirty Memory Writeback. All my research comes back to either bad/lacking ram (which I will be swapping all of them out to 384GB) or Tunables need further adjustment.

As I am just another one in this long list:

Could you kindly guide me to how you monitored Dirty Memory Writeback? That is a bit out of my depth, but I have tried almost everything else suggested in this thread and many others that I read over the past months. So if nothing else, maybe I could at least support your working hypothesis

  • 2 months later...

So nobody have solved this? 

 

When Qbitorrent is doing a "little" work all my dockers get unresponsive.

I even enabled "exclusive shares" to appdata on my 2 x Samsung SSDs

 

image.thumb.png.654acc8f7990c6fecf955437c980670f.png

  • Community Expert

And the torrents/downloads are also going to exclusive shares?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.