Exilepc Posted May 3, 2020 Share Posted May 3, 2020 I need some help... I have been having high IOwaits for a few months now. Its driving me up the wall... Specs: Unraid system: Unraid server Pro, version 6.8.3 Model: Custom Motherboard: Supermicro - X8DT6 Processor:Intel® Xeon® CPU E5620 @ 2.40GHz HVM: Enabled IOMMU: Disabled Cache:L1-Cache = 256 kB (max. capacity 256 kB) L2-Cache = 1024 kB (max. capacity 1024 kB) L3-Cache = 12288 kB (max. capacity 12288 kB) Memory: 48 GB (max. installable capacity 384 GB) Network:bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000Mb/s, full duplex, mtu 1500 Kernel:Linux 4.19.107-Unraid x86_64 OpenSSL:1.1.1d P + Q algorithm:5892 MB/s + 8187 MB/s Steps to fix issues: I have been having a high amount of IOwait issues. I have tried everything that has been the issue for others: Add a second SSD to cache Move Deluge to its own SSD via unmounted drives Switched from Plex to Emby Added a nVidia 1660 Ti for Emby Disabled and removed Dynamix Cache Directories, waited a week and re-added it DiskSpeed docker on all drives Changed the Parity drive to Seagate Ironwolf Pro (second one still in shipping put in a HP Enterprise as temp Parity) Replaced all drives what errors (even ones with 1 error 3 drives) Disabled the specter and meltdown protections Replaced both CPU Coolers with dual noctua fans Added Memory fans to the system, to solve memory heat issues Change to a new case (Fractal design define 7lx) from 24 bay supermicro Added 8 case fans Using 18 - 3.5 drives and 5 - 2.5 drives Removed all 1 TB drives Watched NetData, htop and Glances like a hawk for any signs as to what it causing the issue I am still getting the IOwait errors messages, and the system is very slow. tower2-diagnostics-20200502-2038.zip Quote Link to comment
JorgeB Posted May 3, 2020 Share Posted May 3, 2020 Docker image is on the array, there are similar reports when this is the case, try moving it to cache. Quote Link to comment
Exilepc Posted May 3, 2020 Author Share Posted May 3, 2020 (edited) I will look into that now. Edited May 3, 2020 by Exilepc Quote Link to comment
Exilepc Posted May 3, 2020 Author Share Posted May 3, 2020 Moved docker vm to a cache only share still getting high IOwait.... Quote Link to comment
JorgeB Posted May 3, 2020 Share Posted May 3, 2020 And no difference accessing/using the dockers during those i/o waits? Quote Link to comment
Exilepc Posted May 3, 2020 Author Share Posted May 3, 2020 Most of the dockers have not been slow, only plex, smb and webgui. Parity checks and the like have been at 90-160mb/s. Quote Link to comment
IKWeb Posted March 5, 2021 Share Posted March 5, 2021 @Exilepc - Did you ever find a fix for this? I am getting the same issues, and while copying large amounts of data to the array it kills the WebUI's for the server itself, and all the docker containers. Quote Link to comment
Exilepc Posted November 1, 2021 Author Share Posted November 1, 2021 Nope, The issue has not been so pronounced until tonight... So I googled it tonight and this is one of the top posts... Its kinda sad... Quote Link to comment
dclive Posted November 28, 2021 Share Posted November 28, 2021 Having similar issues here. Sometimes my webgui just doesn't respond (meaning unraid's; Plex's and etc. are fine), and IO performance takes a negative hit. I just installed Glances (thanks!) and will monitor iowait now. Quote Link to comment
Fffrank Posted November 9, 2022 Share Posted November 9, 2022 Has anyone ever figured this out? My system appears to be getting worse. I download and process on my cache drives (4x 240gb SSD.) Then the data is moved to my media share (which also uses the cache.) During these operations my system comes to a standstill. Anything streaming off the array chokes, my Dockers become unresponsive, VMs time out and I even get ssh shells dropped. It's a dual Xenon 5660 w/ 96gb of RAM. htop doesn't show anything out of the ordinary but the unRAID Dashboard shows cpu use through the roof. I finally stumbled upon this thread and it seems I have the same iowait issues. I've got my dockers all pinned to limit what they can use. I have my docker image on the cache as well as my appdata share. Very frustrating! Quote Link to comment
Gruffydd Posted January 1, 2023 Share Posted January 1, 2023 Same problem here. My system is strong enough, however I get IOWAIT up in the 50-60% ranges. Quote Link to comment
brendan399 Posted January 9, 2023 Share Posted January 9, 2023 I have been dealing with this as well for a couple weeks. I can't seem to figure out what the exact cause is. when I check the logs is seems one or the other of my ssd cache drives resets failed scmd and I also get io errrors sector xxxx op 0x0: (read) flags 0x80700phys_seg 1 or 2 or 3 or 4 prio class 0 Quote Link to comment
Andiroo2 Posted January 17, 2023 Share Posted January 17, 2023 I'm having this issue when trying to export data from the array to another server for backup. Speeds start around 30MB/s and drop to around 5-7MB/s. IOWAIT sitting around 33.3 according to glances. No other activity on the server at the same time. What's interesting is that the array shows ~100MB/s of reads, but there is only a trickle going over the wire. It's like the system is spinning it's wheels trying to get the data ready to send, but can only send really slowly. For reference, I am "pulling" data from Unraid to macOS. I am running the rsync commands on MacOS, connected to Unraid via the network. I have been trying with rsync over SSH and just via SMB but no real difference. Quote Link to comment
Exilepc Posted January 17, 2023 Author Share Posted January 17, 2023 Sadly I have not been able to track down the issue… I wish I had an answer Quote Link to comment
dankulo Posted January 28, 2023 Share Posted January 28, 2023 It doesn't seem like there is a solution to the problem at all. I've been dealing with this for months and it only gets worse. Quote Link to comment
lonnie776 Posted January 31, 2023 Share Posted January 31, 2023 (edited) I have a large unraid server with 17 array drives 2 parity and RAID1 SSD Cache Pool. On that I run 3 VM's and up to 12 dockers some of which are io intensive. I often see high iowait % however in my case I know that it's because I am simply demanding too much from my disks, which would be fine if it didn't cripple every other part of the system. Years back I found a way around iowait consuming the whole CPU. Linux allows you to isolate CPU cores from the system so you can dedicate them to other tasks (VM/Docker). This way when the system is crippled by iowait, your VM's and Docker containers can continue to function happily on the isolated CPU cores, although IO may still suffer if accessing the array/pool causing the iowait. As I understand it, my situation is different than yours, but hopefully this trick will still help you work around some of the headaches. In order to isolate the cores, you have to go into your flash drive and edit /syslinux/syslinux.cfg Here is my default boot mode which I have edited to include "append isolcpus=4-9, 14-19". This option will force the system to run on 0-3, 10-13 leaving the isolated cores idle. label Unraid OS menu default kernel /bzimage append isolcpus=4-9,14-19 initrd=/bzroot I have an old hyperthreaded 10 core Xeon so I have 20 virtual cores 0-19. I chose to keep 4 cores for my system as plugins still run on the system, and I have isolated 6 cores for VM's and Docker containers. For this to work properly you must pin each VM and Docker to the isolated cores of your choosing. Now when you are plagued by iowait, your Dockers and VM's will still have processing power. I hope this helps. Edit: After looking into this a bit further, I found that this has been implemented in the GUI. Now you simply go to Settings->CPU Pinning. Edited February 1, 2023 by lonnie776 Quote Link to comment
Andiroo2 Posted January 31, 2023 Share Posted January 31, 2023 Pinning CPUs makes sense in your case where the performance of other things isn’t acceptable when the issue occurs. My experience is different though…I get the high IOWait but the rest of the system doesn’t hang. Quote Link to comment
JPAchilles Posted March 2, 2023 Share Posted March 2, 2023 Bumping this thread. My server's been rendered unusable even without a parity drive and with docker disabled. Tried all the steps in the OP, and they helped, but not enough. nas-diagnostics-20230301-1643.zip Quote Link to comment
maust Posted July 26, 2023 Share Posted July 26, 2023 (edited) More or less have followed the same thing as OP with similar "results" but no permanent fix. Issue became much more apparent after upgrading to 6.11.5 from 6.9. Consistently sitting between 5-10% IOWAIT. Anytime qbittorrent or any other container really does any large scale file operations it shoots the IOWAIT upwards to 30-50%, sometimes sitting there for hours at a time. This causes all network traffic to grind to a halt. Some of what I have tried: Swapped Cache drives. Tried adding more cache drives Tried Splitting the workloads between cache drives. Switched all cache drives from BTRFS to XFS (greatly improved the baseline IOWait, but issues continue to persist) Switched Docker.img from BTRFS to XFS (again, improved IOWait issues but they continue to persist) Rebuild Docker.img from 150GB -> 50GB after fixing naughty containers (no performance changes) Ensured docker containers were not writing to Docker.img after build (no performance changes) Switched Docker.img from XFS to Directory (no change) Tried adding better, faster, pool drives (no perceived difference) Replace both CPUs with E5-2650v2 from E5-2650 What I am working to try: Replacing all RAM with higher capacity sticks (128GB -> 384GB) Things that really trigger the IOWait: Qbittorrent Cache flushing (get a better IO and system performance if all qbittorrent caching is disabled) Mover (even with/without Nice) Radarr/Sonarr (file analysis) Sonarr (Every 30 seconds on Finished Download Check, typically causes 5-6% IOWait every 30 seconds for ~10 seconds) Sabnzbd (no longer an issue once Nice was adjusted) Unzip/unrar (any kind, have to be incredibly harsh on the nice values to get it to not choke the server) NFSv3 (full stop, any remote NFSv3 actions cause massive IOWait, talking upwards of 40-50% IOWait on just READ ONLY) BTRFS (literally anything BTRFS causes issue on my R720XD, I do not experience this issue on my other servers) Specs: R720XD E5-2650V2 128GB DDR3-1600 MHz Parity - 2 Drives 16TB WD Red 18TB WD Gold Array (not including Parity) - 16 Drives - 236TB Usable, all tested with DiskSpeed, monitored with Seagate 16TB Exos x7 WD x2 14TB x4 12TB x3 Cache Pools Team 1TB (Weekly Appdata Backups) P31 1TB (Appdata) 1TB - WD Black NVME (Blank) 4TB - Samsung 870 EVO (for download caching) Dell Compellent SC200 Dell 165T0 BROADCOM 57800S QUAD PORT SFP+ Dell H200 6Gbps HBA LSI 9211 Working Hypothesis: Monitoring with NetData. Noticing IOWait jumps typically correlate with Memory Writeback. Specifically Dirty Memory Writeback. All my research comes back to either bad/lacking ram (which I will be swapping all of them out to 384GB) or Tunables need further adjustment. Edited August 2, 2023 by maust 1 Quote Link to comment
DanielPT Posted August 2, 2023 Share Posted August 2, 2023 On 7/26/2023 at 5:37 PM, maust said: More or less have followed the same thing as OP with similar "results" but no permanent fix. Issue became much more apparent after upgrading to 6.11.5 from 6.9. Consistently sitting between 5-10% IOWAIT. Anytime qbittorrent or any other container really does any large scale file operations it shoots the IOWAIT upwards to 30-50%, sometimes sitting there for hours at a time. This causes all network traffic to grind to a halt. This behavior also occurs anytime Mover runs as well. Additionally, tried swapping Cache drives, tried adding more cache drives, tried splitting the load between multiple cache drives. Switched from BTRFS to XFS (which seemed to help throughput and lowered the baseline IOWAIT but it persists). Things that really trigger the IOWait: Qbittorrent Cache (somehow get a better IO and system performance if all qbittorrent caching is disabled) Mover (even with/without Nice) Radarr (file analysis) Sabnzbd (no longer an issue once Nice was adjusted) Interestingly, I only seem to have this issue on my main Poweredge R720XD Unraid 6.11.5 Server, none of my other servers that are still running 6.10 are experiencing this issue. I think im having the same issue with 6.11.5. But i dont know how to see the IOwaits. Think im gonna try Netdata. But i also use my cache drive for Qbittorent and when that is slamming the Samsung SSD pool i get unresponsive dockers. Quote Link to comment
Mbeco Posted August 17, 2023 Share Posted August 17, 2023 On 7/26/2023 at 5:37 PM, maust said: ... Working Hypothesis: Monitoring with NetData. Noticing IOWait jumps typically correlate with Memory Writeback. Specifically Dirty Memory Writeback. All my research comes back to either bad/lacking ram (which I will be swapping all of them out to 384GB) or Tunables need further adjustment. As I am just another one in this long list: Could you kindly guide me to how you monitored Dirty Memory Writeback? That is a bit out of my depth, but I have tried almost everything else suggested in this thread and many others that I read over the past months. So if nothing else, maybe I could at least support your working hypothesis Quote Link to comment
DanielPT Posted November 8, 2023 Share Posted November 8, 2023 So nobody have solved this? When Qbitorrent is doing a "little" work all my dockers get unresponsive. I even enabled "exclusive shares" to appdata on my 2 x Samsung SSDs Quote Link to comment
JorgeB Posted November 8, 2023 Share Posted November 8, 2023 And the torrents/downloads are also going to exclusive shares? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.