Jump to content

big iowait issues


pappaq
Go to solution Solved by pappaq,

Recommended Posts

Hello everyone,

I'm at the end of my tether...for weeks now I've been trying to fix iowait problems with my Unraid system. Whenever I think I have found the reason for iowait, the problem reappears after a short time. I don't know what to do anymore.

 

First I changed my BTRFS cache to ZFS, which improved the performance a bit at first. (incl. upgrade from 6.9.2 to 6.12.4).

 

Then, when high iowait reappeared, I suspected my download cache SSD. I replaced it, it was a Crucial P2 with QLC memory, which was very slow when writing. Now I have a WD Black in it, which works so far so good.

 

Tonight I suddenly have completely absurd iowait numbers that I can't explain. Unraid is becoming less and less usable for me. The hours of fixing I've put in over the last few weeks are unbelievably high.

 

Please, could someone help me with this? Whatever information you need, I'll try to provide it.

image.thumb.png.d3f137e45aa7182987634129bc9bf253.png

image.png.ae5ac680da6654dd5c08dfc443705e82.png

One or more cores are always spiking to 100% because of iowait.

dringenet-ms-diagnostics-20231115-1952.zip

Edited by pappaq
Link to comment

I usually see that when my disks are spun down and something is trying to access multiple disks all at once (plex...etc).  it spikes CPU until disks are all spun up and it get the data it is looking for.  I have a script that keeps them spun up during the day, then I let them spin down at night when nobody is using it. 

 

suggestion: set spin down delay to "never".

Link to comment

image.png.29b272b922d047771319957863b65680.png

I think I've isolated one of the issues down to my EmbyServer Docker. This high number of iowait only occures, when playing a really big movie via infuse directly. The thing is, that this does not occure when I playback the file via VLC over smb...

 

Funny thing is, that the red line marks the point where I stop the playback via infuse and it really takes some time to go back to normal. I rule out transcoding, because it's a direct play.

 

So my thoughts on this is, that the use of /mnt/user is the issue? But acturally I'm using a rootshare network share which is set in the smb config of unraid to playback via VLC over smb, utalizing the /mnt/user as well...

This is the config of the EmbyServer Docker:

image.thumb.png.6f549ab1c6b7ebd13b327ff321b33a2f.png

 

Maybe it's not the fault on unraid side but on Emby, occupying the system even after the playback has stopped?

 

Does anybody has a thought on this?

Edited by pappaq
Link to comment

This is iowait when I invoke the mover to move files from my encrypted ZFS cache pool to my encrypted XFS array:

image.thumb.png.fda787e23166d080508a732b16baa417.png

I could understand that CPU usage is high because auf decrypting and encrypting while moving, but I do not understand the high iowait?!

 

Could a reason for this be using too many PCIe lanes of the system?

Quote

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
                LnkSta: Speed 8GT/s, Width x4
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
                LnkSta: Speed 5GT/s, Width x8
00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
                LnkSta: Speed 2.5GT/s, Width x4
00:03.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
                LnkSta: Speed 8GT/s, Width x4
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode])
                LnkSta: Speed 8GT/s, Width x16
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode])
                LnkSta: Speed 8GT/s, Width x16
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 xHCI Compliant Host Controller (rev 01) (prog-if 30 [XHCI])
                LnkSta: Speed 8GT/s, Width x4
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller (rev 01) (prog-if 01 [AHCI 1.0])
                LnkSta: Speed 8GT/s, Width x4
01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge (rev 01) (prog-if 00 [Normal decode])
                LnkSta: Speed 8GT/s, Width x4
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
                LnkSta: Speed 2.5GT/s, Width x1
02:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
                LnkSta: Speed 5GT/s, Width x1
02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
                LnkSta: Speed 5GT/s, Width x4
03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
                LnkSta: Speed 2.5GT/s, Width x1
04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11) (prog-if 01 [AHCI 1.0])
                LnkSta: Speed 5GT/s, Width x1
pcilib: sysfs_read_vpd: read failed: No such device
05:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
                LnkSta: Speed 5GT/s, Width x4 (downgraded)
06:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) (prog-if 00 [VGA controller])
                LnkSta: Speed 5GT/s, Width x8
06:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
                LnkSta: Speed 5GT/s, Width x8
07:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])
                LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded)
07:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
                LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded)
08:00.0 Non-Volatile memory controller: Sandisk Corp Western Digital WD Black SN850X NVMe SSD (rev 01) (prog-if 02 [NVM Express])
                LnkSta: Speed 8GT/s (downgraded), Width x4

09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
                LnkSta: Speed 8GT/s, Width x16
09:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor (PSP) 3.0 Device
                LnkSta: Speed 8GT/s, Width x16
09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller (prog-if 30 [XHCI])
                LnkSta: Speed 8GT/s, Width x16
0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function
                LnkSta: Speed 8GT/s, Width x16
0a:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) (prog-if 01 [AHCI 1.0])
                LnkSta: Speed 8GT/s, Width x16

This is the output of:

sudo lspci -vv | grep -P "[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]|LnkSta:"

 The different LnkSta's of the blue entries are partially downgraded, but it still should be plenty of bandwidth to do the job, right?

Link to comment

It was not the solution. Currently I'm trying to pin it down by booting in safe mode.

 

I've moved the docker image and the VMs to a btrfs encrypted cache pool and the iowait ist gone again for over 12 hours now. Will keep a close eye on it.

 

Steps so far:

- Updated BIOS to newest version

- increased ZFS arc from 4 to 16GB

- moved docker image (not appdata) and VMs off of the ZFS encrypted cache pool to a btrfs encrypted cache pool

 

Link to comment

Set down the ZFS arc to 8GB because of swap doing a lot of CPU stress cause my RAM was filling up. Disabled duplicati, as it was another source of unwanted high CPU usage. Getting there. The systems seems to take array and cache loads so much better now. Pretty much no unwanted IOWAIT. Keeping an eye on it for a few days more. Maybe it is now gone for good.

  • Like 1
Link to comment
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...