pappaq Posted November 15, 2023 Share Posted November 15, 2023 (edited) Hello everyone, I'm at the end of my tether...for weeks now I've been trying to fix iowait problems with my Unraid system. Whenever I think I have found the reason for iowait, the problem reappears after a short time. I don't know what to do anymore. First I changed my BTRFS cache to ZFS, which improved the performance a bit at first. (incl. upgrade from 6.9.2 to 6.12.4). Then, when high iowait reappeared, I suspected my download cache SSD. I replaced it, it was a Crucial P2 with QLC memory, which was very slow when writing. Now I have a WD Black in it, which works so far so good. Tonight I suddenly have completely absurd iowait numbers that I can't explain. Unraid is becoming less and less usable for me. The hours of fixing I've put in over the last few weeks are unbelievably high. Please, could someone help me with this? Whatever information you need, I'll try to provide it. One or more cores are always spiking to 100% because of iowait. dringenet-ms-diagnostics-20231115-1952.zip Edited November 15, 2023 by pappaq Quote Link to comment
pappaq Posted November 15, 2023 Author Share Posted November 15, 2023 It's reaching ridiculous levels... Quote Link to comment
B_Sinn3d Posted November 15, 2023 Share Posted November 15, 2023 I usually see that when my disks are spun down and something is trying to access multiple disks all at once (plex...etc). it spikes CPU until disks are all spun up and it get the data it is looking for. I have a script that keeps them spun up during the day, then I let them spin down at night when nobody is using it. suggestion: set spin down delay to "never". Quote Link to comment
pappaq Posted November 15, 2023 Author Share Posted November 15, 2023 Not a solution in my opinion. And it also appears when nothing is trying to access the disks...the wa value goes up and beyond 90, rendering the system completely unresponsive... Yesterday it worked normally for hours on end...I don't now what happened. I've changed nothing. Quote Link to comment
pappaq Posted November 16, 2023 Author Share Posted November 16, 2023 (edited) I think I've isolated one of the issues down to my EmbyServer Docker. This high number of iowait only occures, when playing a really big movie via infuse directly. The thing is, that this does not occure when I playback the file via VLC over smb... Funny thing is, that the red line marks the point where I stop the playback via infuse and it really takes some time to go back to normal. I rule out transcoding, because it's a direct play. So my thoughts on this is, that the use of /mnt/user is the issue? But acturally I'm using a rootshare network share which is set in the smb config of unraid to playback via VLC over smb, utalizing the /mnt/user as well... This is the config of the EmbyServer Docker: Maybe it's not the fault on unraid side but on Emby, occupying the system even after the playback has stopped? Does anybody has a thought on this? Edited November 16, 2023 by pappaq Quote Link to comment
pappaq Posted November 16, 2023 Author Share Posted November 16, 2023 This is iowait when I invoke the mover to move files from my encrypted ZFS cache pool to my encrypted XFS array: I could understand that CPU usage is high because auf decrypting and encrypting while moving, but I do not understand the high iowait?! Could a reason for this be using too many PCIe lanes of the system? Quote 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode]) LnkSta: Speed 8GT/s, Width x4 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode]) LnkSta: Speed 5GT/s, Width x8 00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode]) LnkSta: Speed 2.5GT/s, Width x4 00:03.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode]) LnkSta: Speed 8GT/s, Width x4 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode]) LnkSta: Speed 8GT/s, Width x16 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B (prog-if 00 [Normal decode]) LnkSta: Speed 8GT/s, Width x16 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 xHCI Compliant Host Controller (rev 01) (prog-if 30 [XHCI]) LnkSta: Speed 8GT/s, Width x4 01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller (rev 01) (prog-if 01 [AHCI 1.0]) LnkSta: Speed 8GT/s, Width x4 01:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge (rev 01) (prog-if 00 [Normal decode]) LnkSta: Speed 8GT/s, Width x4 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode]) LnkSta: Speed 2.5GT/s, Width x1 02:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode]) LnkSta: Speed 5GT/s, Width x1 02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode]) LnkSta: Speed 5GT/s, Width x4 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) LnkSta: Speed 2.5GT/s, Width x1 04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11) (prog-if 01 [AHCI 1.0]) LnkSta: Speed 5GT/s, Width x1 pcilib: sysfs_read_vpd: read failed: No such device 05:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) LnkSta: Speed 5GT/s, Width x4 (downgraded) 06:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) (prog-if 00 [VGA controller]) LnkSta: Speed 5GT/s, Width x8 06:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1) LnkSta: Speed 5GT/s, Width x8 07:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller]) LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded) 07:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1) LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded) 08:00.0 Non-Volatile memory controller: Sandisk Corp Western Digital WD Black SN850X NVMe SSD (rev 01) (prog-if 02 [NVM Express]) LnkSta: Speed 8GT/s (downgraded), Width x4 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function LnkSta: Speed 8GT/s, Width x16 09:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor (PSP) 3.0 Device LnkSta: Speed 8GT/s, Width x16 09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller (prog-if 30 [XHCI]) LnkSta: Speed 8GT/s, Width x16 0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function LnkSta: Speed 8GT/s, Width x16 0a:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51) (prog-if 01 [AHCI 1.0]) LnkSta: Speed 8GT/s, Width x16 This is the output of: sudo lspci -vv | grep -P "[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]|LnkSta:" The different LnkSta's of the blue entries are partially downgraded, but it still should be plenty of bandwidth to do the job, right? Quote Link to comment
pappaq Posted November 16, 2023 Author Share Posted November 16, 2023 I've tried CPU pinning...every docker which was pinned by me is now an orphaned image and deleted...this is ridiculous. Quote Link to comment
pappaq Posted November 16, 2023 Author Share Posted November 16, 2023 I've set my ZFS arc from 4GB to 16GB. The io issue seems to be gone...time will tell. Quote Link to comment
pappaq Posted November 17, 2023 Author Share Posted November 17, 2023 It was not the solution. Currently I'm trying to pin it down by booting in safe mode. I've moved the docker image and the VMs to a btrfs encrypted cache pool and the iowait ist gone again for over 12 hours now. Will keep a close eye on it. Steps so far: - Updated BIOS to newest version - increased ZFS arc from 4 to 16GB - moved docker image (not appdata) and VMs off of the ZFS encrypted cache pool to a btrfs encrypted cache pool Quote Link to comment
pappaq Posted November 21, 2023 Author Share Posted November 21, 2023 Set down the ZFS arc to 8GB because of swap doing a lot of CPU stress cause my RAM was filling up. Disabled duplicati, as it was another source of unwanted high CPU usage. Getting there. The systems seems to take array and cache loads so much better now. Pretty much no unwanted IOWAIT. Keeping an eye on it for a few days more. Maybe it is now gone for good. 1 Quote Link to comment
Solution pappaq Posted December 15, 2023 Author Solution Share Posted December 15, 2023 I've changed my whole system from Ryzen to Intel and switched all old controllers to new ASM1166. All iowait issues are gone now. So I assume it had something to do with it. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.