chris smashe Posted October 21, 2022 Share Posted October 21, 2022 (edited) Latest information at the bottom. System Mother Board: ASUSTeK COMPUTER INC. B150M-C , Version Rev X.0x CPU: Intel® Core™ i7-6700 CPU @ 3.40GHz Memory: 32GB DDR4 GPU: NVIDIA GeForce GTX 1070 Hard Drives Cache: Crucial/Micron Client SSDs 480 GB SATA 3.3 Parity: WD-Plus 12 tb 7200 rpm Sata 3.1 Array Disk 1: WD-Plus 8 tb 5640 rpm Sata 3.1 Array Disk 2: Seagate Barracuda 1.5 tb 7200 rpm SATA 2.6, 3.0 Gb/s (only holds music) Array Disk 3: WD-Plus 8 tb 5640 rpm Sata 3.1 I'm having an issue with iowait locking up the system with high cpu usage. I have turned everything off one by one and figured out its deluge. Once i stopped that the io wait went to 0 and stayed there. While this was going on things were downloading but nothing finished so there was no unpacking being done (that I am aware of). Deluge is set up to download to the downloads share which is set to cache pool = yes. Once the file is done, the rrr's are set to move the files to the array. This was not being done during this test. My questions are How do I tell what disk is causing the io to wait? At first, I was thinking that there is a problem with my cache drive (or deluge), but while typing this out, I also figured out that most of the files being downloaded by deluge have been doing so for long enough that least part of the download has been moved off of the cache drive to the array by the mover. Is this part of my problem? Do I need to change the downloads cache pool to be only, so it never moves that to the array until one of the rrr's does it? smashenas-diagnostics-20221021-0909.zip Edited October 22, 2022 by chris smashe 1 Quote Link to comment
chris smashe Posted October 21, 2022 Author Share Posted October 21, 2022 Additional note. I just spun back up my lubuntu VM and it was installing an upgrade and got and io wait also. Not as bad as deluge but it does cause it. Quote Link to comment
chris smashe Posted October 21, 2022 Author Share Posted October 21, 2022 Additional note 2 I made deluge only download items that are only on the cache drive (moved any items that have been partly moved to the array to the bottom) and there is no change. I still have an io wait issue. I changed the number of active torrents from 8 down to 1 and added one in at a time every 2 minutes until i started to get io wait issues at 4. the first 3 had a slow download speed though where the 4th had up 10 15mib/s. Not fast enough that it should matter for the ssd that's for sure. I stopped all other downloads except for that one and i still have an iowait issue. This download is faster then the rest has more currently downloaded then the rest (22gb total with 66.73% currently downloaded where the others are less than 15%) I moved another file to the top running by itself which is 46.8gb total with 43% currently downloaded. This one is downloading slow and there is no iowait issue. I moved another file to the top that had nothing downloaded yet and has a higher download speed and it is not having an iowait issue until it got to about 8% downloaded then i started to get them again. Quote Link to comment
chris smashe Posted October 21, 2022 Author Share Posted October 21, 2022 Additional note 3 ok its not deluge specific. I copied a 23 GB file from my computer to a share that saves to the cach drive. It was fine for a bit but then started to have an ongoing iowait for the last 3/4(ish) of the file. The max transfer was 113MB/s. 13% complete was where it started to have iowait issues. It would go down to a transfer of 7mbs. Closing the VM had no impact on this. I ran disk speed against this and while it was scanning it would have IO wait issues. It also had a strange drop off then rise. I ran it again and it was even stranger. I ran disk speed against one of the 8tb spinning drives. max io wait of 12.3% This is all of the tests that i can think of to run to give data to you. Thanks for any help you can give me. Quote Link to comment
chris smashe Posted October 21, 2022 Author Share Posted October 21, 2022 I changed the share for the test of me moving the 24gb file from the cache drive to the array. It stayed up at 113mb/s longer before it dropped and when it did drop it dropped as low as 53MB/s. Not as bad as going to the cache drive but still not great. Server is not doing anything else but this. io wait hovering around 16% in this configuration which is lower than if it was sending to the cache file but still not great. Nothing should be saturating the array and definitely not the ssd. Im not 100% sure on what causes io waits other than the disk so if anyone can help me figure out why this is happening that would be greatly appreciated. Quote Link to comment
chris smashe Posted October 21, 2022 Author Share Posted October 21, 2022 (edited) I changed out the sata cable. No change. I stopped docker and ran the file transfer above and when it hung i ran pf auxf and the only thing that has a stat of D is [kworker/u16:5+btrfs-worker] I dont know what this means but hopefully this helps USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2 0.0 0.0 0 0 ? S 15:57 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [rcu_gp] root 4 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [rcu_par_gp] root 5 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [slub_flushwq] root 6 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [netns] root 8 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/0:0H-events_highpri] root 10 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/0:1H-kblockd] root 11 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [mm_percpu_wq] root 12 0.0 0.0 0 0 ? I 15:57 0:00 \_ [rcu_tasks_kthread] root 13 0.0 0.0 0 0 ? I 15:57 0:00 \_ [rcu_tasks_rude_kthread] root 14 0.0 0.0 0 0 ? I 15:57 0:00 \_ [rcu_tasks_trace_kthread] root 15 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/0] root 16 0.1 0.0 0 0 ? I 15:57 0:02 \_ [rcu_preempt] root 17 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/0] root 18 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/0] root 19 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/1] root 20 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/1] root 21 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/1] root 23 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/1:0H-events_highpri] root 24 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/2] root 25 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/2] root 26 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/2] root 28 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/2:0H-events_highpri] root 29 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/3] root 30 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/3] root 31 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/3] root 33 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/3:0H-events_highpri] root 34 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/4] root 35 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/4] root 36 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/4] root 37 0.0 0.0 0 0 ? I 15:57 0:00 \_ [kworker/4:0-mm_percpu_wq] root 38 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/4:0H-events_highpri] root 39 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/5] root 40 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/5] root 41 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/5] root 43 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/5:0H-events_highpri] root 44 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/6] root 45 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/6] root 46 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/6] root 48 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/6:0H-kblockd] root 49 0.0 0.0 0 0 ? S 15:57 0:00 \_ [cpuhp/7] root 50 0.0 0.0 0 0 ? S 15:57 0:00 \_ [migration/7] root 51 0.0 0.0 0 0 ? S 15:57 0:00 \_ [ksoftirqd/7] root 53 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/7:0H-kblockd] root 54 0.0 0.0 0 0 ? S 15:57 0:00 \_ [kdevtmpfs] root 55 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [inet_frag_wq] root 57 0.0 0.0 0 0 ? I 15:57 0:00 \_ [kworker/1:1-events] root 59 0.0 0.0 0 0 ? S 15:57 0:00 \_ [oom_reaper] root 61 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [writeback] root 62 0.0 0.0 0 0 ? S 15:57 0:00 \_ [kcompactd0] root 63 0.0 0.0 0 0 ? SN 15:57 0:00 \_ [ksmd] root 64 0.0 0.0 0 0 ? SN 15:57 0:00 \_ [khugepaged] root 65 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kintegrityd] root 66 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kblockd] root 67 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [blkcg_punt_bio] root 68 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [ata_sff] root 69 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [devfreq_wq] root 71 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/1:1H-kblockd] root 94 0.2 0.0 0 0 ? I 15:57 0:07 \_ [kworker/u16:4-btrfs-endio-write] root 116 0.0 0.0 0 0 ? S 15:57 0:00 \_ [kswapd0] root 120 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kthrotld] root 125 0.0 0.0 0 0 ? I 15:57 0:00 \_ [kworker/7:2-rcu_gp] root 171 0.0 0.0 0 0 ? S 15:57 0:00 \_ [xenbus_probe] root 241 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [target_completi] root 242 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [target_submissi] root 243 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [xcopy_wq] root 250 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [vfio-irqfd-clea] root 268 0.2 0.0 0 0 ? I 15:57 0:05 \_ [kworker/u16:6-btrfs-endio-write] root 274 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/2:1H-kblockd] root 335 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kstrp] root 380 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/4:1H-kblockd] root 381 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/3:1H-kblockd] root 385 0.0 0.0 0 0 ? S 15:57 0:00 \_ [scsi_eh_0] root 386 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [scsi_tmf_0] root 388 0.0 0.0 0 0 ? S 15:57 0:00 \_ [usb-storage] root 422 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/7:1H-events_highpri] root 423 0.0 0.0 0 0 ? I< 15:57 0:00 \_ [kworker/6:1H-events_highpri] root 764 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [acpi_thermal_pm] root 774 0.0 0.0 0 0 ? S 15:59 0:00 \_ [scsi_eh_1] root 775 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [kworker/5:1H-kblockd] root 776 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [scsi_tmf_1] root 777 0.0 0.0 0 0 ? S 15:59 0:00 \_ [scsi_eh_2] root 778 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [scsi_tmf_2] root 779 0.0 0.0 0 0 ? S 15:59 0:00 \_ [scsi_eh_3] root 780 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [scsi_tmf_3] root 781 0.0 0.0 0 0 ? S 15:59 0:00 \_ [scsi_eh_4] root 782 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [scsi_tmf_4] root 784 0.0 0.0 0 0 ? S 15:59 0:00 \_ [scsi_eh_5] root 785 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [scsi_tmf_5] root 786 0.0 0.0 0 0 ? S 15:59 0:00 \_ [scsi_eh_6] root 787 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [scsi_tmf_6] root 794 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [cryptd] root 829 0.0 0.0 0 0 ? I 15:59 0:00 \_ [kworker/4:2-rcu_gp] root 836 0.0 0.0 0 0 ? S 15:59 0:00 \_ [nv_queue] root 837 0.0 0.0 0 0 ? S 15:59 0:00 \_ [nv_queue] root 844 0.0 0.0 0 0 ? S 15:59 0:00 \_ [nvidia-modeset/kthread_q] root 845 0.0 0.0 0 0 ? S 15:59 0:00 \_ [nvidia-modeset/deferred_close_kthread_q] root 987 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [mld] root 988 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [ipv6_addrconf] root 999 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [bond0] root 1024 0.0 0.0 0 0 ? I 15:59 0:00 \_ [kworker/3:3-events] root 1136 0.0 0.0 0 0 ? I< 15:59 0:00 \_ [wg-crypt-wg0] root 4581 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [md] root 4582 0.0 0.0 0 0 ? S 16:00 0:00 \_ [mdrecoveryd] root 5163 0.0 0.0 0 0 ? S 16:00 0:00 \_ [unraidd0] root 5164 0.0 0.0 0 0 ? S 16:00 0:00 \_ [unraidd1] root 5165 0.0 0.0 0 0 ? S 16:00 0:00 \_ [unraidd2] root 5166 0.0 0.0 0 0 ? S 16:00 0:00 \_ [unraidd3] root 5199 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfsalloc] root 5200 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs_mru_cache] root 5202 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-buf/md1] root 5203 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-conv/md1] root 5204 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-reclaim/md1] root 5205 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-blockgc/md1] root 5206 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-inodegc/md1] root 5207 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-log/md1] root 5208 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-cil/md1] root 5209 0.0 0.0 0 0 ? S 16:00 0:00 \_ [xfsaild/md1] root 5220 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-buf/md2] root 5221 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-conv/md2] root 5222 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-reclaim/md2] root 5223 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-blockgc/md2] root 5224 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-inodegc/md2] root 5225 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-log/md2] root 5226 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-cil/md2] root 5227 0.0 0.0 0 0 ? S 16:00 0:00 \_ [xfsaild/md2] root 5238 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-buf/md3] root 5239 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-conv/md3] root 5240 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-reclaim/md3] root 5241 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-blockgc/md3] root 5242 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-inodegc/md3] root 5243 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-log/md3] root 5244 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [xfs-cil/md3] root 5245 0.0 0.0 0 0 ? S 16:00 0:00 \_ [xfsaild/md3] root 5257 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-worker] root 5259 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-worker-hi] root 5260 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-delalloc] root 5261 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-flush_del] root 5262 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-cache] root 5263 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-fixup] root 5264 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-endio] root 5265 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-endio-met] root 5266 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-endio-met] root 5267 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-endio-rai] root 5268 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-rmw] root 5269 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-endio-wri] root 5270 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-freespace] root 5271 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-delayed-m] root 5272 0.0 0.0 0 0 ? I< 16:00 0:00 \_ [btrfs-qgroup-re] root 5273 0.5 0.0 0 0 ? S 16:00 0:12 \_ [btrfs-cleaner] root 5274 0.0 0.0 0 0 ? S 16:00 0:01 \_ [btrfs-transaction] root 7859 0.0 0.0 0 0 ? S 16:01 0:00 \_ [irq/128-nvidia] root 7860 0.0 0.0 0 0 ? S 16:01 0:00 \_ [nvidia] root 7861 0.0 0.0 0 0 ? S 16:01 0:00 \_ [nv_queue] root 8129 0.0 0.0 0 0 ? I< 16:01 0:00 \_ [dio/sdd1] root 9607 0.0 0.0 0 0 ? I 16:02 0:00 \_ [kworker/0:3-mm_percpu_wq] root 9720 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-worker] root 9722 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-worker-hi] root 9723 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-delalloc] root 9724 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-flush_del] root 9725 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-cache] root 9726 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-fixup] root 9727 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-endio] root 9728 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-endio-met] root 9729 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-endio-met] root 9730 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-endio-rai] root 9731 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-rmw] root 9732 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-endio-wri] root 9733 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-freespace] root 9734 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-delayed-m] root 9735 0.0 0.0 0 0 ? I< 16:02 0:00 \_ [btrfs-qgroup-re] root 9767 0.0 0.0 0 0 ? S 16:02 0:00 \_ [btrfs-cleaner] root 9768 0.0 0.0 0 0 ? S 16:02 0:00 \_ [btrfs-transaction] root 12543 0.0 0.0 0 0 ? S 16:02 0:00 \_ [UVM global queue] root 12544 0.0 0.0 0 0 ? S 16:02 0:00 \_ [UVM deferred release queue] root 12546 0.0 0.0 0 0 ? S 16:02 0:00 \_ [UVM Tools Event Queue] root 16555 0.0 0.0 0 0 ? I 16:03 0:00 \_ [kworker/2:4-events] root 16557 0.0 0.0 0 0 ? I 16:03 0:00 \_ [kworker/2:6-rcu_gp] root 27361 0.0 0.0 0 0 ? I 16:09 0:00 \_ [kworker/7:0-events] root 31055 0.0 0.0 0 0 ? I 16:11 0:00 \_ [kworker/0:0-inet_frag_wq] root 31982 0.0 0.0 0 0 ? I 16:11 0:00 \_ [kworker/1:0-events] root 1258 0.0 0.0 0 0 ? I 16:12 0:00 \_ [kworker/3:1-mm_percpu_wq] root 1949 0.0 0.0 0 0 ? I 16:13 0:00 \_ [kworker/6:11-cgroup_destroy] root 1974 0.0 0.0 0 0 ? I 16:13 0:00 \_ [kworker/5:12-cgroup_destroy] root 1975 0.0 0.0 0 0 ? I 16:13 0:00 \_ [kworker/5:13-events] root 29577 0.0 0.0 0 0 ? I 16:27 0:00 \_ [kworker/6:0-mm_percpu_wq] root 29912 0.2 0.0 0 0 ? I 16:27 0:01 \_ [kworker/u16:3-btrfs-endio-write] root 29958 0.2 0.0 0 0 ? I 16:27 0:01 \_ [kworker/u16:7-btrfs-endio-write] root 30289 0.0 0.0 0 0 ? I< 16:28 0:00 \_ [kworker/u17:1-btrfs-worker-high] root 30290 0.0 0.0 0 0 ? I< 16:28 0:00 \_ [kworker/u17:2-btrfs-worker-high] root 30293 0.0 0.0 0 0 ? I< 16:28 0:00 \_ [kworker/u17:5-btrfs-worker-high] root 30331 0.3 0.0 0 0 ? I 16:28 0:02 \_ [kworker/u16:13-events_unbound] root 6970 0.3 0.0 0 0 ? I 16:33 0:01 \_ [kworker/u16:0-bond0] root 6971 0.3 0.0 0 0 ? I 16:33 0:01 \_ [kworker/u16:1-events_unbound] root 8726 0.1 0.0 0 0 ? I 16:34 0:00 \_ [kworker/u16:2-btrfs-worker] root 12431 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/6:1-cgroup_pidlist_destroy] root 12573 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/5:0-events] root 12586 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/5:1-mm_percpu_wq] root 12696 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/1:2-mm_percpu_wq] root 12730 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/7:1] root 12739 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/2:0-mm_percpu_wq] root 13085 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/0:1-events] root 13094 0.0 0.0 0 0 ? I 16:36 0:00 \_ [kworker/4:1-rcu_gp] root 13306 0.0 0.0 0 0 ? D 16:36 0:00 \_ [kworker/u16:5+btrfs-worker] root 13860 0.0 0.0 0 0 ? I< 16:37 0:00 \_ [kworker/u17:0-btrfs-worker-high] root 13861 0.0 0.0 0 0 ? I< 16:37 0:00 \_ [kworker/u17:3] root 13862 0.0 0.0 0 0 ? I< 16:37 0:00 \_ [kworker/u17:4] root 1 0.0 0.0 2592 920 ? Ss 15:57 0:00 init root 734 0.0 0.0 18016 3540 ? Ss 15:59 0:00 /sbin/udevd --daemon root 926 0.0 0.0 211388 4168 ? Ssl 15:59 0:00 /usr/sbin/rsyslogd -i /var/run/rsyslogd.pid dhcpcd 1034 0.0 0.0 3032 1976 ? S 15:59 0:00 dhcpcd: br0 [ip4] root 1035 0.0 0.0 3056 2140 ? S 15:59 0:00 \_ dhcpcd: [privileged proxy] br0 [ip4] dhcpcd 1047 0.0 0.0 3044 288 ? S 15:59 0:00 | \_ dhcpcd: [BPF ARP] br0 192.168.1.104 dhcpcd 1087 0.0 0.0 3056 292 ? S 15:59 0:00 | \_ dhcpcd: [network proxy] 192.168.1.104 dhcpcd 1036 0.0 0.0 3032 272 ? S 15:59 0:00 \_ dhcpcd: [control proxy] br0 [ip4] root 1178 0.0 0.0 3184 164 ? Ss 15:59 0:00 /usr/sbin/mcelog --daemon message+ 1188 0.0 0.0 5208 2080 ? Ss 15:59 0:00 /usr/bin/dbus-daemon --system root 1201 0.0 0.0 4272 2416 ? S 15:59 0:00 elogind-daemon ntp 1234 0.0 0.0 74552 4512 ? Ssl 15:59 0:00 /usr/sbin/ntpd -g -u ntp:ntp root 1241 0.0 0.0 2600 104 ? Ss 15:59 0:00 /usr/sbin/acpid root 1255 0.0 0.0 2576 1788 ? Ss 15:59 0:00 /usr/sbin/crond daemon 1259 0.0 0.0 2656 1540 ? Ss 15:59 0:00 /usr/sbin/atd -b 15 -l 1 daemon 2483 0.0 0.0 5148 3080 ? S 16:00 0:00 \_ /usr/sbin/atd -b 15 -l 1 root 2486 0.0 0.0 3980 2992 ? S 16:00 0:00 \_ sh root 2487 0.0 0.0 4344 2756 ? S 16:00 0:00 \_ sh root 2488 0.0 0.0 2772 860 ? S 16:00 0:00 \_ inotifywait -q /boot/changes.txt -e move_self,delete_self,modify root 4537 0.0 0.0 2608 916 tty1 Ss+ 16:00 0:00 /sbin/agetty --noclear 38400 tty1 linux root 4538 0.0 0.0 2608 932 tty2 Ss+ 16:00 0:00 /sbin/agetty 38400 tty2 linux root 4539 0.0 0.0 2608 916 tty3 Ss+ 16:00 0:00 /sbin/agetty 38400 tty3 linux root 4540 0.0 0.0 2608 888 tty4 Ss+ 16:00 0:00 /sbin/agetty 38400 tty4 linux root 4541 0.0 0.0 2608 908 tty5 Ss+ 16:00 0:00 /sbin/agetty 38400 tty5 linux root 4542 0.0 0.0 2608 912 tty6 Ss+ 16:00 0:00 /sbin/agetty 38400 tty6 linux root 4574 0.0 0.0 4344 240 ? Ss 16:00 0:00 /usr/sbin/inetd root 4575 0.3 0.0 412916 6640 ? Ssl 16:00 0:06 /usr/local/sbin/emhttpd root 4603 0.0 0.0 142600 2176 ? Ssl 16:00 0:00 /sbin/apcupsd root 4706 0.0 0.0 88096 9984 ? Ss 16:00 0:00 php-fpm: master process (/etc/php-fpm/php-fpm.conf) root 10164 0.1 0.0 91736 14100 ? S 16:35 0:00 \_ php-fpm: pool www root 11758 0.1 0.0 91864 14608 ? S 16:35 0:00 \_ php-fpm: pool www root 12837 0.1 0.0 91672 13516 ? S 16:36 0:00 \_ php-fpm: pool www root 13632 0.1 0.0 91672 13704 ? S 16:37 0:00 \_ php-fpm: pool www root 14014 0.1 0.0 91672 13504 ? S 16:37 0:00 \_ php-fpm: pool www root 14518 0.1 0.0 91672 13516 ? S 16:37 0:00 \_ php-fpm: pool www root 14609 0.1 0.0 91672 13516 ? S 16:37 0:00 \_ php-fpm: pool www root 4919 0.0 0.0 147708 4388 ? Ss 16:00 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf root 4920 0.5 0.0 149504 10508 ? S 16:00 0:12 \_ nginx: worker process root 4951 0.9 0.4 11417656 157852 ? SLsl 16:00 0:21 /usr/local/bin/unraid-api/unraid-api /snapshot/api/dist/unraid-api.cjs start root 4972 0.0 0.0 5172 3632 ? S 16:00 0:00 /bin/bash /etc/rc.d/rc.flash_backup watch root 14638 0.0 0.0 2576 872 ? S 16:37 0:00 \_ sleep 60 root 5371 0.0 0.0 91220 29040 ? SL 16:00 0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/notify_poller root 5373 0.0 0.0 91220 29016 ? SL 16:00 0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/session_check root 5375 0.0 0.0 91284 29728 ? SL 16:00 0:00 /usr/bin/php -q /usr/local/emhttp/plugins/dynamix.system.temp/nchan/system_temp root 5377 0.1 0.0 91456 29240 ? SL 16:00 0:03 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/device_list root 5379 0.0 0.0 4016 3036 ? S 16:00 0:01 /bin/bash /usr/local/emhttp/webGui/nchan/disk_load root 16043 0.0 0.0 2576 932 ? S 16:38 0:00 \_ sleep 2 root 5381 0.1 0.0 91220 29188 ? SL 16:00 0:02 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/parity_list root 5775 0.0 0.0 91284 29540 ? SL 16:01 0:01 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/wg_poller root 5777 0.0 0.0 91220 29216 ? SL 16:01 0:00 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_1 root 5779 0.0 0.0 91416 29616 ? SL 16:01 0:01 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_2 root 5781 0.1 0.0 91220 29264 ? SL 16:01 0:03 /usr/bin/php -q /usr/local/emhttp/webGui/nchan/update_3 root 7727 0.0 0.0 140236 304 ? Ssl 16:01 0:00 /usr/local/sbin/shfs /mnt/user0 -disks 14 -o default_permissions,allow_other,noat root 7739 3.1 0.0 693092 21496 ? Ssl 16:01 1:08 /usr/local/sbin/shfs /mnt/user -disks 15 -o default_permissions,allow_other,noati root 9785 0.0 0.0 31864 9560 ? S 16:02 0:00 /usr/sbin/virtlockd -d -f /etc/libvirt/virtlockd.conf -p /var/run/libvirt/virtloc root 9790 0.0 0.0 31936 10460 ? S 16:02 0:00 /usr/sbin/virtlogd -d -f /etc/libvirt/virtlogd.conf -p /var/run/libvirt/virtlogd. root 9809 0.0 0.0 1391124 21664 ? Sl 16:02 0:00 /usr/sbin/libvirtd -d -l -f /etc/libvirt/libvirtd.conf -p /var/run/libvirt/libvir nobody 9918 0.0 0.0 7484 3096 ? S 16:02 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-r root 9919 0.0 0.0 7352 332 ? S 16:02 0:00 \_ /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefi root 4564 0.0 0.0 12324 5000 ? Sl 16:32 0:00 /usr/bin/ttyd -d0 -t disableLeaveAlert true -t theme {'background':'black'} -t fo root 4569 0.0 0.0 7604 4308 pts/0 Ss 16:32 0:00 \_ bash --login root 16077 0.0 0.0 5172 3004 pts/0 R+ 16:38 0:00 | \_ ps auxf root 14384 0.0 0.0 7604 4240 pts/1 Ss 16:37 0:00 \_ bash --login root 14460 0.1 0.0 5580 3608 pts/1 S+ 16:37 0:00 \_ top root 12001 0.0 0.0 79008 13516 ? Ss 16:36 0:00 /usr/sbin/smbd -D root 12006 0.0 0.0 77216 9116 ? S 16:36 0:00 \_ /usr/sbin/smbd -D root 12007 0.0 0.0 77208 4352 ? S 16:36 0:00 \_ /usr/sbin/smbd -D csmashe 14228 40.1 0.0 80940 17064 ? S 16:37 0:30 \_ /usr/sbin/smbd -D root 12014 0.0 0.0 2588 2004 ? Ss 16:36 0:00 /usr/sbin/wsdd2 -d root 12017 0.0 0.0 75032 11940 ? Ss 16:36 0:00 /usr/sbin/winbindd -D root 12020 0.0 0.0 75232 12792 ? S 16:36 0:00 \_ /usr/sbin/winbindd -D root 13492 0.0 0.0 75860 10448 ? S 16:37 0:00 \_ /usr/sbin/winbindd -D avahi 12049 0.0 0.0 5640 3404 ? S 16:36 0:00 avahi-daemon: running [SmasheNas.local] avahi 12052 0.0 0.0 5344 268 ? S 16:36 0:00 \_ avahi-daemon: chroot helper root 12060 0.0 0.0 2676 120 ? S 16:36 0:00 /usr/sbin/avahi-dnsconfd -D Edited October 21, 2022 by chris smashe Quote Link to comment
chris smashe Posted October 22, 2022 Author Share Posted October 22, 2022 So it is definitely an issue with the cache drive or butter fs. I ran iotop and here is a video of it running while I do a transfer and deluge is running. A good portion of the time btrfs worker is 99% io but has no read or writes being listed. Then when you get to 31 seconds it is the only thing showing ( i used the -o flag to only show processes that have io) since it looks like its blocking everything else from running and causing the wait. I have also seen nothing listed at all for a couple seconds in my tests but that is not shown in this video. Also attached is an iostat image My next steps are to change out the drive with another drive and if that does not work, change the drive from btfs to xfs and see if that works. That will be later so if anyone has any ideas before i start that it would be helpful. Quote Link to comment
Solution chris smashe Posted October 22, 2022 Author Solution Share Posted October 22, 2022 I have fixed the problem, I think. So far, so good. I wanted to try everything before I started updating the hardware, so I found and updated my bios for my motherboard to the latest version available. I still have some iowaits that go up to the 50’s in certain situations, but so far, nothing I have thrown at it has locked up the server while trying to complete an awaited task. Most situations that I have tested still allowed me to transfer my test file at full speed with no degradation of speed even if there was an io wait reported. The 2 situations in which I was able to make the speed slow down slightly were if tdar was transcoding multiple files (3 at a time plus any 1 copy could be running) or if plex was detecting intros. Both of those situations made my transfer speed drop about 20% at most, but it no longer drops to 0 and locks up the server until the io catches up. I would like to have little to no iowaits but I will take having the server always respond without issue as a win for now. While trying to figure this out the past 2 days, I found multiple posts about people having the same issues, but none of the posts have solutions, so I want to post what helped me here. First, update your bios if there is an update. Something about the bios my server was using (which was old) and btfs did not work well together. This fixed my issue so save yourself 2 days and try this first. If that does not fix your issue, the couple of things you can do to try and diagnose what is causing it are First, install glances from apps. It’s an easy way to view the iowait and show tasks that are using your cpu and you can watch/log iowaits while you try different scenarios. You can get the same information with the top command, but glances made it easier to view, and it logs the last 9 warning/critical issues. Next, run the iostat with the following command: iostat -hymx 1 4 This will give you a response like below. Look to see if any disks have 100% usage, as I had below. If you do, that is the disk that is having issues. Next, install disk speed from apps and run the benchmark on that drive. In my case, the drive looked fine when I ran the benchmark without other things turned on, but that helped me rule out the disk (somewhat since I still thought we could have a disk issue). Lastly, install nerd tools, and then inside nerd tools, install iotop. After you install iotop run it using the command: iotop -o This will list out all the tasks that are using io in real-time. Keep it open and do things on your server that cause it to have a high iowait percentage to see what task is causing the problem. In my case, it was the btfs worker (shown in the youtube video linked in this thread), so if my bios update did not work, I was going to swap out the cache drive and change the drive from btfs to xfs as a last resort. If iotop is not showing the percentage and shows unavailable, run the following command and then run iotop again. It might still show unavailable for a minute, but then it started showing for me. echo 1 > /proc/sys/kernel/task_delayacct Hopefully, this helps anyone else having this issue. If it starts happening again i will open this thread back up. 2 Quote Link to comment
huapi Posted December 3, 2022 Share Posted December 3, 2022 On 10/23/2022 at 4:57 AM, chris smashe said: I have fixed the problem, I think. So far, so good. I wanted to try everything before I started updating the hardware, so I found and updated my bios for my motherboard to the latest version available. I still have some iowaits that go up to the 50’s in certain situations, but so far, nothing I have thrown at it has locked up the server while trying to complete an awaited task. Most situations that I have tested still allowed me to transfer my test file at full speed with no degradation of speed even if there was an io wait reported. The 2 situations in which I was able to make the speed slow down slightly were if tdar was transcoding multiple files (3 at a time plus any 1 copy could be running) or if plex was detecting intros. Both of those situations made my transfer speed drop about 20% at most, but it no longer drops to 0 and locks up the server until the io catches up. I would like to have little to no iowaits but I will take having the server always respond without issue as a win for now. While trying to figure this out the past 2 days, I found multiple posts about people having the same issues, but none of the posts have solutions, so I want to post what helped me here. First, update your bios if there is an update. Something about the bios my server was using (which was old) and btfs did not work well together. This fixed my issue so save yourself 2 days and try this first. If that does not fix your issue, the couple of things you can do to try and diagnose what is causing it are First, install glances from apps. It’s an easy way to view the iowait and show tasks that are using your cpu and you can watch/log iowaits while you try different scenarios. You can get the same information with the top command, but glances made it easier to view, and it logs the last 9 warning/critical issues. Next, run the iostat with the following command: iostat -hymx 1 4 This will give you a response like below. Look to see if any disks have 100% usage, as I had below. If you do, that is the disk that is having issues. Next, install disk speed from apps and run the benchmark on that drive. In my case, the drive looked fine when I ran the benchmark without other things turned on, but that helped me rule out the disk (somewhat since I still thought we could have a disk issue). Lastly, install nerd tools, and then inside nerd tools, install iotop. After you install iotop run it using the command: iotop -o This will list out all the tasks that are using io in real-time. Keep it open and do things on your server that cause it to have a high iowait percentage to see what task is causing the problem. In my case, it was the btfs worker (shown in the youtube video linked in this thread), so if my bios update did not work, I was going to swap out the cache drive and change the drive from btfs to xfs as a last resort. If iotop is not showing the percentage and shows unavailable, run the following command and then run iotop again. It might still show unavailable for a minute, but then it started showing for me. echo 1 > /proc/sys/kernel/task_delayacct Hopefully, this helps anyone else having this issue. If it starts happening again i will open this thread back up. hi chris, how do you run iotop and ipstat on unraid? eventhough, i cant use yum to install these command. thanks Quote Link to comment
chris smashe Posted December 3, 2022 Author Share Posted December 3, 2022 42 minutes ago, huapi said: hi chris, how do you run iotop and ipstat on unraid? eventhough, i cant use yum to install these command. thanks Apps > NerdTools After that is installed Settings > nerdtools you can install it from there. Quote Link to comment
damndaewoo Posted March 10 Share Posted March 10 Thanks for all the work you put into this Chris. I recently converted my old AMD FX4300 Black based unraid server to an i7-6700K based platform with 32GB RAM. I added a Samsung SATA SSD, a Crucial NVME drive, and an LSI HBA to attach my stack of spinning rust. I had been struggling with IOWAIT issues ever since first booting the new platform up and had been struggling to diagnose it. After reading your post I moved all my appdata and VMs off my NVME drive, converted it from BTRFS to XFS, then moved everything back. Now it seems my IOWAIT issues are solved. Occasionally see 5-6% IOWAIT in top, but only if I'm doing something file intensive like a Plex library scan. Thanks for pointing me in the right direction. Quote Link to comment
shpitz461 Posted March 11 Share Posted March 11 I've also seen this video about the IO Wait issue: Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.