High cpu iowait causing unresponsive docker containers


Recommended Posts

I'm having some trouble with Unraid and I'm hoping that someone can assist me, it is related to high cpu_iowait on the cache pool. I have a share that is used by sabnzbd / sonarr / radarr / plex, containing all my media files. This share has "cache pool" set to "Yes", so the mover will later put them on the array.

 

The appdata and system shares are both on the same "cache pool" (as the media), set to "Prefer".

 

My array consists of all 8TB WD RED and my cache pool is a Samsung 860 2TB evo. Both are xfs encrypted. I noticed this issue first when I had 2x 2TB SSD with btrfs encrypted but in other threads of others with a similar issue people suggested to try xfs, which did not fix it. At this time, I am still on xfs encrypted for both my array and the cache pool.

 

Sabnzbd has a volume /mnt/user/plex/usenet to /storage/usenet.
- /storage/usenet/incomplete
- /storage/usenet/completed

 

Sonarr has a volume /mnt/user/plex/ to /storage.
- /storage/media/TVShows/Someshow

 

So this will move downloaded media from /storage/usenet/completed to /storage/media/TVShows/Someshow, and translated /mnt/user/plex/usenet/completed to /mnt/user/plex/usenet/completed.


When I download a season with sonarr, I notice after a download completes in sabnzbd and gets imported, the `cpu_iowait` goes up to 30% or higher and docker containers start to freeze. Also, when the mover runs, moving all the episodes from the cache to array, it freezes everything and plex direct play stops working as well. High cpu_iowait in both case. This is not the only case in which the cpu_iowait goes up. For example when I write a 13GB file to the same share using samba, the same thing happens.

 

If you are playing a movie on plex that is located on the array, playback stops and starts buffering. Every container becomes really unresponsive.

Link to comment

Progress

 

I have made some progress on my issue. What helped was understanding what iowait means and how it occurs. It happens when something requires io of a disk while that disk is under full load by another process, causing it to wait.
 

This gave me the idea to split my cache pool into two.

  • One cache pool (a) is used for appdata / system / domains (vms)
  • Other cache pool (b) is used for all shares that will eventually get moved to the array, such as plex media / downloads

Doing this has fixed my issue of plex (or other containers) becoming unresponsive when media is downloaded, because it is a totally separate disk.

 

Remaining issue

 

While this has solved some issue's, I still have issues with the mover. When files are moved from the cache pool (b) to my array, reading anything from the array of the disk that data is being moved to just does not work until the mover is finished. I tried to fix this by changing the mover CPU & IO priority with my own plugin (CA Mover Tuning fork). This seemed to have helped a little bit, but after the mover finishes I notice high usage across my entire array (updating parity?) which still causes lock ups (cpu iowait) until that is finished, and this puts load on all disks.

 

I have already tried both available md_write_method, reconstruct write (aka turbowrite) finishes the mover faster but the parity process after still causes issues.

 

2022_02_12-07_02-chrome_hTEstVAsfC.png.7a24f926dfee5a71e2fb376d359c30c9.png 2022_02_12-07_05-chrome_lhwUXnBDgJ.thumb.png.35648ef6c226451c6fa54120932297b7.png

 

 

Link to comment
  • 7 months later...
  • 4 weeks later...
  • 9 months later...
On 2/12/2022 at 3:45 PM, AeonLucid said:

Progress

 

I have made some progress on my issue. What helped was understanding what iowait means and how it occurs. It happens when something requires io of a disk while that disk is under full load by another process, causing it to wait.
 

This gave me the idea to split my cache pool into two.

  • One cache pool (a) is used for appdata / system / domains (vms)
  • Other cache pool (b) is used for all shares that will eventually get moved to the array, such as plex media / downloads

Doing this has fixed my issue of plex (or other containers) becoming unresponsive when media is downloaded, because it is a totally separate disk.

 

Remaining issue

 

While this has solved some issue's, I still have issues with the mover. When files are moved from the cache pool (b) to my array, reading anything from the array of the disk that data is being moved to just does not work until the mover is finished. I tried to fix this by changing the mover CPU & IO priority with my own plugin (CA Mover Tuning fork). This seemed to have helped a little bit, but after the mover finishes I notice high usage across my entire array (updating parity?) which still causes lock ups (cpu iowait) until that is finished, and this puts load on all disks.

 

I have already tried both available md_write_method, reconstruct write (aka turbowrite) finishes the mover faster but the parity process after still causes issues.

 

2022_02_12-07_02-chrome_hTEstVAsfC.png.7a24f926dfee5a71e2fb376d359c30c9.png 2022_02_12-07_05-chrome_lhwUXnBDgJ.thumb.png.35648ef6c226451c6fa54120932297b7.png

 

 

I got the almost the same setup as you but with 2 x Samsung 870 EVO.

 

So when i download like 60mb/s my dockers get reallyslow.

 

How can i see if this is the IOwait?

 

So the only good solution is to totally seperate things? I dont understand this.

 

With my old windows server 2012 r2 this was never a problem.

 

Why cant i SSDs handle the smaal load?

Link to comment
  • 2 months later...
On 8/2/2023 at 7:23 AM, DanielPT said:

With my old windows server 2012 r2 this was never a problem.

 

Why cant i SSDs handle the smaal load?

 

This is essentially my question, too. I was running on a Windows box, had some docker problems, and moved to Unraid. I thought that since docker runs so much better in Linux than in Windows, performance would be better. But it's not, it's much worse. Basically any time any docker container or VM does something intensive, the whole box locks up and I can't even get the web GUI to respond, etc. until it's done. Frustrating.

Link to comment
  • 2 weeks later...
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.