CPU_IOWAIT

Michal · February 7, 2017

I've been searching for how to solve this problem for weeks to no solution, so hopefully this post will help.

I have a new server, about a month old now,

ASUS Z12PE-D16 WS, XEON E5-2697 v4 @ 2.3Ghz, 128Gb RAM @ 2133Mhz.... not exactly a budget build... 16X6TB Drives in Array so far.,...

During copies from my old server over the network everything will be going find 60+ MB/s (in and out limitation of PC doing the transfer). Then suddenly everything will drop to 1 MB/s and access all files on the array and the entire web interface comes to a hault. This occours and can go on for hours and hours at full speed before this happens and sometimes minutes...

I've just finished a party check at 130+ MB/s and I can pull off each drive at atleast 150+ MB/s ...

The only error I'm getting as such is in the glances docker (screenshot attached) that shows CUP_IOWAIT going through the roof.

With Cache Drive disabled for partition, 60+ MB/s copy normal, 1+MB/s copy during CPU_IOWAIT issues..

With Cache Drive Enabled for partition, 60+ MB/s Copy normal, but also 60+ MB/s copy during IO-WAIT but server is still very sluggish..... that makes me think this is happening all the time and I only notice it when copies are happening...

Most drives are WD RED 6TB connected directly to motherboard, including parity, with 4 drives being enterprise seagate SAS through a LSI 9311-8I.

Any thoughts? this is driving me crazy!!

Thanks

tower-diagnostics-20170207-1917.zip

ninjabucket · February 7, 2017

Unrelated, but how did you setup your cache? I have a similar build, and when I enable cache on a share that I am copying multiple TB's of data to, the cache fills up and things go wonky. It seems like the Mover process gets stuck and never finishes, then things bog down. For now I have cache disabled. I have the same LSI and am only getting 30MB/s in - but my drives are not as good.

John_M · February 7, 2017

I don't know how to interpret the "Warning or critical alerts" section but assuming the information in the top right hand corner is based on the output of

top

I don't see a problem. CPU is spending 4.8% of its time waiting for disk I/O, which is not an abnormal figure. A load average of around 8 on a 36-core machine means it's essentially idle - at any given time there are more processing cores available than processes waiting to be executed.

How do you interpret the "Warning or critical alerts"? If it's a history of snapshots from

top

and the figures are percentages then they seem quite reasonable for a file server, depending on what it was doing at the time. Perhaps you're using the wrong profile for what is considered critical.

What is the nature of the files you're copying? Many small files will produce much worse write speeds than few large files because of the significant overheads per file.

gizmer · December 11, 2018

On 2/7/2017 at 8:10 PM, Michal said:

I've been searching for how to solve this problem for weeks to no solution, so hopefully this post will help.

I have a new server, about a month old now,

ASUS Z12PE-D16 WS, XEON E5-2697 v4 @ 2.3Ghz, 128Gb RAM @ 2133Mhz.... not exactly a budget build... 16X6TB Drives in Array so far.,...

During copies from my old server over the network everything will be going find 60+ MB/s (in and out limitation of PC doing the transfer). Then suddenly everything will drop to 1 MB/s and access all files on the array and the entire web interface comes to a hault. This occours and can go on for hours and hours at full speed before this happens and sometimes minutes...

I've just finished a party check at 130+ MB/s and I can pull off each drive at atleast 150+ MB/s ...

The only error I'm getting as such is in the glances docker (screenshot attached) that shows CUP_IOWAIT going through the roof.

With Cache Drive disabled for partition, 60+ MB/s copy normal, 1+MB/s copy during CPU_IOWAIT issues..

With Cache Drive Enabled for partition, 60+ MB/s Copy normal, but also 60+ MB/s copy during IO-WAIT but server is still very sluggish..... that makes me think this is happening all the time and I only notice it when copies are happening...

Most drives are WD RED 6TB connected directly to motherboard, including parity, with 4 drives being enterprise seagate SAS through a LSI 9311-8I.

Any thoughts? this is driving me crazy!!

Thanks

tower-diagnostics-20170207-1917.zip

I know it's an old post, but did you manage to fix the problem? It seems I have the same issue with my server.

Abigel · February 23, 2019

me too

Edited February 23, 2019 by Abigel

[email protected] · May 13, 2020

Ive also got this issue too.

TiSpork · August 6, 2020

If you are just seeing this alert in Glances it could just be due to Glances default config. It sets the threshold based on core count, so if you have a higher core count it's going trigger too often.

https://github.com/nicolargo/glances/issues/1214

CPU_IOWAIT

Recommended Posts

Michal

Link to comment

ninjabucket

Link to comment

John_M

Link to comment

gizmer

Link to comment

Abigel

Link to comment

[email protected]

Link to comment

TiSpork

Link to comment

Join the conversation