CPU_IOWAIT


Michal

Recommended Posts

I've been searching for how to solve this problem for weeks to no solution, so hopefully this post will help.

 

I have a new server, about a month old now,

 

ASUS Z12PE-D16 WS, XEON E5-2697 v4 @ 2.3Ghz, 128Gb RAM @ 2133Mhz....  not exactly a budget build...  16X6TB Drives in Array so far.,...

 

During copies from my old server over the network everything will be going find 60+ MB/s (in and out limitation of PC doing the transfer). Then suddenly everything will drop to 1 MB/s and access all files on the array and the entire web interface comes to a hault. This occours and can go on for hours and hours at full speed before this happens and sometimes minutes...

 

I've just finished a party check at 130+ MB/s and I can pull off each drive at atleast 150+ MB/s ...   

 

The only error I'm getting as such is in the glances docker (screenshot attached) that shows CUP_IOWAIT going through the roof.

 

With Cache Drive disabled for partition, 60+ MB/s copy normal, 1+MB/s copy during CPU_IOWAIT issues..

 

With Cache Drive Enabled for partition, 60+ MB/s Copy normal, but also 60+ MB/s copy during IO-WAIT but server is still very sluggish..... that makes me think this is happening all the time and I only notice it when copies are happening...

 

Most drives are WD RED 6TB connected directly to motherboard, including parity, with 4 drives being enterprise seagate SAS through a LSI 9311-8I.

 

Any thoughts? this is driving me crazy!!

 

Thanks

CPU_Wait.PNG.4b26e1f4d77bb88ac1d69e11780c9108.PNG

tower-diagnostics-20170207-1917.zip

Link to comment

Unrelated, but how did you setup your cache?  I have a similar build, and when I enable cache on a share that I am copying multiple TB's of data to, the cache fills up and things go wonky.  It seems like the Mover process gets stuck and never finishes, then things bog down.  For now I have cache disabled.  I have the same LSI and am only getting 30MB/s in - but my drives are not as good.

Link to comment

I don't know how to interpret the "Warning or critical alerts" section but assuming the information in the top right hand corner is based on the output of

top

I don't see a problem. CPU is spending 4.8% of its time waiting for disk I/O, which is not an abnormal figure. A load average of around 8 on a 36-core machine means it's essentially idle - at any given time there are more processing cores available than processes waiting to be executed.

 

How do you interpret the "Warning or critical alerts"? If it's a history of snapshots from

top

and the figures are percentages then they seem quite reasonable for a file server, depending on what it was doing at the time. Perhaps you're using the wrong profile for what is considered critical.

 

What is the nature of the files you're copying? Many small files will produce much worse write speeds than few large files because of the significant overheads per file.

 

Link to comment
  • 1 year later...
On 2/7/2017 at 8:10 PM, Michal said:

I've been searching for how to solve this problem for weeks to no solution, so hopefully this post will help.

 

I have a new server, about a month old now,

 

ASUS Z12PE-D16 WS, XEON E5-2697 v4 @ 2.3Ghz, 128Gb RAM @ 2133Mhz....  not exactly a budget build...  16X6TB Drives in Array so far.,...

 

During copies from my old server over the network everything will be going find 60+ MB/s (in and out limitation of PC doing the transfer). Then suddenly everything will drop to 1 MB/s and access all files on the array and the entire web interface comes to a hault. This occours and can go on for hours and hours at full speed before this happens and sometimes minutes...

 

I've just finished a party check at 130+ MB/s and I can pull off each drive at atleast 150+ MB/s ...   

 

The only error I'm getting as such is in the glances docker (screenshot attached) that shows CUP_IOWAIT going through the roof.

 

With Cache Drive disabled for partition, 60+ MB/s copy normal, 1+MB/s copy during CPU_IOWAIT issues..

 

With Cache Drive Enabled for partition, 60+ MB/s Copy normal, but also 60+ MB/s copy during IO-WAIT but server is still very sluggish..... that makes me think this is happening all the time and I only notice it when copies are happening...

 

Most drives are WD RED 6TB connected directly to motherboard, including parity, with 4 drives being enterprise seagate SAS through a LSI 9311-8I.

 

Any thoughts? this is driving me crazy!!

 

Thanks

 

CPU_Wait.PNG.4b26e1f4d77bb88ac1d69e11780c9108.PNG

tower-diagnostics-20170207-1917.zip

I know it's an old post, but did you manage to fix the problem? It seems I have the same issue with my server.

Link to comment
  • 2 months later...
  • 1 year later...
  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.