Michal Posted February 7, 2017 Share Posted February 7, 2017 I've been searching for how to solve this problem for weeks to no solution, so hopefully this post will help. I have a new server, about a month old now, ASUS Z12PE-D16 WS, XEON E5-2697 v4 @ 2.3Ghz, 128Gb RAM @ 2133Mhz.... not exactly a budget build... 16X6TB Drives in Array so far.,... During copies from my old server over the network everything will be going find 60+ MB/s (in and out limitation of PC doing the transfer). Then suddenly everything will drop to 1 MB/s and access all files on the array and the entire web interface comes to a hault. This occours and can go on for hours and hours at full speed before this happens and sometimes minutes... I've just finished a party check at 130+ MB/s and I can pull off each drive at atleast 150+ MB/s ... The only error I'm getting as such is in the glances docker (screenshot attached) that shows CUP_IOWAIT going through the roof. With Cache Drive disabled for partition, 60+ MB/s copy normal, 1+MB/s copy during CPU_IOWAIT issues.. With Cache Drive Enabled for partition, 60+ MB/s Copy normal, but also 60+ MB/s copy during IO-WAIT but server is still very sluggish..... that makes me think this is happening all the time and I only notice it when copies are happening... Most drives are WD RED 6TB connected directly to motherboard, including parity, with 4 drives being enterprise seagate SAS through a LSI 9311-8I. Any thoughts? this is driving me crazy!! Thanks tower-diagnostics-20170207-1917.zip Quote Link to comment
ninjabucket Posted February 7, 2017 Share Posted February 7, 2017 Unrelated, but how did you setup your cache? I have a similar build, and when I enable cache on a share that I am copying multiple TB's of data to, the cache fills up and things go wonky. It seems like the Mover process gets stuck and never finishes, then things bog down. For now I have cache disabled. I have the same LSI and am only getting 30MB/s in - but my drives are not as good. Quote Link to comment
John_M Posted February 7, 2017 Share Posted February 7, 2017 I don't know how to interpret the "Warning or critical alerts" section but assuming the information in the top right hand corner is based on the output of top I don't see a problem. CPU is spending 4.8% of its time waiting for disk I/O, which is not an abnormal figure. A load average of around 8 on a 36-core machine means it's essentially idle - at any given time there are more processing cores available than processes waiting to be executed. How do you interpret the "Warning or critical alerts"? If it's a history of snapshots from top and the figures are percentages then they seem quite reasonable for a file server, depending on what it was doing at the time. Perhaps you're using the wrong profile for what is considered critical. What is the nature of the files you're copying? Many small files will produce much worse write speeds than few large files because of the significant overheads per file. Quote Link to comment
gizmer Posted December 11, 2018 Share Posted December 11, 2018 On 2/7/2017 at 8:10 PM, Michal said: I've been searching for how to solve this problem for weeks to no solution, so hopefully this post will help. I have a new server, about a month old now, ASUS Z12PE-D16 WS, XEON E5-2697 v4 @ 2.3Ghz, 128Gb RAM @ 2133Mhz.... not exactly a budget build... 16X6TB Drives in Array so far.,... During copies from my old server over the network everything will be going find 60+ MB/s (in and out limitation of PC doing the transfer). Then suddenly everything will drop to 1 MB/s and access all files on the array and the entire web interface comes to a hault. This occours and can go on for hours and hours at full speed before this happens and sometimes minutes... I've just finished a party check at 130+ MB/s and I can pull off each drive at atleast 150+ MB/s ... The only error I'm getting as such is in the glances docker (screenshot attached) that shows CUP_IOWAIT going through the roof. With Cache Drive disabled for partition, 60+ MB/s copy normal, 1+MB/s copy during CPU_IOWAIT issues.. With Cache Drive Enabled for partition, 60+ MB/s Copy normal, but also 60+ MB/s copy during IO-WAIT but server is still very sluggish..... that makes me think this is happening all the time and I only notice it when copies are happening... Most drives are WD RED 6TB connected directly to motherboard, including parity, with 4 drives being enterprise seagate SAS through a LSI 9311-8I. Any thoughts? this is driving me crazy!! Thanks tower-diagnostics-20170207-1917.zip I know it's an old post, but did you manage to fix the problem? It seems I have the same issue with my server. Quote Link to comment
Abigel Posted February 23, 2019 Share Posted February 23, 2019 (edited) me too Edited February 23, 2019 by Abigel Quote Link to comment
[email protected] Posted May 13, 2020 Share Posted May 13, 2020 Ive also got this issue too. Quote Link to comment
TiSpork Posted August 6, 2020 Share Posted August 6, 2020 If you are just seeing this alert in Glances it could just be due to Glances default config. It sets the threshold based on core count, so if you have a higher core count it's going trigger too often. https://github.com/nicolargo/glances/issues/1214 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.