Very Slow System Performance During Write


Recommended Posts

This is a weird one to me. Basically whenever I am copying a large file to the array whether across the network or locally when I was converting some drives to unRAID overall system performance becomes unusable. I have several Docker applications that become unresponsive whenever I am doing large file transfers.

 

I'm running dual Xeon L5430 cpu's with 24GB of RAM on a Supermicro X7DVL-E motherboard using an IBM m1015 SAS card cross-flashed to an LSI 9211. All drives are SATA. I have my md_num_stripes upped to 3840 and md_sync_window upped to 1152.

 

I realize it has to do parity calculations but even as old as this hardware is I wouldn't think that the performance hit would be that huge from simply copying a large file to the array. Is there something I'm missing here?

unraid-diagnostics-20161024-1006.zip

Link to comment

That's an odd one.  The CPUs should be more than sufficient to keep things going.  Does the copying occur at full speed?

 

Seems dependent on the drive but it appears to start out at full speed at about 100MB/s but then slows down. Slower drives bottoming out at about 13MB/s while faster drives go about 45-80MB/s. It takes about 15-20sec for it to reach the slowing down point. I assume that's when the ram drive cache gets filled.

Link to comment

How is the LSI 9211 generally with unraid in JBOD?

 

I just joined the forum and this post caught my attention.

 

Anyone else have good or bad feedback?

 

I have been reading conflicting reports from all sorts of sources - i'll start searching here now too.

Link to comment

Sounds like normal write speeds. So the issue is that other server functions are hanging up / going unresponsive while the copy is underway?  Is it just some Dockers or is the UI unresponsive?  Also, is Docker on a cache drive or the array?

 

The web UI also becomes really slow. Although SSH doesn't lose responsiveness.

 

When I get home from work this evening I'm going to make sure I put the SAS card in the correct slot. This motherboard has two physical x8 slots however one is electrically x4. Need to make sure it's in the electrical x8 slot. One thing that might be slowing things down as far as the bus is concerned is that this SAS card is a PCIe 2.0 card. The motherboard only supports PCIe 1.1. While I don't think that should be a show stopper since it should be backwards compatible it would decrease the maximum throughput of the card.

Link to comment

How is the LSI 9211 generally with unraid in JBOD?

 

I just joined the forum and this post caught my attention.

 

Anyone else have good or bad feedback?

 

I have been reading conflicting reports from all sorts of sources - i'll start searching here now too.

 

It's been a great card. Previously it was in a Windows 10 system where I had been having performance issues with the integrated SATA controller with my SageTV software. Moving my drives to this card fixed all the performance problems I had been haviing.

Link to comment

This is a weird one to me. Basically whenever I am copying a large file to the array whether across the network or locally when I was converting some drives to unRAID overall system performance becomes unusable. I have several Docker applications that become unresponsive whenever I am doing large file transfers.

 

I'm running dual Xeon L5430 cpu's with 24GB of RAM on a Supermicro X7DVL-E motherboard using an IBM m1015 SAS card cross-flashed to an LSI 9211. All drives are SATA. I have my md_num_stripes upped to 3840 and md_sync_window upped to 1152.

 

I realize it has to do parity calculations but even as old as this hardware is I wouldn't think that the performance hit would be that huge from simply copying a large file to the array. Is there something I'm missing here?

 

At least part of the problem is the tunables you are using.  Write performance would have dropped dramatically when you increased md_sync_window, too far above md_sync_thresh, a new tunable for 6.2, defaulting to 192.  Try setting md_sync_thresh to be at least half of md_sync_window (576), or about 30 less than md_sync_window (1132), or other values a little above and below 1132.  Whether half of md_sync_window works better or not depends on whether you are using certain disk controllers, try both half and 30 below.

Link to comment

At least part of the problem is the tunables you are using.  Write performance would have dropped dramatically when you increased md_sync_window, too far above md_sync_thresh, a new tunable for 6.2, defaulting to 192.  Try setting md_sync_thresh to be at least half of md_sync_window (576), or about 30 less than md_sync_window (1132), or other values a little above and below 1132.  Whether half of md_sync_window works better or not depends on whether you are using certain disk controllers, try both half and 30 below.

 

Neither setting seems to improve performance. I do wonder if one of my drives is having read issues. I'm going to see if I can get the funds to replace it and see if that improves performance.

Link to comment

At least part of the problem is the tunables you are using.  Write performance would have dropped dramatically when you increased md_sync_window, too far above md_sync_thresh, a new tunable for 6.2, defaulting to 192.  Try setting md_sync_thresh to be at least half of md_sync_window (576), or about 30 less than md_sync_window (1132), or other values a little above and below 1132.  Whether half of md_sync_window works better or not depends on whether you are using certain disk controllers, try both half and 30 below.

 

Neither setting seems to improve performance. I do wonder if one of my drives is having read issues. I'm going to see if I can get the funds to replace it and see if that improves performance.

 

AFAIK those settings don't really have an effect on write speed, but they can have a very big influence on parity check speed.

Link to comment

This is a weird one to me. Basically whenever I am copying a large file to the array whether across the network or locally when I was converting some drives to unRAID overall system performance becomes unusable. I have several Docker applications that become unresponsive whenever I am doing large file transfers.

 

I'm running dual Xeon L5430 cpu's with 24GB of RAM on a Supermicro X7DVL-E motherboard using an IBM m1015 SAS card cross-flashed to an LSI 9211. All drives are SATA. I have my md_num_stripes upped to 3840 and md_sync_window upped to 1152.

 

I realize it has to do parity calculations but even as old as this hardware is I wouldn't think that the performance hit would be that huge from simply copying a large file to the array. Is there something I'm missing here?

There are a large number of read errors being reported on disk1 in your syslog.  That is going to kill performance.

Link to comment
  • 3 weeks later...

So the last few weeks have been eventful. I had ordered a replacement drive for the one with read errors. That took 2 weeks to get to me. In the meantime I had a different drive fail. I was able to pull a nearly new drive from a different system to replace it.

 

I was also suspecting that my power supply was too weak. On a warn reboot it would not detect the full 24GB of installed RAM. Only on a cold boot would that happen. I was running a dual 12V rail 520W Seasonic. Supermicro recommends a minimum of 500W for that board but I now realize that's probably for a minimal configuration. I temporarily hot wired a different power supply and hooked up 4 of the drives to it. It seemed to improve things but not by much. I ended up ordering a Seasonic 850W single 12V rail PS.

 

Got the power supply and installed it today. Preliminarily that appears to have solved my performance problems during heavy write operations. The only thing I can guess is that the 520W just couldn't keep up with all the drives operating during a heavy write. While it obviously didn't trip the overload protection it might have been high enough load to produce dirty current.

 

I'll do more testing tomorrow but so far it looks like the power supply solved it. Never before had a problem with PS brown outs so this is a good learning experience for me.

Link to comment
  • 4 weeks later...

I spoke too soon. Still having the same performance problems. My current motherboard is a Supermicro X7DVL-E, an Intel 5000V based board. The MCH itself doesn't have any PCIe channels on it and only has 2 memory channels. All PCIe goes through the ESB2 chip (Enterprise South Bridge). I believe the 5000V is an entry level server chipset and not meant for a whole lot of bus traffic. I assume that the ESB2 is the bottleneck.

 

With that assumption I've ordered a Supermicro X7DWE board. It's an Intel 5400B based board. Most of the PCIe channels on it are from the MCH and are also PCIe 2.0. Not only that but it has 4 memory channels.

 

Either all this is true and/or the board is defective. Originally I had vmware ESXi running on it with a different SAS controller and seemed to be having controller performance issues that I assumed were the controller itself. Maybe that wasn't true.

 

I'm crossing my fingers that this new motherboard will fix these weird performance issues.

Link to comment

I spoke too soon. Still having the same performance problems. My current motherboard is a Supermicro X7DVL-E, an Intel 5000V based board. The MCH itself doesn't have any PCIe channels on it and only has 2 memory channels. All PCIe goes through the ESB2 chip (Enterprise South Bridge). I believe the 5000V is an entry level server chipset and not meant for a whole lot of bus traffic. I assume that the ESB2 is the bottleneck.

 

With that assumption I've ordered a Supermicro X7DWE board. It's an Intel 5400B based board. Most of the PCIe channels on it are from the MCH and are also PCIe 2.0. Not only that but it has 4 memory channels.

 

Either all this is true and/or the board is defective. Originally I had vmware ESXi running on it with a different SAS controller and seemed to be having controller performance issues that I assumed were the controller itself. Maybe that wasn't true.

 

I'm crossing my fingers that this new motherboard will fix these weird performance issues.

 

Without looking at the specifics of the board and chipset, my money would still be on defective hardware. Many users on this board have 15+ drives running on consumer level boards with cpus as slow as Sempron 140s (though I imagine those are slowly becoming extinct!) and 2GB of ram. Obviously a system with those specs is going to be a bit poky, but looking at the passmark score (6600) of dual L5430s as well as 24GB of ram and a server-class mobo, I highly doubt it is simply a limitation of the hardware you have causing poor performance.

 

You've already swapped out the PS, which is almost always my first guess. Have you since replaced Disk 1 as well as the second drive that failed?

Link to comment

Without looking at the specifics of the board and chipset, my money would still be on defective hardware. Many users on this board have 15+ drives running on consumer level boards with cpus as slow as Sempron 140s (though I imagine those are slowly becoming extinct!) and 2GB of ram. Obviously a system with those specs is going to be a bit poky, but looking at the passmark score (6600) of dual L5430s as well as 24GB of ram and a server-class mobo, I highly doubt it is simply a limitation of the hardware you have causing poor performance.

 

You've already swapped out the PS, which is almost always my first guess. Have you since replaced Disk 1 as well as the second drive that failed?

 

Yes, I switched out the first drive. With 9 drives total it's actually on a separate 6Gb SATA controller. Some day I'll get a SAS expander so I can put all the drives on the SAS controller. It's definitely a weird problem.

Link to comment

Ok, I got the new motherboard Monday and was able to install it yesterday while our new laminate floors were being installed. Also took the chance to update the firmware on my LSI card to the latest version as well as upgrading the motherboard's BIOS.

 

Had an issue with the 2 port 6Gb Rocket RAID controller where the drive was not working as it was on the other motherboard. I moved the drive to the onboard SATA controller and it works, although only at 3Gb. Also had an issue where the system was completely locking up. After doing Memtest came to the conclusion that the memory was overheating. I guess that's the peril of running fully buffered server memory in a desktop case. I ended up building a shroud out of some foam board I had laying around to direct the airflow over the memory and that seems to have fixed that problem. The AMB's on FB memory get so dang hot. Before I installed the shroud it didn't get past 20% of a single pass before locking up after about 5 minutes. After installing the shroud it did eventually lock up but after nearly completing an entire pass after running for over 45 minutes. I'm fairly confident that the shroud provides enough air guidance to cool the memory enough for our use case.

 

So, it seems to be working better now. Watching the Dynamix System Stats with the old board when copying a large file I would get "humps" of storage access. Both reading and writing. With the new board it's just a solid block with relatively minor variations in reading and writing rate. Performance on the client side is fairly steady starting out at around 90MB/s and then going down to about 80-65MB/s after the cache is filled.

 

Still a bit concerned about the memory temps but for now it appears the shroud should be good enough for our purposes.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.