Transfers stalling, two most common remedies not working


klomasdo

Recommended Posts

Hi,

 

Migrated all of the virtualization and most of the storage to a brand new unRAID 6.8.2 installation during the weekend. Four 8TB drives are connected to the internal SATA-ports on a Supermicro server, three data, one parity.

 

Dockerization went well.

 

Trouble started when feeding the array from external USB-drives. After copying about 8-11 GB the transfer starts stalling and if not stopped quick enough the whole system will grind to a halt for some time. After a while it will then self heal. Tried several times with same result. Copying over network also ends the same way.

 

Since the docker containers paused for a while as well a lot of things goes out of sync and weird after effects show up. Not good.

 

Things I tried:

  • "tunable md_write_method: reconstruct write" - no difference
  • write cache via the hdparm command - all drives already had write caching enabled. Enabled anyway for good measure - no difference
  • Going through the BIOS settings for any apparent misconfigurations - none found that I can see.

 

Thing to try:

  • Tomorrow I expect to get hold of a Perc-card just to see if there is anything funny going on with the internal SATA-ports.


Currently feeding the array with "rsync --bwlimit=30M ..." to keep things going for now. That has been stable for some 20 hours, but still it should not be a requirement to limit transfer rates.

 

Any ideas? Diagnostics zip-file attached.

 

 

Thanks in advance...

unraid02-diagnostics-20200210-2253-v6-anon.zip

Link to comment

The setting for Reconstruct write has been restored as it was from the beginning.

 

Those Disk Cache settings is a clear improvement in that the server does not completely choke any more. It might go on for as long as 20GB at full speed now before slowing down. Then it fluctuates more or less wildly between 15 to 60 MB/sec. Other read/write operations will crawl and the response times from various docker containers are quite choppy, but they are still working.

 

Thanks very much for the hints. Will be interesting to see tomorrow if the Perc-card can do any more improvements.

Link to comment

Short update:

 

  • The Perc card was a complete fail; it refused to present more than two drives, so back to internal SATA again.
  • Used the downtime to upgrade BIOS as well, no noticeable difference.

 

Guess I'll just sit back and think things over for a while, that usually helps for some strange reason...

Link to comment

If left to it's own devices it will do about 40 MB/sec*, fluctuating 10 up or down. I've seen it stay around 20-25 MB/sec for a few minutes before going back again. At least the dockers keep running fairly well. There is some choppiness in the response times, though.

 

* MegaBytes per second (just for clarity)

Link to comment
On 2/11/2020 at 3:31 PM, klomasdo said:

Trouble started when feeding the array from external USB-drives.

Have you checked whether your external USB drive controller is overheating?

 

An overheating controller than throttle itself down causing high IO Wait which manifests as lower speed / choppiness.

A dying controller will also do the same thing.

 

If your external are not branded (i.e. 3rd party enclosure + an internal 3.5" HDD) then it would be best to just take the drive out and plug it into the server via SATA.

Link to comment

Well... one learns new things every day. Also worth mentioning is that the same behavior happen when copying over network.

 

The external USB drive is a brand new "WD My Book WDBBGB0080HBK" and what I can see from remote is this

  • USB-drive is 41 degrees C (from unassigned devices tab)
  • Internal drives average is 31 degrees C (from array devices tab)
  • Room temperature is 13 degrees C (unfurnished basement)

 

Link to comment
  • 3 weeks later...

An update is in order, I think...

 

Perc-card reinstalled and then only the parity disk was connected to spread out the bus utilization. Now the system feels a little bit snappier in response. With the settings provided by @PeteAsking this installation is now usable, although still a bit sluggish at times.

 

Then, after some other have-to-do-things, I finally got around to convert the old Zabbix database. Thank you very much for quickly providing that @ich777.

 

 

image.thumb.png.e7df427bd586f7b9c2f20e8b49bb1110.png

 

I can see quite some high IO wait that could explain the sluggishness. For the last few days it usually keeps around 10% with a 6 hour plateau around 40% one night. This feels a bit too high...

 

 

Link to comment

Hi there,

 

I'm not entirely sure what the expectation is here.  Copying over a network or locally to a parity protected array will have a bottleneck.  Write performance to an array using normal read/modify/write is typically between 30-60 MB/s.  With reconstruct/write, performance can be higher, but its dependent on the hardware you use.  For example, if you mix and match different performance drives, you'll tend to see reconstruct/write cap out at the speed of the slowest drives in the array.  To improve performance when copying to the array, you can configure a cache pool.

 

What's hard to diagnose is when the complaint is "I am seeing high IO wait".  Well did the copy / transfer finish or not?  What was the average speed?  If you're in the 30-60MB/s range, all is normal.  If you're expecting higher, I would need to understand why you would think the transfer should be going faster.

 

As of your last post on 2/28, you mention that the system feels snappier.  What is the average write speed to the array?  Focusing on CPU iowait is probably not worth the time.  Focusing on read/write speed to the array both locally and over the network is what's important here.

Link to comment

Hi @jonp,

 

Expectations are quite simple

  • nice docker management (check, nice work on that).
  • smooth storage for 'slow media' that also can take occasional periods of heavy duty usage (not impressed).

 

Writing to an array with simple parity calculations will of course have some impact, but with a decent CPU and quite a lot of memory the resulting throughput is way lower than it should be for such simple operations. Transfers DO finish and after the tips given in the thread they run stable without choking the whole system as it initially did. During some testing yesterday evening it went down to (stable) 15MB per second for some unknown reason.

 

All drives are the same: WD red 8TB 5400 rpm (various ages). They have in the past been well capable for sustained transfers at 1 gigabit wirespeed for hours and days on end.

 

Did some comparisons with the colleague who introduced me to unRAID. He has the same hardware spec on two rigs, one has way better CPU and the other is quite weak on the CPU front. It clearly shows that both those two rigs get what can be expected. For example he is currently preparing a backup system and has been copying at wire speed for a couple of days now...

 

 

 

Regarding IO-wait. High IO-wait has never been normal with a system that is idle or lightly loaded. I have a steady average around 10% when idle and way more during load. My colleague sits more close to 0.1% when idle and of course higher during load. We can't figure out why, but there it is.

 

The snappier feeling was likely an effect of a newly rebooted system, since that went back to normal after a while.

 

 

 

After browsing the forum for a while I can see other posts lately mentioning performance issues. In summary unRAID performance is either good or not. Guess I'm just having bad luck or something...

 

Since I've spent WAY more time than intended, I'll convert back to VMware. When that is done I'll be requesting full refunds.

 

Edited by klomasdo
Tagged jonp
Link to comment
On 3/9/2020 at 5:01 AM, klomasdo said:

Hi @jonp,

 

Expectations are quite simple

  • nice docker management (check, nice work on that).
  • smooth storage for 'slow media' that also can take occasional periods of heavy duty usage (not impressed).

 

Writing to an array with simple parity calculations will of course have some impact, but with a decent CPU and quite a lot of memory the resulting throughput is way lower than it should be for such simple operations. Transfers DO finish and after the tips given in the thread they run stable without choking the whole system as it initially did. During some testing yesterday evening it went down to (stable) 15MB per second for some unknown reason.

 

All drives are the same: WD red 8TB 5400 rpm (various ages). They have in the past been well capable for sustained transfers at 1 gigabit wirespeed for hours and days on end.

 

Did some comparisons with the colleague who introduced me to unRAID. He has the same hardware spec on two rigs, one has way better CPU and the other is quite weak on the CPU front. It clearly shows that both those two rigs get what can be expected. For example he is currently preparing a backup system and has been copying at wire speed for a couple of days now...

 

 

 

Regarding IO-wait. High IO-wait has never been normal with a system that is idle or lightly loaded. I have a steady average around 10% when idle and way more during load. My colleague sits more close to 0.1% when idle and of course higher during load. We can't figure out why, but there it is.

 

The snappier feeling was likely an effect of a newly rebooted system, since that went back to normal after a while.

 

 

 

After browsing the forum for a while I can see other posts lately mentioning performance issues. In summary unRAID performance is either good or not. Guess I'm just having bad luck or something...

 

Since I've spent WAY more time than intended, I'll convert back to VMware. When that is done I'll be requesting full refunds.

 

The fixes I proposed that you used indicate that you do not have enough RAM in the system, or you have used up all your ram on other things like dockers or VM's. I note you did not actually post the amount of RAM installed and its usage, but likely you have left too small an amount for copying files if you are wanting the fastest NAS in the world. I would suggest something like 16GB RAM free if you want speed to be super fast.

 

Pete

Link to comment
On 3/9/2020 at 5:01 AM, klomasdo said:

Did some comparisons with the colleague who introduced me to unRAID. He has the same hardware spec on two rigs, one has way better CPU and the other is quite weak on the CPU front. It clearly shows that both those two rigs get what can be expected. For example he is currently preparing a backup system and has been copying at wire speed for a couple of days now...

Does he have a cache disk or cache pool?

Link to comment

Hi there,

 

Sorry to hear that performance wasn't meeting your expectations within your setup.  There are a few things you mentioned though that I think need to be addressed:

 

On 3/9/2020 at 4:01 AM, klomasdo said:

Writing to an array with simple parity calculations will of course have some impact, but with a decent CPU and quite a lot of memory the resulting throughput is way lower than it should be for such simple operations

 

Let's start with the bolded part.  What "should be" the right speed for these operations?  I'm trying to figure out what your actual expectations are here for performance on average.  You've mentioned that you've been able to attain speeds on average of 40MB/s to the array (that is on par with the average), but not what you are expecting speeds to be.  What's important to note is you'll never saturate your 1gbps link writing directly to the array.  Only the cache pool can do that.

 

On 3/9/2020 at 4:01 AM, klomasdo said:

Transfers DO finish and after the tips given in the thread they run stable without choking the whole system as it initially did. During some testing yesterday evening it went down to (stable) 15MB per second for some unknown reason.

 

Ok was this just a temporary slow down and then it picked back up?  What was the average write speed during this transfer?  Copies can ebb and flow a bit depending on the setup, so that's not entirely unusual.

 

On 3/9/2020 at 4:01 AM, klomasdo said:

Did some comparisons with the colleague who introduced me to unRAID. He has the same hardware spec on two rigs, one has way better CPU and the other is quite weak on the CPU front. It clearly shows that both those two rigs get what can be expected. For example he is currently preparing a backup system and has been copying at wire speed for a couple of days now...

If he is copying data directly to the array, the only possible way he can get near 1gbps performance is if he has reconstruct-write turned on.  Without that setting, it is impossible to sustain a 1gbps network write speed for gobs and gobs of data.  The penalty for parity calculations on dedicated disks is too great for that.  Expected write performance to the array can vary based on hardware, but most modern setups with ideal configurations can attain between 40-60 MB/s to the array directly using the standard read/modify/write method.

 

On 3/9/2020 at 4:01 AM, klomasdo said:

After browsing the forum for a while I can see other posts lately mentioning performance issues. In summary unRAID performance is either good or not. Guess I'm just having bad luck or something...

Of course you'll find some posts from users talking about performance issues.  That's probably the most common topic you'll find in ANY forum for a server solution like ours (bet the FreeNAS forums are loaded with this too).  The simple truth is that the overwhelming majority of our users are getting expected performance and its only outliers like yourself that have these kinds of problems.

 

If you're in a situation where you're able to break the array configuration for testing purposes, here's what I would do to narrow down the root cause of the issue:

 

1)  Reset your array configuration (from the Tools > New Config page).  This will reset all drive assignments back to null.

2)  Add one parity and two data disks to the array that are attached to a standard HBA storage controller (no USB/addon controllers)

3)  Test write operation performance to the disks.

 

If your not able to attain 30-60 MB/s to the array with that setup, I'd be shocked.  Next we'd have to test a direct wired connection from your server to another machine to rule out a network-specific issue.  If you do get good speeds with this test, then you can slowly begin to add drives back to the system and retest until you discover a problem.  That's the best way to narrow down issues like these.

 

Oh, and its also worth mentioning that mixing an array of USB-attached storage devices and SATA-attached storage devices could also be the cause of some of your issues.

Link to comment
On 3/9/2020 at 5:01 AM, klomasdo said:

Did some comparisons with the colleague who introduced me to unRAID. He has the same hardware spec on two rigs, one has way better CPU and the other is quite weak on the CPU front. It clearly shows that both those two rigs get what can be expected. For example he is currently preparing a backup system and has been copying at wire speed for a couple of days now...

CPU usually has little or no effect, it is mostly going to be the disks and how they are used.

 

If you have cache, then you can write at whatever speed cache will support, since parity isn't involved.

 

In addition to using cache, another possibility is to simply not have a parity disk. If you to a disk in the parity array, and you have a parity disk, then you will get lower speed than a simple write to a single disk because parity has to be updated.

 

See here:

 

 

Link to comment

Hi all,

 

This post is a summary response to the latest posts.

 

My colleague's servers are pure array, no cache pools. Reconstruct write is not activated, so 1 gbit is apparently possible, since some people can do that.

 

 

The array has no USB-drives involved. USB-drives have only been used for tests and moving data back and forth.

 

Regarding further tests, I'll have to cut my losses at this time. I've already spent way too much time on this and it has made too big an impact on other commitments. When all data has been exported, the server will be reverted to the old system again.

Link to comment
19 minutes ago, klomasdo said:

Reconstruct write is not activated, so 1 gbit is apparently possible, since some people can do that.

I don't know of any that get that speed, with or without reconstruct write, with commonly used HDDs. If you read that link you can see why writes to the parity array cannot be as fast as writes to a single disk.

 

Write speed isn't the primary focus of the Unraid design. Unlike RAID, there is no striping, but there is parity, which has to be maintained in addition to any write to a data disk.

 

But, the fact that there is no striping means each data disk can be read independently on any Linux, and different sized disks can be used, and drives can be easily replaced or added without rebuilding the entire array.

 

Storage in the parity array is often write-once, read-many, archived data and media files. Cache pool allows for faster access and redundancy for those other cases where that is needed.

Link to comment
1 minute ago, trurl said:

I don't know of any that get that speed, with or without reconstruct write, with commonly used HDDs. If you read that link you can see why writes to the parity array cannot be as fast as writes to a single disk.

It is always possible he is running without parity in which case the speed WOULD be attainable?

Link to comment
2 minutes ago, itimpi said:

It is always possible he is running without parity in which case the speed WOULD be attainable?

Yes, I wouldn't be surprised if that isn't what he is comparing to. He didn't specifically mentioned that, just said no cache.

On 3/10/2020 at 8:13 PM, trurl said:

another possibility is to simply not have a parity disk.

And we often recommend doing the initial data load before adding the parity disk.

On 3/9/2020 at 5:01 AM, klomasdo said:

For example he is currently preparing a backup system and has been copying at wire speed for a couple of days now...

  

 

Link to comment
  • 1 month later...

Hi,

 

Since a few weeks the server is running VMware again. Everything is now running as it used to do and no more unstable performance in sight.

 

I'm not likely to use this software again, so I used the contact form to ask the unRAID-team about refunds. No response at all after some two-three weeks or so.

 

Is there anyone in particular that I should contact?


Thanks in advance...

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.