SMB writes stalling at 0bytes/sec (ram flush issue?)


Recommended Posts

I am in the process of moving ~15TB of data to a 28TB array and seem to be seeing some strange behavior. I believe it might be explained with the core function of how Linux manages memory, but I wanted to double check before going down a rabbit hole...

 

Setup:

  • Unraid machine has 72GB of ram with a 10GBit NIC.
  • Caching is Disabled
  • Parity is Enabled
  • All Disks are Spinning

 

Transfer Methods:

  • Windows Server 2012 drag and drop between a local disk and Unraid share (causes the stall)
  • MS Robot copy (works fine unless another machine uses drag and drop to cause a stall)

 

Experienced Behavior:

 

I attempt a copy using MS Explorer via drag and drop or copy and paste. The transfer begins at 150-180MB/sec. This part strikes me as odd because it seems to be the max read speed of the source disk. However, I would expect the write speeds to be closer to ~50MB/sec with parity on and no Unraid cache drive.

 

The only thing that makes sense is that Unraid is caching my transfer into a slice of the system's 72GB of ram. This theory is also supported because the chart for array writing activity is near 0/MB at the start of the SMB transfer (while the ram cache is filling).

 

The problem seems to be that once the ram cache fills, it begins to flush its data to the array at a much slower write speed. This process bogs down the whole system and causes any SMB writing from other machines in the network to drop to 0.

I also see the array write speed chart spike up to maximum write speed. If all the SMB writes have stalled, this must be the ram cache trying to flush?

 

It feels like I'm filling my ram cache with a fire hose and then trying to empty it with a straw. This behavior would be completely fine if I were transferring ~50GB or less... but not so much with the big load ups.

 

My MS Robocopy has been running like a champ for the last 15 hours. It copies slower than MS Explorer and that slower speed seems to avoid the stalling situation. However, I can instantly replicate the stall if I start a drag and drop transfer in MS Explorer from another machine.

 

How can I better manage some settings to stop my transfers from stalling? I'm fine with the slower write speeds because all this stop and go nonsense is probably just as slow.

 

Or do I have it completely wrong?

 

Link to comment
13 hours ago, CallOneTech said:

it begins to flush its data to the array at a much slower write speed. This process bogs down the whole system and causes any SMB writing from other machines in the network to drop to 0.

This is somewhat normal but usually only last for a couple of seconds, it can last much longer with large amounts of RAM, you can decrease the amount of RAM used for write cache, 20% free RAM is the default.

 

Also look into enabling turbo write for the initial load, it should go much faster.

Link to comment

In the past I did experience lots of SMB issues with big files or mass of data. What helped here:

 

1.) Fetch the data - don't push it. What I mean is: Initiate the copy from the Unraid machine, not from the Windows machine.

 

2.) Don't write to a user share, write to a disk share.

 

3.) If possible, combine both 1.) and 2.).

 

This is my own experience. Since using this workaround I never had problems again.

 

Link to comment

I'm not sure if I would call this 100% fixed, but it's definitely a usable work around if anyone with a similar issue is reading this thread in the future.

 

I didn't mess with the RAM cache settings yet, but I did turn on Turbo Write.  Since enabling it I have been able to run a MS Explorer SMB drag and drop style transfer from another machine (2 huge writes total) without having either transfer fail.

 

The Explorer based copy has slowed a bit.  The bulk of it happens between 80-100MB/s which tells me it is still RAM caching like crazy.  It still goes near 0B/s when flushing and if it is a single huge file it will completely bottom out at 0B/s.  The good news is these stalls don't last long enough to cause the transfers to fail anymore 🥳

 

Turbo Write seems to allow it to flush fast enough and prevent the major stalls that I was seeing before.   My attached graph seems to verify this because the gaps between the bursts of array write activity are MUCH tighter than they were before.  I haven't timed it, but I assume it was stalling at 0 for ~60-90 seconds VS maybe 5-15 seconds with Turbo enabled.

 

I assume I could reduce the stalls even more by changing the max RAM cache to 4GB as per @JorgeB, but I dont want to touch anything because it is finally running without me having to babysit.

 

Big thanks to everyone for the helpful insights.

unraid001.jpg

  • Like 1
Link to comment

I ran into this same issue. i tried everything I could but what I found was it stalled only when using the built in transfer tool. I downloaded fast copy. when I have a stall, i try again. usually it stalls again right away. at this point I run it with fast copy and it works perfect every time. 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.