Unraid 6.3.2 becomes unresponsive with big file transfers


Recommended Posts

Hello,

 

i have a strange Problem...

If i transfer a big file to Unraid about ~3GB+ Unraid's webinterface Dockers, VM's and Shares stop to respond.

The filetransfer (to a Cache drive) is copying 3GB at full speed then it drops to 0KB/s, wait for a minute or two and then everything is ok. The filetransfer went up to about 50MB/s and after 10 seconds it went down to ~14KB/s (webinterface is again not responding) and so on...

After the transfer is complete i have to wait about 1 minute and everything works without problems.

I've tested if i transfer such a file within the VM, same behavior - Docker, VM's, Webinterface unresponsive...

Even if my Docker do a big filetransfer the whole system becomes unresponsive like i described abvoe.

 

Already checked a few threads but nothing found with the same error...

 

Also downloaded the TipsAndTweaks App and set the 'vm.dirty_background_ration' to '2' and 'vm.dirty_ration' to '3', nothing changed so i set it back to '10' and '20'.

 

My Server:

Unraid 6.3.2 Pro

ASUS Z9PA-D8

2x Intel Xeon 2670v1

64GB DDR3 1333

850W Corsair Power Supply

1x DellPerc H310 (IT-Mode)

1x DigitalDevices Octopus CI PCIe (passtrough to Windows 10Pro)

1x DigitalDevices CineV6 DVB-C Tuner PCIe  (passtrough to Windows 10Pro)

1x DigitalDevices CineV6 DVB-C Tuner addon card (passtrough to Windows 10Pro)

Array: 6x WD Red with various Sizes (one of them for Parity)

Cache: 2x SanDisk 120GB + 2x SanDisk 240GB

Unnasigned Devices App (1x SanDisk 32GB, 1x WD Blue 500GB)

2 Dockers (1x McMyAdmin, 1x jDownloader2)

4 VM's (always running: 1x Ubuntu 14.04, 1x Windows 10Pro; running if needed: 2x Windows 7)

NIC's: 2x Onboard Intel 82574L Gigabit LAN (Network is in Bonding mode - balance-alb 6)

(already turned off bonding to see if it helps but no change so i turned it back on)

 

 

I attached the diagnostics file from my Server.

 

Please help i dont know what do do else and sorry for my poor english i hope someone understand what i wrote... xD

 

chipsserver-diagnostics-20170309-2025.zip

  • Upvote 1
Link to comment

Maybe unrelated to your stated problem, but it looks like you added a cache pool after you had already setup your cache-prefer shares, and they still have files on the array. Recommend you stop the docker service and VMs then run mover to get them moved to cache. You might also install Fix Common Problems plugin, which would have told you about this and might find other problems you don't know about.

 

Then, to simplify things, go ahead and try it with one NIC, no dockers or VMs and see how the transfer goes.

Link to comment

First i want to thank you for the quick answer!

 

You are right i removed the entire cache pool last week and rebuild it because i upgraded it to SSD's.

But everything should be on the cache pool from the shares with the option 'Prefer', i double checked it and looked on all disks, everything is there where it belongs to be.

 

I downloaded the Fix Common Problems app and it finds only that the automatic update is turned off (turned off by me) and that a second registration key is found - i've fixed this instantly

58c1d1683941a_FixCommonProblems.thumb.jpg.7b072218f6a625db3689d2ac97f868a0.jpg

 

 

Then i started my testing (testfile is a 8,5GB img file), i turned everything off Docker, VM - still same behavior

copyspeed.thumb.jpg.91ae328775399a00a417d972f9157e5c.jpgsystem.thumb.jpg.b4eb8e592fde72b20325d715dfde4fc1.jpg

 

 

Then with dockers on - copy speed is ok but everything else is not responding, dockers and webgui

58c1d29a76fd9_withdockeron.jpg.a276d842461111f2d27ebff7289d441d.jpg

 

 

Then with everything on (copy speed is ok everything responds great but at the last 300MB the speed goes down to 15MB/s after 10 seconds it goes up to 105MB/s it finishes and nothing is responding for about 1 minute)

58c1d30ab30b0_everythingon.thumb.jpg.4ffc905025dcc942d7595d26f492f374.jpg

 

 

 

I think it has something to do with the Memory

stats.thumb.jpg.f1f07ed0547a118ba76e452ebc087c79.jpg

 

 

If all my dockers and two VM's are turned on the Memory is allmost full and the copyspeed is like i described in the first post (I know that Linux holds a lot in the cache but is this right?)

 

I dont understand why nothing responds for about a minute, can copy from system memory to the SSD's make the system this much unresponsive?

Link to comment

Hi and sorry to hear you are having difficulty.  Have you eliminated the network itself as the problem?  Tried new network cable to the unRAID system or your test machine? How about new ports on the router / switch its attached to?  RAM is definitely not the problem here.

Link to comment

Hi thank you for the reply.

 

Yes, the network isn't the problem.

Before i put unraid on the server it runs on windows server 2012 r2 essentials and never had such problems.

 

Also i tried a direct connection between the server and 2 different computers with a 2 new cat6 cables, same again.

 

Strange is if go to my VM through the RDP connection and began the file transfer (file was 20gb because of the much faster copy speed) i experienced the same as described above copy speed went down to 0kb/s, Unraid is unresponsive, VM doesn't respond, docker doesn't work after ~30secs the copy continues the vm and everything else respons normaly after a few seconds the copy speed drops again and everything is, again, unresponsive.

 

Maybe it's the unnassigned drives plugin?

 

Should i try to make a 'New Config' under the Tools section?

Edited by ich777
Link to comment

Hi,

 

i've copied a VM image to another share (on the Cache Disk), same problem as discribed above and i discovered this in the log:

Mar 18 20:23:24 chipsServer kernel: perf: interrupt took too long (2591 > 2500), lowering kernel.perf_event_max_sample_rate to 77000
Mar 18 20:23:47 chipsServer kernel: perf: interrupt took too long (3322 > 3238), lowering kernel.perf_event_max_sample_rate to 60000

What means this? Is this part of my problem?

 

 

Edit:

Another strange thing that i discovered is when i copy files from my Cache Pool to my Unnassigned Drives the copy speed is constant, something must be wrong with the cache disks...

 

58cdbf93ade96_UnnassignedDrives.jpg.f9a15a452d23c58a4db7cccbf10fe0a9.jpg

Edited by ich777
Attached screenshot
Link to comment

Hi ich777,

 

So after running into the same problem and pulling my hair our for more than a few days with about 100 different configurations I think I figured out what's going on. It looks like its a problem with the way the SSDs are made, and how much actually 'fast' (SLC) memory they have vs the slower (TLC) memory. 

 

I've got an OCZ Trion 150 500GB, and it looks like there is ~7GB of SLC memory on there which moves at around 400-500MB/s. After that 7GB it drops down to HDD speeds of around 70-80MB/s.  This is rather pitiful in my opinion. It wasn't a problem until i started downloading bluRays via sonarr/sabnzb and my entire system would lock up whenever they were unpacking/moving. 

 

As to why it 'locks up' and makes unraid unresponsive I couldn't tell you, but I'm almost certain this is the cause of our troubles. Let me know if that makes sense for you and if you come to a solution. 

 

JonP, any idea why a really slow cache drive would lock things up?

 

Here is an article from pcworld about the exact issue: http://www.pcworld.com/article/2947864/storage/ocz-trion-100-review-an-affordable-ssd-with-a-problem.html

Edited by psparks
forgot link
Link to comment

Hi psparks,

 

thanks for the reply.

Yeah i read such an article about the SanDisk SSD's too, but in the article the write speed, within windows and connected to a sata3 port, drops from ~450MB/s to about ~150MB/s when the SSD transfers the data from the SLC to the TLC memory, but why drop the speed in my unraid config to 0MB/s for about a minute and everything is completely unresponsive und locks everything up?

 

Bevor i changed my Cache Pool to SSD's i've got 2x 500GB WD Blue in my Pool write speed is good but after 2GB of transfer my copy speed drops to ~30MB/s and stays at this level.

 

This is very strange and i don't understand why this is happening.
 

After a few more testing, with the Cache SSD's, i discovered that locking up everything is not constant e.g if i copy a file (~20GB) write speed drops after 3GB goes up for about 2GB drops and so on. Wait a few hours copy the same file aigain, write speed is constant for about 6GB then drops, goes up copys 4GB drops and so on, if i copy the file again without waiting (after everything responds like it should) the speed is constant for about 10GB then drops and is realy slow for the rest of the copy.

 

Can someone help us out?

Link to comment
  • 2 weeks later...
  • 1 month later...

Okay, it was the two cache drives in my case (2x 120GB SanDisk).

 

I've ordered 2x 240GB SanDisk drives and once again same behaviour, after that my last attempt was to put 2x 500GB WD Blue in it and now it works.

 

But can you please look into it why unRAID becomes unresponsive if you have suche disks who have 10GB of cache on it (like the SanDisks or in @psparks case with the OCZ drives)?

 

If i put the SSD in a Windows or a Ubuntu machine there is no noticable speed drop.

 

Thank you all for the help! Appriciate it!

Link to comment
  • 1 month later...

I'm running into a similar issue. 

When I transfer to my cache drive, it is extremely slow, my Docker Containers become unresponsive, and the Web UI stops responding.

 

cache copy.PNG

 

The cache drive is a SanDisk SDSSDA120G.

I get better speeds when copying to a share on a WD red.

 

no cache copy.PNG

One thing I noticed, was the drive gets pretty warm, so I bumped the warning to 55°C and threshold to 60°C. (The manufacturer has a rating up to 70°)

Unfortunately this hasn't done much. (I'm not even sure if that only changes when I get notifications, or if it throttles based on temp.)

 

I may need to look into getting a different SSD for my cache.

Edited by klhutchins
Link to comment
  • 8 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.