data injestion performance issue


hoff

Recommended Posts

Hi everyone.

 

Just looking for a quick reality check on anything I have missed or if there is anything I can tune to make this rocket a bit more as I have spent close to 5 days just trying to copy data in without hangs/timeouts.

 

Some of my source shares have large files which are okay on single thread, as they spool up and max the link for 10-120seconds.

Extremely large files which take longer than 4-5mins seems to choke the box's writes and the whole platform dies.

Small files such as my photo galleries on single thread are fine, but millions of small files I am looking at 2weeks to copy as if I go multiple threads on RCLONE they hang in the same way.

 

Once the data is in there, its going to be consumed mainly via docker services and the off desktop mount point so I dont see long term read/writes having issues but this initial seed load is driving me insane :)

 

Can I tweak any tunables somewhere I have not found around RAM/schedulers to help?

 

Just finished building out a new Dell R430/Dell SC200 JBOD tray system.

  • 2 x 6Core E5 CPU
  • 96G RAM

 

The system has

  • 4 x SSD Cache Pool in the R430.
  • 2 x 12TB Parity Drives (disabled at this time)
  • 5 x 4TB 7200rpm
  • 2 x 8TB 7200rpm

 

Details of workload

  • I am working to copy data from my old synology units which I have mounted at /tmp/source using NFS.
  • I am copying data into /mnt/user/destination
  • The share is setup with the maximum distribution which I can see if I browse the share in the UI.
  • Cache Tier is disabled on this share due to data sizes

 

Settings

  • Docker - Disabled
  • VM Managed - Disabled
  • Turbo Write - Enabled
  • Parity Drives - Removed
  • Version: 6.9.0-rc2 

 

Looking at above I should have the most efficient method available to me for copying data in.

 

  • RSYNC will basically slow down from 100MB/s to 0.0001MB/s after about 30mins of work, basically dead. (Single Thread, No Compression, Local to Local copy)
  • RCLONE seems to work better provided I limit it to 1 transfer and BW Limit it to about 70MB/s.
  • If I use multiple threads on RCLONE it will basically melt and hang after about 20-25mins.
  • Restarting any of the copies above resolves the issues.
  • IOwait remains in single digits with a single copy thread and a BW limit of 80MB/s.
  • 2 transfers and I see 3-10 iowait numbers on half the CPUs (rclone, shfs and unraidd* top threads)
  • 4 transfers and I see 5-30 iowait on half the CPUs (rclone, shfs and unraidd* top threads)

 

Initial Thoughts were:

  • Parity Calc so removed the drives from the array
  • Network speeds. Completed an iperf and sat at 900/900 on a 1Gps port for almost 8 hours last night.
  • Turbo write mode which is mentioned a lot, enabled this.
  • Disk issues. I have checked the smart on everything. the 12TB and 1 of the 8TB are brand new. The 8TBs are 6 months old, the 4TB are 16 months old and will be replaced with 8TB's that are 6 months old as soon as the synology sources are empty.

 

 

 

 

 

 

 

 

Edited by hoff
Link to comment

To be clear guys. Appreciate getting 100MB is not happening due to limitations of unraid/etc.

The bit that's driving me insane is the process hang/io timeouts causing the apps to fail and losing 8-10hours of copy time while asleep.

I am going to leave RCLONE on single/20MB/s tonight and see what happens but that's 3days of copy time but better than nothing....

 

Tried all the RX/TX and flow control settings tonight too on both the source and destination.

 

It seems to all be down to threads, the second I do more than 1 parallel copy it explodes and dies. 1 copy, 1 thread, with a BWlimit of 80MB/s and it seems stable... if ANYTHING happens on the box including my docker container backups, it basically explodes.

 

Yes. Its a single SAS path as the disks are not DP/etc. I am going to investigate this part of it tomorrow.

 

Edited by hoff
Link to comment

okay. This is all on my SAS connection I think in case someone else finds/reads this...

 

[   28.369041] mpt2sas_cm0: LSISAS2308: FWVersion(17.00.01.00), ChipRevision(0x05), BiosVersion(07.24.01.00)

All my disks are desktop disks so sata so only single channel and I only have a single SAS cable connected

 

 

 

Single Link on Dell H310 (1200MB/s*) <=- not my controller but close enough to be honest

8 x 137.5MB/s

12 x 92.5MB/s

16 x 70MB/s

20 x 55MB/s

24 x 47.5MB/s

 

At the moment my array is made up of 8 disks in the tray and from a little more testing based on the data above, I seem to be able to have as many threads as I want as long as I dont go over ~70MB/s.... This puts me basically in the ballpark on the above...

 

EDIT: after 15mins it died gain.... wonder how low I need to set it with threads to survive...


So this is what the array can give me :) no more sooking about it.

 

Edited by hoff
Link to comment

for small files copy the fastest is FTP. You can also resume FTP transfers easily.

this is my test results table if you want to test something else
immagine.thumb.png.f016f7ab1e0492a60bf1dc96b5a34321.png

 

my experience is simillar. after few minutes of huray start it chokes to ridiculous speeds with certains protocols or config combinations. still finding the optimal setup.

my setup is way more simple though. i have only 3 SATA drives 2x 4TB ona for parity one for data + 120 GB SSD cache.
the hardware is capable of doing better so the sw is the suspect here.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.