X9SCM-F slow write speed, good read speed


Recommended Posts

I understand that unRAID (and perhaps, the underlying Linux OS) will use all unused memory to buffer disk writes.

 

Yes, I wasn't suggesting that there is anything wrong with this behaviour - just that it will confuse matters when we ask lots of individuals to report their write speeds.

Link to comment
  • Replies 387
  • Created
  • Last Reply

Top Posters In This Topic

It could also mean that if this mechanism is failing on some systems it might be the cause of slow copying (eg: the system tryingto address the higher memory and this constantly fails). Could be worth a test to see what happens if we turn off this behaviour. Anyone know if that is possible ?

 

So not reduce the memory but block the behaviour of the kernel of using it while copying..

Link to comment

This may not go anywhere or identify a problem, but it may prove useful to gather some data on these systems actively affected by the slow transfer problem.  That said, I'd appreciate it if someone experiencing the slow write problem could run the following command and post the "data.txt" file:

sysctl -a |grep vm > data.txt;hdparm -W /dev/sd* >> data.txt;cat /proc/meminfo >> data.txt;free -l >> data.txt;ifconfig >> data.txt;ethtool eth0 >> data.txt;ethtool -i eth0 >> data.txt;ethtool -S eth0 >> data.txt

 

EDIT: The above command will show an error when executed.  It is from the hdparm -W command when used with the wildcard and is to be ignored.

 

Another bit of info to collect is a high resolution transfer capture of a _single_ binary file transfer of at least 1GB to the array - larger the better.  This will be done using bwm-ng.  Bwm-ng can be installed by unMENU if you don't already have it or I'm sure someone can make the package available or suggest an alternate equivalent (with text output).  Telnet (or however you roll) into your server, issue the command below and then start your transfer to a share from another computer.  Hit CTRL+C to cancel the logging after the transfer completes.  When you reply with the attachment please note if it was via SMB or NFS and what hardware the remote computer has (processor, ram, network card chipset or at least make/model, operating system, hard drive/ssd and brand/model).

bwm-ng --output csv -F transfer_log.csv --count 1000 --interfaces eth0 --type rate

 

As with the first data collection change "eth0" to the active interface for those with multiple interfaces.  If multi-linked/trunking replace "--interfaces eth0" with "--allif" so it collects everything.  Also please include a current copy of the syslog when attaching the files.  I have not read the entire thread, so not sure if this write issue affects in-array disk-disk writes so this only covers network writes.  The goal with both is to see how the vm system is handling the caching and the physical data writes to disk.  It would be interesting to see a pre_bwm_data.txt and a post_bwm_data.txt for those capable and have read this.

Link to comment

Command gives an error:

 

error: permission denied on key 'net.ipv4.route.flush'

SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

root@Tower:~#

 

Link to comment

Command gives an error:

 

error: permission denied on key 'net.ipv4.route.flush'

SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

root@Tower:~#

 

Yes it will, that is ok - forgot to mention that.  Most likely from the USB.  Would require more than what I wanted to include in a single command line to exclude it.  Perhaps someone will roll it into a script to run for data collection purposes if it proves useful.  Once I get some transfer data logs I will plot it up and post it.

Link to comment

First post here. Unfortunately I have the same or similar bug as a lot of you.

 

But I thought that I could bring some new data points. This is my first install of unRAID on a new server built specifically for running unRAID.

 

I started searching the forums after I got the behavior running ftp from my old Win 2003 server started at 120 mbs but always started slowing down after approx an hour to below10 mbps. I found that stopping the array and starting it again got me back to 120 mbps.

 

I then tried the "sysctl vm.highmem_is_dirtyable=1" fix and got 120 mbps again for approx 2 h before full stop that hung my Win 2003 server.

 

Finally I have tried the "mem=4095M" fix and now I only get max 40-60 mbps and changing the "dirtyable" does not have any affect.

 

System:

CPU i3 3220

Motherboard ASUS P8H77-i

RAM 2x8 GB Corsair Vengeance

Only using on-board SATA ports

NIC On-board Realtech 8111f

Using  UnRAID 5.0 RC11

Link to comment

Here's my script for testing... I have not tested network writes, as I have isolated the problem to the local unraid box.  I run the following as root on my box:

 

#!/bin/sh
dd if=/dev/zero of=/mnt/cache/output.dat bs=1024 count=8000000
dd if=/dev/zero of=/mnt/disk1/output.dat bs=1024 count=8000000
dd if=/dev/zero of=/mnt/myscratch/output.dat bs=1024 count=8000000
rm /mnt/cache/output.dat
rm /mnt/disk1/output.dat
rm /mnt/myscratch/output.dat

 

the "myscratch" disk is not part of my array.  With dirtyable on, I get great performance until my RAM fills up... in the neighbourhood of 140-200 MB/s.  With dirtyable off, I get the same speed on myscratch, but horrible results on the cache and disk1 outputs.

 

Link to comment

System spec as in my .sig (with Sandy Bridge Xeon E3-1230).

 

I achieve a sustained write speed of around 26MB/s, writing to a disk share, from Ubuntu desktop, over nfs.  I believe that this is a 'normal' figure, not subject to the slowdown which is being reported.

 

Now, I made an error in placing my order - I meant to get the Ivy Bridge processor (I deliberately purchased 1600 memory), and my Ivy Bridge replacement should arrive in about a week.  I will then be in a position to confirm whether the simple swap to IB introduces the slowdown.

 

Okay, I have now swapped from Sandy Bridge Xeon to Ivy Bridge Xeon (E3-1230V2), and there is no appreciable difference in write speed (over the network [nfs] to a disk share).  Also, at 2%, it's still reporting 110+MB/s on parity check.

Link to comment

Well I am having the same issue, and glad I found this thread.  I am not using the same or similar motherboard. My Unraid is on an AMD platform, using a Phenom x3 705e low energy chip.  I have 18 drives, including several 3TB and have noticed this issue on 5.0 rc10 and 5.0 rc11.  I previously used some earlier versions of 5.0 but did not really  notice the issue - it could have been happening, but I wasn't alert to it because I was using a cache drive which I have since removed.  The only thing I have in common with others in this thread is having 16gb of RAM.

 

When I reboot Unraid, it can't write faster than 30-40 Mbytes/sec, but over time it starts to slow down - after a day it's down to about 10 Mbytes/sec, after a few days 3-4 Mbytes/sec, and after a week or more it can get down to less than 1 Mbyte/sec. 

 

The screenshot below is me testing a random file copy to Unraid direct to disk9 that's going at 10Mbytes/sec (rebooted yesterday and it's already that slow).  As soon as I issued the 'sysctl vm.highmem_is_dirtyable=1' command, it spiked to ~110Mbytes/sec sustained speed, about 11x faster than it was currently moving at and more than 3 times what my max write speed was even on a fresh reboot.

 

Do I need to add this command to my go script?  What does this command really do?  Is it making the extra ram beyond 4gb in my system worthless?

 

mb52dzD.jpg

Link to comment

FYI,

 

I also use the "sysctl vm.highmem_is_dirtyable=1" fix.

 

However, I'd like to report that when this is set to 1, it is not possible to run the preclear_disk.sh utility on the system.  Trying it on 3TB drives, the box runs out of memory (memory fragmentation?) and kernel oops (reiserfs) after ~11hours, still during step 1. 

 

Setting the variable back to 0, allows me to complete the preclear of the drive.

 

This is 100% reproducible on my system by toggling the setting on/off.

 

 

Link to comment

FYI,

 

I also use the "sysctl vm.highmem_is_dirtyable=1" fix.

 

However, I'd like to report that when this is set to 1, it is not possible to run the preclear_disk.sh utility on the system.  Trying it on 3TB drives, the box runs out of memory (memory fragmentation?) and kernel oops (reiserfs) after ~11hours, still during step 1. 

 

Setting the variable back to 0, allows me to complete the preclear of the drive.

 

This is 100% reproducible on my system by toggling the setting on/off.

You could try the preclear_disk.sh with the -b -r -w commands to reduce it's memory foot print.
Link to comment

My preclear is still running with these parameters, slow though, will take more then a week at this rate..

 

It may not be obvious from within the preclear_disk script screen dialog that the process is having hiccups.  For me, I had to check the kernel ring buffer ( dmesg ) to see the repeated dumps occuring.

Link to comment

I still continue to think about this problem, even though I'm not affected.  It seems to me that it is very 'hit and miss' whether you are affected.  Logically, there is no reason why X9SCM-ii and X9SCL/X9SCL+ boards should not be affected - there is little difference in the hardware, yet my SCM-ii and Tom's SCL don't seem to be affected.  Also, it seems that there are a number of X9SCM users who do not seem to be affected.  The larger user base of SCM boards may explain why we have no reports of SCM-ii and SCL boards being affected.  Of course, the SCL uses a different variation of the C20x (Cougar Point) chip and this might make a difference.

 

However, it may be that the BIOS version affects the result.  We still don't have a table of results as I was proposing earlier.  Then another thought crossed my mind.  There are reports of the 82574L network interface chips behaving very strangely because the board manufacturer is responsible for programming their on-chip memory and getting it wrong sometimes.  Is it possible that other support chips (C202/204?) also have to be programmed in a similar fashion, and an error in this programming can lead to the slow writes?

 

Getting more obscure, perhaps different steppings of the cpu or support chips are affecting this.  However, we have reports of both iCore and Xeon users experiencing the problem, so that may discount the cpu as being the culprit.

 

Of course, there still has to be a change in the unRAID software (most likely the kernel or drivers), which has exposed this problem.

Link to comment

To be perfectly honoust I feel like I am kind of done with this issue (I am affected). The  dirtyable parameter solves the problem for 95% when I combine it with my 4gb of ram (4gn of ram ONLY did not solve it). And in the event I really have to move a motherload of data and I cannot wait I will revert back to the kernel in RC5 for a couple of hours... I have spent so much time on this allready..

 

This ofcourse not to say that others should not pursue.. But with the current fixes I am content.

Link to comment

Guess I was not as done as I thought I was...

 

I solved this issue for myself in a way I have not seen sofar and I am wondering if it might work for other people:

 

- I am not running with the MEM=4095 parameter

- I am not running with the dirtyable=1 parameter

- I am running with only 4gig at the moment but please note I -had- the issue when running with 4gig.

 

Basically al that used to work for me in a more or less structural way was the dirtyable=1 parameter.

 

I have now added the following parameters to my GO script to change the default cpu governor settings and this is notw giving me 20 to 30 MB/s speeds copying within the array. Speeds also do not appear to drop after some time.

 

# Change scaling governor for performance
modprobe powernow-k8  
echo 50 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 400 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

 

Would be interested to know if this works for you to. Earliest chance I get reboot-wise I will also enter my 16gigs again to see if the effect remains the same.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.