unRAID Server Release 5.0-rc10 Available


limetech

Recommended Posts

If you want to test your system performance, it must be done with a completely "stock" configuration, that is, disable/remove all plugins and stop all network activity.

 

Also, I have found that sending non-read/write commands down to the drives during parity check/sync or data streaming, can have larger effects than one might think.  For example, it used to be that unRaid employed a background thread that would check drive spin up/down status every 10 seconds, mainly to determine if it was time to spin a drive down.  This was implemented by using 'hdparm' command which in turns sends down a CHECKPOWERMODE command to the drive.  To implement this command, drivers have to let all current I/O's that are currently sent down to the drive or in the driver queue to finish, synchronously send down the CHECKPOWERMODE command, then restart the I/O queues.  This is a rather harsh disruption in the I/O stream, bottom line being it hurts aforementioned performance.  As of -rc9 it no longer does it this way, but I don't know about what any plugins might be doing.

Link to comment
  • Replies 284
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Info:

 

The SimpleFeatures Disk Health plugin is also using hdparm check powermode at 10 seconds interval in the background.

 

This might be better in the customisation forum but is there a way that simple features could suspend all stats gathering / health checks plugins when it realises a parity check is running ?

Link to comment

Info:

 

The SimpleFeatures Disk Health plugin is also using hdparm check powermode at 10 seconds interval in the background.

 

This might be better in the customisation forum but is there a way that simple features could suspend all stats gathering / health checks plugins when it realises a parity check is running ?

 

It's more important to not disrupt I/O streams in order avoid media stuttering.  I should have mentioned above that sending down SMART commands can be even more disruptive because this causes the disk to drive the heads to an inner track in order to retrieve saved SMART data (with subsequent seek back when I/O resumes).

Link to comment

Info:

 

The SimpleFeatures Disk Health plugin is also using hdparm check powermode at 10 seconds interval in the background.

 

This might be better in the customisation forum but is there a way that simple features could suspend all stats gathering / health checks plugins when it realises a parity check is running ?

 

We may have to reconsider the mechanism in SF, but it isn't really a topic to be discussed here...

 

Link to comment

How much usable ram is in your machine for unRAID's use?

In the other thread we've been discussing setting a limit of 4G for unRAID's usage.

Read from here -> http://lime-technology.com/forum/index.php?topic=22675.msg220296#msg220296

 

Is this exclusively related to this motherboard? I've got 8GB installed in a Supermicro C2SEA and haven't noticed this problem... Will have to go and test this now!

 

On another note, I've successfully upgraded to RC10. Parity check has just completed successfully, and took pretty much the same time as on RC8a.

 

EDIT: Forgot to mention, my board has a Realtek RTL8111C which seems to be working fine too.

 

Sent from my Nexus 7 using Tapatalk HD

 

I got 24GB Memory no virtualization (I tried but no luck with passthrough).

I installed the RC10 Test version and I also added the mem=4095M parameter to syslinux.cfg

Here are some results:

Writing - Console Command

Cache Drive - dd if=/dev/zero of=./testhd2 bs=1024M count=1 1,1GB - (81,5; 110; 112)MB/s

Cache Drive - dd if=/dev/zero of=./testhd2 bs=1024M count=10 11GB - (110; 99,4; 113)MB/s

Cache Drive - dd if=/dev/zero of=./testhd2 bs=1024M count=4 4,3GB - (99,5; 119; 99,4)MB/s

Cache Drive - dd if=/dev/zero of=./testhd2 bs=1M count=1000 1GB - ( 91,5; 110; 119)MB/s

Normal Drive - dd if=/dev/zero of=./testhd2 bs=1024M count=1 1,1GB - (23,3; 26,6; 33)MB/s

Normal Drive - dd if=/dev/zero of=./testhd2 bs=1024M count=10 11GB - (26,8; 31,2; 26,7)MB/s

Normal Drive - dd if=/dev/zero of=./testhd2 bs=1024M count=4 4,3GB - (29,1; 24,9; 26,9)MB/s

Normal Drive - dd if=/dev/zero of=./testhd2 bs=1M count=1000 1GB - (27,1; 32,1; 29)MB/s

 

Upload - Network Transfer - smb - win7 > /mnt/{cache|disk2}

Cache Drive - 1 File ~ 1GB - (30; 27; 29)MB/s

Cache Drive - 1 File ~ 4GB - (31; 30; 31)MB/s

Cache Drive - 1 File ~ 8GB - (30; 31; 30)MB/s

Cache Drive - 1055 Files ~ 6GB - (32; 33; 32)MB/s

Normal Drive - 1 File ~ 1GB - (20; 25; 23)MB/s

Normal Drive - 1 File ~ 4GB - (25; 23; 24)MB/s

Normal Drive - 1 File ~ 8GB - (24; 24; 23)MB/s

Normal Drive - 1055 Files ~ 6GB - (23; 22; 22)MB/s

 

Download - Network Transfer - smb - /mnt/disk2 > win7

1 File ~ 1GB - (51; 46; 47)MB/s

1 File ~ 4GB - (42; 42; 41)MB/s

1 File ~ 8GB - (39; 41; 36)MB/s

1055 Files ~ 6GB - (32; 31; 32)MB/s

 

Link to comment

I upgraded from RC8 on Monday, and have been having one rather major issue since. After a few hours, my server simply stops responding to anything. I can't telnet in (it states the host is down), I have cannot type anything in direct to the server, no web GUI/unmenu, all my shares stop, all apps stop, and I have no way of gather a system log (to my knowledge).

 

This is all I'm ever able to see when I check the monitor.

 

2013-01-15%252017.32.29.jpg

 

Originally I upgraded the server, didn't run permissions or anything, and just let it go. About 6 hours later, it stopped working.

 

I hard shut it down, rebooted, and let a parity check run while permissions were being ran. Same thing, after a few hours, it stopped working.

 

Another hard shut down, I disabled all apps, stopped the parity, and let permissions run. They went through just fine, started a parity check, and sometime in the night it stopped working again.

 

Right now I've downgraded to RC8 and am letting a parity run with no apps enabled. Anyone have ideas on what's going on? I should mention, my server was running for 20 days straight with no issues and is only about a year and 3 months old from all new parts.

Link to comment

I upgraded from RC8 on Monday, and have been having one rather major issue since. After a few hours, my server simply stops responding to anything. I can't telnet in (it states the host is down), I have cannot type anything in direct to the server, no web GUI/unmenu, all my shares stop, all apps stop, and I have no way of gather a system log (to my knowledge).

 

This is all I'm ever able to see when I check the monitor.

 

Originally I upgraded the server, didn't run permissions or anything, and just let it go. About 6 hours later, it stopped working.

 

I hard shut it down, rebooted, and let a parity check run while permissions were being ran. Same thing, after a few hours, it stopped working.

 

Another hard shut down, I disabled all apps, stopped the parity, and let permissions run. They went through just fine, started a parity check, and sometime in the night it stopped working again.

 

Right now I've downgraded to RC8 and am letting a parity run with no apps enabled. Anyone have ideas on what's going on? I should mention, my server was running for 20 days straight with no issues and is only about a year and 3 months old from all new parts.

 

If possible, we would like to see the syslog just before the hard crash, so try capturing it then by opening a Telnet/PuTTY session and repeatedly (every 15 minutes?) running a capture command such as cp /var/log/syslog /boot/syslog.txt.  To repeat just up arrow and press Enter.  Then also start a tail running on the console of the server, as tail -f --lines=100 /var/log/syslog.  If you happen to spot something suspicious on the console, try capturing another syslog immediately, and more often thereafter.

 

Sometimes these crashes seem to happen very randomly and suddenly, with no warning in the syslog (thankfully rarely).  But other times, we may be able to see the problem as it develops.  After the crash, retrieve the syslog from the flash drive and post it here, zipped if necessary.

 

We perhaps should move this to the RC support forum...

Link to comment

That looks a lot like a reiserfs file system error leading to a kernel panic.

 

I would perform a reiserfsck on each of your data disks.  You can do this most easily by starting unRAID in maintenance mode (and therefore you do not have to un-mount the disks to perform the tests), you can run them on /dev/md1, /dev/md2, /dev/md3, etc...

 

Joe L.

 

Link to comment

Unevent, could by chance post how to create those temps on unraid?  I'll be moving data to and from a win7 box but at least it is an SSD and mostly I want to confirm that unraid and my unraid link aren't the problems.  I can move data to and from my two win7 boxen at link speed.

Thanks.

 

Can do it two ways: create a mount point under an existing SMB or NFS export (under an existing share), or create a new export.  NFS will be faster than SMB, but if your Windows not much choice.

 

On unRaid:

mkdir -m 777 /tmp/ramdrv  or mkdir /mnt/disk1/ramdrv (do your own export manually or piggyback an existing one, respectively)

 

_stop all addons_ (unmenu is fine) and clear the caches and see what is left when choosing the size:

sync && echo 3 > /proc/sys/vm/drop_caches

free -lm

 

Create the ram drive:

mount -t tmpfs -o size=3G tmpfs /tmp/ramdrv

The size is important and should be chosen conservatively if you don't have a swap partition enabled.  G = Gig, M = Meg and so on.  Tmpfs will be created with a set max size - will not grow more than what you tell it.  It will also be swappable if you have a swap partition enabled, which is nice for those oops as it keeps the server from crashing.

 

If you piggybacked an export, browse (Windows) to the network, then choose disk1 and it will be there.  If you want to roll your own export, edit the export file (nfs) or samba config and restart the service.

 

To capture a csv file with bwm-ng of the transfers on runraid (install bwm-ng via unmenu), telnet in and:

bwm-ng --output csv -F transfer_log.csv --count 1000 --interfaces eth0 

 

Start the command and run your transfer.  Hit CTRL-C if the transfer finishes and bwm-ng is still logging or increase the count if it ends early (scary).

 

Edit the file to add the headings below to the top:

unix_timestamp;iface_name;bytes_out;bytes_in;bytes_total;packets_out;packets_in;packets_total;errors_out;errors_in

 

Import into your favorite spreadsheet program.  Delete the last columns with no headings.  Sort to separate the "total" rows and delete, sort ascending using timestamp.  Plot vs timestamp however you want.  I'd post one of my NFS plots, but don't have a pic hosting site.  The plots are only exciting when they are crazy with hiccups and dropouts in the transfers.

 

Another tidbit is to watch for dropped packets if you suspect network trouble, in another telnet session:

watch -n.1 'ifconfig|grep dropped'

 

Drops while nothing going on is most likely framing errors that is being reported as dropped.  Drops during a transfer is more important to watch for.

 

wow ok that will take some trial and error for this duffer I'm sure :o hahah  I get my new intel nic today, but before I install it I will do my best to run a test and then repeat with the intel.

Link to comment

Tom, consider a "debugging" option to "disable drive temperature/spinstatus polling."

 

I'd use it!

The only time it actually does this, as of 5.0-rc9 is when you do any operation via the webGui, such as refresh the page.  If nothing is going on with the webGui, it will not hit the disk with any SMART or other ioctl()'s.

Link to comment

Joe,

 

r.e. "... I would perform a reiserfsck on each of your data disks."

 

I've seen you suggest this several times; and although (thankfully) I've never had a need to do it, I have a few questions that would be handy for a non-Linux guy to know the answer to:

 

=>  I presume you run this via a telnet session ... i.e. "reiserfsck --check /dev/sdx"

      ... if that's not correct, how do you run it?

 

=>  Can multiple instances be run at the same time?  (i.e. to check multiple drives)

 

=>  if it finds errors, I know you need to do "reiserfsck --check --fix-fixable /dev/sdx"

                                                        or  "reiserfsck --check --rebuild-tree /dev/sdx"

 

Two questions about this:  (1)  Can you do both the "--fix-fixable" and "--rebuild-tree" at the same time;  or does the rebuild-tree negate the need for the fix-fixable ?

 

                                        (2)  Since these are direct Linux commands, is UnRAID still tracking parity ... or does running these commands cause parity to no longer be valid?

 

=>  Is there any particular reason to run this on a periodic basis (i.e. as preventative maintenance) ?

 

Link to comment

Joe,

 

r.e. "... I would perform a reiserfsck on each of your data disks."

 

I've seen you suggest this several times; and although (thankfully) I've never had a need to do it, I have a few questions that would be handy for a non-Linux guy to know the answer to:

 

=>  I presume you run this via a telnet session ... i.e. "reiserfsck --check /dev/sdx"

      ... if that's not correct, how do you run it?

 

=>  Can multiple instances be run at the same time?  (i.e. to check multiple drives)

 

=>  if it finds errors, I know you need to do "reiserfsck --check --fix-fixable /dev/sdx"

                                                        or  "reiserfsck --check --rebuild-tree /dev/sdx"

 

Two questions about this:  (1)  Can you do both the "--fix-fixable" and "--rebuild-tree" at the same time;  or does the rebuild-tree negate the need for the fix-fixable ?

 

                                        (2)  Since these are direct Linux commands, is UnRAID still tracking parity ... or does running these commands cause parity to no longer be valid?

 

=>  Is there any particular reason to run this on a periodic basis (i.e. as preventative maintenance) ?

 

See check file systems in my sig.

Link to comment

Joe,

 

r.e. "... I would perform a reiserfsck on each of your data disks."

 

I've seen you suggest this several times; and although (thankfully) I've never had a need to do it, I have a few questions that would be handy for a non-Linux guy to know the answer to:

 

=>  I presume you run this via a telnet session ... i.e. "reiserfsck --check /dev/sdx"

      ... if that's not correct, how do you run it?

It is done via telnet or on the system console, but NEVER on the /dev/sdX devices.

 

Instructions are in the wiki here:

http://www.lime-technology.com/wiki/index.php/Check_Disk_Filesystems

 

In the most recent 5.0rc releases, there is a checkbox on the main page to start the array in maintance mode (without the disks being mounted) and therefore the step of un-mounting the disk in the instructions can be skipped as it is not mounted to begin with.

=>  Can multiple instances be run at the same time?  (i.e. to check multiple drives)

I suppose so, but you'll need to use multiple system consoles or telnet sessions.  Typically it does not take that long.

 

=>  if it finds errors, I know you need to do "reiserfsck --check --fix-fixable /dev/sdx"

                                                        or  "reiserfsck --check --rebuild-tree /dev/sdx"

 

Two questions about this:  (1)  Can you do both the "--fix-fixable" and "--rebuild-tree" at the same time;  or does the rebuild-tree negate the need for the fix-fixable ?

I think you can only do one step at a time.  Each time you run it, it will suggest the option to use next.

 

                                        (2)  Since these are direct Linux commands, is UnRAID still tracking parity ... or does running these commands cause parity to no longer be valid?

If you run these commands on the /dev/md1, /dev/md2, /dev/md3, etc devices parity IS maintained.  If, for some reason you elect to run reiserfsck directly on the first partition of the physical disk device (the /dev/sdX1 devices) parity is not maintained.  running file system check is not normally needed (or desired) on the /dev/sdX1 devices since it breaks parity.

=>  Is there any particular reason to run this on a periodic basis (i.e. as preventative maintenance) ?

Probably only needed after any "write error" that turns a disk indicator "'red" or any non-clean shutdown.
Link to comment

=>  Can multiple instances be run at the same time?  (i.e. to check multiple drives)

I suppose so, but you'll need to use multiple system consoles or telnet sessions.  Typically it does not take that long.

I have run 3-4 telnet/ssh sessions at the same time successfully (each one running --check on an individual disk), on 2/3TB drives that are almost full it took quite a bit of time. You will see it stepping through stages, once you monitor one, you will get the idea of how long it would take on a particular disk going forward.

 

Link to comment

People that had speed issues with SAS2LP and SASLP cards in RC8a. Have they been resolved for you?

 

The problems i'm having on RC8a:

Transfer to server (bypassing cache) is only 14MB/s, down from about 28MB/s in 4.7. I currently have a 2TB cache drive that is full and moving files over, in the last 4 hours it's only transferred about 60GB. It's going to take days. A fresh parity build goes about 110MB/s, but a parity sync/check with existing parity only goes about 40MB/s. Downgrading to Beta12 fixes all these issues, so it's definitely not hardware and others have reported the same findings.

 

Will update whenever my cache drive decides to finish moving... but can anyone give me good/bad news before then?

Link to comment

how much memory in yor system?

 

8GB.

 

I just updated my 2nd server to 10 and the speed issues are still there. 63MB/s after 2% into parity sync, estimated 13 hours to finish. My other server, which has far more drives, took 26 hours to finish last time. If I invalidated parity, and then rebuilt it from scratch, i'd get about 100-110MB/s... so it's definitely not a bandwidth issue... When I upgrade to 4TB, it's just going to get worse. It's already too long. :(

 

It's sad because I spent over a grand building these 2 servers (not including drives), just so I could make parity syncs faster... and then a month later unRAID updated which slowed both systems down to speeds below my old servers. Seems like these speed issues are going to make it into 5.0 final because not enough people seem to have them.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.