unRAID Server Release 5.0-rc6-r8168-test Available


Recommended Posts

  • Replies 257
  • Created
  • Last Reply

Top Posters In This Topic

How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14.

 

Does anyone here have over 15 drives and is not experiencing slow down?

Link to comment

How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14.

 

Does anyone here have over 15 drives and is not experiencing slow down?

Did you experience the same slow down with 5.0rc5?

 

I have a server at home running 5.0rc5 and can run a parity check.  I have parity+14 data+cache and all drives are on SASLP cards.

 

 

Note: I will likely NOT be updating to 5.0rc6 as it uses a newer kernel.  The 3.0.* kernels work fine for me but the 3.1.*, and 3.3.* (and possibly the 3.4.*) series of kernels seem to not play as well with my motherboard.

Link to comment

How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14.

 

Does anyone here have over 15 drives and is not experiencing slow down?

 

Could it be that people running drives all from the motherboard SATA ports are NOT experiencing issues, where as people using external addon cards ARE? Just a thought in case it helps.

Link to comment

How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14.

 

Does anyone here have over 15 drives and is not experiencing slow down?

Did you experience the same slow down with 5.0rc5?

 

I have a server at home running 5.0rc5 and can run a parity check.  I have parity+14 data+cache and all drives are on SASLP cards.

 

 

Note: I will likely NOT be updating to 5.0rc6 as it uses a newer kernel.  The 3.0.* kernels work fine for me but the 3.1.*, and 3.3.* (and possibly the 3.4.*) series of kernels seem to not play as well with my motherboard.

 

I'm ~90% sure I experience parity slow downs on RC1-RC5, but it was more like 65MB/s, RC6 is around 40MB/s. B14 is 80-100MB/s.

 

However, I reported something weird that I noticed with B14 during parity sync here. This doesn't seem to happen as drastically in the RCs. *shrug*

 

Could it be that people running drives all from the motherboard SATA ports are NOT experiencing issues, where as people using external addon cards ARE? Just a thought in case it helps.

 

I run all 22 of my drives off of 3x SAS2LP-MV8 cards. The top 2 cards are on PCI-E x8 2.0 bandwidth, and the bottom on PCI-E x4 2.0 and it has only 6 drives on it. There's no way it's a bandwidth/bus issue (works fine on older betas, and many others have reported parity speed loss). This setup should be limited by the drives.

 

The good news is I have not experienced any SAS-MV8 issues so far on RC6. It may of been fixed in the latest kernel, but the parity speed is a game breaker for me. Hopefully I won't have to spend the next year on B14. :P

 

I have four data, plus parity, plus cache.  RC6 is much slower for me than RC5.  Currently four drives on mobo, two on non-LSI card.

 

Well there goes my theory. :(

 

Hopefully some of the higher tech people can figure out whats causing it.

Link to comment

How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14.

 

Does anyone here have over 15 drives and is not experiencing slow down?

Probably as important, is to list the disk controller/kernel modules involved.  Granted, people with over 6 or 8 drives have additional disk controllers installed.  It is highly likely it is only one piece of hardware that is slower with the newer release.

 

In any case, I have only 6 data drives + parity in my newer unRAID array.  It had no change in parity calc speed.  It is using the "sata_sil" and "jmicron" sata controller kernel modules.  (all the Supermicro C2SEE motherboard ports plus a no-name add-in card)

Link to comment

Regarding Parity check speeds: 

 

One thing that I noticed was that the speed was not constant.  (You have to hit "Refresh" in your browser to update the status during parity checks using the Simple Features interface.)  During my testing of parity check speeds on beta 14 and this release candidate is that the speed would occasionally drop back to about 2/3 of the maximum rate.  I am not sure what causes this as I don't have any plugins running other than apcupsd (UPS monitoring plugin).  I would think that it would be something running in the background that is occasionally 'steals' enough CPU cycles to slow down the parity check.  It would be my suggestion that you check parity speed that you run it long enough to actually establish what your rate is-- somewhere in the range of 5% to 10% would probably be about right.  I also suspect that time-to-completion should be checked as well as the actual rate as the time-to-completion may be a more accurate indicator of how things are going than the actual rate. 

 

Question for Joe L. ---  How do you find the names of SATA controller kernel modules? 

 

Underline italics added in edit of post

Link to comment

One thing that I noticed was that the speed was not constant.

As far as I have seen, the speed of the parity check has never been constant if is limited by disk speed - this is due to the variable density of data across disk cylinders.  My checks always begin at a high rate (115 - 120MB/sec) and slow down as the check progresses.  I would think that is normal - certainly I would not expect anything different.  As you say, one does have to refresh the display to get an up to date figure. 

 

When I had much slower hardware (Atom CPU, PCI bus SATA controller as opposed to PCI-e) the speed was pretty much constant thought the parity check.

Link to comment

One thing that I noticed was that the speed was not constant.

As far as I have seen, the speed of the parity check has never been constant if is limited by disk speed - this is due to the variable density of data across disk cylinders.  My checks always begin at a high rate (115 - 120MB/sec) and slow down as the check progresses.  I would think that is normal - certainly I would not expect anything different.  As you say, one does have to refresh the display to get an up to date figure. 

 

When I had much slower hardware (Atom CPU, PCI bus SATA controller as opposed to PCI-e) the speed was pretty much constant thought the parity check.

 

I can add a piece of data, I am running RC6 with 23 total drives in the array plus a cache drive.  I have 3 x AOC-SASLP-MV8s.  I was getting slower parity checks in the RC3 and RC5 builds at around 65MB/sec at the start of a check vs. closer to 80-90MB/sec in prior betas.  With RC6 I just ran a parity check and let it go for a bit and am averaging 25MB/sec.  Something's up.  Could we be saturating the bus or something with so many drives going at one time?

 

G

 

P.S. And yes, Joe L, how do you find the kernel/module names you were asking about?

Link to comment

There are some tuneables related to parity sync on the Disk Settings page:

md_num_stripes

md_write_limit

md_sync_window

 

For each of these, it will either say "default" or "user-set" to the right of the input field.  If you set an input field to blank and click Apply, it sets that value back to the default.

 

Current defaults are:

md_num_stripes 1280

md_write_limit 768

md_sync_window 384

 

md_num_stripes - is going to impact total memory used by the unraid driver.  This memory is used to perform the parity calculations both for normal writes, and for reconstruct writes (writing to an array with a missing/disabled disk), and parity sync/check.  Roughly, each stripe requires 4096 x N bytes, where N is the number of disks in the array.  You can leave this number at it's default unless you want to really increase the other two values.  This value must always be bigger than either of the other two.

 

md_write_limit - determines the maximum number of stripes allocated for write operations.  This is to prevent the entire stripe pool from getting allocated when a large write is taking place, so that reads can still take place.  Increasing this number will increase write throughput, but only up to a limit.

 

md_sync_window - the one we're interested in for parity sync/check.  You can think of this as the number of parity sync/check stripes permitted to be "in-process" at any time.  The larger this number, the faster parity sync/check will occur, again up to a limit.  Making this too big however, may introduce unacceptable latencies for normal read/write occurring during parity sync/check.

 

So I suggest experimenting with increasing md_sync_window - I have this set to 512 for in-house servers.

 

Link to comment

Hi Tom

 

this versions is a big Lime for a br10i controller

 

been running r8168 for weeks without any issue ... installed yesterday this version to help you with testing and last night when mover went into action all drives on the br10 gave errors and were kicked of the array somehow with a lot of errrors

funny thing even with 44000 + errors the disks were not redballed

 

I went back to r8168 without any issues

 

Please note i don't have a parity disk running yet on this server.... just moved to a 2 unraid setup and need to spread the budget a bit :P

i also attached the inventory image just so you can see it are only the br10 disks that got the errors

 

 

Syslog attached to this post (2012.07.12) is from 8168 for comparission ...

r8168-test syslog (2012-07-13) uploaded to my skydrive -> https://skydrive.live.com/redir?resid=96282FB49FCE9B37!148

 

syslog-2012-07-12.zip

Lian-Li.Movie.Unraid.zip

Link to comment

There are some tuneables related to parity sync on the Disk Settings page:

md_num_stripes

md_write_limit

md_sync_window

 

For each of these, it will either say "default" or "user-set" to the right of the input field.  If you set an input field to blank and click Apply, it sets that value back to the default.

 

Current defaults are:

md_num_stripes 1280

md_write_limit 768

md_sync_window 384

 

md_num_stripes - is going to impact total memory used by the unraid driver.  This memory is used to perform the parity calculations both for normal writes, and for reconstruct writes (writing to an array with a missing/disabled disk), and parity sync/check.  Roughly, each stripe requires 4096 x N bytes, where N is the number of disks in the array.  You can leave this number at it's default unless you want to really increase the other two values.  This value must always be bigger than either of the other two.

 

md_write_limit - determines the maximum number of stripes allocated for write operations.  This is to prevent the entire stripe pool from getting allocated when a large write is taking place, so that reads can still take place.  Increasing this number will increase write throughput, but only up to a limit.

 

md_sync_window - the one we're interested in for parity sync/check.  You can think of this as the number of parity sync/check stripes permitted to be "in-process" at any time.  The larger this number, the faster parity sync/check will occur, again up to a limit.  Making this too big however, may introduce unacceptable latencies for normal read/write occurring during parity sync/check.

 

So I suggest experimenting with increasing md_sync_window - I have this set to 512 for in-house servers.

 

I let it go to 3% completed before checking speed, I refresh 5 times each.

 

md_num_stripes 1280

md_write_limit 768

md_sync_window 384

 

40-45MB/s

 

--

 

md_num_stripes 1280

md_write_limit 768

md_sync_window 512

 

45-55MB/s

 

--

 

md_num_stripes 1536

md_write_limit 768

md_sync_window 768

 

With the above... one of my disks seem to have died the second I pressed Parity Sync. Can't get even get SMART results. Syslog below...

 

EDIT: Restarted server, drive is detected. No SMART errors, drive is still red balled. Seems like unRAID did not like those values, am I forced to rebuild this drive?

syslog_errors.txt

Link to comment
this versions is a big Lime for a br10i controller

 

... and for all other LSI-based controllers

 

This is the fault which has been discussed so many times - with later 3.1 + Linux kernels, access to a drive while it is spun down will generate these errors.

Link to comment

this versions is a big Lime for a br10i controller

 

... and for all other LSI-based controllers

This is the fault which has been discussed so many times - with later 3.1 + Linux kernels, access to a drive while it is spun down will generate these errors.

HI Peter

 

I knew it would happen.. was just trying to be helpfull and provide tom with extra logs to go through...  which might or might not help....

since this server has no parity disk was the damage non existing as i knew...

but there might be some info in the logs that help tom.... although i just saw that syslogd restarted at 04:00 AM and the mover started at 03.45Am

i checked the previous log in the logs folder but that stops at the reboot where i installed R8168-test...

so this might have been for nothing ....

Link to comment
I knew it would happen.. was just trying to be helpfull and provide tom with extra logs to go through...

 

Okay, that wasn't clear from your original post.  I had also posted a log previously, with a controlled provocation of the bug, so that my syslog didn't overflow.

 

Anyway, as Tom posted quite recently, the problem has been traced back to a particular commit in the Linux kernel tree and it is hoped that those involved in that will soon address the issue ... there is light at the end of the tunnel!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.