Joe L. Posted July 12, 2012 Share Posted July 12, 2012 My parity check speed using rc6 is 85mb/sec, everything seems to be running fine so far. My parity check speed seems about the same as on prior releases. No slowdown evident. Jul 11 20:39:36 Tower2 kernel: md: sync done. time=28257sec Quote Link to comment
spylex Posted July 12, 2012 Share Posted July 12, 2012 r8168 doesn't work at all on one of my servers, RC5 or RC6, so i'd need a separate version? Quote Link to comment
prostuff1 Posted July 12, 2012 Share Posted July 12, 2012 r8168 doesn't work at all on one of my servers, RC5 or RC6, so i'd need a separate version? That tells us absolutely nothing. 1. What is your hardware? 2. We need a syslog! 3. We need to know what you mean by "doesn't work at all"! Quote Link to comment
spylex Posted July 12, 2012 Share Posted July 12, 2012 Apologies, i will get everything ready and post here! Quote Link to comment
tyrindor Posted July 12, 2012 Share Posted July 12, 2012 How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14. Does anyone here have over 15 drives and is not experiencing slow down? Quote Link to comment
prostuff1 Posted July 12, 2012 Share Posted July 12, 2012 How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14. Does anyone here have over 15 drives and is not experiencing slow down? Did you experience the same slow down with 5.0rc5? I have a server at home running 5.0rc5 and can run a parity check. I have parity+14 data+cache and all drives are on SASLP cards. Note: I will likely NOT be updating to 5.0rc6 as it uses a newer kernel. The 3.0.* kernels work fine for me but the 3.1.*, and 3.3.* (and possibly the 3.4.*) series of kernels seem to not play as well with my motherboard. Quote Link to comment
jaybee Posted July 12, 2012 Share Posted July 12, 2012 How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14. Does anyone here have over 15 drives and is not experiencing slow down? Could it be that people running drives all from the motherboard SATA ports are NOT experiencing issues, where as people using external addon cards ARE? Just a thought in case it helps. Quote Link to comment
S80_UK Posted July 12, 2012 Share Posted July 12, 2012 I have 7 plus cache not all on the mobo. No slow down. Quote Link to comment
PeterB Posted July 12, 2012 Share Posted July 12, 2012 I have four data, plus parity, plus cache. RC6 is much slower for me than RC5. Currently four drives on mobo, two on non-LSI card. Quote Link to comment
tyrindor Posted July 12, 2012 Share Posted July 12, 2012 How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14. Does anyone here have over 15 drives and is not experiencing slow down? Did you experience the same slow down with 5.0rc5? I have a server at home running 5.0rc5 and can run a parity check. I have parity+14 data+cache and all drives are on SASLP cards. Note: I will likely NOT be updating to 5.0rc6 as it uses a newer kernel. The 3.0.* kernels work fine for me but the 3.1.*, and 3.3.* (and possibly the 3.4.*) series of kernels seem to not play as well with my motherboard. I'm ~90% sure I experience parity slow downs on RC1-RC5, but it was more like 65MB/s, RC6 is around 40MB/s. B14 is 80-100MB/s. However, I reported something weird that I noticed with B14 during parity sync here. This doesn't seem to happen as drastically in the RCs. *shrug* Could it be that people running drives all from the motherboard SATA ports are NOT experiencing issues, where as people using external addon cards ARE? Just a thought in case it helps. I run all 22 of my drives off of 3x SAS2LP-MV8 cards. The top 2 cards are on PCI-E x8 2.0 bandwidth, and the bottom on PCI-E x4 2.0 and it has only 6 drives on it. There's no way it's a bandwidth/bus issue (works fine on older betas, and many others have reported parity speed loss). This setup should be limited by the drives. The good news is I have not experienced any SAS-MV8 issues so far on RC6. It may of been fixed in the latest kernel, but the parity speed is a game breaker for me. Hopefully I won't have to spend the next year on B14. I have four data, plus parity, plus cache. RC6 is much slower for me than RC5. Currently four drives on mobo, two on non-LSI card. Well there goes my theory. Hopefully some of the higher tech people can figure out whats causing it. Quote Link to comment
Joe L. Posted July 12, 2012 Share Posted July 12, 2012 How many drives do people have that aren't experiencing any parity speed slow downs? I think I am seeing a pattern of people with smaller amount of drives not having parity speed issues. I have 20 data drives, and it definitely takes 2x longer over B14. Does anyone here have over 15 drives and is not experiencing slow down? Probably as important, is to list the disk controller/kernel modules involved. Granted, people with over 6 or 8 drives have additional disk controllers installed. It is highly likely it is only one piece of hardware that is slower with the newer release. In any case, I have only 6 data drives + parity in my newer unRAID array. It had no change in parity calc speed. It is using the "sata_sil" and "jmicron" sata controller kernel modules. (all the Supermicro C2SEE motherboard ports plus a no-name add-in card) Quote Link to comment
Frank1940 Posted July 12, 2012 Share Posted July 12, 2012 Regarding Parity check speeds: One thing that I noticed was that the speed was not constant. (You have to hit "Refresh" in your browser to update the status during parity checks using the Simple Features interface.) During my testing of parity check speeds on beta 14 and this release candidate is that the speed would occasionally drop back to about 2/3 of the maximum rate. I am not sure what causes this as I don't have any plugins running other than apcupsd (UPS monitoring plugin). I would think that it would be something running in the background that is occasionally 'steals' enough CPU cycles to slow down the parity check. It would be my suggestion that you check parity speed that you run it long enough to actually establish what your rate is-- somewhere in the range of 5% to 10% would probably be about right. I also suspect that time-to-completion should be checked as well as the actual rate as the time-to-completion may be a more accurate indicator of how things are going than the actual rate. Question for Joe L. --- How do you find the names of SATA controller kernel modules? Underline italics added in edit of post Quote Link to comment
S80_UK Posted July 12, 2012 Share Posted July 12, 2012 One thing that I noticed was that the speed was not constant. As far as I have seen, the speed of the parity check has never been constant if is limited by disk speed - this is due to the variable density of data across disk cylinders. My checks always begin at a high rate (115 - 120MB/sec) and slow down as the check progresses. I would think that is normal - certainly I would not expect anything different. As you say, one does have to refresh the display to get an up to date figure. When I had much slower hardware (Atom CPU, PCI bus SATA controller as opposed to PCI-e) the speed was pretty much constant thought the parity check. Quote Link to comment
limetech Posted July 12, 2012 Author Share Posted July 12, 2012 Apologies, i will get everything ready and post here! Please don't post bug reports in this thread. Use the unRAID OS 5.0-rc board instead. Quote Link to comment
speeding_ant Posted July 12, 2012 Share Posted July 12, 2012 FYI - Mountain Lion (10. working nicely with unRAID Quote Link to comment
JonathanM Posted July 12, 2012 Share Posted July 12, 2012 FYI - Mountain Lion (10. working nicely with unRAID LOL at (10. turning into (10. Quote Link to comment
gbdesai Posted July 12, 2012 Share Posted July 12, 2012 One thing that I noticed was that the speed was not constant. As far as I have seen, the speed of the parity check has never been constant if is limited by disk speed - this is due to the variable density of data across disk cylinders. My checks always begin at a high rate (115 - 120MB/sec) and slow down as the check progresses. I would think that is normal - certainly I would not expect anything different. As you say, one does have to refresh the display to get an up to date figure. When I had much slower hardware (Atom CPU, PCI bus SATA controller as opposed to PCI-e) the speed was pretty much constant thought the parity check. I can add a piece of data, I am running RC6 with 23 total drives in the array plus a cache drive. I have 3 x AOC-SASLP-MV8s. I was getting slower parity checks in the RC3 and RC5 builds at around 65MB/sec at the start of a check vs. closer to 80-90MB/sec in prior betas. With RC6 I just ran a parity check and let it go for a bit and am averaging 25MB/sec. Something's up. Could we be saturating the bus or something with so many drives going at one time? G P.S. And yes, Joe L, how do you find the kernel/module names you were asking about? Quote Link to comment
mbryanr Posted July 12, 2012 Share Posted July 12, 2012 I noticed a slow down from 4.7 to RC5. I need to test further. MB ports and 2 JMB362 controllers. http://lime-technology.com/forum/index.php?topic=21269.msg188995.msg#188995 Sent from my SAMSUNG-SGH-I897 using Tapatalk 2 Quote Link to comment
limetech Posted July 13, 2012 Author Share Posted July 13, 2012 There are some tuneables related to parity sync on the Disk Settings page: md_num_stripes md_write_limit md_sync_window For each of these, it will either say "default" or "user-set" to the right of the input field. If you set an input field to blank and click Apply, it sets that value back to the default. Current defaults are: md_num_stripes 1280 md_write_limit 768 md_sync_window 384 md_num_stripes - is going to impact total memory used by the unraid driver. This memory is used to perform the parity calculations both for normal writes, and for reconstruct writes (writing to an array with a missing/disabled disk), and parity sync/check. Roughly, each stripe requires 4096 x N bytes, where N is the number of disks in the array. You can leave this number at it's default unless you want to really increase the other two values. This value must always be bigger than either of the other two. md_write_limit - determines the maximum number of stripes allocated for write operations. This is to prevent the entire stripe pool from getting allocated when a large write is taking place, so that reads can still take place. Increasing this number will increase write throughput, but only up to a limit. md_sync_window - the one we're interested in for parity sync/check. You can think of this as the number of parity sync/check stripes permitted to be "in-process" at any time. The larger this number, the faster parity sync/check will occur, again up to a limit. Making this too big however, may introduce unacceptable latencies for normal read/write occurring during parity sync/check. So I suggest experimenting with increasing md_sync_window - I have this set to 512 for in-house servers. Quote Link to comment
sacretagent Posted July 13, 2012 Share Posted July 13, 2012 Hi Tom this versions is a big Lime for a br10i controller been running r8168 for weeks without any issue ... installed yesterday this version to help you with testing and last night when mover went into action all drives on the br10 gave errors and were kicked of the array somehow with a lot of errrors funny thing even with 44000 + errors the disks were not redballed I went back to r8168 without any issues Please note i don't have a parity disk running yet on this server.... just moved to a 2 unraid setup and need to spread the budget a bit i also attached the inventory image just so you can see it are only the br10 disks that got the errors Syslog attached to this post (2012.07.12) is from 8168 for comparission ... r8168-test syslog (2012-07-13) uploaded to my skydrive -> https://skydrive.live.com/redir?resid=96282FB49FCE9B37!148 syslog-2012-07-12.zip Lian-Li.Movie.Unraid.zip Quote Link to comment
tyrindor Posted July 13, 2012 Share Posted July 13, 2012 There are some tuneables related to parity sync on the Disk Settings page: md_num_stripes md_write_limit md_sync_window For each of these, it will either say "default" or "user-set" to the right of the input field. If you set an input field to blank and click Apply, it sets that value back to the default. Current defaults are: md_num_stripes 1280 md_write_limit 768 md_sync_window 384 md_num_stripes - is going to impact total memory used by the unraid driver. This memory is used to perform the parity calculations both for normal writes, and for reconstruct writes (writing to an array with a missing/disabled disk), and parity sync/check. Roughly, each stripe requires 4096 x N bytes, where N is the number of disks in the array. You can leave this number at it's default unless you want to really increase the other two values. This value must always be bigger than either of the other two. md_write_limit - determines the maximum number of stripes allocated for write operations. This is to prevent the entire stripe pool from getting allocated when a large write is taking place, so that reads can still take place. Increasing this number will increase write throughput, but only up to a limit. md_sync_window - the one we're interested in for parity sync/check. You can think of this as the number of parity sync/check stripes permitted to be "in-process" at any time. The larger this number, the faster parity sync/check will occur, again up to a limit. Making this too big however, may introduce unacceptable latencies for normal read/write occurring during parity sync/check. So I suggest experimenting with increasing md_sync_window - I have this set to 512 for in-house servers. I let it go to 3% completed before checking speed, I refresh 5 times each. md_num_stripes 1280 md_write_limit 768 md_sync_window 384 40-45MB/s -- md_num_stripes 1280 md_write_limit 768 md_sync_window 512 45-55MB/s -- md_num_stripes 1536 md_write_limit 768 md_sync_window 768 With the above... one of my disks seem to have died the second I pressed Parity Sync. Can't get even get SMART results. Syslog below... EDIT: Restarted server, drive is detected. No SMART errors, drive is still red balled. Seems like unRAID did not like those values, am I forced to rebuild this drive? syslog_errors.txt Quote Link to comment
PeterB Posted July 13, 2012 Share Posted July 13, 2012 this versions is a big Lime for a br10i controller ... and for all other LSI-based controllers This is the fault which has been discussed so many times - with later 3.1 + Linux kernels, access to a drive while it is spun down will generate these errors. Quote Link to comment
sacretagent Posted July 13, 2012 Share Posted July 13, 2012 this versions is a big Lime for a br10i controller ... and for all other LSI-based controllers This is the fault which has been discussed so many times - with later 3.1 + Linux kernels, access to a drive while it is spun down will generate these errors. HI Peter I knew it would happen.. was just trying to be helpfull and provide tom with extra logs to go through... which might or might not help.... since this server has no parity disk was the damage non existing as i knew... but there might be some info in the logs that help tom.... although i just saw that syslogd restarted at 04:00 AM and the mover started at 03.45Am i checked the previous log in the logs folder but that stops at the reboot where i installed R8168-test... so this might have been for nothing .... Quote Link to comment
PeterB Posted July 13, 2012 Share Posted July 13, 2012 I knew it would happen.. was just trying to be helpfull and provide tom with extra logs to go through... Okay, that wasn't clear from your original post. I had also posted a log previously, with a controlled provocation of the bug, so that my syslog didn't overflow. Anyway, as Tom posted quite recently, the problem has been traced back to a particular commit in the Linux kernel tree and it is hoped that those involved in that will soon address the issue ... there is light at the end of the tunnel! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.