Frank1940 Posted August 19, 2016 Author Share Posted August 19, 2016 Here is mine from the Testbed sever: raid6: using algorithm sse2x4 gen() 7386 MB/s and for my Media server raid6: using algorithm sse2x4 gen() 7644 MB/s Don't know what the speed numbers really means at this point (except that higher is always better). Quote Link to comment
John_M Posted August 19, 2016 Share Posted August 19, 2016 Even on what I consider to be a P.O.S. cpu in my main server, I don't think its that bad: raid6: using algorithm sse2x4 gen() 13394 MB/s Interesting. By all accounts my A6 6400K ought to be even more of a POS than your A8 6600K, except for its power consumption, but at stock speed I get raid6: using algorithm sse2x4 gen() 13824 MB/s I'm actually rather fond of those AMD APUs, and the AM1 Kabinis are great value for everyday use. With RC4 I just got my best dual parity check speed so far (2 parity + 6 data, all 5 TB Toshiba MD04ACA500 disks) Last check completed on Friday, 19 August 2016, 08:58 (today), finding 0 errors. Duration: 10 hours, 3 seconds. Average speed: 138.9 MB/sec Quote Link to comment
Squid Posted August 19, 2016 Share Posted August 19, 2016 Interesting. By all accounts my A6 6400K ought to be even more of a POS than your A8 6600K, except for its power consumption, but at stock speed I get raid6: using algorithm sse2x4 gen() 13824 MB/s Memory configuration and speed will also play a part Quote Link to comment
John_M Posted August 19, 2016 Share Posted August 19, 2016 Memory configuration and speed will also play a part I agree, but is the speed quoted per core or total? Our processors are essentially the same, except that yours has twice as many cores and a better GPU. Two 8 GB sticks of DDR3-1600, btw. Asus A88M-A mobo (again, very similar to yours but less capable), SAS-LP HBA (compared with your SAS2). Quote Link to comment
Squid Posted August 19, 2016 Share Posted August 19, 2016 Tom just posted a good explanation of how the numbers correlate to the real world in the rc4 thread Sent from my LG-D852 using Tapatalk Quote Link to comment
John_M Posted August 19, 2016 Share Posted August 19, 2016 Thanks. I'll have a read. Yikes! The RC4 thread has grown by two pages since I last looked. Quote Link to comment
JorgeB Posted August 19, 2016 Share Posted August 19, 2016 No. Actually you can confirm that. Just do diffs of /usr/src/linux*/drivers/md/ on each version of unraid. Pay attention to md.c and unraid.c. Yes, but there could be changes that have nothing to do with parity calculation speed, I know nothing about programming so wouldn't be able to tell, Tom already confirmed on the rc4 thread that there are no changes that would affect speed, like I suspected. Quote Link to comment
Squid Posted August 19, 2016 Share Posted August 19, 2016 Just for the hell of it, I posted a 6.2 / 6.1 compatible unraid-tunables-tester (http://lime-technology.com/forum/index.php?topic=29009.msg492140#msg492140) to see if it helps out any. Quote Link to comment
Frank1940 Posted August 19, 2016 Author Share Posted August 19, 2016 Just for the hell of it, I posted a 6.2 / 6.1 compatible unraid-tunables-tester (http://lime-technology.com/forum/index.php?topic=29009.0) to see if it helps out any. Actually found in the last post of this thread. For reference, here is a direct link: http://lime-technology.com/forum/index.php?topic=29009.msg492140#msg492140 But it is a good idea to read the first few posts in the thread to figure out what the script does and how to use it! EDIT: Running it now to see what the results are... Quote Link to comment
Squid Posted August 19, 2016 Share Posted August 19, 2016 Just for the hell of it, I posted a 6.2 / 6.1 compatible unraid-tunables-tester (http://lime-technology.com/forum/index.php?topic=29009.0) to see if it helps out any. Actually found in the last post of this thread. For reference, here is a direct link: http://lime-technology.com/forum/index.php?topic=29009.msg492140#msg492140 But it is a good idea to read the first few posts in the thread to figure out what the script does and how to use it! thx. posted the wrong link Quote Link to comment
Frank1940 Posted August 21, 2016 Author Share Posted August 21, 2016 Quick update. I download Squid's modified unraid-tunables-test.sh script and ran it to get settings for the tunables variables. I modified the settings on the 'Disk Settings' icon to the 'Best Bang-for-the-buck' suggestions and ran the non-correcting parity check again. Here are the results: Event: unRAID Parity check Subject: Notice [ROSE] - Parity check finished (0 errors) Description: Duration: 12 hours, 59 minutes, 20 seconds. Average speed: 64.2 MB/s Importance: normal The change was about one minute (longer). Except for the knowledge that was gained, it doesn't seem worth the effort! Quote Link to comment
Squid Posted August 21, 2016 Share Posted August 21, 2016 Quick update. I download Squid's modified unraid-tunables-test.sh script and ran it to get settings for the tunables variables. I modified the settings on the 'Disk Settings' icon to the 'Best Bang-for-the-buck' suggestions and ran the non-correcting parity check again. Here are the results: Event: unRAID Parity check Subject: Notice [ROSE] - Parity check finished (0 errors) Description: Duration: 12 hours, 59 minutes, 20 seconds. Average speed: 64.2 MB/s Importance: normal The change was about one minute (longer). Except for the knowledge that was gained, it doesn't seem worth the effort! If you still have the table of results, it would probably be worthwhile to post them in the tunables thread, as it might help out Pauven with his new modifications that he's been working on. Quote Link to comment
Frank1940 Posted August 26, 2016 Author Share Posted August 26, 2016 OK, after having suffer through a game of whack-a-mole (on this thread and the rc4 one) about my AMD Sempron 140, I decided to upgrade to a dual core processor. Looking around on E-bay, I found a AMD Athlon II X2 260 3.2GHz CPU for $14.00 and got it in a couple of days ago. (Note: The unRAID version used for all of these tests is 6.2 rc4.) Let's begin by comparing spec's: CPU Pass Mark P + Q Algorithm Sempron 140 @ 2.7GHz 739 prefetch64-sse (10844 MB/s) + sse2x4 (7386 MB/s) Athlon II 260 @ 3.2 GHz 1894 prefetch64-sse (12848 MB/s) + sse2x4 (8082 MB/s) Perhaps the first thing you will note is that there isn't much increase in the P + Q numbers. For a 20% increase in clock speed, there was only a 9% increase the P =Q Algorithm. For the increase in the Pass Mark number, it is even worst! So much for the preliminary look at the numbers. The real question is what is happening in the real word. For this part, I ran both the single and dual parity non-correcting Parity Check. I also had a comparable number for dual parity for the Sempron and I will include it. (I don't have a single parity test for rc4 because I never thought I would need it!) CPU Parity type Average Speed Athlon II 260 @ 3.2 GHz Single 105.8MB/s Athlon II 260 @ 3.2 GHz Dual 77.8MB/s Sempron 140 @ 2.7GHz Dual 64.2MB/s An interesting observation is that the speed for the dual parity test for 21% faster for the Athlon II than for the Sempron. Now that is a most interesting observation. Also, it is almost identical to the increase in clock frequency! And it corresponds with something I observed when the dual parity test was running. One of the cores was pegged at 100% while the other one was running between (say) 10% and 60%. Remember this is still a low end CPU and the GUI can really suck-up CPU resources. (And that is what it looks like to me was happening on that second core.) Since this a dual core processor, the Pass Mark for each core is 947 is only a 28% increase over the Sempron. What I suspect is happening (at least for the AMD processors) is that the Parity checking subroutine runs only on a single core! This would mean that one would need an AMD processor with a Pass Mark of about 1300 per core for dual parity speeds to be approximately equal to the single parity speeds. (I am not sure what happens with Intel processors Of course, this issue probably disappears when one has a processor that supports the AVX2 extensions. And I found that I made a mistake when I said earlier that support started with Sandy Bridge. (That was AVX.) For AVX2, it is actually Haswell (2013) for Intel and Carrizo (2015) for AMD. Quote Link to comment
JorgeB Posted August 27, 2016 Share Posted August 27, 2016 That's surprising, based on my tests I would expect a much bigger improvement. I used a slower Sempron and got worst results than yours, as expected, but a less powerful dual core got much better results, so maybe there's something else affecting the results. Do you mind posting your current diagnostics, if possible start a parity check, wait a couple of minutes and grab them to capture CPU usage so I can compare. Quote Link to comment
Frank1940 Posted August 27, 2016 Author Share Posted August 27, 2016 That's surprising, based on my tests I would expect a much bigger improvement. I used a slower Sempron and got worst results than yours, as expected, but a less powerful dual core got much better results, so maybe there's something else affecting the results. Do you mind posting your current diagnostics, if possible start a parity check, wait a couple of minutes and grab them to capture CPU usage so I can compare. No Problem. Here is the diagnostics file. There will be a delay in the screen capture as the array was still setup for single parity after running the single parity tests. Array is currently rebuilding the P + Q disk. That will take about three to four hours. rose-diagnostics-20160827-0650.zip Quote Link to comment
JorgeB Posted August 27, 2016 Share Posted August 27, 2016 I meant grab the diagnostics during a parity check, you can just start one, grab the diagnostics and abort, CPU usage will be in the diags. Quote Link to comment
Frank1940 Posted August 27, 2016 Author Share Posted August 27, 2016 OK, sorry about the confusion. The rebuilt of P + Q is completed (3hrs, 7 min, 56 sec). I started the non-correcting parity check, shutdown the GUI, and logged in via Putty. I ran the 'diagnostics' from the command line at 1025. (Figured the CLI would have less impact on CPU usage than the GUI!) A few minutes later, I went restarted the GUI and collected the diagnostics file using the GUI at 1029. Both files are attached. As a piece of information for an inquiring mind, where is the CPU usage information stored in these diagnostics files? rose-diagnostics-20160827-1025.zip rose-diagnostics-20160827-1029.zip Quote Link to comment
JorgeB Posted August 27, 2016 Share Posted August 27, 2016 system\ps.txt UID PID PPID C STIME TTY TIME CMD ... root 10626 2 10 06:54 ? 00:22:19 [mdrecoveryd] ... root 10763 2 30 06:54 ? 01:04:34 [unraidd] ... At this time parity check is using only 40%, rather low. Quote Link to comment
Frank1940 Posted August 27, 2016 Author Share Posted August 27, 2016 system\ps.txt UID PID PPID C STIME TTY TIME CMD ... root 10626 2 10 06:54 ? 00:22:19 [mdrecoveryd] ... root 10763 2 30 06:54 ? 01:04:34 [unraidd] ... At this time parity check is using only 40%, rather low. Interesting. But difficult to determine exactly what the numbers mean. (I went googling for the column headings and apparently the 'STIME' is the start time of the process. Both of those processes started at 6:54 today (At least, that is what I got from the googled description) and that is when I began the rebuilt of the P + Q. The parity check started about 10:22. And the 'C' value is (may be) calculated over the entire time that the process has been running. Do you (or anyone else) know of a command that allows us to look at the CPU usage at a 'snap-shot' basis? Especially from the command line... If not, I wonder if a reboot is in order to clarify things a bit? Quote Link to comment
JorgeB Posted August 27, 2016 Share Posted August 27, 2016 I'm not seeing anything obvious, but CPU usage is low, so it's not that, two things you can try if you're willing, one easy, another more complicated: 1- Change tunables, you're using default which is usually sufficient for a small array, but try these: md_num_stripes: 4096 md_sync_window: 2048 md_syn_thresh: 2000 No need to do a full check, start one and check the GUI a couple of times for current speed. 2- This one will leave you with an unprotected array, since this is your test server it may not be an issue, but don't do it if you have critical data here: Do a new config, re-assign all disks as data disks, start array and do a read check, this is a new option on v6.2 when there's no parity, since there are no calculations it's faster on CPU limited servers, if the speed remains the same there's something else limiting your speed. When done you can do another new config, re-assign your parity disks and trust parity, though it's normal for the next check to correct a few sync errors. Quote Link to comment
Frank1940 Posted August 27, 2016 Author Share Posted August 27, 2016 Suggestion (1) -- changed values and started the parity check. Speed (at around 9 minutes in) was around 53.5MB/s which is slower than the default. I also took some time and had a look at the GUI Dashboard for CPU usage for a couple of minutes. Currently, Total CPU usage is varying from 67% to 97%. CPU0 ranges from 53% to 99%. CPU1 ranges from 94% to 100%. Shutdown the GUI for a bit... Restarted GUI. 54.7MB/s at 19 minutes in. So no real improvement. (This confirms what I found earlier when I changed them.) I will copy over the critical data on this server to the Media server and see what happens with suggestion (2). (I also have off-site storage of the irreplaceable stuff but that is never quite current.) It will take a while... Quote Link to comment
RobJ Posted August 27, 2016 Share Posted August 27, 2016 system\ps.txt UID PID PPID C STIME TTY TIME CMD ... root 10626 2 10 06:54 ? 00:22:19 [mdrecoveryd] ... root 10763 2 30 06:54 ? 01:04:34 [unraidd] ... At this time parity check is using only 40%, rather low. Interesting. But difficult to determine exactly what the numbers mean. (I went googling for the column headings and apparently the 'STIME' is the start time of the process. Both of those processes started at 6:54 today (At least, that is what I got from the googled description) and that is when I began the rebuilt of the P + Q. The parity check started about 10:22. And the 'C' value is (may be) calculated over the entire time that the process has been running. Do you (or anyone else) know of a command that allows us to look at the CPU usage at a 'snap-shot' basis? Especially from the command line... As far as I can remember, the 'C' value is a current CPU activity level, running from 00 to 99, a 'snap-shot' type value. The cumulative CPU usage column is under 'TIME'. Quote Link to comment
Frank1940 Posted August 27, 2016 Author Share Posted August 27, 2016 Interesting find! Using the top command, you can see the CPU % on a real time basis with a snap-shot every few seconds. This is copy of a portion of the output screen that I got after a killed the top process. Tasks: 169 total, 3 running, 166 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 55.5 sy, 0.0 ni, 21.7 id, 0.0 wa, 0.0 hi, 22.3 si, 0.0 st KiB Mem : 6114940 total, 248304 free, 104628 used, 5762008 buff/cache KiB Swap: 0 total, 0 free, 0 used. 5249896 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10763 root 20 0 0 0 0 R 98.7 0.0 106:23.68 unraidd 10626 root 20 0 0 0 0 D 34.9 0.0 39:58.82 mdrecoveryd 17323 root 0 -20 0 0 0 S 25.3 0.0 16:06.83 kworker/0:2H 1474 root 20 0 9680 2504 2052 S 0.7 0.0 14:02.18 cpuload 2398 root 20 0 16604 2852 2112 R 0.7 0.0 0:00.10 top 15051 root 20 0 90904 4700 2888 S 0.7 0.1 22:59.99 emhttp 960 root 0 -20 0 0 0 S 0.3 0.0 13:12.26 kworker/1:1H The mdroveryd process would peak as high as 51% and this listed value was close to its low. There were two other processes, kworker/0:2H and kworker/1:1H, that seemed to combine for about 25% of %CPU. It seem to vary as which of these two was the high one. (I would assume that the %CPU is the percent of the CPU core that the process is running on.) unraidd was always the highest percentage process and it seemed it was locked at about 98%+ virtually all of the time. The remaining processes seemed to be sleeping. At least, they were not above a 1 or 2 percent at any time. Quote Link to comment
Videodr0me Posted June 9, 2017 Share Posted June 9, 2017 (edited) I can corroborate your findings on my J1900 box. Dual parity + 5 data disks (7 8tb disks in total) is bottlenecked due to one core (the cpu has 4 real cores not hypthreaded ones) being maxed out by the unraidd process.This is with 6.3.5. Passmark of my cpu is 1834 and the relevant internal speedtests show: Jun 6 13:53:13 Tower kernel: prefetch64-sse: 9376.000 MB/sec Jun 6 13:53:13 Tower kernel: generic_sse: 8332.000 MB/sec Jun 6 13:53:13 Tower kernel: xor: using function: prefetch64-sse (9376.000 MB/sec) .. Jun 6 13:53:13 Tower kernel: raid6: sse2x1 gen() 605 MB/s Jun 6 13:53:13 Tower kernel: raid6: sse2x1 xor() 2052 MB/s Jun 6 13:53:13 Tower kernel: raid6: sse2x2 gen() 1289 MB/s Jun 6 13:53:13 Tower kernel: raid6: sse2x2 xor() 2687 MB/s Jun 6 13:53:13 Tower kernel: raid6: sse2x4 gen() 2039 MB/s Jun 6 13:53:13 Tower kernel: raid6: sse2x4 xor() 2457 MB/s Jun 6 13:53:13 Tower kernel: raid6: using algorithm sse2x4 gen() 2039 MB/s Jun 6 13:53:13 Tower kernel: raid6: .... xor() 2457 MB/s, rmw enabled Jun 6 13:53:13 Tower kernel: raid6: using ssse3x2 recovery algorithm My conclusion is that the parity calculation code is probably single threaded. Which is good news, as it should be simple to use a multithreaded implementation: Either by assigning different groups of stripes to different threads or - not quite es efficient - just use one thread per parity disk. As unraid is geared towards reusing old/low power hardware to built NAS Storrage, I would very much welcome if this gets some priority with the devs. After all much better to use hardware to its fullest potential than to just throw it away and buy new stuff all the time (sustainability, save the planet, bla bla). Disclaimer: Despite our findings their is still a chance that there is another culprit for the unraidd process saturating one core, but i ruled out pci saturation/a-link/sata/tunables etc. etc. as far as i could. On 27.8.2016 at 9:25 PM, Frank1940 said: The mdroveryd process would peak as high as 51% and this listed value was close to its low. There were two other processes, kworker/0:2H and kworker/1:1H, that seemed to combine for about 25% of %CPU. It seem to vary as which of these two was the high one. (I would assume that the %CPU is the percent of the CPU core that the process is running on.) unraidd was always the highest percentage process and it seemed it was locked at about 98%+ virtually all of the time. The remaining processes seemed to be sleeping. At least, they were not above a 1 or 2 percent at any time. Edited June 9, 2017 by Videodr0me Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.