UhClem

Members
  • Posts

    282
  • Joined

  • Last visited

Everything posted by UhClem

  1. I agree, but (most probably) not pertinent to the goals of the testing.
  2. Thanks. The first set are the speeds for the drives, tested individually. That is, with no contention (with other drives) for the controller (and its x2 [pcie v2] limit of ~750 MiB/s, or, and most important in this case, no contention for the single Sata port (which all 4 are multiplied off of) (and its 6Gbps [~550 MiB/s] limit). Those numbers serve as reference points for further testing ... The second set are the speeds for the drives, tested concurrently; all competing for the resource(s). In this example, the total bandwidth provided (by the 9705 port multiplier) was only ~480 MiB/s [vs 550], hence exposing it as the constraining component. That may turn out to be typical overhead for Port Multipliers, in general. (When I subject a SiI3726 port multiplier [10+ years older; only Sata II] to a similar test, it provides a maximum of ~240 MiB/s. There is some specific significance of all of this, for you Zippi. You should probably only put 2 of your unRAID array data drives on the multiplied ports, along with the SSD. The other 3 (native) ports can each have an array data drive. Your parity drive(s) should go on mobo ports, as can additional data drives. What do you think, johnnie.black? (You've played in this sandbox, right?) --UhClem
  3. Proceed with the download, install, and testing of dskt. Thank you.
  4. Yes, that will be fine for our testing purposes. The installation & testing commands will instead be: mv /dskt.txt /dskt chmod 755 /dskt /dskt X a b c d (that is a capital-X followed by the 4 letters [on paper] [e.g. sda => a, etc.] /dskt a b c d (similarly) [After we're all finished with testing, you should move /dskt elsewhere; it's not good to clutter root.] This is a quick and painless procedure (like the dentist says ...) --UhClem
  5. Zippi, one more test please: ./dskt b c d (Only the 3 Reds; omit the SSD.) Thanks. --UhClem
  6. Everything's cool. Here's the deal: The first line (ending in SControl 300) is the solid connection (ie, link up 6.0 Gbps) from the multiplied (remember my earlier post?) Sata port on the 9235 to the single upstream port on the 9705. The 2nd line is the 9705 actually identifying itself. The 3rd line is the 9705 initializing its first (of 5, downstream) Sata ports; perfectly normal. The 4th line is the 9705 reporting that that link is down (because nothing is connected to it. The 5th line is the 9705 initializing its second Sata port. 5th line is the 9705 reporting that it has a solid (and full speed [6Gbps]) connection to a Sata drive on that 2nd (of 5) Sata port. The next pair of lines are the same explanation as lines 3 & 4 (except relating to the 9705's 3rd port). And same for the next two pair in your post for the 4th & 5th empty ports. Then, in your post the typical successful report for the device (SSD) itself. Now, I want you to do something for me. OK? Just to be safe, shutdown the system. Connect at least two, and preferably three, of your WD Red's to the new card. Do it this way please: Connect one Red to the port that is a "partner" of the SSD's port, and the other two Reds to the partnered pair of ports adjoining that first pair. Now, reboot. Verify that all looks cool in this new syslog (based on the concepts in my above explanation). If not, post the snippet from the new syslog. But if all is OK, make note of the 4 device names assigned to the SSD and the 3 Reds (on a piece of paper). Download the attached (below) script (dskt.txt) to your current directory. Then: mv dskt.txt dskt chmod 755 dskt ./dskt X a b c d (that is a capital-X followed by the 4 letters [on paper] [e.g. sda => a, etc.] ./dskt a b c d (similarly) Please reply with the two 4-line outputs ... and we will all learn something. Thank you. --UhClem PS Note that the script only reads from the disks; worry not. dskt.txt
  7. You put it in your signature. But I bet that you thought signatures stopped working since you weren't seeing them. You have to enable the viewing of signatures, as a reader. Go: "3 horiz bars" (in upper right of the forum window [not your browser]) -> Account -> Account Settings -> Signature ; then Enable "View Signatures". [ To get the details I wanted, to provide you advice, I looked at the specs for your mobo (model# in your sig). It looked "weird" to me, but what do I know--I did my most recent (and probably last) build in '08. So, figuring everyone else probably knew all about it, I GOOG'd for it, filtering for this forum. And, there was the back-story. ] Since your sig says your 850 EVO is mSATA, is it not using the mobo's "mini-sata"? Or, does other weirdness preclude that? If it can use it, do so; and put the 3TB in that (now-free) Sata port. The goal is to offload the bandwidth "stress" from the MV8. I'm betting that the QM87 chipset (on the mobo; supplying the satas, etc.) can handle at least 700 MB/s. --UhClem "Are we having fun yet?"
  8. Almost certainly, you should put your 3 x 6TB drives and the 3TB on the 4 Sata ports on the motherboard. By the way, that is one strange motherboard -- but, as they say, "Don't look a gift horse in the mouth." Right, shooga? ["Strange", as in PCIe v1 and USB3, and more rs-232 ports than disk ports; but it does have 2 x GbE LANs.] --UhClem
  9. Thank you for your summary. (I hadn't bothered -- once I saw the CERN paper was one of the references, I lost all confidence ["...baby...bath water." ]). A+ for you. I'm surprised it was even published ... but I appreciate CERN's openness (or was it ignorance?). Personally, I would be totally embarrassed to admit that I had purchased, and deployed into production, 600 RAID controllers and 3000 drives, without first getting 3-4 controllers & 15-20 drives and beating the sh*t out of it all for a week or two (and not just 2GB every 2 hours). But, why should they care ... it's just the(ir) taxpayers' money. [And, in 2006, that probably represented ~US$750,000+ (in 2006 euros).] Did they even get competitive bids? [Make that $1M+] Those data path issues were formally addressed in 2007 when they were added to SMART, but had probably been implemented in drive firmware even earlier by the competent manufacturer(s). --UhClem (almost accepted a job offer from CERN in 1968 ... then my draft deferment came through)
  10. Not that CERN "study" again. c3, did you actually read it ? (not just casually) I invite everyone who has participated in this thread to read it (this version is only 7 pages). See if you can find the numerous flaws in his presentation, and conclusions. Extra credit if you deduce the overall premise/motivation. -- UhClem "Gird your grid for a big one ..."
  11. Please convince me (not being argumentative--I'm sincere!). But, to convince me, you'll need a rigorous presentation, using a valid evidence trail. I appreciate your time/effort. Oh yeah, I do recall seeing that, but always in a casual perusal of one of HGST Ultrastar manuals. Thanks for pointing it out to me in the context of a technical discussion, where I'm motivated to dig into it deeper. My first two minutes of digging has already added a few crumbs of new knowledge to my quest to understand the factory/low-level formatting. Agreed ... since the reduction in drive data capacity when going from 4k sectors to (4k+128) sectors is only 3.5%. I believe they do it so you'll buy (the same # of) (much!) more expensive drives. After all, these are the same execs/bureaucrats that spent ~$500Billion on the Y2K scam; why not soak them for a measly extra $5-10B to protect their data (and cover their hiney). Remember, fear, and lawyers, (and fear of lawyers) are great motivators in such finagles. Presently, I'm about 75% serious in the above. But I'm waiting, and very open, to be convinced otherwise. -- UhClem
  12. Untrue! Rather than (try to) get into the theory of error detection and correction [at the level implemented in HDD firmware] (which I doubt ANY reader of this forum is competent to do--I'm surely not!), consider this: If a collision was easy (Hell! forget easy; if a collision was even *possible*), hard drives would only be used by careless hobbyists. (Regarding HDDs) I agree with RobJ (and I've written the same 2+ times on this board in the last few years): Note that few is pretty large (8-12+, I think) for a 512/4096-byte sector. And the firmware will make many retry attempts to get a good read; I've seen evidence of 20. And then the OS driver will usually retry several times. Only then does the OS throw a UCE. As for johnnie.black's original experience/report, I'm intrigued/disturbed. Whose controller does Sandisk use? [ Added: Note in the original post, there appear (I don't know unRAID's logging methodology) to have been 26 UCEs Reported by this drive (over its lifetime, prior to 12Mar2017:2253 (that is SMART code 187) and 1 Reallocated sector (SMART code 5; that "somebody" is labeling `retired'). Do I assume, because of the way this logging is done, and the way that you are monitoring it, that you *know* that all of this bogosity (the 26 & 1) happened very recently? And that there was no sign of it in dmesg etc? If so, and if this SSD's firmware implements SMART correctly, where were those 26 errors REPORTED? As I understand, there is a different category of SMART error for logging "implicit" errors (from self-diagnosis, Trim, etc.) that can't be "reported". Speaking of which ... what does a "smartctl -a /dev/sdX" show in the Error_Log? Or, does Sandisk (mis)behave like Western Digital and not bother to Log Errors? (Hitachi/HGST has spoiled me--they do [almost] everything right.) (Hey, who remembers the Samsung fiasco when they had a serious firmware bug in the F4 series (HD204, etc.)? As if the bug wasn't bad enough, when they released a fixed firmware, they HAD NOT CHANGED THE FIRMWARE VERSION/REVISION # ] -- UhClem
  13. Since you've identified that there are two chips under the heat-sink, I believe johnnie.black is correct with a minor correction. The 9235 is the actual SATA controller chip, and the 9705 is a port-multiplier chip. On that, I do agree. But the 9705 is on one of the 9235's 4 ports, (port-)multiplying out to 5 SATA drive connections, with the other 3 (of the 9235's ports) each providing a (direct/native) SATA drive connection. [5 + 1 + 1 + 1 = 8] Here's the not-so-good news. Being a PCIe x2 v2 chip the 9235 is only capable of a maximum throughput of 720-780 MiB/sec [I have a 9230-based card and have measured this; the 9230 & 9235 are "non-identical twins"]. Once you put more than, for example, 5 of those WD Red 4TB drives on this controller, the speed of unRAID parity checks and rebuilds will be limited by this card. Alternatively, if your PCIe x4 slot [since I've never heard of an actual PCIe x2 (electrical) slot] is physically x8 or x16, and there's room for it, you could use any one of the many LSI-9211 8-port cards (popular, common, and [in the US at least] inexpensive (< $100 used/surplus)). Then, at x4, [even though not able to perform up to its full x8 potential] you will be able to sustain a maximum throughput of 1400+ MiB/sec. [Edit: (added on 23Mar2017)] Following some productive, and enlightening, testing [Thank you, Zippi], I hereby retract my skeptical/pessimistic opinions/prognostications regarding the reliability prospects for combining the 9235/30/20 & the 9705/15 in a Port Multiplication set-up. I've left my original comments below (between the XXXXX's), as an example of "speaking too soon". Anyone interested in this subject, should be sure to read through this thread to the end. Note, however, that the favorable results achieved should only be associated with the specific combination of the 92xx and the 97x5. A 92xx in combination with any other PM chip is very likely to be painful. And, the 97x5 in combination with any other (supposedly) PM-capable controller may also end badly. Caveat utilitor. XXXXXXXXXXXXXXXXX (More not-so-good news ...) Even if you can live with the performance sacrifice of the PEX40071, reports/reviews from users indicate (to me) a potential for issues/problems once you are using more than 1 of the 5 "multiplied" ports. While I found mention (here) of unRAID user using the 40071, there was no mention of performance or how many ports. This mating of host controllers and port multipliers is best left to OEMs and system integrators who have full control of the hardware/software environment, and are committed to supporting their customers. As an example, it looks like the current Backblaze pods use 3 9235-based cards, each connected to 3 9715 multipliers, with 5 drives on each 9715 [3 x 3 x 5 = 45 drives; 9715 is same as 9705, but adds temp & fan goodies], but you can bet they're NOT running unmodified sata ("ata/ahci") drivers. XXXXXXXXXXXXXXXXXX -- UhClem
  14. Me (as Bart at the blackboard): I will not discuss the precise meaning of "precisely". I will not discuss ... But I will continue my endeavor to have precisely performing hardware and software. -- UhClem
  15. Please allow me to rebut. ... In the context of performance testing of current era disk drives (2010-now), tests results really are precisely (within a fraction of a percent) reproducible. Providing the following conditions are met: 1. The system being used must be controlled absolutely -- no Windows:) -- on Unix-flavors, no other users, no daemons, cron-jobs, download agents etc. 2. The disk drive(s) must be healthy. Not just SMART-healthy, *healthy*. (consider: Upon receiving a distressing diagnosis, the patient says "I can't be sick, Doc; I walked in here, didn't I?") 3. The rest of the hardware should be non-flaky, and up to the demands of the tests being performed (ie, bus bandwidths, etc) To illustrate, (and to also provide data to support something I mentioned in a previous post), I performed the following test on 4 drives (all are 4TB HGST HDS724040ALE640 [non-NAS 7200rpm]) [call them a b c & d]. For each drive, consecutively (not concurrently), read the 5GiB from 500G-504G, measuring the speed for each GiB. Do the test 4 times [call them 1 2 3 & 4]. The actual results follow [using cat & paste to get the 16 outputs together]: --- a1 --- --- b1 --- --- c1 --- --- d1 --- 500G 156.6 M/sec 158.2 M/sec 156.3 M/sec 160.2 M/sec 501G 159.6 M/sec 154.8 M/sec 160.3 M/sec 157.7 M/sec 502G 155.3 M/sec 159.7 M/sec 154.1 M/sec 159.1 M/sec 503G 161.1 M/sec 154.1 M/sec 163.1 M/sec 159.9 M/sec 504G 154.8 M/sec 158.0 M/sec 153.3 M/sec 156.5 M/sec --- a2 --- --- b2 --- --- c2 --- --- d2 --- 500G 157.1 M/sec 158.8 M/sec 157.0 M/sec 160.8 M/sec 501G 159.6 M/sec 154.8 M/sec 160.3 M/sec 157.7 M/sec 502G 155.3 M/sec 159.7 M/sec 154.1 M/sec 159.1 M/sec 503G 161.1 M/sec 154.1 M/sec 163.1 M/sec 159.9 M/sec 504G 154.8 M/sec 158.0 M/sec 153.3 M/sec 156.5 M/sec --- a3 --- --- b3 --- --- c3 --- --- d3 --- 500G 157.1 M/sec 158.8 M/sec 157.0 M/sec 160.8 M/sec 501G 159.6 M/sec 154.8 M/sec 160.3 M/sec 157.7 M/sec 502G 155.3 M/sec 159.7 M/sec 154.1 M/sec 159.1 M/sec 503G 161.1 M/sec 154.1 M/sec 163.1 M/sec 159.9 M/sec 504G 154.8 M/sec 158.0 M/sec 153.3 M/sec 156.5 M/sec --- a4 --- --- b4 --- --- c4 --- --- d4 --- 500G 157.1 M/sec 158.8 M/sec 157.0 M/sec 160.8 M/sec 501G 159.6 M/sec 154.8 M/sec 160.3 M/sec 157.7 M/sec 502G 155.3 M/sec 159.7 M/sec 154.1 M/sec 159.1 M/sec 503G 161.1 M/sec 154.1 M/sec 163.1 M/sec 159.9 M/sec 504G 154.8 M/sec 158.0 M/sec 153.3 M/sec 156.5 M/sec This is exemplified best by comparing 502G & 503G for drive c (above). Options are good ... (just be very prudent when choosing the default behavior) Maybe, if you believe this example is representative (not an anomaly), you'd consider adding a flag to expand the sample size. Hypothetically (since I neither use unRAID nor employ a GUI when running Linux--and, hence, probably can't get the direct benefit of diskspeed.sh), because I know how to control my test environment, I would never use -i3, but I might welcome/use -e3. -- UhClem
  16. In hindsight, I should have just made a non-specific suggestion (to circumvent the memory-hog thing) like "Try reducing the buffer size (bs=N) and increasing the count=N." You'd have arrived at satisfactory specifics (without my bothersome meddling :)). Aside: The specific options I did suggest [bs=64M count=15] were intended to accomplish the following: 1) [Personal flaw] As a hardcore software person, I shun the whole 10^N game, so I wasn't going to even utter nMB. And I didn't want to appear ignorant of the guidelines for use of Unix/Linux Direct I/O that (strongly) suggest [but not insist] that operations stay on "block boundaries" [both memory and disk]. But, note that I was mistaken about this second concern, because 64,000,000 (64MB) *IS* evenly divisible by 4096! 2) Since this is all "behind the scenes" for the users of diskspeed.sh, any changes should go unnoticed (as much as possible). E.g., to reproduce the sampling points, it worked out that (15*64M) was *very* close to 1GB [fraction of 1%] (But even a slight change in sampling points can affect the (precise) reproducibility of results. Low-level disk layout/formatting is deeply complex--I'm still trying to learn [but I can't get into the sausage factory:)].) John, other than reducing the memory footprint, I wouldn't change anything else (in the context of this hubbub). [But do continue to nurture and develop "your baby".] -- UhClem
  17. Thank you jonathanm, and johnnie.black . [10^6 rules this kingdom, I guess.] -- UhClem
  18. For true papal purity, you need to use bs=64M (not MB). That will give a true 1 gig (2^30 = 1073741824.). Make special note that dd reports its speed in ("fake") MB/sec (10^6), and you'll want to divide that by 1.048576 (to "pass through the gates"). diskspeed.sh users will have to accept that (superficially) their numerical results will drop by ~5%; maybe change the speed units from MB/sec to MiB/sec. [Ask yourself this: Have you produced (and are your users using) a technical tool or a marketing tool?] -- UhClem
  19. John, I suggest you make a slight change to your methodology. Note that the Linux kernel splits *large* O_DIRECT (iflag=direct) requests into 512k chunks anyway. Also, in a testing environment, all i/o requests (and especially O_DIRECT requests) should be in (integer) multiples of 4k. So, if you really want your sample size to be [in dd units] 1GB (ie, 10^9) vs 1G (ie, 2^30), then, instead of the above "dd ...", maybe use: dd if=/dev/sdb of=/dev/null bs=64M count=15 skip=0 iflag=direct That will result in samples of 1.006 GB (close enough?) but much lower RAM burden. Of course, you'll need to scale your "skip=N" args by 15x as you march through the drives. [My personal preference, as an old-school software guy (Unix kernel development 40+ yrs ago [v4-v6]), is to stick to the 2^N path ("count=16" above), but ...] I've done a lot of disk testing (recently, even) and find that sample sizes of 32M-128M meet the "principle of diminishing returns". Have you experimented with this? You could achieve faster completions and/or finer granularity with negligible, if any, loss of result quality. --UhClem "Base-8 arithmetic is just like base-10 ... if you're missing two fingers." --Tom Lehrer
  20. As John_M has pointed out, the Startech card you referenced is only PCIe x1 (a single lane). The important ramification is that you will not be able to get the full bandwidth potential of that IcyCube's SataIII (6Gbps) connection. Instead of the 500-525 MB/s that John_M gets with his 9235-based card, the referenced card (9128-based) will only get 350-375 MB/s. The Syba card (9215-based) you reference is also x1 with the same limitation. You might want to consider the alternate Syba card [9230-based] (with AmazonUK ASIN B00AZ9T264). True, it doesn't have eSata connections, BUT a Sata-to-eSata cable will solve that if the G8 has any way to accomodate it. On the G7, it can be wedged out of the (even occupied) PCIe rear panel opening. According to the Marvell data sheet [http://www.marvell.com/storage/system-solutions/assets/Marvell-88SE92xx-002-product-brief.pdf] , the 88se9230 is actually a superset of the 9235, adding HyperDuo & hardware RAID [neither of which are probably of any benefit to you presently]. (Hope this msg isn't "too late") -- UhClem
  21. Interesting ... 111.25GB in 22:30 = ~85 MB/sec Since that is well below the speed of those drives [in the outer zones (130-150 MB/sec)], that means the test result was constrained by the controller/expander combination. The result shows a max throughput of ~1600 MB/sec (19 x 85). Now, since the M1015 itself has a maximum sustained throughput of 2000+ MB/sec, the expander is imposing a fairly significant inefficiency. Either that or Intel's claim of a single cable connection getting 4 x 6Gbps [theoretical] throughput--ie, 2000-2200 MB/s [realistic]--is optimistic/exaggerated. Maybe that test with 1xM1015<=>2xcables<=>RES2sv240<=>16x(fast)drives will pinpoint the source of the degradation. [Note that seeing 2000+ MB/s from a ("stand-alone") M1015 requires using 4-8 SSDs, since the fastest mechanical drives, today, max out at ~200 ea.]
  22. BobP, Upon looking a little closer at the numbers for your recent Test#1 [M1015<=single cable=>expander w/15 array drives], I think they're a little low. With 12 drives, I had been expecting 150-160--with the (revised) 15 drives, I would reduce that to 120-130. However, that test yielded a speed of only 100 MB/sec. [it smells like there might be a PCIe misconfig.] If you have a chance to perform that same test on your 19-drive WD Red array, pls do so. That will help determine what the max bandwidth for a M1015<=>RES2SV240 set-up should be. If it doesn't jibe with the above Test#1, we can investigate the PCIe further. With 19 array drives, I expect ~100 MB/s; but if it's actually ~80 (which approximates Test#1's 15-drive @ 100), then I'm "wrong", and the "slowdown" is an inherent factor of using an expander.
  23. Nice. Good test data leads to good information ... which allows for informed decisions.