UhClem

Members
  • Posts

    282
  • Joined

  • Last visited

Everything posted by UhClem

  1. Thanks for that info. Unfortunately, it does not support my theory. Since the only reports of the "out of memory" condition seemed to occur with 3TB+ drives, (and since drives > 2TB might not fit into fdisk's notion of "geometry"), I was expecting to see a "Units=" value much greater than that 8225280, which would have caused preclear to suck up a correspondingly greater amount of memory. But, alas ... It is still a fact that no good will come from preclear using that "Units=" value for determining the block-size it will pass to dd using the bs=N argument. The whole notion of disk geometry is meaningless in any post-DOS (& CP/M ) OpSys. Now, as for your result using the override parameters. By specifying a 64KB block-size, you will definitely reduce dd's memory usage, and it should have only a negligible impact on the performance of dd itself. *BUT*, because you were advised to use "-b 200" (in conjunction with that 64KB block-size), that has caused your (overall) read performance to deteriorate. That is because each invocation of dd (by preclear) is only transferring ~12.8 MB (200*64KB) instead of the previous 1.6GB (200 * 8225280). That is the (household) equivalent of emptying your bathtub with a tablespoon (1/2 oz) instead of a pitcher (~64oz). The end result is the same (finished read cycle/empty tub), but not the time to perform it. The fact that the write cycle did not seem to suffer also makes sense, but is harder (for me) to explain (to you) [my shortcoming!] If I were talking to a colleague, I would say "the write cycle doesn't suffer because sufficient write-behind data is pending to absorb the added [dd] invocation overheads." You can get the safety of the (reduced) "-w 65536 -r 65536" and not have any performance penalty by increasing the -b parameter to compensate. Ie, -b (block-count) should be increased to ((8225280 / 65536) * 200), which is (~) 25600 (ie, 128x). So, "-w 65536 -r 65536 -b 25600" should do it. (Then you'll be emptying by the pitcherful again.)
  2. Could you identify that 3TB disk (by device-name: /dev/sdX) and report the output of: fdisk -l /dev/sdX replace X appropriately, and that -l is "dash (lower-case) ell" [Feel free to mask out the (reported) Serial #] I suspect this whole problem derives from a misconception about disk geometry, and (maybe) the fix is simple, and efficient.
  3. I started to reply in your other thread [user shares across 2 ...], but thought this thread was more on-topic. [Disclaimer: this posting does not constitute an offer of ongoing assistance/advice (as explained below).] 2 x 50MBps is pretty poor, but at least the port-multiplication is functioning. (Many people never get more than one drive active ) Let me add my own example. I have a Rocket 620A card (Marvell 9125 chip) in a PCIe x1v2 slot (lowly HP N40L) [i route a SATA-eSATA cable out the back] to a SansDigital TR4MB (SiI3726-based). With 4 drives in the TR4MB, and doing simultaneous sustained reads on all 4, I get 62-67 MiB/s on each drive. Thats a total sustained throughput of 260+ MiB/s (ie,272+ MB/s), seemingly limited by the SATA2 connection (the best the [old-timer] 3726 is capable of). [With 3 drives, or 2 [fast] drives, the same 260+ total.] I don't run unRAID, but the above was on Linux (2.6.37). (Before you get excited note that after I upgraded to Linux 3.2.29, it stopped working entirely (I'll put the relevant old driver code in the newer kernel when I get around to it.) Just an example of the s**t that can happen. The major caveat is that getting PMs to work well is totally unpredictable (exception being the SiI3132<=>SiI3726 combination, which is very stable, but slow--max bandwidth of ~140 MiB/s). Personally => I <= won't recommend that anyone pursue port-multipliers, because then => I <= would be obligated to assist them in what may very likely be a futile endeavor. In other words, it's not only YMMV, but also PAYOR (proceed at your own risk ). Damn shame too, because the concept, and at least some implementations, are fairly attractive. PM needs better OS support, and it definitely needs a "modernization". The most commonly-used PM chip is the SiI3726, which was released 8+ years ago! Some company needs to produce a SATA3-capable (4/5-drive) PM chip (that can really sustain ~550 MB/s) *AND* a PCIe x2/x4 controller that can cleanly interoperate with it (at max speed), *AND* offer solid driver support to make it all a useful reality. Will we ever see it? ... Is hell gonna freeze over ? [/rant]
  4. Since a very important counter-example was the successful Syba-x2 behavior on the Z77/Windows combo, the Z77/unRAID tests should include one with the Syba in the same Z77 slot as was used for the Windows win. (In case that was not the x16 unshared which is in your stated plan.) I commend, and appreciate, your perseverance.
  5. OK. I did not see you mention the x4, so just tried to cover all bases. You did even better ... Those tests should tell the story. One other thing, I didn't see mention of which Linux version you're running (u4.7==L2.6.32;... u5.0rc8==L3.4.11), but I would also try those barebones tests with the opposite end of the above "spectrum", in addition. For instance, my SiI3726 port multipliers (connected to a Mrv 88se9125) stopped working when I went from Linux 2.6.37 to 3.2.29. (I don't use unRAID myself). Good luck.
  6. Well, that's somewhat encouraging. But, it is not conclusively proven to be a Linux problem. Since both the OS and the motherboard are different in that experiment, it lacks rigor. But, rather than loading Windows on your Z68 guy, you could just boot the Z77 w/ your unRAID flash (in combo with the mod'd enable_ahci script [and a neutered unRAID config (??)]), and then try putting the pedal to the metal [and checking lspci]. Because of all the PCIe bandwidth sharing and dropbacks on your Z68 mobo, in combo with a newly-introduced card using a "mongrel" PCIe lane-grab (x2 is rarely seen), the above observation (re: concern about a two-variable experiment) is (possibly) more than anally academic. Speaking of your Z68 mobo, I know that you tested the Syba in the X16 & X8, but just for grins, did you try it in the X4? (Note that if you do, you need to remove the 3132 that is in the PCIeX1_2 slot because of this note from Gigabyte: You mean, so it isn't just a half-fast card ? [couldn't resist] If the above tests don't improve the scene, I don't know. I tried searching around, but there appears to be nothing applicable regarding the 9230 on any *nix. This Syba card is the first instance of the chip being used in a Marvell-Id'd card (ie, 1b4b:9230) [as opposed to, eg, a Highpoint-Id'd one). What is surprising is that I find nothing *nix-related regarding the several mobo's that incorporate the 9230 onboard. Amusingly, the only hit was in a syslog attached to a forum posting here (July '12) but it, expectedly, didn't ID.
  7. [To you "youngsters": Please allow us two old farts to reminisce; maybe it'll be of some interest to a few of you.] Before I respond, allow me to put this in a chronological perspective (using a human lifecycle): I'd say that today, Unix is solidly in middle-age; it's healthy, accomplished, and has a long future ahead. The PWB/Unix that you referred to was first released in 1976, which I would characterize as Unix's early puberty; the time period you went hands-on, 1979-80, is getting close to its adolescence. When I first had hands-on with Unix and the kernel was Nov, 1974; that was the first release of the code from Western Electric (aka AT&T). Seven sets of tapes were mailed from Murray Hill; one came to me (on behalf of my employer). I would call that Unix's early weaning . Were you one of those who used "adb -w" on the kernel while loaded in memory? I fully respect anyone with that skill. adb? That's like solid food [cf: weaning]; there was no adb. There was only db, and truthfully, it is a misnomer to call db a debugger. Its limited functionality was to examine a (binary) executable (pre-execution), or to analyze a core-dump (post-execution). It could do nothing to help you in debugging a running program, or (obviously) a live kernel. [i will always remember my phone call to Ken Thompson, asking him where the (real) debugger was. When he told me there was none, I was honestly incredulous. I instinctively asked him "How did you and Dennis get all of this stuff working?" His answer: "We read each other's code."] And that is part of what identifies true software deities. But, as a mere mortal, I still needed to debug my modified kernel. So, I hacked the header of DEC's ODT-11 (octal debugging tool for PDP-11) so that it was compatible with Unix ld (the [link-]loader). Adding this odt11.o to the rest of the kernel object files, ld gave me a kernel that I could debug, albeit painfully (no symbols--constantly referring to a nm listing for a symbol map). I believe I was the first person to actually debug (breakpoints, single-step, etc.) on Unix (kernel or user-space). Not to evade your question , but by the time adb came around (1976-77), I was winding down my kernel hacking. [i guess I don't get your respect, huh? :)] Oh my, time-traveling is tiring ... this geezer has to take a nap. I'll follow-up the on-topic discussion of this (computer) memory exhaustion issue a little later. --UhClem "Forward! Into the past... back to the shadows, again..."
  8. That's a major disappointment! (Not to be making excuses for Syba [or whoever engineered this card], but) I wonder if the x2 aspect of the 9230 chip, and the trace assignments of this card, relative to the layouts of "standard" PCIe x4, x8, x16 slots might have flummoxed this designer. I could find no substantive reports about this particular card anywhere. The Highpoint Rocket 640L uses the same 9230 chip, but I can't find any quantitatively satisfying reviews of it either. I did find a useful review of the RocketRaid 640L (which uses the 88SE9235, which lacks some [non-germane] frills of the 9230) here [link]. His tests show not only x2 @gen2, but also that a single active SATA3 port can utilize both lanes. Something that is puzzling is that the RocketRaid 640L uses the less expensive Marvell 88SE9235 chip (which lacks Marvell's RAID capability), but the less expensive Rocket 640L uses the more expensive 88SE9230 (which does feature Marvell's RAID). "There's something going on here, but I don't know what it is. (Do I, Mr Jones?)" [Ref:88SE92XX Product Brief [link]] Brandon, good work on documenting (via lspci) the root cause of the sub-par performance tests on the PEX40054. [Funny that lspci doesn't get the full Manufacturer ID when the card is id'd/added to the kernel post-boot; not relevant to this x1 vs x2 problem, but something that could be fixed in Linux] Another possible explanation for the x1 vs x2 could be that there is some "quirk" in the 9230 (and maybe 9235) that needs to be addressed specifically in the Linux driver at AHCI set-up time. I've seen examples of this, but I don't have the know-how to create such a quirk-handler. One way to possibly determine whether this x1 misbehavior is inherent to the PEX40054 itself would be to try it on Win7. IF Syba did any testing of this new card, it's much more likely it was done on Win7 than Linux. A standard HDTune/Crystal read seq-xfer test of one of the fast SSDs to see if it can achieve 450+ would be conclusive. It would also determine whether this whole exercise remains a (geek) science experiment (only affecting *nix), or escalates to a mini- consumer reports expose'. Thanks for the excellent reporting of your results; that test script is a real quick-and-dirty, but it did unequivocally expose what neither of us wanted to see. Ignorance is NOT bliss.
  9. Linux ... feh!! ... FYI, I first used, and did kernel development on, Unix before it even had dd (v4, 1973). The important point about dd, as it pertains to this discussion, is that the bs= and count= options have not changed. And, the (general) lesson to be learned by all (self included) is that it is important to read (and comprehend) the man pages for commands we hope to use correctly and constructively. The description of the count= option has not changed in its entire 38+ year lifetime (except in a negligibly semantic sense): May 1974: copy only n input records August 2012: copy only N input blocks The fact that it is a copy precludes any concern about buffering. It isn't really keeping it (in an active sense); it has just not yet overwritten it with anything else. Regardless, this can not be the source (nor excuse/explanation) for any shortage of user-space memory. [Yes, a perverse, and privileged, user can "manufacture" a problem by setting excessively agressive memory tuning parameters. If so ... "you make your bed, you have to lie in it."] No, I don't believe it really is "in the same way". In the dd example, only the buffer cache is in play, and in a very simple/straightforward manner. In the case of your find/cache_dirs example, there is likely some "race condition" provoked by an interplay of the buffer cache, the inode cache, and the directory-entry cache. If you can really cause an error condition this way, then it is a system bug (technically). [but nobody is both willing and able to fix it. (You know, like the national budget problem )] --UhClem "(readme) (doctor memory)"
  10. Great to hear that. I'm real interested to learn what the card's maximum throughput is, and I'd appreciate your help. Just for the test, if you could connect your 4 fastest drives--especially useful if you have one or two SSD's available. (This is quick [~5 minutes], and strictly read-only.) Download the little script attached to this post [link]. Identify the drive letters of the 4 connected drives, and run the script twice, first with the X as first arg (ie, dskt X g h i j), and second without the X (ie, dskt g h i j). The first run will take a measure of each drive's max speed individually, as a sort of baseline. The second run will measure the max speed (attainable) of each drive, but simultaneously, all 4 drives transferring at once. (It is important that the four test drives be idle/unaccessed during the measurements.) One or two SSDs really help to push that upper limit, since even 4 of the fastest (transfer rate) mechanical drives (eg, Seagate ST[23]000DM001) will only total ~700 MB/s, and I would hope to see this card do even better (~800). What do you say, brandon?? In the interest of geek science ... Thanks. P.S. The 4 test drives can be full or empty, even unformatted--doesn't' matter what, if any, filesystems are on them.
  11. No. (In the chronological context of this forum) Totally forget about (disregard!) DISK GEOMETRY. No. dd read 8225280 bytes at a time. (the entire dd run consisted of 200 such reads). Each read() (re-)used the same ~8MB (user-space) buffer. I don't have any >2TB drives, so I don't know how fdisk -l reports them, but if you do get a much larger "Unit", you have no grounds for complaint. Instead, consider yourself fortunate that you didn't get bit earlier (for following the folly of disk geometry). Practical difference or not, it is just good practice to use the same basic unit (and multiples thereof) as the OS. (it's a corollary of the Law of least surprises )
  12. Cylinder? As a design criterion (such as you allude), the cylinder has been obsolete for ~20 years, even moreso in the last 5-8 years. Just change your default to something that "feels" right. And, totally forget about "disk geometry"--all that does for you is create a chunk size that is NOT a multiple of 4K. --UhClem "The times they are a'changing."
  13. Thank you! The Toshiba 3TB does appear to be a noticeable update of the Hitachi 7K3000 3TB (HDS723030ALA640) [which would typically measure ~155]. Note that the hdparm (-t) measurement only looks at a very narrow portion of the drive, and is not appropriate for any decision making. For example, your Seagate 3TB measurement seems somewhat low; I would have expected 170-175. But do not worry; if hdparm allowed a user-specified offest for the -t test, you could verify this variance. You can use the dd command if you wanted, e.g., dd if=/dev/sdX of=/dev/null bs=1M count=256 skip=5K will time 256 MB of reading (256 * 1M) starting at an offset of 5GB (5K * 1M). By varying that 5K parameter, you can witness the variance, even within the "outer" (ie, fastest) cylinder range. Note that dd reports MB/sec whereas hdparm reports MiB/sec.
  14. You need to add the proper identifier that matches your controller to the script. Do this by amending the PCI_IDS= string to include your new identifier. Your identifier should be one of: 1. 1b4b9485 2. 11ab9485 3. 11032720 Try them, one at a time (separate trials), in the order of preference above. The first one that results in you "seeing" your drives will (probably) work best [in the unRAID scenario]. Good luck.
  15. I'm curious about these new Toshachi 3TB drives. For those who have them, could you run the following command on each of them and report the results (nnn MB/sec): hdparm -t /dev/sdX (replace sdX as appropriate) And, for luca, could you do the same command on your Seagate 3TB (ST3000DM001) and the (older) Hitachi 2TB (HDS72020ALA330). Thanks.
  16. I tore the tag off my mattress, and never got arrested. ["Are we learning yet?"] --UhClem "Life is such monotony ... without a good lobotomy."
  17. It should be very simple. You will need elkay14's enable_ahci script, which you will update by adding 1b4b9230 to the PCI_IDS= string definition. (Look for a post from elkay14; the link is in the .sig) More important is that Syba says it's supported. In the product manual, page1, FEATURES_AND_SPEC=>General=>LastItem. [if that turns out untrue, it becomes your basis for a no-cost/-hassle return.] Do note that, while the card requires a x4 (or x8 or x16) pcie slot, it is only a 2-lane (ie "x2") implementation; at (pcie) gen2 that will still get you total sustained throughput of ~750-800 MB/sec--more than enough for 4 of the fastest existing (mech.) drives. But, you must have gen2 pcie-- else, only ~350-400. This card looks like excellent value, especially so if you like to mess with port-multiplier enclosures. The card supports FBS, and can be configured for 1 or 2 eSATA ports (replacing 1 or 2 SATAs). Unfortunately, (for me) it does not include a low-profile bracket, nor can I tell from pictures whether the card itself is "narrow" enough to allow fitting/fashioning an LP [for a H-P N40L]. brandon, if you get one, pls post on this thread. I'll have questions . tnx --UhClem "Doing Base-8 arithmetic is easy ... if you're missing two fingers."
  18. Unfortunately there is no such tool. "Unfortunately" is an understatement. In my opinion, this is a flagrant omission in a product such as unRAID. It might be that Reiser FS lacks such a facility; if that is the case, then that, alone, would make it a bad choice to be limited to. (Or, if there [really] were sufficient advantages to Reiser FS, otherwise, then unRAID should have implemented the missing feature itself.) Even the very first release of Unix (1973) had a (admittedly obtuse) mechanism for this (using ncheck & icheck). I know that BSD/SunOS provided a mechanism. I'm not familiar enough with ext[234], but I'd be very surprised if they lacked a (straightforward) mechanism. (there is a bass-ackwards hack that starts with find, but ... ) OP: even if you could determine those (up to 12) files [on some drives, the sector in question might be unused, or part of FS metadata], you could still be stuck figuring out which one was actually damaged. This is a situation where hashing/checksumming comes into play. Some people use ZFS; others SnapRAID; others roll their own. Also, to OP: just because a possibly suspect file "works" does not exonerate it from being damaged (hence, mistakenly pointing the blame on the parity drive). A single bit-flip (in a single file) will correctly provoke a parity check error report, but that single bit may be totally extraneous to the correct functioning of the file containing it. 'Tis a tangled web we (must un-)weave ...
  19. One downside of this motherboard is that the PCI-E implementation is only v1; this means that the bandwidth for a PCI-E v2 disk controller card is cut in half.
  20. Why spend an hour+ test-writing 500+ GB ? Try .... bs=64k count=8k oflag=direct OOps... I did not think to reduce the "count" when I increased the block size... Guess it would take a while... But mine is no prize either ... the oflag=direct messes up. It was an attempt to eliminate the system-buffer (write-behind) effect. (I never do a write test when a read test [on the appropriate /dev/xxx, or /mnt/xxx] would suffice. Does that fit here?) If a write-test is called for, change the above to ... bs=64k count=8k conv=fdatasync (Have your cake; and eat it -- short and accurate) Aside: Maybe one of you youngsters can enlighten me (I was using Unix before there was a dd command) -- but since when does dd use Marketing Megabytes in its summary line? (And with no way to request otherwise!) This is sacrilege. Maybe it is a "new feature" of linux's dd.
  21. Why spend an hour+ test-writing 500+ GB ? Try .... bs=64k count=8k oflag=direct
  22. You're missing a very important distinction here. What is key is not reading it again, but reading it (the newly-written sector) the first time. In other words: "never being good in the first place" is quite different from "going bad (over time)". Also, you are putting too much (blind ) faith in the (ATA) write command returning OK. In practice, the only time a write command will return ERR is if the spare sectors are fully depleted (or if faulty memory or controller has sent a mis-constructed ATA write command [ie, LBA out-of-range]). (In contrast, a readOK is reliable, because of the ECC, etc.) Exactly!!! And one shouldn't proofread their entire thesis every time they revise only a single paragraph. But they better proofread the changed paragraph (lest they embarrass themself to their advisor(s)). --UhClem "We're all bozos on this bus."
  23. Incorrect. Both parity and data blocks are read before being written in order to update parity. "You are correct, sir." Thank you. But, remember, this only partially eliminates susceptibility to (lateral) misalignment; and there are numerous additional ways for a write to go wrong. Trust, but verify. Right? And the sooner you verify, the narrower the window of vulnerability on the worthiness of your trust. (Unless they are used/read during day-to-day operation) All of these normal data block writes will never have had their readability confirmed, not to mention the correctness of their contents verified, until a (full) parity check is performed. At that time, I agree, either of those flaws will be detected and corrected. But, how often does a typical user perform a parity check? Yeah, there might be one or two fanatics that do it nightly, but I would guess typical is monthly, with a sizable minority doing it weekly. This is the window of vulnerability. If you have a drive failure AND you have any of those untested/unverified newly-written data blocks (on a non-failed drive) turn out to be bad, your replacement/rebuild of the failed drive will be less than perfect. Since it is normal, and recommended, to do a (full) parity check directly following a (full) build/rebuild, why would it not be prudent to do a (partial/targeted) parity check directly following a cache-to-array "mover session" (ie, a partial rebuild)? Granted, a proper partial parity check will require driver mods (but they're not difficult--couple of new [fs/io]ctl's). But, just reading those newly-written blocks will get 90+% of the effect, which is to eliminate that (broad) window of vulnerability. Just trying to help you keep your data safe® ...
  24. That's an important point. But not just for new disks. It also applies at the block level, ... Well, once parity is established a "read" is performed immediately prior to a write of a sector in order to calculate the needed change to parity. Therefore, there is a better chance the sectors involved are readable sectors. Yes, I am aware of all of that. I should have communicated the crux of my point more precisely. In the context of this discussion, there are two notions of a sector. One is the location (LBA / "lot # on a tax map"). The other is the current contents (raw bit-stream (PRML-encoded) + ECC / actual house currently occupying the lot (construction/build-quality, etc)). Iow, even though the lot is level, solid, well-drained etc., the builder might be sub-par [even, if only on the day he built this house]. As you pointed out, unRAID is less susceptible in one regard. Because it reads the parity block (just) prior to re-writing it, the heads are already on track, so alignment issues are reduced. But there is still the possibility of a miscue/botch in the write process. ==> It is for this reason that read-after-write gets so much attention when (super-)high-data-integrity is discussed. Back to unRAID ... while the parity block re-write has the "benefit" of a reduced likelihood of mis-alignment, that is not the case for the newly-written data block. A targeted parity check will give added assurance for both. Actually, just a read of both sectors might suffice, if a re-construct were implicit upon either failing. Murphy-alert!: what about read-after-write following a re-construct? So, what I am suggesting, is a batched post-facto selective read-after-write. Either of us could write it in a few hours for unRAID (but only one of us might be motivated ). I don't use unRAID. I prefer SnapRAID for my needs; I've added this feature for my use (peace of mind) and suggested it to SnapRAID's author [link].
  25. That's an important point. But not just for new disks. It also applies at the block level, for any write to the array. I believe there is merit in being able to perform a selective/targeted parity check. That is, parity checking only those sectors/blocks/stripes "recently" written. I'll leave it to you to define "recently".