eweitzman

Members
  • Posts

    82
  • Joined

  • Last visited

Posts posted by eweitzman

  1. Problem is, the "md" driver still needs to go through the reiserfs and disk's own driver.

     

    That really changes things... My ignorance of the big picture shows.

     

    One could hope that if md batches small, apparently sequential requests and asks for one large transfer down the chain, it would be more efficient. That would need to be tested before spending time on serious coding.

     

    - Eric

  2. I'm migrating data to my new array following this procedure:

     

    - preclearing disks

    - inserting them into the array one at a time

    - loading data onto each disk

     

    There is no parity drive in the array yet to reduce loading time. I'll pay to create parity once when the disks are loaded.

     

    I had two disks in the array with data on them and added a third disk to the array. After the array was stopped, the precleared disk added via the Devices page, and the Start button pressed on the Main page, the "Free" column for all drives in the array says "Formatting". This scared the heck out of me, since two of the drives were loaded with data and didn't need formatting.

     

    Fortunately, this is a bug in the web page and not the system. After pressing the Refresh button, the "Free" column showed correct numeric values for the two loaded disks, while the third disk stayed in the "Formattting" state for a few minutes.

     

    - Eric

     

    Note to admin: This happened under version 4.5 Plus. I didn't find a forum for 4.5 issues so I used this one. Please move this thread if there's a better place for it.

     

  3. ... if the requested stripes have some sort of addressing that can be mapped to drive geometry...

     

    With modern drives, you can't GET physical drive geometry.  It is translated by the drive.

     

    Okay, that was sloppy. Let me rephrase it:

     

    ...if the requested stripes have some sort of addressing that can be used to order and group them so they can be retrieved sequentially in batches...

     

    A cursory glance shows a large buffer (unsigned int memory) allocated for an array of MD_NUM_STRIPE stripe_head structs. stripe_head has sector and state members. state could be used to prepare a list of stripes waiting to be read and written  (ie, that are in the same state), while sector could be used to order them into batches that could be read or written sequentially.

     

    At this point, of course, I don't understand the pattern of calls to the driver nor how such batching would be set up. :( One strategy could be to wait after the first IO request for a bit -- perhaps the approximate time of one drive rotation? -- and then processing the first and subsequent calls in a single read/write request if the sectors are numbered sequentially. Overhead and delay like this may be verboten in a driver, though. I dunno. :(

     

    - Eric

     

  4. bubbaQ,

     

    Browsing through the driver code, I see it can hold on to ~1200 "stripes" of data. This term must be a legacy from the days when this code was really a RAID driver, right?

     

    Anyways, if the driver is aware of 1200 simultaneous IO requests, perhaps some of them can be grouped, reordered and processed so that a large series of data reads on adjacent tracks is done in parallel with a similar series of parity drive reads. That is, if the requested stripes have some sort of addressing that can be mapped to drive geometry, there is the possibility of disk-sequential, deterministic reading instead of "non-determinative optimistic read-ahead." After these complete -- with minimal seeks and no wasted rotations -- then parity will be computed, and then both drives would be written to in the same order as the reads.

     

    ------

     

    Can anyone recommend a pithy summary/overview of linux driver architecture and programming? I'll look at this reordering and batching idea in more detail once I understand the overall picture better.

     

    - Eric

  5. IIRC, the code is multithreaded and blocks can be processed out of order -- and as you noted, the drive cache will reorder disk I/O to be more efficient ... but the read of a particular sector, must take place before the write to that sector and the read of both the data disk and the parity disk must take place before the write to parity.  So there is a bundle of 4 operations done by the driver, that are driven by a single write by the OS.

    I see. I've been looking at this as if the unraid code was higher up, ie, not a driver, and had knowledge of what needed to be written beyond a single block or atomic disk operation. If each call to the driver by the OS has no knowledge of previous and forthcoming calls (that is, data to read/write) then it would be very gnarly to have the driver to coordinate with other invocations of it.

     

    From what I've gleaned since last night, there are three main parts to unRAID:

     

    md driver - unRAID-modified kernel disk driver

    shfs - shared file system (user shares?) built on FUSE

    emhttp - management utility

     

    Any others?

     

    - Eric

  6. First, an introduction. I've just started using unRAID Plus (not Pro) and I like it a lot. It has the right trade-offs for me, and is replacing a slow, dedicated RAID NAS box and some unprotected drives in a PC.

     

    I'm a developer. I worked on various unixes (and other OSes) from the mid 80s to mid 90s. Getting ps -aux, ls -lRt, top, and even vi back into L2 has been a trip. (vi twice so for an emacs guy.) I dug up a 1989, spiral bound O'Reilly Nutshell book on BSD 4.3 that I bought back in the day. I'm not a kernel programmer or driver programmer or hardware guy, so the following thoughts may be naive. Clue me in if you can.

     

    I've read that unRAID has to do two disk reads and two disk writes for each data chunk when writing a file. See http://lime-technology.com/forum/index.php?topic=4390.60 for Tom's description. That description, and Joe L. and bubbaQ's posts, make it sound like these operations must be done sequentially, with seeks between each op, waiting for the start sector to spin back to the heads, and so on. All this waiting seems unnecessary to me, except for files that only occupy part of a track, and you'll never get high throughput with them anyways because of all the directory activity. With large files, the parts don't necessarily have to be read, written or processed sequentially along the length of the file.

     

    Let me illustrate. Multithreaded code could issue 20 synchronous reads, each from a separate thread, at different locations in a file. The drive will optimize how it reads and retrieves that data. When the IO requests complete in somewhat random order and the threads unblock, each can work with it's chunk of data to compute or update parity. After this, the write commands can be issued in any order and again, the drive will reorder the commands to write with the fewest seeks and least rotational latency. It would seem to me that allowing out-of-order processing in the code coupled with smart IO request reordering in the drive firmware can keep the heads moving smoothly through a file for relatively long periods. Of course, there will be some limits imposed by memory limits and interleaved block operations in the code, but if 20 tracks can be processed at a time this way, with only one or two seeks instead of 20, it's a big win.

     

    I'm sure this has been investigated, or that there are underlying reasons due to the architecture of unRAID that makes this unfeasible. Anybody know the reasons?

     

    Also, I'm very interested in reading an overview of unRAID's architecture: custom daemons, drivers, executables, and so on. Any pointers would be appreciated.

     

    Thanks,

    - Eric