jbuszkie

Members
  • Posts

    693
  • Joined

  • Last visited

Posts posted by jbuszkie

  1. Here is an updated version that more conforms to the berkley standard for mail.  It will either take the old parameters or the standard ones.

     

    So

     

    Samples Usage:

      cat filename.txt | mail

      echo 'this is a test' | mail

      echo 'this is a test' | mail --subject 'ATTENTION'

      echo 'this is a test' | mail --subject 'ATTENTION' --rcpt [email protected]

     

    All still work or...

     

      cat filename.txt | mail

      echo 'this is a test' | mail

      echo 'this is a test' | mail -s 'ATTENTION'

      echo 'this is a test' | mail -s 'ATTENTION' [email protected]

     

    Will also work.

     

    This still requires unraid_notify to be installed

     

    The program is attached..  just remove the .txt

    Or the whole package is there too...

     

     

     

  2. I'm looking to replace my laptop drive with a bigger one and I'm curious to see if they have regular SATA connectors for power.  I want to run Joe's preclear script on it

    to run it through it's paces...  but I don't know if I'll need any special adapters.  I know the older PATA ones had a different conenctor and I had an adapter for that.  But I haven't pulled the old one out of my laptop to check yet.

     

    My guess is they have regular connectors.

     

    That will also help me transfer the data to the new drive as well..

     

     

    Jim

  3. Whew!  56 hours of testing later, my replacement 1.5T disk passed 3 cycles!

     

    Thanks again for a great script!  It helped catch the first disk being bad!!

     

     

    ===========================================================================
    =                unRAID server Pre-Clear disk /dev/sdd
    =                       cycle 3 of 3
    = Disk Pre-Clear-Read completed                                 DONE
    = Step 1 of 10 - Copying zeros to first 2048k bytes             DONE
    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE
    = Step 3 of 10 - Disk is now cleared from MBR onward.           DONE
    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE
    = Step 5 of 10 - Clearing MBR code area                         DONE
    = Step 6 of 10 - Setting MBR signature bytes                    DONE
    = Step 7 of 10 - Setting partition 1 to precleared state        DONE
    = Step 8 of 10 - Notifying kernel we changed the partitioning   DONE
    = Step 9 of 10 - Creating the /dev/disk/by* entries             DONE
    = Step 10 of 10 - Testing if the clear has been successful.     DONE
    = Disk Post-Clear-Read completed                                DONE
    Elapsed Time:  56:18:58
    ============================================================================
    ==
    == Disk /dev/sdd has been successfully precleared
    ==
    ============================================================================
    S.M.A.R.T. error count differences detected after pre-clear
    note, some 'raw' values may change, but not be an indication of a problem
    19,20c19,20
    < Offline data collection status:  (0x80)       Offline data collection activity
    <                                       was never started.
    ---
    > Offline data collection status:  (0x84)       Offline data collection activity
    >                                       was suspended by an interrupting command from host.
    54c54
    <   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
    ---
    >   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
    ============================================================================

  4. Remember, all the answers to any programming questions are available at google.com  ;D

    So you are saying you are also known as google.com? :D

    I have no interest in adding any e-mail alert at this time myself.  I can use the progress rate it shows and take a pretty good guess how long it will take before it will end.  A simple exercise on your calculator should let you do the same.

    Awww..  come on!  Where's your thirst for adventure!  Think of it as a challenge!  I dare you to try! :)  just kidding!  Yeah..  Adding something to display_progress() function might be easy enough..  I'll tinker once I have some time..  but my guess is that I'll be done with the parity upgrade and forget about this until the next time!  ::)

     

    Oh..  and it is well commented..   good job! 8)

  5. [To add e-mail notifications is easy... step-by-step instructions follow  ;):

    1.  Perform a functional decomposition of the source code of the preclear_disk.sh script.

    2.  Identify the temporal positions where an e-mail notification would be desired.

    3.  Design, develop, and code an appropriate e-mail function that will allow you to capture the screen output at the appropriate intervals... (This is the tricky part)

    4.  Add the e-mail function at the desired notification points, sending it the desired content to be mailed.

    5.  Install an e-mail program on your unRAID server.  (As delivered, unRAID has no e-mail capability)

    6.  Configure the e-mail program to interface with your service provider and test it is able to send status e-mail.

    7.  Run a series of tests, to ensure the e-mail is being routed correctly with appropriate subject lines to identify itself in your mailbox.

    8.  Clear your disk.

     

    Joe L.

    Nice! :D 

    Let me rephrase.. 

    how hard would it be for you to modify this script to e-mail updates as the test is progressing :-)

     

    I've already got unraid notify working..  Maybe I will dig into that..  someday..  For now I'm having to connect back into the session every couple of hours.. :-)

    hmm..  redirect the output to tee and send the contents to bashmail..  hhmm...  all with variables to disable it...

     

    Jim

     

  6. Dumb question here..  I got got my replacement drive and it's on it's second pass. (18 hours for the first pass!)

     

    Since this is my new parity drive, do I really gain anything by running this script? (other than testing the drive)  It's going to have to write all the parity anyway when I assign it, correct?

     

    oh..  And how hard would it be to modify this script to e-mail me updates as the test is progressing?

     

    Jim

     

     

  7. Here is my hdparm info and smart test info.  Interestingly..  After the power cycle my disk 2 was "missing"  I power cycled again and it came back?

     

    Now I just have to see if disk 2 is on the same controller as my new disk..

     

    /dev/sdd:
    
    ATA device, with non-removable media
            Model Number:       WDC WD15EADS-00H7B0
            Serial Number:      WD-WCAUP0018631
            Firmware Revision:  05.00K05
            Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5
    Standards:
            Supported: 8 7 6 5
            Likely used: 8
    Configuration:
            Logical         max     current
            cylinders       16383   16383
            heads           16      16
            sectors/track   63      63
            --
            CHS current addressable sectors:   16514064
            LBA    user addressable sectors:  268435455
            LBA48  user addressable sectors: 2930277168
            device size with M = 1024*1024:     1430799 MBytes
            device size with M = 1000*1000:     1500301 MBytes (1500 GB)
    Capabilities:
            LBA, IORDY(can be disabled)
            Queue depth: 32
            Standby timer values: spec'd by Standard, with device specific minimum
            R/W multiple sector transfer: Max = 16  Current = 16
            Recommended acoustic management value: 128, current value: 254
            DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6
                 Cycle time: min=120ns recommended=120ns
            PIO: pio0 pio1 pio2 pio3 pio4
                 Cycle time: no flow control=120ns  IORDY flow control=120ns
    Commands/features:
            Enabled Supported:
               *    SMART feature set
                    Security Mode feature set
               *    Power Management feature set
               *    Write cache
               *    Look-ahead
               *    Host Protected Area feature set
               *    WRITE_BUFFER command
               *    READ_BUFFER command
               *    NOP cmd
               *    DOWNLOAD_MICROCODE
                    Power-Up In Standby feature set
               *    SET_FEATURES required to spinup after power up
                    SET_MAX security extension
                    Automatic Acoustic Management feature set
               *    48-bit Address feature set
               *    Device Configuration Overlay feature set
               *    Mandatory FLUSH_CACHE
               *    FLUSH_CACHE_EXT
               *    SMART error logging
               *    SMART self-test
               *    General Purpose Logging feature set
               *    64-bit World wide name
               *    {READ,WRITE}_DMA_EXT_GPL commands
               *    Segmented DOWNLOAD_MICROCODE
               *    SATA-I signaling speed (1.5Gb/s)
               *    SATA-II signaling speed (3.0Gb/s)
               *    Native Command Queueing (NCQ)
               *    Host-initiated interface power management
               *    Phy event counters
                    DMA Setup Auto-Activate optimization
               *    Software settings preservation
               *    SMART Command Transport (SCT) feature set
               *    SCT Long Sector Access (AC1)
               *    SCT LBA Segment Access (AC2)
               *    SCT Error Recovery Control (AC3)
               *    SCT Features Control (AC4)
               *    SCT Data Tables (AC5)
                    unknown 206[12] (vendor specific)
                    unknown 206[13] (vendor specific)
    Security:
            Master password revision code = 65534
                    supported
            not     enabled
            not     locked
            not     frozen
            not     expired: security count
                    supported: enhanced erase
            412min for SECURITY ERASE UNIT. 412min for ENHANCED SECURITY ERASE UNIT.
    Logical Unit WWN Device Identifier: 50014ee2ad0035dd
            NAA             : 5
            IEEE OUI        : 14ee
            Unique ID       : 2ad0035dd
    Checksum: correct
    root@Tower:~#
    root@Tower:~#
    root@Tower:~# smartctl -a -d ata /dev/sdd
    smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
    Home page is http://smartmontools.sourceforge.net/
    
    === START OF INFORMATION SECTION ===
    Device Model:     WDC WD15EADS-00H7B0
    Serial Number:    WD-WCAUP0018631
    Firmware Version: 05.00K05
    User Capacity:    1,500,301,910,016 bytes
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   8
    ATA Standard is:  Exact ATA specification draft version not indicated
    Local Time is:    Thu May 14 13:33:22 2009 GMT+5
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x84) Offline data collection activity
                                            was suspended by an interrupting command from host.
                                            Auto Offline Data Collection: Enabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever
                                            been run.
    Total time to complete Offline
    data collection:                 (40500) seconds.
    Offline data collection
    capabilities:                    (0x7b) SMART execute Offline immediate.
                                            Auto Offline data collection on/off support.
                                            Suspend Offline collection upon new
                                            command.
                                            Offline surface scan supported.
                                            Self-test supported.
                                            Conveyance Self-test supported.
                                            Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            General Purpose Logging supported.
    Short self-test routine
    recommended polling time:        (   2) minutes.
    Extended self-test routine
    recommended polling time:        ( 255) minutes.
    Conveyance self-test routine
    recommended polling time:        (   5) minutes.
    SCT capabilities:              (0x303f) SCT Status supported.
                                            SCT Feature Control supported.
                                            SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   139   139   051    Pre-fail  Always       -       14844
      3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       9
      5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
    10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
    11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
    12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
    192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       1
    193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       9
    194 Temperature_Celsius     0x0022   127   121   000    Old_age   Always       -       25
    196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   195   195   000    Old_age   Always       -       1311
    198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
    200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    
    SMART Selective self-test log data structure revision number 1
    SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    

    Now do I have to be concerned about the "Current_Pending_Sector "  Number?  Seems like that should  be 0 for a new good drive..

     

    Could a bad controller have any effect on that number?

  8. Assuming that /dev/sdd is your new disk... it looks like it stopped responding.  Might be a loose cable, either power or data...  It is very easy for some sata cables to come loose.

    If not loose, then odds are the disk died an early death.

     

    Can you do a

    hdparm -I /dev/sdd

    or

    smartctl -a -d ata /dev/sdd

    and get anything back at all?

     

    If the disk did die an early death... sorry, but the script did exactly as designed... it helped identify an early failure.

    Be happy it failed before you added it to your array... It takes a lot more time to replace it once it has data on it.

    After the reboot it seems to be happily be running  I'm at 98% of the pre-read...   Ok..  Change that...  I guess it isn't happy...  I'm getting more of those errors on the zeroing..

     

    Here is a snippet of the log.  The snippet starts at close to the end of the pre read and captures the start of the zeroing..  I'm in the middle of a power cycle (remotly so I may not get it back).   I'll have to look to see if the very first pass of this test behaved well...  I'll post the smart results when the computer reboots..

     

     

     

     

  9. I am running this on my new 1.5TB Maxtor Green.  It did one full pass that seemed to work.  On the second pass, it didn't finish.

    I looked in /tmp for the smart logs, but it appears to have been deleted.  where should I look to see what happened?  I remember seeing something about not being able to do something with the MBR.  My putty session got killed when I rebooted.  I'm going to try again

    and see if it was just some sort of fluke.  My syslog file is 500Meg!  With a whole ton of these:

     

    May 14 04:48:23 Tower kernel: end_request: I/O error, dev sdd, sector 2930137216
    May 14 04:48:23 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

     

    And I see this

    May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00
    May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930043648
    May 14 04:48:22 Tower kernel: __ratelimit: 78016 callbacks suppressed
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255456
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255457
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255458
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255459
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255460
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255461
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255462
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255463
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255464
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255465
    May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd
    May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00
    May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930044672
    

      As well

     

    Is there anything I should look for?

     

    Jim

  10. Is there an wasy way to post a link to the wiki on the forum page?  I have the forum bookmarked, and I'm constantly having to go back to the root lime-tech page to get at the wiki.  It would be nice if there was an easy way to get it from the forum main page! :-)

     

    Jim "I know..  I should just bookmark the wiki in my browser!"

  11. I'm not against adding an option for a second parity drive, but I do think users should be aware of the downside, another significant hit on write performance.  Writing to a parity protected drive now takes about 2 to 3 times as long as reads do, because there are 4 disk I/O's involved in a write, only 1 I/O in a read.  Adding a second parity drive means adding a third drive to the equation, effectively causing 6 disk I/O's for each write.  I am going to speculate on write times dropping to about 2.5 to 4 times as long as reads.  And since this involves one more drive being buffered, it is possible that the stream delays to empty full buffers are going to be even more of a problem.  I don't think the slower writes are going to be acceptable to most users.

     

    Well for me, It's only an issue when upgrading the parity drive. Myself, I would use only one parity drive all other times.  But also in my case.. I'm using a cache drive so write speed isn't important.

     

    Just to add some perspective, I think we have gotten spoiled pretty quickly by the advantages of parity protection.  But we have probably grown up without missing it most of our lives.  An extremely small percentage of the drives we have ever used are parity protected or mirrored, certainly very few of the system drives in our desktops.  Very very few systems are sold today with mirrored or parity protected drives, although more and more are now capable of it.  We have expected our drives to either fail quickly, or give us years of reliable operation.  And we have relied on backups to save us when they do very unexpectedly fail.

    True...  I agree that we have gotten a little spoiled..  But now that we have it, we are trying to be able to use it all the time. :D

     

    Which brings me to the last point, parity protection is NOT a substitute for backups.  I would prefer more effort being put into backup procedures, and additional copies of our data, than improving the safety of any one copy of our data.  Please see the UnRAID Topical Index, Backups section (especially the later links) for more thoughts on backups.

     

    UnRaid is where I store all my incremental backups! (along with my music and TV shows and DVDs)  Do I need a backup of my back up :)  probably not. 

     

    One last thought, you can upgrade the parity drive to a larger drive, without endangering the current parity protection (but you can't keep the array online), by a manual procedure.

    That's the problem..  I have to have the array offline for many hours while the new parity builds or risk being unprotected!

  12. I'd like to add my support for addition of the 2 parity system... 

    My biggest complaint is in upgrading the parity drive to a bigger one, you run the risk of being unprotected while it's being built.  (which could take many hours, right?)

    Or you keep your old parity and don't use the system until it's done rebuilding the new parity disk.  So I either sacrifice reliablility or availibility.

    I don't want to lose either!

     

    For me, if we had the ability to build the 2nd parity disk while the first was still in the system, then I could upgrade my parity disk and keep my system on-line AND be protected.  After it's done building then I'd probably take the original one off-line and add it as a new drive.  But at least there is no down time or unprotected time.

     

    Just my $0.02

  13. I just added a cache drive to my system and I'd like to update Brainbone's unraid notify script to report the cache drive status as well.

    Brianbone's script parses the /proc/mdcmd to get the disk list.  The cache drive isn't in there.  Now I can probably hard code my cache drive, but I was hoping for something a bit more generic.

     

    Any thoughts??  I'm sure it's probably easy.. but I'm not an expert with the internals of UnRaid

     

    Thanks,

     

    Jim