Jump to content

Joe L.

Members
  • Posts

    19,009
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by Joe L.

  1. @Joe L.

     

    sine tonight i have this entries in my logfile. Running unraid 5.0

     

    Sep 13 23:26:25 Server cache_dirs: scheduling via at ./cache_dirs -w -m 1 -M 10 -d 9999 -p 10 -a "-noleaf"

    Sep 13 23:27:00 Server cache_dirs: ==============================================

    Sep 13 23:27:00 Server cache_dirs: command-args=-w -m 1 -M 10 -d 9999 -p 10 -a -noleaf

    Sep 13 23:27:00 Server cache_dirs: vfs_cache_pressure=10

    Sep 13 23:27:00 Server cache_dirs: max_seconds=10, min_seconds=1

    Sep 13 23:27:00 Server cache_dirs: max_depth=9999

    Sep 13 23:27:00 Server cache_dirs: command=find -noleaf

    Sep 13 23:27:00 Server cache_dirs: version=1.6.7

    Sep 13 23:27:00 Server cache_dirs: ---------- caching directories ---------------

    Sep 13 23:27:00 Server cache_dirs: server

    Sep 13 23:27:00 Server cache_dirs: ----------------------------------------------

    Sep 13 23:27:00 Server cache_dirs: cache_dirs process ID 24124 started, To terminate it, type: cache_dirs -q

     

     

    From where did these entries come ? I have only normaly started cache_dirs -w at startup ?

     

    Eisi

    If the array is off-line, cache_dirs will re-schedule itself to be run when the array is possibly online.  The log entry  is entirely normal when this ccurs.
  2. Is this disk unsafe to use, with the Raw_Read_Error_Rate being so high?

    All disks have raw-read-errors, some report them, some do not.

     

    The failure threshold of 006 for that parameter in SMART is not close to the current normalized value of 116,  however the "worst" normalized value has been as low as 99, so I'd keep an eye on the disk over the next months/years.  It looks great otherwise.

     

      1 Raw_Read_Error_Rate    0x000f  116  099  006    Pre-fail  Always      -      105202000

     

    The normalized value actually improved from 114 to 116 over the course of the preclear.  That is a good sign.

  3. ]It will NOT find the un-readable sectors when using that option, so just be aware of the risks. 

    (Far more likely to suffer read errors (un-correctable-media-errors) once put online)

     

    True -- however, if it's a "known working drive" that's been in use and is known to be in good shape, it's MUCH faster to (a) do a preclear with the -n parameter;  ((b) add it to the array; and then © run a parity check => something you should do anyway when adding a new drive, and that WILL find any unreadable sectors  :)

    good point.
  4. This may have already been mentioned.. but I can't seem to find it with the search..

     

    Just got a new drive and went to pre clear it.. ran pre clear_disk.sh -l and it is showing ALL my disks, not just the unassigned ones. Is this a known issue or is there something wrong that might cause this??

     

    Cheers,

     

    whiteatoms

    In the latest 5.X releases the array must be started for preclear to find the assigned disks.  Yes, known issue with the latest unRAID.  It use to keep a "readable" file with the disk assignments in the /boot/config folder, apparently, it no longer does.

     

    MAKE SURE YOU KNOW THE DISK YOU WILL BE CLEARING BY THE SERIAL NUMBER...

     

    Joe L.

  5. RobJ,

     

    Thanks very much for the insight!

     

    Yeah, this drive is a little wonky, I'd only had it for a few months, running in my MBP, and all of a sudden it started taking FOREVER to read.  I was watching a movie with XBMC and it stuttered every few seconds.  I couldn't agree more on not using it in the array, I was just hoping that the combined stress level of preclearing and the drive's own mechanisms for fixing its problems would make it useful for something besides a coaster.  (now there's an idea, old drives encased in lexan for drink coasters?  who doesn't need a frosty adult beverage whilst constructing the perfect parity-protected server?!)

     

    I'm running the fsck now, but I do have a few more questions.

     

    If the kernel detected a lost packet, why did the 3TB drive (md12) not redball?  As you see from the dates, this was a couple of days ago, and the array still showed all green after the issue.

     

    What could, and I know this is probably all guesswork, cause the 1TB drive to "fail" in this manner?

     

    And thirdly, I'd like to second the respectful request to Joe L. to take a look at the preclear script.  These don't look like 'PASS'es to me!  Don't get me wrong here, I, like probably everyone else on this forum am indebted to Joe multiple times over for his advice and unmenu and everything else.  Just a question/bug(maybe).

     

    Thanks!

     

    Pengrus

     

    p.s. Oh, i forgot one more...is there any way to tell what "command c21f9f00" is?

    I agree, the preclear script should do better in detecting the complete failure of a drive, it is not easy however, as the "dd" commands used do not always give clear indications of a failure.  (preclear does not look in the syslog)
  6. Hi Joe, found a quirk.  Running unRaid 5.0-rc16c and preclear 1.13.

     

    In the unRaid disk settings, set Enable Auto Start to No, then reboot.  Run preclear_disks.sh -l.

     

    All disks, whether assigned to the array or not, are listed as available for clearing.

     

    Start the array.  Only the correct disks are listed for clearing.

     

    Stop the array.  The correct disks are still listed.

    I'll check it out.  I have a server running 5.0rc16c.  Thanks for the report.
  7. I'm about to buy another 2 4TB drives and intend to preclear them both at the same time.

     

    I've got a machine that can be dedicated just to preclearing and it will be running virtually stock unRAID 5.x apart from the addition of a few small-footprint apps like unMENU and screen.

     

    It has 3GB of RAM. The last time I precleared a single 4TB drive, on a machine running a ton of add-ons, but with 8GB of RAM, I was getting lots of memory errors (but no crash of the preclear process). I imagine that's because I was running out of lowmem while using the other apps.

    The command line I used was:

     

    preclear_disk.sh -r 65536 -w 65536 -b 25600 /dev/sdb

     

    Can someone please advise what would be the optimum or suggested command line parameters to preclear 2x 4TB drives with 3GB RAM?

    I guess you'd want a combination of speed + memory safety.

     

    Cheers!

    I wrote it, and I would not even guess.  I would not use more than -b 2000 personally.  You are asking it to read in 1.6 GB chunks.

     

    65536 * 25600

     

     

  8. ALL disks have hardware error correction errors, some report them some do not.  It is normal.  The important thing to check is if the "normalized" value is above the affiliated error threshold.  If it is, the drive is working as expected.

     

    All that said,  the current normalized value is 031, the worst value it has been is 006, and the failure threshold is 0.    Keep an eye on the disk, if the value continues to drop low into the single digits, it might be an indication of a drive nearing the end of its life. (it has been spinning for 16127 hours)

  9. Not that I'm aware of...  But which download link are you referring to?

     

    Thanks Joe L. for writing!  The link is the bottom of your original post in this thread, shown as cache_dirs.zip:

     

    http://lime-technology.com/forum/index.php?action=dlattach;topic=4500.0;attach=14315

     

    I appreciate the help!

    Nothing at all wrong with the zip file. It has no ms-dos carriage returns in it. (I just downloaded it and un-zipped it on my server)
  10. I thought that the latest version of the pre-clear script would, when run on a system with UnRAID v5, automatically always use sector 64 => but perhaps that's not the case. 

    Incorrect, if no option is specified, then the unRAID configuration preference setting is used.

    Look under

    Settings->Disk Settings->Default-partition-format

    to set your default preference.

     

    Thanks for the details Joe.    But just for clarification, isn't it true that the default setting for v5 is 4K aligned?    ... in which case I'd think you don't need the -A switch (with v5)  UNLESS you've changed that setting.  Correct?

    I do not know the default...
  11. On my second preclear I forgot to add the -A switch when executing the command. So now I have my first disk precleared with a starting sector of 64 and the second one with a starting sector of 63. Any issues here?? Do I need to run another preclear on the 2nd disk with the -A option?

     

    I thought that the latest version of the pre-clear script would, when run on a system with UnRAID v5, automatically always use sector 64 => but perhaps that's not the case. 

    Incorrect, if no option is specified, then the unRAID configuration preference setting is used.

    Look under

    Settings->Disk Settings->Default-partition-format

    to set your default preference.

     

    One more thing...

    Unless your disk is a WD "EARS" drive with the firmware that performs poorly when partitioned to start on sector 63, then odds are you would not notice any difference in performance at all regardless of where the partition starts.  (and even that drive would work just fine with the partition starting on sector 63 unless you were really anal about performance)    You likely could have left things as they were.

  12. I'm in the process of pre-clearing several new 4TB Seagate NAS drives, and have the following results from the first 2 drives (the two sets of results are nearly identical, so I've only listed one, as it shows what I'd like to know).

     

    I'd appreciate some feedback on the results.    My thoughts/questions are as follows:

     

    (1)  The Raw Read Error Rate actually shows improvement (from 100 to 110), so I assume it's fine, and I should simply ignore the raw value.    Is that correct?

    Correct

    (2)  Both the Spin Retry Count and End-to-End Error values are the same (100 before, 100 after), but the status shows "Near Threshold".  Any comment on these?  ... or are they fine and "ignorable" ??

    anything with a current normalized value within 25 of its affiliated failure threshold  is "near threshold"    For some attributes, the manufacturer sets the failure threshold within a count or two from the factory initial value.

    (4)  The temps seem fine.  Not sure why the airflow is shown as "Near Threshold", but I don't see anything to worry about -- agree?

    same above. 
  13. It's my hard drive ready?

    Where's the report showing the changes?

     

    The high-fly-writes went from 100 to 75 during the pre-clear.  Although nowhere near the failure threshold, keep an eye on it in the next few months.

     

    Thanx Joe .... I will add it to the array now to replace another failing hd.

     

    What is a high-fly?  should I be worried?  I will keep an eye on it the next few months. Sorry here is the report forgot there where 3 files.

    From the SMART Wiki:

    High Fly Writes

    HDD producers implement a Fly Height Monitor that attempts to provide additional protections for write operations by detecting when a recording head is flying outside its normal operating range. If an unsafe fly height condition is encountered, the write process is stopped, and the information is rewritten or reallocated to a safe region of the hard drive. This attribute indicates the count of these errors detected over the lifetime of the drive.

     

    This feature is implemented in most modern Seagate drives[1] and some of Western Digital’s drives, beginning with the WD Enterprise WDE18300 and WDE9180 Ultra2 SCSI hard drives, and will be included on all future WD Enterprise products.[20]

     

  14. They are just the statistics reported by the "dd" command.  I never cared what they meant (was more interested in the number of blocks read).  Probably indicates some partial blocks of bytes.  (nothing to worry about)  You would have to look up the "dd" command to learn more.

     

    Joe L.

    I did some research.  The number after the "+" is the number of partial blocks read or written.  Since the pre-clear script periodically sends a query to the process actually writing a disk that is using "dd", and has no way to synchronize the request, it is possible to get a response while in the middle of writing a "block" of data.    For that reason you'll see the +n increment over time.  It is just a measure of how many times you managed to ask its percentage complete when it was in the middle of writing a block of data.

     

    Joe L.

  15. here is the output

     

    root@Tower3:/boot/scripts# hdparm -i /dev/sdh

     

    /dev/sdh:

    HDIO_DRIVE_CMD(identify) failed: Input/output error

    HDIO_GET_IDENTITY failed: No message of desired type

     

    I think UNRAID has an issue with the Supermicro AOC-SASLP-MVL8. Is this SATA addon card supported? I thought it was or I would not have spent over $100 on it.

     

    Can anyone confirm that the Supermicro AOC-SASLP-MVL8 is supported in UNRAID?

     

    I posted a syslog of startup with the SATA board installed. Can someone confirm it is ok?

     

    http://lime-technology.com/forum/index.php?topic=28406.0

    The errors you are getting are consistent with a drive that has stopped responding to commands.

    It could be the disk itself, or a cable to it, or a disk controller port.  It may even start responding again if power cycled. (it is locked up by buggy firmware on the disk)

     

    In any case, you can isolate the issue by re-seating/replacing the cables to the drive, trying a different disk controller port, etc.

     

     

  16. I am wondering if this Supermicro AOC-SASLP-MVL8 is causing me issues. They are supposed to be great SATA controllers.

     

    I tried this:

     

    smartctl  -a  -d  sat  /dev/sdh | todos >/boot/data/smart_report_disk6.txt

     

    Gave me this:

     

    smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

    Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

     

    === START OF INFORMATION SECTION ===

    Device Model:    [No Information Found]

    Serial Number:    [No Information Found]

    Firmware Version: [No Information Found]

    Device is:        Not in smartctl database [for details use: -P showall]

    ATA Version is:  [No Information Found]

    ATA Standard is:  [No Information Found]

    Local Time is:    Tue Jul  9 20:56:20 2013 EDT

    SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.

    SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.

    A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

    looks to me like the drive is not responding at all.

     

    Have you tried

    hdparm -i /dev/sdh

  17. Thanks very much for the rapid reply.

    I think I'll run another pre-clear on that drive, and then monitor it closely.

    Smething happened prior to the preclear to mark those sectors as having a checksum at the end of the sector that did not match the contents of the sector.

    As already mentioned, they were re-written in place, and the contents then matched the expected checksum for those 7 sectors.

     

    A subsequent preclear cycle will tell you more.  With any luck you'll not see any additional un-readable sectors

    (all checksums at ends of sectors will match the contents of their affiliated sector)

     

    Joe L.

  18. If it completed it would have logged the results in the /boot/preclear_reports directory.

    Ah-ha!

    ========================================================================1.13
    == invoked as: ./preclear_disk.sh -A /dev/sdb
    ==  ST4000DM000-1F2168    W3009TM3
    == Disk /dev/sdb has been successfully precleared
    == with a starting sector of 1 
    == Ran 1 cycle
    ==
    == Using :Read block size = 8225280 Bytes
    == Last Cycle's Pre Read Time  : 9:32:25 (116 MB/s)
    == Last Cycle's Zeroing time   : 8:22:56 (132 MB/s)
    == Last Cycle's Post Read Time : 19:03:40 (58 MB/s)
    == Last Cycle's Total Time     : 37:00:07
    ==
    == Total Elapsed Time 37:00:07
    ==
    == Disk Start Temperature: 29C
    ==
    == Current Disk Temperature: 36C, 
    ==
    ============================================================================
    ** Changed attributes in files: /tmp/smart_start_sdb  /tmp/smart_finish_sdb
                    ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
          Raw_Read_Error_Rate =   118     100            6        ok          182808288
             Spin_Retry_Count =   100     100           97        near_thresh 0
             End-to-End_Error =   100     100           99        near_thresh 0
      Airflow_Temperature_Cel =    64      71           45        near_thresh 36
          Temperature_Celsius =    36      29            0        ok          36
    No SMART attributes are FAILING_NOW
    
    0 sectors were pending re-allocation before the start of the preclear.
    0 sectors were pending re-allocation after pre-read in cycle 1 of 1.
    0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
    0 sectors are pending re-allocation at the end of the preclear,
        the number of sectors pending re-allocation did not change.
    0 sectors had been re-allocated before the start of the preclear.
    0 sectors are re-allocated at the end of the preclear,
        the number of sectors re-allocated did not change. 
    ============================================================================
    

    Looks great!!!

     

    Joe L.

×
×
  • Create New...