Jump to content

Guzzi

Members
  • Posts

    219
  • Joined

  • Last visited

Posts posted by Guzzi

  1. Did anybody find a solution to AVOID the spinup of the drives after WOL?

     

    No and you won't.... ACPI doesn't provide for it.  Even for drives with power management that will power up in spin-down mode, it won't work.

     

    ...so the script you have posted somewhere else for WD drives to get staggered spinup also doesn't work for it? So the reinitialization process includes a command to the disk that ALWAYS spins up the drive?

    I am not experienced nor do I know enough about the ata protocol and that stuff, just want to make sure i properly understand that and why it's working or not working.

    Thanks, Guzzi

  2. When I wake my server from S3, it always spins up all drives and does some reinitialization on the ports.

    I have two questions:

    1.) Do others get those errormessages on inizialization also? Is it something to be worried about or is it harmless?

    2.) Did anybody find a solution to AVOID the spinup of the drives after WOL? So that just the drives being accessed spin up as in normal operations? I would be interested in a solution or hints into the right direction.

     

    Thanks, Guzzi

     

     

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x11)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata2.00: revalidation failed (errno=-5)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata2.15: hard resetting link

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata2: controller in dubious state, performing PORT_RST

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x11)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1.00: revalidation failed (errno=-5)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1.15: hard resetting link

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1: controller in dubious state, performing PORT_RST

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata2.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Nov 17 22:37:04 XMS-GMI-02 kernel: ata1.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

  3. For completeness, here's <s3_notHddHrsNet.sh>; it does not need adjustment to work, but it is [CUSTOMIZABLE] in two places.

     

    Been running the s3 script for several days and these new changes look great.  What would make this perfect for me was if, as its final check, it pinged a list of IPs and if it couldn't reach any of them, then it would go to sleep.  I don't know enough to do it myself and am looking for a bit of help on this.  I believe someone else mentioned this earlier in the thread, but I don't think it got fleshed out far enough.

    It's just adding another section after the check of the timeframe. But I better do not post it - I can only do "if-then-else" - those guys here knowing better can easily set a vaiable containing the IP-adresses and have them processed. It should be very easy.

    So long you can do something like

    ping 1.2.3.4 -c 1 | grep -q "ttl="

        if [ $? -eq 0 ]

        then ...

    so you can set a variable that prohibits (or allows) the machine to fall asleep...

  4. Maybe the smbstatus program can be used to reveal open files and/orlocks on the smb share.

     

    Sorry. You guys are far... far ahead of me when it comes to Linux, which is to say that I don't know enough about Linux to write a competent script on my own.

     

    Rereading Guzzi's post and looking at the S3.SH script that I am using from awg's wiki, I now understand that Guzzi is refering to modifications he made on his script to check whether his HTPCs were on, and not to the basic script I am using.

     

    So Guzzi, if you would be kind enough to share your code I would like to give it a try.

     

    [edit]If all the added code does is ping the client for a response, I don't think it will help with the WDTV, because the WDTV does not power off completely. It only "soft-offs" so the network controller remains on. You can even telnet and FTP to it while it's in the "off" state.[/edit]

     

    And, WeeboTech, if you could illustrate how to incorporate your idea into the script I would also appreciate it. Though, please note, I am using NFS shares not smb, so maybe it wouldn't help in my case.

     

    My apologies for being at the beginning of the learning curve on this stuff and relatively useless at helping myself.

     

    Any help would be appreciated.

     

    ok, in fact my script is useless for you, if your networkdevice keeps TCP up while "switched off" - it works fine for devices that enter e.g. S3 mode - so, the networkadapter is active and responds on MAC address, but TCP-stack is not running, so it's NOT responding to a ping.

    For sure I am not good in scriptingstuff, so my script is not elegant, but it does what I need... here is what I added as "check":

     

        ping "DNSNameOfDevice or IP-Address" -c 1 | grep -q "ttl="

        if [ $? -eq 0 ]

        then

    echo HTPC is active

        else

    echo HTPC is not active

        fi

     

    One more question from my side - I read somewhere, that it is possible not to spin up the harddrives when waking up the unraid machine from S3 - has anybody successfully done this? To my understanding at least the devices need some reinitialization that usually causes the drives to spin up. Or is this depending on the mobo/BIOS?

     

  5. I was able to get S3 sleep working on my new unRAID box quickly, thanks to agw's wiki and the info I found here. So I thought I had everything working the way I wanted... that is until I experienced a couple of incidents where unRAID went to sleep while serving music to my media player!

     

    I believe the problem is that my RAM buffer (I have 2 GB installed) is able to store more music than will play in the time period I have set before the hard drives spin down + the delay I have set in the S3.SH script.

     

    Obviously one solution is to significantly increase the time periods so that they exceed the maximum amount of "play time" the cache will hold. (Can anyone tell me approximately how much of my 2GB is dedicated to buffering?) But I suspect this would mean the server would stay awake a lot longer than would otherwise be necessary in the vast majority of circumstances.

     

    I think the optimal solution would be to modify the S3 script to check to see if there is any network activity going on before implementing the sleep state.

     

    Do you think I have diagnosed the problem correctly?

     

    Can anyone suggest how to change the script to accomplish what I describe?

     

    Thanks!

     

    The S3 script already checks, if all disks are spun down.

    I have added checks if the HTPC-machines are up and available (powered on) - so Unraid doesn't go to sleep as long as the HTPCs are running - even if the disks are spun down. Can be done easily e.g. with ping and works fine.

    Disadvantage: Unraid will never go to S3 if HTPC is running, even if HTPC is not using it.

     

  6. No problems at all.  Since the rate of zero (the RAW number) is the same, there is no real change here.  For some reason, the scaled numbers, VALUE and WORST, have been reset.  The number 253 usually seems to indicate "Not Used Yet".

    Thanks - I mounted the disk in the machine and got kernel errors - took me some time to find, it was the 16+ drive bug in current beta release - because I only had 16 drives (15+1, no cache drive). So it seems, the 16+-bug is not only related to the number of drives - but also related to the slots! After deleting super.dat (otherwise unraid always crashed on startup when trying to sync) and moving all drives to the lower slots it works fine ... now syncing...

  7. Hi, I made another preclear on a disk with new cables, that had even syslogerrors before. Syslog is clean now, preclear did it, but tells about one difference in seek error rate. Something to worry about?

     

    THanks, Guzzi

     

    ===========================================================================

    =                unRAID server Pre-Clear disk /dev/sdg

    =                      cycle 1 of 1

    = Disk Pre-Clear-Read completed                                DONE

    = Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

    = Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

    = Step 5 of 10 - Clearing MBR code area                        DONE

    = Step 6 of 10 - Setting MBR signature bytes                    DONE

    = Step 7 of 10 - Setting partition 1 to precleared state        DONE

    = Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

    = Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

    = Step 10 of 10 - Testing if the clear has been successful.    DONE

    = Disk Post-Clear-Read completed                                DONE

    Disk Temperature: 32C, Elapsed Time:  15:00:48

    ============================================================================

    ==

    == Disk /dev/sdg has been successfully precleared

    ==

    ============================================================================

    S.M.A.R.T. error count differences detected after pre-clear

    note, some 'raw' values may change, but not be an indication of a problem

    58c58

    <  7 Seek_Error_Rate        0x000e  200  200  051    Old_age  Always      -      0

    ---

    >  7 Seek_Error_Rate        0x000e  100  253  051    Old_age  Always      -      0

    ============================================================================

     

  8. It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-)

    Well... sorry about causing you extra "trouble" but then I figure you might want to avoid extra issues that can be uncovered before you move your files...   The cost of a few new drives is small compared to the amount of time and effort needed otherwise.

     

    I hope your data transfer goes smoothly once you have a set of disks to move it to.   From what you've said, your RAID-5 array would have had to deal with the defects on those two old disks at some point... and it might not have been as easy to swap in a new larger drive.

     

    Joe L.

    I appreciate the help and the abilities of your tools - I didn't complain, just reported back my experience. Please don't misunderstand me - I am happy to discover the problems in advance instead of having the trouble later and yes, you're completely right - the price of a new disk is nothing compared to trouble of a machine and the data on it - that's why I replaced the failing drives quickly with new ones...

     

  9. I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails

    Don't you mean... just installing and being surprised, if when something fails  :( :( :(

    [...]

    Yes, you're absolutely right - but you noticed my smiley also, didn't you ...

    It IS a positive thing to get those extended informations - I appreciate it - and as you might have seen to my last posts: at least 2 drives of my former windows raid-5 do not behave good - and I am more than happy to identify them and throw them out of my box. It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-)

  10. thanks for the infos - did some reading, lot's of details. Hmmm, I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails ;-) - just kidding - I like the concept of the preclear script very much - once reading and writing the whole HD before using it in production IS a help to discover problems in advance. At least I found 2 harddiscs behaving strange - will have a closer look to them after doing my migration to the healthy drives.

  11. Hi, I made another 4 preclears of disks formerly used in another windows raid-5. the Script claims there are some differences pre and post - can you have a look on the message of seekerrorrate and comment on it if it is something to worry? tnx, Guzzi

     

    ===========================================================================

    =                unRAID server Pre-Clear disk /dev/sdb

    =                       cycle 1 of 1

    = Disk Pre-Clear-Read completed                                 DONE

    = Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

    = Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

    = Step 5 of 10 - Clearing MBR code area                         DONE

    = Step 6 of 10 - Setting MBR signature bytes                    DONE

    = Step 7 of 10 - Setting partition 1 to precleared state        DONE

    = Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

    = Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

    = Step 10 of 10 - Testing if the clear has been successful.     DONE

    = Disk Post-Clear-Read completed                                DONE

    Elapsed Time:  21:41:19

    ============================================================================

    ==

    == Disk /dev/sdb has been successfully precleared

    ==

    ============================================================================

    S.M.A.R.T. error count differences detected after pre-clear

    note, some 'raw' values may change, but not be an indication of a problem

    58c58

    <   7 Seek_Error_Rate         0x000e   100   253   051    Old_age   Always       -       0

    ---

    >   7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0

    63c63

    < 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always       -       72598

    ---

    > 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always       -       72599

     

     

     

    ===========================================================================

    =                unRAID server Pre-Clear disk /dev/sdc

    =                       cycle 1 of 1

    = Disk Pre-Clear-Read completed                                 DONE

    = Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

    = Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

    = Step 5 of 10 - Clearing MBR code area                         DONE

    = Step 6 of 10 - Setting MBR signature bytes                    DONE

    = Step 7 of 10 - Setting partition 1 to precleared state        DONE

    = Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

    = Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

    = Step 10 of 10 - Testing if the clear has been successful.     DONE

    = Disk Post-Clear-Read completed                                DONE

    Elapsed Time:  23:20:10

    ============================================================================

    ==

    == Disk /dev/sdc has been successfully precleared

    ==

    ============================================================================

    S.M.A.R.T. error count differences detected after pre-clear

    note, some 'raw' values may change, but not be an indication of a problem

    58c58

    <   7 Seek_Error_Rate         0x000e   100   253   051    Old_age   Always                                            -       0

    ---

    >   7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always                                            -       0

    63c63

    < 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always                                            -       73100

    ---

    > 193 Load_Cycle_Count        0x0032   176   176   000    Old_age   Always                                            -       73101

     

    ===========================================================================

    =                unRAID server Pre-Clear disk /dev/sdd

    =                       cycle 1 of 1

    = Disk Pre-Clear-Read completed                                 DONE

    = Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

    = Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

    = Step 5 of 10 - Clearing MBR code area                         DONE

    = Step 6 of 10 - Setting MBR signature bytes                    DONE

    = Step 7 of 10 - Setting partition 1 to precleared state        DONE

    = Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

    = Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

    = Step 10 of 10 - Testing if the clear has been successful.     DONE

    = Disk Post-Clear-Read completed                                DONE

    Elapsed Time:  26:25:24

    ============================================================================

    ==

    == Disk /dev/sdd has been successfully precleared

    ==

    ============================================================================

    S.M.A.R.T. error count differences detected after pre-clear

    note, some 'raw' values may change, but not be an indication of a problem

    58c58

    <   7 Seek_Error_Rate         0x000e   100   253   051    Old_age   Always                                                    -       0

    ---

    >   7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always                                                    -       0

    63c63

    < 193 Load_Cycle_Count        0x0032   173   173   000    Old_age   Always                                                    -       81301

    ---

    > 193 Load_Cycle_Count        0x0032   173   173   000    Old_age   Always                                                    -       81306

     

    ===========================================================================

    =                unRAID server Pre-Clear disk /dev/sde

    =                      cycle 1 of 1

    = Disk Pre-Clear-Read completed                                DONE

    = Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

    = Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

    = Step 5 of 10 - Clearing MBR code area                        DONE

    = Step 6 of 10 - Setting MBR signature bytes                    DONE

    = Step 7 of 10 - Setting partition 1 to precleared state        DONE

    = Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

    = Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

    = Step 10 of 10 - Testing if the clear has been successful.    DONE

    = Disk Post-Clear-Read completed                                DONE

    Elapsed Time:  25:00:20

    ============================================================================

    ==

    == Disk /dev/sde has been successfully precleared

    ==

    ============================================================================

    S.M.A.R.T. error count differences detected after pre-clear

    note, some 'raw' values may change, but not be an indication of a problem

    19,20c19,20

    < Offline data collection status:  (0x82)      Offline data collection activity

    <                                      was completed without error.

    ---

    > Offline data collection status:  (0x84)      Offline data collection activity

    >                                      was suspended by an interrupting command from host.

    63c63

    < 193 Load_Cycle_Count        0x0032  176  176  000    Old_age  Always      -      72723

    ---

    > 193 Load_Cycle_Count        0x0032  176  176  000    Old_age  Always      -      72724

    ============================================================================

     

     

  12. ... you couldn't see it, because I didn't access the other drive - as soon as I do e.g. preclear on it, I get the same messages in syslog. I do NOT get any of those errors wth all other drives (did e.g. the reiserfsck on all drives except parity).

    Cabling is always a mess - I had those problems in the pre-unraid ära (windows raid-5 with the free veritas solution) as well - changed sata cables, chaged powercabling, changed powersupply, etc.

    The worst problem is those splitters, that you just touch and you hear the drive spindown and up again - just because voltage dropped a bit - this depends also on the brand of the drives - some are more sensitive, some less - at that time I replaced my powercabling from those PC-standard stuff to a more solid powerdistribution - helped a lot.

     

    Anyway, regarding this current situation: I have ordered a new drive yesterday, will be delivered today and it will replace those two "in question drives".

    I can then test those drives in another box when I have time to decide if or if not I can continue using them. If they show ok, I will throw them in my backupbox later.

    Currently my focus is on getting (or keeping) my main box stable and errorfree to put it "in the corner and forget about it" ;-)

  13. That syslog is a mess!  And it's only the latter part too, it is missing the 600 to 900 odd lines of system setup at the beginning.

     

    The drive with ID of sdn probably has a poor quality cable.  I would replace it if at all possible.

     

    And Joe is right, there were page allocation failures for many subsystems, including the share file system, Samba, and possibly involving the networking and Reiser file system modules, which is worrying.  In this piece of the syslog, I don't see any kernel panics, so I don't think we can say for sure that there is any damage, such as evidence of flaky memory, or corrupted Reiser file systems, but I never fully trust a system that has crashed.  Always better to restart fresh.  I certainly would not try to run anything important, once I saw the first sign of suspicious system operation.  Those 'Call Traces' definitely qualify as suspicious system operation.  Grabbing the syslog and waiting for advice was the correct thing to do.

     

    Even though I saw no 'panics' here, to be safe, I would reboot and run a full memory test first, then run reiserfsck on each of the data drives (see the Check Disk File systems page for instructions).  I'm sorry, it is somewhat time-consuming, but it is better to be safe.  The memory test is probably not needed, so you can postpone it if you wish, but I like to be thorough, and know whether a system is truly trustworthy, especially when I have just had extensive memory-related problems.  I would like to say test only the data drives you were actually using, but it appears that there were numerous spin downs to many drives, and the mover ran at least twice, so it looks like all or most of your drives may have been written to.

     

    2 GB of memory should have been more than enough.  I can't see any reason so far for the problems, at least not from this syslog.

     

    ... I'm done... I ran the memorytest overnight - it passed 8 times without errors plus I ran the reisefsck on all data drives - all went through without any errors reported. Checked syslog also, no errors, neither after boot nor after all those activities.

     

    Anything else I can / should do? So it seems that those problems are all around those 2 drives ? If so, I probably prefer to dispose them and order 2 new ones - much cheaper than the time it took me to check the whole server ... ;-)

  14. Me too I am worried, if I see such things - I think I will remove both of the drives and test them separately and see, if they need to be RMAed.

    Will also perform memorytest and chkdsk on all drives as recommended to be sure, everything is fine.

    And yes, there is already stuff on almost all drives, since I am already moving data during the last weeks.

    Will post after running the tests.

    Guzzi

    The preclear_disk script is very good at thrashing exercising a disk.  As already said, it is far easier to RMA the drives before they are loaded with your data if you find they do not test well.   The errors you saw could be because of bad SATA cables or bad power cables/splitters, or even a bad disk controller.   But...

     

    Remember, your SMART report showed an emergency retraction of the heads to a safe landing spot when it thought the drive was losing power in the middle of the preclearing process.  That is pretty drastic as it tries to save itself from a head crash.

     

    Is your power supply being overloaded?  Are you using a backplane for power distribution?    Lots to check out, but, at least you are more informed than most Window's OS users.  They just blue-screen.

     

    Joe L.

     

    Maybe I wastn't completely clear: I have NO data on those 2 "suspicious" drives (they're unassigned and I didn't mount them except for temporal checking if they're empty) - only the array is filled with data (where I didn't encounter problems with the drives so far).

    The drives are not new - most of them are coming from my former windows box and had been running there as raid-5 for 1-2 years (hope warranty not yet over ...)

    I never got BSODs on the windows box - but a remember once or twice drives where showing "yellow" - which probably was the same CRC-Problem as now.

    But nevertheless I have to admit, that there is much more transparence with unraid and linux tools what's "really" happening - windows doesn't help you much with that (just "reactivate" the drive, errors corrected by raid-layer anyway).

     

    BTW: I ran the memorytest overnight - it passed 8 times without errors. Will chkdsk the drives when finding the time (currently working with my son on his motorcyle ;-))

     

    The biggest hasstle with those "many-disk-machines (regardless of windows or linux, or something else) is power and cabling - and very difficult to diagnose.

     

    Power might be fine for all normal operations - but if you are accessing a disk and at the same time 20 other disks spin up it might pull the voltage down - and I experienced in the past that HDs are VERY sensitive to voltages below 4,8 v on the 5Vrail - to be measured at the drive itself, not somewhere else, because you loose voltage on the cables.

     

    Anyway, I thought to be safe, because I operated the windows box and now the unraid box with same powersupply but 8 drives less... so maybe again checking the cables - it seems to be focused on those two ports...

    So I don't think it's overloaded powerwise, but unraid is in a diffenent box with different cabvles, no powerbackplane and there might be issues - I won't have any other possibility than to check and solve - because there is planned to add the remaining 4 disks from the windowsraid to the unraid-array as soon as the 17+ bug is solved...

    I hope to soon reach the stage to put the box back in the corner and forget it for the next years ;-)

  15. Thanks Rob, Joe for the feedback.

    sdn and sdq are the two drives, I currently have not yet in the array - because they both were showing those errors when i first tried setting up the empty array some weeks ago.

    All other drives are in the array and were fine, showing no errors.

    Because I didn't trust those 2 drives I ran preclear script to be safe - with the result above.

    it was the very first time, I encountered such memoryrelated errors, never had it before - but you're right, I had even problems, accessing sambashares after this.

     

    I restarted the box and everything is fine so far, no errors at all in the syslog (except this DMA-stuff on the IDE-port - " kernel: atiixp 0000:00:14.1: simplex device: DMA disabled").

    BTW: starting preclear on either of those 2 unassigned drives gives me those above "ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0" - errors in the log. They do NOT appear during startup.

     

    cache_dirs was not started at all - removed it from go script and rebooted before I moved the files. So it definately cannot be responsible for any memoryrelated stuff.

     

    Me too I am worried, if I see such things - I think I will remove both of the drives and test them separately and see, if they need to be RMAed.

    Will also perform memorytest and chkdsk on all drives as recommended to be sure, everything is fine.

    And yes, there is already stuff on almost all drives, since I am already moving data during the last weeks.

    Will post after running the tests.

    Guzzi

  16. You have several drives with errors, not just the one you are trying to clear... and it looks like you are running out of memory too. 

    Are you running any add-on packages? (other than the pre-clear)  The user-share file system is constantly reporting it cannot allocate memory.

    How much RAM are you running?

     

    I can't go into detail now...  Perhaps RobJ can take a look and provide his input.  Perhaps send him a PM and ask him to take a look.

     

    Joe L. 

     

    I have 2 GB RAM in the box:

    (from /usr/bin/top -b -n1)

     

    top - 01:15:04 up  1:13,  0 users,  load average: 3.94, 4.00, 3.73

    Tasks:  73 total,   2 running,  71 sleeping,   0 stopped,   0 zombie

    Cpu(s):  7.8%us, 60.5%sy,  0.0%ni, 22.3%id,  5.0%wa,  0.6%hi,  3.7%si,  0.0%st

    Mem:   1943344k total,  1617648k used,   325696k free,    39868k buffers

    Swap:        0k total,        0k used,        0k free,  1481180k cached

     

    (Did a reboot after I saw those kernel things in syslog - never had that before, just during this specific preclear)

     

    Addons: I have disabled cachedirs to keep memory free while moving data to the box. Here is the goscript:

     

    #!/bin/bash

    # Start the Management Utility

    /usr/local/sbin/emhttp &

    cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c

     

    # Unraid_Notify (E-Mail Notification)

    #installpkg /boot/packages/socat-1.7.0.0-i486-2bj.tgz

    #installpkg /boot/packages/unraid_notify-2.30-noarch-unRAID.tgz

    installpkg /boot/packages/acpitool-0.4.7-i486-1goa.tgz

    #unraid_notify start

     

    sleep 30

     

    # enable wakeup

    /usr/sbin/ethtool -s eth0 wol g

     

    # Start UnMenu

    /boot/unmenu/uu

     

    I have to say that I was moving constantly data to the box while clearing the disk - maybe the problems with the disk has blocked the copy process?

     

    Do I need to upgrade the RAM to 4 GB?

  17. Hi, I have succesfully precleared a disk, but got smartdifferences as below. Is this something I have to worry about or can I use this disk? I realized some interface errors in the log in the very beginning, but no errors in the script.

    Thanks, Guzzi

     

    ============================================================================

    ==

    == Disk /dev/sdq has been successfully precleared

    ==

    ============================================================================

    S.M.A.R.T. error count differences detected after pre-clear

    note, some 'raw' values may change, but not be an indication of a problem

    62,63c62,63

    < 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31

    < 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25344

    ---

    > 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       32

    > 193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25345

    ============================================================================

     

    This is a new one to me... According to a "google" search on "Power-Off_Retract_Count", I got the following

    [pre]

    # Power-Off_Retract_Count = No of times drive was powered off in an emergency, called Emergency Unload.

    # Load_Cycle_Count = This number is highly affected by your power management policies. For e.g. a too aggressive power management might put hard disk to sleep too often. This number is indicative of when your hard disk parks, unparks , spins up, spins down.

    [/pre]

    So. reading between the lines... unless you powered down the disk while it was being cleared, it *thought* it had lost power, or it really did lose power. 

    It retracted the disk heads in an emergency-unload, thinking it had lost power, then loaded them again once it thought power had been restored.

     

    I'd check the system log for any other errors while the drive was being cleared.   I'd also check any power connectors or "Y" splitters.  They can be intermittent.

     

    Joe L.

     

    Hi Joe,

     

    checking the powerconnectors is no problem - I can do that.

    I cheked the syslog several times during preclear and except in the very first minutes (some drive not ready) there was nothing special.

    But it seems, that in the post read there happened a lot - which I do not understand; could you have a look in the log? It's the complete preclear-process from beginning to the end!?

     

    Thanks, Guzzi

     

  18. Hi, I have succesfully precleared a disk, but got smartdifferences as below. Is this something I have to worry about or can I use this disk? I realized some interface errors in the log in the very beginning, but no errors in the script.

    Thanks, Guzzi

     

    ============================================================================

    ==

    == Disk /dev/sdq has been successfully precleared

    ==

    ============================================================================

    S.M.A.R.T. error count differences detected after pre-clear

    note, some 'raw' values may change, but not be an indication of a problem

    62,63c62,63

    < 192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      31

    < 193 Load_Cycle_Count        0x0032  192  192  000    Old_age  Always      -      25344

    ---

    > 192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      32

    > 193 Load_Cycle_Count        0x0032  192  192  000    Old_age  Always      -      25345

    ============================================================================

     

  19. I did 2 "preclears" on WD 1 TB drives - on two different servers. Both did hang at 88% - took approx. 25 hours (that's what makes it difficult to just "retest" ;-))

    I understand, it is also what makes it difficult for me to test...    Combine that with the fact that the only WD 1TB drive I own is already part of my array (and nearly full), and I have no desire to clear it, and you can see why testing can take as long as it is.

     

    Can you do me a favor and let me know the "geometry" of the drive that fails to clear?

     

    You can do that by typing:

    fdisk -l /dev/sdX

     

    where sdX = the actual drive in your array.  (replace the X with the correct drive letter)

     

    Joe L.

     

    Joe L.

     

    Sure - here you go:

     

    Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes

    1 heads, 63 sectors/track, 31008336 cylinders

    Units = cylinders of 63 * 512 = 32256 bytes

    Disk identifier: 0x00000000

     

       Device Boot      Start         End      Blocks   Id  System

    /dev/sdc1               2    31008336   976762552+  83  Linux

    Partition 1 does not end on cylinder boundary.

     

    The "funny" thing is, that the pre-read runs always to 100% - so maybe you can check your code about differences in the handling of pre-read and post-read?

     

    cheers, Guzzi

  20. I did 2 "preclears" on WD 1 TB drives - on two different servers. Both did hang at 88% - took approx. 25 hours (that's what makes it difficult to just "retest" ;-))

    One board was 780G chipset, the other 690 - not sure if the drives were connected to onboard sata (there is some workaround in the kernel for those chipsets, ist't it?) or to sil3114.

    Maybe this info helps?

    cheers, Guzzi

    PS: I did run it in telnet session .... and yes, it stopped updating the screen. Using latest Unraid beta.

  21. The BadCRC error flag is usually associated with a poor cable, not the drive.  Try replacing/upgrading the cable to sdl on ata10.00.  The Devices tab or your syslog should help you determine which drive that is.

     

    Thanks for the hint - argh, I hate those cables. I replaced all Satacables some time ago because of problems, maybe I reused some of the old ones since this is my 2nd unraid server...

    tnx anyway, will have a look at this.

  22. I have a question: I started preclear_disk on a drive I wanted to add to my array.

    Came back tonight expecting it to be finished, but it seems stuck.

    Telnetscreen shows:

    ===========================================================================

    =                unRAID server Pre-Clear disk /dev/sdc

    =                       cycle 1 of 1

    = Disk Pre-Clear-Read completed                                 DONE

    = Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

    = Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

    = Step 5 of 10 - Clearing MBR code area                         DONE

    = Step 6 of 10 - Setting MBR signature bytes                    DONE

    = Step 7 of 10 - Setting partition 1 to precleared state        DONE

    = Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

    = Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

    = Step 10 of 10 - Testing if the clear has been successful.     DONE

    = Post-Read in progress: 88% complete.

    (  888,330,240,000  of  1,000,204,886,016  bytes read )

    Elapsed Time:  25:16:35

     

    ps shows:

    root@XMS-GMI-01:~# ps -ef | grep preclear

    root     20752 27552 11 14:19 pts/0    00:44:03 /bin/bash ./preclear_disk.sh /dev/sdc

    root     21116 21101  0 20:40 pts/1    00:00:00 grep preclear

    root     27552 27244  0 Jul13 pts/0    00:01:06 /bin/bash ./preclear_disk.sh /dev/sdc

    root@XMS-GMI-01:~#

     

    Anything I can do except restating the whole from the beginning?

    Unraidserver is alive, can read and write to it.

    Thanks, Guzzi

     

     

    This seems to happen once in a while.  Most of the time if you start another pass on the drive it will finish as it should.

     

    Hmmm, well ok, I cancelled the process and started it on another drive - same size (1 TB WD green) and it happens exactly the same - hangs at 88% complete of the post-read, same position (888.330.240.000 of .... bytes read).

    Is this a problem with the WD-drives? 1st reading is ok, all steps of preclearing including writing zeroes is ok, only last pass ("post-read") hangs always at the same position. Any ideas?

    Only thing I saw in the log was some errors at the very beginning - while preclear didnb't give me any messages or errors.

    Beside the preclear hanging: Should I be worried about the logentries although I didn't get errors reported by preclear?

     

    Log:

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: BMDMA2 stat 0x6d0009

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10: SError: { 10B8B BadCRC }

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: cmd 25/00:00:4f:3b:4f/00:04:4c:00:00/e0 tag 0 dma 524288 in

    Jul 15 03:44:27 XMS-GMI-01 kernel: res 51/04:3f:10:3e:4f/00:01:4c:00:00/f0 Emask 0x1 (device error)

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: status: { DRDY ERR }

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: error: { ABRT }

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10.00: configured for UDMA/100

    Jul 15 03:44:27 XMS-GMI-01 kernel: ata10: EH complete

    Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)

    Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write Protect is off

    Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Mode Sense: 00 3a 00 00

    Jul 15 03:44:27 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

    [...]

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: BMDMA2 stat 0x6d0009

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10: SError: { 10B8B BadCRC }

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: cmd 25/00:00:cf:86:33/00:04:1d:00:00/e0 tag 0 dma 524288 in

    Jul 15 04:24:55 XMS-GMI-01 kernel: res 51/04:2f:a0:87:33/00:03:1d:00:00/f0 Emask 0x1 (device error)

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: status: { DRDY ERR }

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: error: { ABRT }

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10.00: configured for UDMA/100

    Jul 15 04:24:55 XMS-GMI-01 kernel: ata10: EH complete

    Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)

    Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write Protect is off

    Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Mode Sense: 00 3a 00 00

    Jul 15 04:24:55 XMS-GMI-01 kernel: sd 10:0:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

    Jul 15 05:21:20 XMS-GMI-01 emhttp: shcmd (103): /usr/sbin/hdparm -y /dev/sdm >/dev/null

  23. I have a question: I started preclear_disk on a drive I wanted to add to my array.

    Came back tonight expecting it to be finished, but it seems stuck.

    Telnetscreen shows:

    ===========================================================================

    =                unRAID server Pre-Clear disk /dev/sdc

    =                      cycle 1 of 1

    = Disk Pre-Clear-Read completed                                DONE

    = Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

    = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

    = Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

    = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

    = Step 5 of 10 - Clearing MBR code area                        DONE

    = Step 6 of 10 - Setting MBR signature bytes                    DONE

    = Step 7 of 10 - Setting partition 1 to precleared state        DONE

    = Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

    = Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

    = Step 10 of 10 - Testing if the clear has been successful.    DONE

    = Post-Read in progress: 88% complete.

    (  888,330,240,000  of  1,000,204,886,016  bytes read )

    Elapsed Time:  25:16:35

     

    ps shows:

    root@XMS-GMI-01:~# ps -ef | grep preclear

    root    20752 27552 11 14:19 pts/0    00:44:03 /bin/bash ./preclear_disk.sh /dev/sdc

    root    21116 21101  0 20:40 pts/1    00:00:00 grep preclear

    root    27552 27244  0 Jul13 pts/0    00:01:06 /bin/bash ./preclear_disk.sh /dev/sdc

    root@XMS-GMI-01:~#

     

    Anything I can do except restating the whole from the beginning?

    Unraidserver is alive, can read and write to it.

    Thanks, Guzzi

     

×
×
  • Create New...