Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

Okay, thanks.  Since I don't really know that much about the details of the workings of hard drives, nor preclear, I'm still a bit confused why they are still pending.  If preclear has identified them as needing to be re-allocated, why weren't they re-allocated?

 

Also, the post you linked me to says that it might be a bad PSU.

 

Those 2 drives bring me up to 10 drives total in this system, which is using a 550W power supply (XFX - Core Edition PRO550W - 80 PLUS BRONZE Certified Active PFC)

 

I'm not positive it that is enough power for 10 drives (nor am I sure how to calculate that).

Link to comment

It is not "preclear" that identified the sectors, it is the SMART firmware during the post-read phase that did.

 

They are now pending re-allocation when next written.  If they had been identified in the pre-read phase, they would have been re-alocated when written with zeros in the "write" phase.

 

Sectors pending re-allocation after a preclear are not a great sign.  It indicates the drive should be cleared once more and if the sectors are not re-allocated, an RMA is as likely as anything in the future.

 

There is a possibility that the power supply cannot keep up with the drive's demands during the "writing" phase, in which case, a replacement drive could potentially work the same.

 

Your power supply is a single rail supply rated at 44Amps.  It should be plenty powerful.  However, if you have lots of

splitters in between it and the drives, you might have poor voltages at the drives.

 

Joe L.

Link to comment

Can anyone shed some light on what's happening here?

 

background:

Unraid 5-rc; i have simplefeatures plugin installed; Array is not started.

I am using the latest version of the script.

these are those damn western digital EARX drives. (I got the retail version from Bestbuy).

I did not use the wdile3 utility on these;

they are connected to a Br10i and are passed through to the UnRaid VM

I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected).

My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method.

 

 

Also have a parity check going on my production UnRaid VM, which is also moving as expected.

 

Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone?

 

 

 

sdc.PNG.fdcd4d60edd69d7681966e49e34d5451.PNG

sdb.PNG.4589deee1d984e42a00dc0995842c6c7.PNG

Link to comment

Can anyone shed some light on what's happening here?

 

background:

Unraid 5-rc; i have simplefeatures plugin installed; Array is not started.

I am using the latest version of the script.

these are those damn western digital EARX drives. (I got the retail version from Bestbuy).

I did not use the wdile3 utility on these;

they are connected to a Br10i and are passed through to the UnRaid VM

I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected).

My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method.

 

 

Also have a parity check going on my production UnRaid VM, which is also moving as expected.

 

Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone?

You are probably experiencing resource contention of some kind.    They are probably each waiting on some resource the other has. 

 

Since you did not attach a syslog, I can assume you've already looked there for clues and found nothing.

 

Joe L.

Link to comment

It is not "preclear" that identified the sectors, it is the SMART firmware during the post-read phase that did.

 

They are now pending re-allocation when next written.  If they had been identified in the pre-read phase, they would have been re-alocated when written with zeros in the "write" phase.

 

Sectors pending re-allocation after a preclear are not a great sign.  It indicates the drive should be cleared once more and if the sectors are not re-allocated, an RMA is as likely as anything in the future.

 

There is a possibility that the power supply cannot keep up with the drive's demands during the "writing" phase, in which case, a replacement drive could potentially work the same.

 

Your power supply is a single rail supply rated at 44Amps.  It should be plenty powerful.  However, if you have lots of

splitters in between it and the drives, you might have poor voltages at the drives.

 

Joe L.

 

I don't have 'lots of splitters', but I do have a couple of drives connected with an old style power connector adaptor to a new style SATA power connector.  I honestly can't remember if this drive is connected with such an adaptor, and will have to take the server apart to find out for sure.  The server has 8 120mm fans connected to the power supply (5 to one power connector, and 3 connected to another). if that matters.  The server runs very cool, so I can disconnect at least 3 of the fans without issue, i'm sure.

 

I stopped SABnzbd from running while I ran the preclear on only on drive, so preclear should have been the only thing running on this server all night.  Below are the results, which show sectors still needing re-allocation.  this drive is a few years old, and had served as my cache drive for the last couple of years.  It is out of warranty so there is no RMA available.

 

so, do I throw the drive away, even though is hasn't actually failed, or preclear again, or put it into the array, and be aware that it's likely to fail in the near future, and get a replacement ordered and precleared to be ready the day it does fail on me???

 

== invoked as: ./preclear_disk.sh /dev/sdk
==  SAMSUNG HD103UJ    S13PJDWS337885
== Disk /dev/sdk has been successfully precleared
== with a starting sector of 64 
== Ran 1 cycle
==
== Using :Read block size = 8225280 Bytes
== Last Cycle's Pre Read Time  : 3:13:07 (86 MB/s)
== Last Cycle's Zeroing time   : 2:58:54 (93 MB/s)
== Last Cycle's Post Read Time : 7:15:07 (38 MB/s)
== Last Cycle's Total Time     : 13:28:07
==
== Total Elapsed Time 13:28:07
==
== Disk Start Temperature: 26C
==
== Current Disk Temperature: 25C, 
==
============================================================================
** Changed attributes in files: /tmp/smart_start_sdk  /tmp/smart_finish_sdk
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
  Airflow_Temperature_Cel =    75      74            0        ok          25
      Temperature_Celsius =    76      74            0        ok          24
No SMART attributes are FAILING_NOW

[b]10 sectors were pending re-allocation before the start of the preclear.[/b]
11 sectors were pending re-allocation after pre-read in cycle 1 of 1.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
[b]9 sectors are pending re-allocation at the end of the preclear,
    a change of -1 in the number of sectors pending re-allocation.[/b]
0 sectors had been re-allocated before the start of the preclear.
0 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change. 

Link to comment

I'd run a non-destructive read/write badblocks cycle on it.  This will take many many many hours...  It will read and then re-write every sector.

badblocks -c 1024 -b 65536  -o /boot/badblocks_out.txt -svn /dev/sdk

If anything will get it to settle down, it will.

 

If you are absolutely certain of the device name of the disk

AND IT IS NOT ASSIGNED TO YOUR ARRAY OR HOLD ANY DATA YOU WISH TO KEEP

you can run the 4 pass badblocks write test on the disk. 

It will erase everything on the disk, including the preclear signature.

This is a even longer test...  (probably 80 hours or more on a 2TB drive)  You will need to leave the telnet session open for this duration. (or you can run this under "screen", or on the system console)

badblocks -c 1024 -b 65536  -o /boot/badblocks_out.txt -svw /dev/sdk

Be absolutely certain you have the correct device name.  It has no "are you sure" to prevent you from erasing the wrong drive.

 

Joe L.

Link to comment

thanks Joe.  Which would you run?  you gave 2 commands, so I'm not sure which is 'most likely' to give me a usable drive, or determine that it's definitely not worth putting to use?  Since it sounds like it's gonna take a couple of days, plus another preclear cycle, I'd really only want to do it once, so which one do you recommend?

 

badblocks -c 1024 -b 65536  -o /boot/badblocks_out.txt -svn /dev/sdk

 

or

 

badblocks -c 1024 -b 65536  -o /boot/badblocks_out.txt -svw /dev/sdk

Link to comment

thanks Joe.  Which would you run?  you gave 2 commands, so I'm not sure which is 'most likely' to give me a usable drive, or determine that it's definitely not worth putting to use?  Since it sounds like it's gonna take a couple of days, plus another preclear cycle, I'd really only want to do it once, so which one do you recommend?

The "svn" (non destructive) test will not destroy the preclear signature and will take less time (I think)

 

The "svw" (write four values, ending with all zeros) will take longer, but is a more through test.  It will need a subsequent preclear if you intend to add it as an additional disk in the array.  If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary.  If you have the time, this is the one I would perform.

 

Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read  phase to save a bit of time since the badblocks would have just read all the sectors.

To run only the writing of zeros and post-read-verify phases (skipping the pre--read)

    preclear_disk.sh -W -A  /dev/sdk

 

Link to comment

thanks Joe.  Which would you run?  you gave 2 commands, so I'm not sure which is 'most likely' to give me a usable drive, or determine that it's definitely not worth putting to use?  Since it sounds like it's gonna take a couple of days, plus another preclear cycle, I'd really only want to do it once, so which one do you recommend?

The "svn" (non destructive) test will not destroy the preclear signature and will take less time (I think)

 

The "svw" (write four values, ending with all zeros) will take longer, but is a more through test.  It will need a subsequent preclear if you intend to add it as an additional disk in the array.  If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary.  If you have the time, this is the one I would perform.

 

Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read  phase to save a bit of time since the badblocks would have just read all the sectors.

To run only the writing of zeros and post-read-verify phases (skipping the pre--read)

    preclear_disk.sh -W -A  /dev/sdk

 

Thanks again Joe!  i really appreciate all your help, and the fact that you are usually so very quick to respond to help requests, it means a lot!

 

Since it will take longer to order a new drive than to run any tests, I'm going to run this one now...

 

badblocks -c 1024 -b 65536  -o /boot/badblocks_out.txt -svw /dev/sdk

 

then I'll run this when it finishes (unless the drive explodes because of the first one :))

 

preclear_disk.sh -W -A  /dev/sdk

Link to comment

Can anyone shed some light on what's happening here?

 

background:

Unraid 5-rc; i have simplefeatures plugin installed; Array is not started.

I am using the latest version of the script.

these are those damn western digital EARX drives. (I got the retail version from Bestbuy).

I did not use the wdile3 utility on these;

they are connected to a Br10i and are passed through to the UnRaid VM

I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected).

My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method.

 

 

Also have a parity check going on my production UnRaid VM, which is also moving as expected.

 

Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone?

You are probably experiencing resource contention of some kind.    They are probably each waiting on some resource the other has. 

 

Since you did not attach a syslog, I can assume you've already looked there for clues and found nothing.

 

Joe L.

 

Thanks - no i didn't look - actually didn't even think to attach the log, thinking that since the array isn't on, what could the log show.. sorry that was nooby of me.

 

anyhoo - after a little while, it took off agian, and is in the 90MB/s range - i guess knowing it was an EARX, i paniced too quickly. chugging along now. the parity check also finished on my other VM. Glad that the machine didn't croak with all these disks chugging at the same time. *whew*.

Link to comment

Can anyone shed some light on what's happening here?

 

background:

Unraid 5-rc; i have simplefeatures plugin installed; Array is not started.

I am using the latest version of the script.

these are those damn western digital EARX drives. (I got the retail version from Bestbuy).

I did not use the wdile3 utility on these;

they are connected to a Br10i and are passed through to the UnRaid VM

I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected).

My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method.

 

 

Also have a parity check going on my production UnRaid VM, which is also moving as expected.

 

Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone?

You are probably experiencing resource contention of some kind.    They are probably each waiting on some resource the other has. 

 

Since you did not attach a syslog, I can assume you've already looked there for clues and found nothing.

 

Joe L.

 

Thanks - no i didn't look - actually didn't even think to attach the log, thinking that since the array isn't on, what could the log show.. sorry that was nooby of me.

 

anyhoo - after a little while, it took off agian, and is in the 90MB/s range - i guess knowing it was an EARX, i paniced too quickly. chugging along now. the parity check also finished on my other VM. Glad that the machine didn't croak with all these disks chugging at the same time. *whew*.

 

OK, are you saying you had a parity check running in another VM on the SAME physical machine?  That would be major resource contention!  UnRAID, especially during a parity check/build is I/O bound, meaning it will be making maximum use of the available I/O bandwidth, and everything else has to wait their turn.  A VM is great for sharing unused resources, so multiple VM's can use idle CPU time, and can use unused RAM, but NOT any unused I/O because there isn't any!  Running 2 VM's will not double your available I/O capabilities!  So of course it sped up close to normal, once the parity check finished.

Link to comment

OK, are you saying you had a parity check running in another VM on the SAME physical machine?  That would be major resource contention!  UnRAID, especially during a parity check/build is I/O bound, meaning it will be making maximum use of the available I/O bandwidth, and everything else has to wait their turn.  A VM is great for sharing unused resources, so multiple VM's can use idle CPU time, and can use unused RAM, but NOT any unused I/O because there isn't any!  Running 2 VM's will not double your available I/O capabilities!  So of course it sped up close to normal, once the parity check finished.

 

yea - it was going at the same time. as for the I/O bandwidth, i wasn't expecting it to  be an issue, since the test array was running off a BR10i card and the other is running off the mobo headers. either way - lesson learned. like i said i figured the wd drive was defective until proved otherwise.

Link to comment

okay - so the issue is back - and i am trying to get a log, whats the best way to do that? Simple features seems to freeze on opening it. I ran this command:

cp /var/log/syslog /boot/syslog-2008-04-10.txt

 

on the console, via putty and ended up with the attached.. no carriage returns, etc.

 

I have four pre-clears running at the same time; only one of them is runnign at full speed (about 130MB/s the others are all sub 5MB/s).

 

is there a better way to get a syslog? I had to zip it, it was too large

syslog-2013-02-17.zip

Link to comment

Feb 17 10:26:25 unraid5 kernel: read_file: error 2 opening /boot/config/super.dat

Feb 17 10:26:25 unraid5 kernel: md: could not read superblock from /boot/config/super.dat

 

Your flash drive is not readable (and probably not writable)

 

Run scandisk/checkdisk on it on your window's PC to fix it.

 

Joe L.

Link to comment

The "svw" (write four values, ending with all zeros) will take longer, but is a more through test.  It will need a subsequent preclear if you intend to add it as an additional disk in the array.  If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary.  If you have the time, this is the one I would perform.

 

Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read  phase to save a bit of time since the badblocks would have just read all the sectors.

To run only the writing of zeros and post-read-verify phases (skipping the pre--read)

    preclear_disk.sh -W -A  /dev/sdk

 

Thanks again Joe!  i really appreciate all your help, and the fact that you are usually so very quick to respond to help requests, it means a lot!

 

Since it will take longer to order a new drive than to run any tests, I'm going to run this one now...

 

badblocks -c 1024 -b 65536  -o /boot/badblocks_out.txt -svw /dev/sdk

 

then I'll run this when it finishes (unless the drive explodes because of the first one :))

 

preclear_disk.sh -W -A  /dev/sdk

 

26 hours later, it's done.  It says "Pass completed.  9 bad blocks found"

 

it doesn't say if it 'fixed' the bad blocks, nor how I can fix them myself.  The log it created is useless to me, it says...

 

14973952

14974005

14968832

14969495

14973568

14974006

10838016

10838939

14970760

 

which didn't have line breaks in the actual log, but now I suppose that is the list of the 9 bad blocks.  is there some way to mark them as bad, and continue with using the rest of the disk?

 

I don't want to continue with the preclear until i know if there's anything I can do about these bad blocks, or if it's necessary.

Link to comment

Feb 17 10:26:25 unraid5 kernel: read_file: error 2 opening /boot/config/super.dat

Feb 17 10:26:25 unraid5 kernel: md: could not read superblock from /boot/config/super.dat

 

Your flash drive is not readable (and probably not writable)

 

Run scandisk/checkdisk on it on your window's PC to fix it.

 

Joe L.

 

i thought that error was related to not having the array started.

 

If i start the shutdown from the webui, will it cancel the pre-clears? or should i wait for them to finish?

Link to comment

The "svw" (write four values, ending with all zeros) will take longer, but is a more through test.  It will need a subsequent preclear if you intend to add it as an additional disk in the array.  If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary.  If you have the time, this is the one I would perform.

 

Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read  phase to save a bit of time since the badblocks would have just read all the sectors.

To run only the writing of zeros and post-read-verify phases (skipping the pre--read)

    preclear_disk.sh -W -A  /dev/sdk

 

Thanks again Joe!  i really appreciate all your help, and the fact that you are usually so very quick to respond to help requests, it means a lot!

 

Since it will take longer to order a new drive than to run any tests, I'm going to run this one now...

 

badblocks -c 1024 -b 65536  -o /boot/badblocks_out.txt -svw /dev/sdk

 

then I'll run this when it finishes (unless the drive explodes because of the first one :))

 

preclear_disk.sh -W -A  /dev/sdk

 

26 hours later, it's done.  It says "Pass completed.  9 bad blocks found"

Good, you originally had 9 blocks marked for re-allocation in your prior SMART report.  Now, get a new SMART report and see what the current statistics show.

it doesn't say if it 'fixed' the bad blocks, nor how I can fix them myself.

That is what will be shown on the SMART report/
  The log it created is useless to me, it says...

 

14973952

14974005

14968832

14969495

14973568

14974006

10838016

10838939

14970760

 

which didn't have line breaks in the actual log, but now I suppose that is the list of the 9 bad blocks.

It does have linefeeds, but not carriage returns.  MS-Dos uses both.  You can read the file easily if you use an editor that recognizes UNIX/Linux files. (many seem to like notepad2)
  is there some way to mark them as bad, and continue with using the rest of the disk?
The SMART firmware on the disk should have already done just that.

I don't want to continue with the preclear until i know if there's anything I can do about these bad blocks, or if it's necessary.

Get a new smart report.  It will take just a few seconds and let you know what has happened on the disk.  With any luck you'll see 9 sectors re-allocated, and none pending re-allocation.

 

smartctl -a /dev/sdk

 

Joe L.

Link to comment

Feb 17 10:26:25 unraid5 kernel: read_file: error 2 opening /boot/config/super.dat

Feb 17 10:26:25 unraid5 kernel: md: could not read superblock from /boot/config/super.dat

 

Your flash drive is not readable (and probably not writable)

 

Run scandisk/checkdisk on it on your window's PC to fix it.

 

Joe L.

 

i thought that error was related to not having the array started.

 

If i start the shutdown from the webui, will it cancel the pre-clears? or should i wait for them to finish?

Unless you've never started the array, the error is unexpected.  The file is created the first time you start the array.
Link to comment

Get a new smart report.  It will take just a few seconds and let you know what has happened on the disk.  With any luck you'll see 9 sectors re-allocated, and none pending re-allocation.

 

smartctl -a /dev/sdk

 

Joe L.

 

I don't see any mention of re-allocating sectors, so I'm going to preclear again while I'm at work today.

 

thanks again for all your help with this...

 

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Model Family:    SAMSUNG SpinPoint F1 DT series

Device Model:    SAMSUNG HD103UJ

Serial Number:    S13PJDWS337885

Firmware Version: 1AA01113

User Capacity:    1,000,204,886,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 3b

Local Time is:    Mon Feb 18 08:06:58 2013 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      ( 121) The previous self-test completed having

                                        the read element of the test failed.

Total time to complete Offline

data collection:                (11566) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off supp      ort.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (  2) minutes.

Extended self-test routine

recommended polling time:        ( 194) minutes.

Conveyance self-test routine

recommended polling time:        (  21) minutes.

SCT capabilities:              (0x003f) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_      FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  099  081  051    Pre-fail  Always      -            326

  3 Spin_Up_Time            0x0007  083  083  011    Pre-fail  Always      -            5960

  4 Start_Stop_Count        0x0032  098  098  000    Old_age  Always      -            1658

  5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -            0

  7 Seek_Error_Rate        0x000f  100  100  051    Pre-fail  Always      -            0

  8 Seek_Time_Performance  0x0025  100  100  015    Pre-fail  Offline      -            9598

  9 Power_On_Hours          0x0032  095  095  000    Old_age  Always      -            25623

10 Spin_Retry_Count        0x0033  100  100  051    Pre-fail  Always      -            0

11 Calibration_Retry_Count 0x0012  100  100  000    Old_age  Always      -            0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -            139

13 Read_Soft_Error_Rate    0x000e  099  081  000    Old_age  Always      -            322

183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -            0

184 End-to-End_Error        0x0033  100  100  000    Pre-fail  Always      -            0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -            5311

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -            0

190 Airflow_Temperature_Cel 0x0022  075  054  000    Old_age  Always      -            25 (Min/Max 24/28)

194 Temperature_Celsius    0x0022  075  054  000    Old_age  Always      -            25 (Min/Max 23/31)

195 Hardware_ECC_Recovered  0x001a  100  100  000    Old_age  Always      -            105263424

196 Reallocated_Event_Count 0x0032  096  096  000    Old_age  Always      -            150

197 Current_Pending_Sector  0x0012  100  099  000    Old_age  Always      -            6

198 Offline_Uncorrectable  0x0030  100  100  000    Old_age  Offline      -            1

199 UDMA_CRC_Error_Count    0x003e  100  100  000    Old_age  Always      -            0

200 Multi_Zone_Error_Rate  0x000a  100  100  000    Old_age  Always      -            0

201 Soft_Read_Error_Rate    0x000a  099  098  000    Old_age  Always      -            8

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA      _of_first_error

# 1  Extended offline    Completed: read failure      90%    25457        191      5825454

# 2  Short offline      Completed: read failure      20%    25457        191      5971623

# 3  Extended offline    Completed: read failure      90%    24432        180      6590219

# 4  Extended offline    Completed: read failure      90%    24416        191      6093552

# 5  Short offline      Completed: read failure      20%    24415        180      6590219

# 6  Extended offline    Completed: read failure      90%    24379        180      6590219

# 7  Extended offline    Completed: read failure      90%    24345        180      6590219

# 8  Short offline      Completed: read failure      20%    24345        191      6119474

# 9  Extended offline    Completed: read failure      90%    23791        185      8603687

#10  Short offline      Completed: read failure      20%    23790        185      8603687

#11  Extended offline    Completed: read failure      90%    20232        191      6099552

#12  Short offline      Completed: read failure      20%    20150        191      6001625

#13  Short offline      Completed: read failure      20%    18923        191      6033547

#14  Extended offline    Completed: read failure      90%    18923        191      6055629

#15  Extended offline    Aborted by host              90%    18917        -

#16  Short offline      Completed: read failure      20%    17515        191      6088511

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Link to comment

We'll see what the disk looks like after the preclear. 

 

It shows 0 sectors re-allocated.  (That is good, as it indicates the disk has been able to write successfully to the original sector.)

It shows 6 sectors pending re-allocation.  (This is bad, as it indicates they were identified in the most recent pass of the badblocks program...)

It shows 150 reallocation events, which again indicates a constant trickle of sectors that are unreadable, but can be written in place.  (the original "writes" to those blocks were marginal)

 

Now, this can all be explained by either a defective disk drive, OR a drive that is sensitive to power supply noise or low voltages.  (as when supplied on either a marginal supply, or connected through a number of high-resistance connectors/splitters, or sharing a power supply rail with a lot of other drives)    In other words, if you can, try a different power connection.

 

Did I ask you yet? What specific make/model power supply are you using?  And what mix of disks are you powering?

 

Joe L.

 

 

Link to comment

Okay.  I'm hoping the new preclear will be finished when I get home from work today, but probably not until late this evening (depending on how much time skipping the pre-read saves me).

 

You did ask about the power supply (model in signature), and responded that it seemed good enough...

 

Your power supply is a single rail supply rated at 44Amps.  It should be plenty powerful.  However, if you have lots of

splitters in between it and the drives, you might have poor voltages at the drives.

 

I don't have 'lots of splitters', but I do have a couple of drives connected with an old style power connector adaptor to a new style SATA power connector.  I honestly can't remember if this drive is connected with such an adaptor, and will have to take the server apart to find out for sure.  The server has 8 120mm fans connected to the power supply (5 to one power connector, and 3 connected to another). if that matters.  The server runs very cool, so I can disconnect at least 3 of the fans without issue, i'm sure.

 

I will disconnect the extra/unnecessary fans tonight, and review exactly how this drive is connected to power.  If I can, I'll connect it directly to a SATA power connector, but that will just force me to connect another drive to the adaptor instead (assuming this is currently connected this way).

 

If the preclear shows pending re-allocations, and switching the power around still doesn't resolve the situation, does that just mean this drive isn't worth using in unRAID?  If so, would it be reasonable/okay to use in an external case as a long-term backup drive?  Or is it just a paper weight at that point?

 

Maybe I'll upgrade to a new power supply with modular connections so I can just connect/use more SATA connectors, and not use the IDE type with adaptors.

Link to comment

Unless you've never started the array, the error is unexpected.  The file is created the first time you start the array.

 

Thanks Joe - this array has never been started, and has nothing on it, data-wise.

 

The two WD green drives are still going, on pass 2 of 3. The Samsungs finished, with what appears to be some red flags. Also - it looks like the slow down is gone again. it seems that the pre-read is the step that this happens at when there's another disk being written to..i thought the BR10i can handle that type of bandwidth, but perhaps it maxes at about 150MB/s ?

 

Can I trouble you to look at these two logs and let me know what I should be freaked out about?

Samsung_sdd.txt

Samsung_sde.txt

Link to comment

Unless you've never started the array, the error is unexpected.  The file is created the first time you start the array.

 

Thanks Joe - this array has never been started, and has nothing on it, data-wise.

 

The two WD green drives are still going, on pass 2 of 3. The Samsungs finished, with what appears to be some red flags. Also - it looks like the slow down is gone again. it seems that the pre-read is the step that this happens at when there's another disk being written to..i thought the BR10i can handle that type of bandwidth, but perhaps it maxes at about 150MB/s ?

 

Can I trouble you to look at these two logs and let me know what I should be freaked out about?

Nothing too bad, other than the second drive looks like it has been bounced a bit at some point in its past:

G-Sense_Error_Rate      0x0022  100  100  000    Old_age  Always      -      3

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.