Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

root@Tower:~# hdparm -i /dev/hdc

 

/dev/hdc:

 

Model=ST31500341AS, FwRev=CC1H, SerialNo=9VS2L1L4

Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }

RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4

BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=16

CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=18446744072344861488

IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}

PIO modes:  pio0 pio1 pio2 pio3 pio4

DMA modes:  mdma0 mdma1 mdma2

UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 udma6

AdvancedPM=no WriteCache=enabled

Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 

* signifies the current active mode

 

Is it possible your BIOS has an option like this (in red) for the last two SATA ports?  (take note how it defaults to IDE mode)

10nywja.jpg

 

 

I started preclear to get an idea of speeds and everything looks normal ( i think )

 

unRAID server Pre-Clear disk /dev/hdc

=                      cycle 1 of 1

= Disk Pre-Read in progress: 0% complete

= ( 1,645,056,000  bytes of  1,500,301,910,016  read ) 116 MB/s

=

Disk Temperature: 30C, Elapsed Time:  0:00:15

Speed looks good. I would not worry too much about how it presents itself to the Linux OS, but I'll bet it is pretending to be a PATA drive.

 

Joe L.

Link to comment

I would have never even thought of anything like that :)

 

I suspect their is an option in the BIOS to change this.

 

Assuming I could find this option in the BIOS and if I changed this AFTER the preclear completed, would that create any issues?

If you change it, and a disk was already assigned to the array, you might need to re-assign it on the devices page if unRAID was unable to figure it out on its own.  It probably will not care at all, or work any faster... so you might just leave it for now.

 

Joe L.

Link to comment

I've got some form of OCD/Perfectionist in me and its hard to just leave it  :)

 

My pre_clear died on me last night ( Windows can be thanked for that as it did an automatic update and rebooted my computer ... that's fixed and won't happen again! ).

 

So, I got adventurous this morning and went into the BIOS.   I've located the part of the BIOS that relates to this.

 

There are two options (defaults listed after):

 

OnChip SATA Type - Native IDE

OnChip SATA Port4/5 Type - IDE  (this cannot be changed unless the option before is changed)

 

My options for OnChip SATA Type include:

Native IDE

RAID

AHCI

 

Once I pick either RAID or AHCI, I then can edit OnChip SATA Port4/5 Type and my options are:

As SATA Type

IDE

 

I'm googling and I have found similar questions, but the answers are for Windows and most say if I select RAID or AHCI I would need drivers.

 

===============================================================

 

EDIT:  So I came across an article here:

 

Here’s a possible solution if you’re in this kind of SATA hard drives not found situation. The trick is related to ACHI (Advanced Host Controller Interface) hardware mechanism which are designed to allow software to communicate with Serial ATA (SATA) devices such as host bus adapters with features not offered by Parallel ATA (PATA) controllers such as higher speeds, hot-plugging and native command queuing (NCQ). Windows XP (and any OS older than Vista or linux kernel 2.6.19) does not have came pre-packaged with driver to support AHCI/SATA mode, thus creating a very common error.....

 

As the kernel of Linux UnRaid uses is newer, that would mean the AHCI driver is built in.   I updated the BIOS to "AHCI" and "As SATA Type"

 

Now, it comes up with the right name.   Hopefully this change doesn't create any other issues.   Opinions?

 

My drive output:

 

root@Tower:~# ls -l /dev/disk/by-id

total 0

lrwxrwxrwx 1 root root  9 Oct 15 06:10 ata-ST31000528AS_9VP0H3TD -> ../../sdd

lrwxrwxrwx 1 root root 10 Oct 15 06:10 ata-ST31000528AS_9VP0H3TD-part1 -> ../../sdd1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 ata-ST31500341AS_9VS2L1L4 -> ../../sde

lrwxrwxrwx 1 root root  9 Oct 15 06:10 ata-ST31500341AS_9VS2LXAF -> ../../sda

lrwxrwxrwx 1 root root 10 Oct 15 06:10 ata-ST31500341AS_9VS2LXAF-part1 -> ../../sda1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 ata-ST320011A_3HT0GTF1 -> ../../hda

lrwxrwxrwx 1 root root 10 Oct 15 06:10 ata-ST320011A_3HT0GTF1-part1 -> ../../hda1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 ata-ST3750330AS_5QK00GT5 -> ../../sdc

lrwxrwxrwx 1 root root 10 Oct 15 06:10 ata-ST3750330AS_5QK00GT5-part1 -> ../../sdc1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 ata-WDC_WD15EADS-00P8B0_WD-WMAVU0072618 -> ../../sdb

lrwxrwxrwx 1 root root 10 Oct 15 06:10 ata-WDC_WD15EADS-00P8B0_WD-WMAVU0072618-part1 -> ../../sdb1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 scsi-SATA_ST31000528AS_9VP0H3TD -> ../../sdd

lrwxrwxrwx 1 root root 10 Oct 15 06:10 scsi-SATA_ST31000528AS_9VP0H3TD-part1 -> ../../sdd1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 scsi-SATA_ST31500341AS_9VS2L1L4 -> ../../sde

lrwxrwxrwx 1 root root  9 Oct 15 06:10 scsi-SATA_ST31500341AS_9VS2LXAF -> ../../sda

lrwxrwxrwx 1 root root 10 Oct 15 06:10 scsi-SATA_ST31500341AS_9VS2LXAF-part1 -> ../../sda1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 scsi-SATA_ST3750330AS_5QK00GT5 -> ../../sdc

lrwxrwxrwx 1 root root 10 Oct 15 06:10 scsi-SATA_ST3750330AS_5QK00GT5-part1 -> ../../sdc1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 scsi-SATA_WDC_WD15EADS-00_WD-WMAVU0072618 -> ../../sdb

lrwxrwxrwx 1 root root 10 Oct 15 06:10 scsi-SATA_WDC_WD15EADS-00_WD-WMAVU0072618-part1 -> ../../sdb1

lrwxrwxrwx 1 root root  9 Oct 15 06:10 usb-Lexar_JD_FireFly_AA04011000035322-0:0 -> ../../sdf

lrwxrwxrwx 1 root root 10 Oct 15 06:10 usb-Lexar_JD_FireFly_AA04011000035322-0:0-part1 -> ../../sdf1

 

Link to comment

Ran preclear on my first disc (a brand new WD10EADS) into my new system last night..

 

Came back with a few errors. Below is a C&P from the telnet session.

 

System specs are as follows:

Asus P5Q-Deluxe

E6400 underclocked to 1600mhz

2gb G.skill DDR2800

2x Adaptec 1430SA raid cards

3xNorco SS-500 Hotswap modules.

Seasonic M12-700w

Lian Li PC-A17 case

AHCI enabled in BIOS

Adaptec raid bios's disabled

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sda

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 27C, Elapsed Time:  15:01:48

============================================================================

==

== Disk /dev/sda has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<  1 Raw_Read_Error_Rate    0x002f  100  253  051    Pre-fail  Always

-      0

---

>  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always

-      0

63c63

< 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always

-      6

---

> 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always

-      7

67c67

< 199 UDMA_CRC_Error_Count    0x0032  200  253  000    Old_age  Always

-      0

---

> 199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always

-      0

============================================================================

 

These errors are bad right?

Link to comment

Ran preclear on my first disc (a brand new WD10EADS) into my new system last night..

 

Came back with a few errors. Below is a C&P from the telnet session.

 

These errors are bad right?

Where do you see any errors? 

Did you read how to interpret the SMART report results?  See here for some help http://en.wikipedia.org/wiki/S.M.A.R.T

 

There are three SMART parameters that changed from the report taken before the pre-clearing process. 

 

One of those, the "Load Cycle Count" increased from 6 to 7.  This indicated the disk heads were loaded from their parked position onto the disk surface so it could be read and written.  If it had NOT incremented it would have been a problem.

 

The other two have just changed the current value from the factory initialized value of 253 to 200 now that statistics are being collected.  Neither are anywhere near their "failure threshold" (the last numeric column in each row)

 

Enjoy your new disk...

 

Joe L.

 

Link to comment

Where do you see any errors? 

Did you read how to interpret the SMART report results?  See here for some help http://en.wikipedia.org/wiki/S.M.A.R.T

 

There are three SMART parameters that changed from the report taken before the pre-clearing process. 

 

One of those, the "Load Cycle Count" increased from 6 to 7.  This indicated the disk heads were loaded from their parked position onto the disk surface so it could be read and written.  If it had NOT incremented it would have been a problem.

 

The other two have just changed the current value from the factory initialized value of 253 to 200 now that statistics are being collected.  Neither are anywhere near their "failure threshold" (the last numeric column in each row)

 

Enjoy your new disk...

 

Joe L.

 

 

Heh.. Thanks heaps for the explanation Joe.. Guess I saw the CRC error count line, panicked and assumed it was bad... i will have a bit more of a read of the wiki link..

 

Link to comment

Hi Joe L.,

 

I am wondering if it would be easy for you to include an option for completely wiping the disk instead of preclearing it.

Or the preclearing method in it's current form can be used for this purposes? (but at least the readback seems unneccessary in this case)

 

That would be extremly useful on a disk replacement, when the old disk is going to be sold out.

 

Thank you in advance for your feedback.

I think you intended this post to be in the thread on the preclear_disk.sh script...

 

If you are just interested in clearing a disk prior to selling it, you can just use the existing "-n" option to cause it to skip the pre-reading and post-reading phases of the pre-clear script.  It will then run in about a third of the time.

 

You would invoke it as

preclear_disk -n /dev/???

(with ? ? ? being the drive designation for the drive to be cleared sda, sdb,... hda, hdb, etc...)

 

 

 

Link to comment

Can clearing with -n option considered to be safe against undelete/unerase/unformat tools?

The ReiserFS file system is not friendly to undelete/unerase/unformat tools to begin with. So writing all zeros on top of that should be safe enough.

As far as recovering deleted files from a reiserfs file-system.  It is actually pretty easy if you have not overwritten the data by writing other files to the disk.

However... the preclear_disk.sh script does not write to the reiser file-system.  It works directly on the raw disk itself.  As far as I know, there is no un-erase when you write all zeros to a raw disk, regardless of the file-system on it.  ;D

But if you are paranoid enough, then you can always zero the disk twice :)

You could use -c 20 to perform 20 cycles of zeroing the disk if you wanted.  It will just take 20 times longer...  ;)

BTW, would be nice if we could use /dev/random instead of /dev/zero for this. But form what I gather they don't work quite the same, so it may not be trivial.

If you are absolutely certain of the disk device you can simply type

dd if=/dev/urandom of=/dev/your_device

But be VERY careful... one error and you WILL overwrite the wrong disk... and there is no un-do command.

 

One more thing...  for the "really" paranoid.  If a disk has re-allocated a sector because it was unable to read it, then the disk probably still has the original contents of that sector somewhere on its disk.  Now, it was re-located because it could not be read, so odds are you'll have a really tough time reading it in its original location on the disk, even if you could somehow get to it using low-level disk diagnostics.

 

One way to ensure the data on the entire disk is un-readable (including original locations of re-allocated sectors) is to heat the platters within it to a temperature above a point known as the Curie temperature. At this point, the energy being put into the magnetic material on the platters from the heat will permanently disrupt the magnetic domain structure of the material, turning it into a paramagnetic material.  See here for detailed instructions, if you are really paranoid.

 

Now, the disk might lose some of its re-sale value if you use heat to erase it, but if you are really paranoid, the drop in value will not be a significant issue.

 

Joe L.

(Curie_temperature of iron = 768°C, well above the normal temperature range recommended for disks in an unRAID server ;)  combine that with 660.32 °C being the melt temperature of aluminum will guarantee the erasure of the data, and the loss of value on ebay.  Best to sell as-is, untested.  ;D  )

 

Edited to add links...

Link to comment

Ive just whacked in a Seagate 1.5tb 11 series drive to use as my Parity drive and run preclear over the drive overnight. Thought i'd get the results below checked before I assign the drive as it's reported back a few more things than my WD10EADS drives did when I ran them through preclear.

 

Does the drive appear OK to use or is there something I should be looking at?

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdg

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 32C, Elapsed Time:  19:23:44

============================================================================

==

== Disk /dev/sdg has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<  1 Raw_Read_Error_Rate    0x000f  100  100  006    Pre-fail  Always      -      17417

---

>  1 Raw_Read_Error_Rate    0x000f  118  100  006    Pre-fail  Always      -      181712693

58c58

<  7 Seek_Error_Rate        0x000f  100  253  030    Pre-fail  Always      -      732

---

>  7 Seek_Error_Rate        0x000f  100  253  030    Pre-fail  Always      -      141171

63,66c63,66

< 188 Unknown_Attribute      0x0032  100  253  000    Old_age  Always      -      0

< 189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0

< 190 Airflow_Temperature_Cel 0x0022  069  069  045    Old_age  Always      -      31 (Lifetime Min/Max 26/31)

< 195 Hardware_ECC_Recovered  0x001a  100  100  000    Old_age  Always   

---

> 188 Unknown_Attribute      0x0032  100  100  000    Old_age  Always      -      0

> 189 High_Fly_Writes        0x003a  088  088  000    Old_age  Always      -      12

> 190 Airflow_Temperature_Cel 0x0022  068  060  045    Old_age  Always      -      32 (Lifetime Min/Max 26/40)

> 195 Hardware_ECC_Recovered  0x001a  052  044  000    Old_age  Always   

69,72c69,72

< 199 UDMA_CRC_Error_Count    0x003e  200  253  000    Old_age  Always      -      0

< 240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      256289288486912

< 241 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      0

< 242 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      776

---

> 199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

> 240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      210333138419731

> 241 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      3172977585

> 242 Unknown_Attribute      0x0000  100  253  000    Old_age  Offline      -      120762575

============================================================================

 

Link to comment

Ive just whacked in a Seagate 1.5tb 11 series drive to use as my Parity drive and run preclear over the drive overnight. Thought i'd get the results below checked before I assign the drive as it's reported back a few more things than my WD10EADS drives did when I ran them through preclear.

 

Does the drive appear OK to use or is there something I should be looking at?

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdg

=                       cycle 1 of 1

= Disk Pre-Clear-Read completed                                 DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE

= Step 5 of 10 - Clearing MBR code area                         DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE

= Step 10 of 10 - Testing if the clear has been successful.     DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 32C, Elapsed Time:  19:23:44

============================================================================

==

== Disk /dev/sdg has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<   1 Raw_Read_Error_Rate     0x000f   100   100   006    Pre-fail  Always       -       17417

---

>   1 Raw_Read_Error_Rate     0x000f   118   100   006    Pre-fail  Always       -       181712693

58c58

<   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       732

---

>   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       141171

63,66c63,66

< 188 Unknown_Attribute       0x0032   100   253   000    Old_age   Always       -       0

< 189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

< 190 Airflow_Temperature_Cel 0x0022   069   069   045    Old_age   Always       -       31 (Lifetime Min/Max 26/31)

< 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always     

---

> 188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

> 189 High_Fly_Writes         0x003a   088   088   000    Old_age   Always       -       12

> 190 Airflow_Temperature_Cel 0x0022   068   060   045    Old_age   Always       -       32 (Lifetime Min/Max 26/40)

> 195 Hardware_ECC_Recovered  0x001a   052   044   000    Old_age   Always     

69,72c69,72

< 199 UDMA_CRC_Error_Count    0x003e   200   253   000    Old_age   Always       -       0

< 240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       256289288486912

< 241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       0

< 242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       776

---

> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

> 240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       210333138419731

> 241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       3172977585

> 242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       120762575

============================================================================

 

There are no re-allocated sectors, or sectors pending re-allocation, or rather, if there are, no additional sectors were re-allocated in the pre-clear script.  Everything else looks very normal. 
Link to comment

There are no re-allocated sectors, or sectors pending re-allocation, or rather, if there are, no additional sectors were re-allocated in the pre-clear script.  Everything else looks very normal. 

 

Thanks Joe.. Just what I wanted to hear. Was a bit worried by the rather large numbers at the end of the read and seek error lines. Still don't know enough about interpreting these SMART reports...

Link to comment

There are no re-allocated sectors, or sectors pending re-allocation, or rather, if there are, no additional sectors were re-allocated in the pre-clear script.  Everything else looks very normal. 

 

Thanks Joe.. Just what I wanted to hear. Was a bit worried by the rather large numbers at the end of the read and seek error lines. Still don't know enough about interpreting these SMART reports...

Nobody but the manufacturer knows how to interpret the "raw"values.

 

The normalized "read" value is 118, the worst it has ever been is 100, and if it ever gets down to 6 the drive will be considered to be in a failure mode by the SMART tests.   So, in reality, during the preclear, the raw-read-rate improved.  (118 is a better value than 100.  The "100" is probably a factory initialized starting value, as are all the "253" values on other variables you see.)

 

Joe L.

Link to comment

There are no re-allocated sectors, or sectors pending re-allocation, or rather, if there are, no additional sectors were re-allocated in the pre-clear script.  Everything else looks very normal. 

 

Thanks Joe.. Just what I wanted to hear. Was a bit worried by the rather large numbers at the end of the read and seek error lines. Still don't know enough about interpreting these SMART reports...

Nobody but the manufacturer knows how to interpret the "raw"values.

 

Just FYI, I have 8 WD-EADS, and none of them has ever shown such crazy numbers. They've always had those raw numbers at 0. (so far).

 

Purko

 

Link to comment

There are no re-allocated sectors, or sectors pending re-allocation, or rather, if there are, no additional sectors were re-allocated in the pre-clear script.  Everything else looks very normal. 

 

Thanks Joe.. Just what I wanted to hear. Was a bit worried by the rather large numbers at the end of the read and seek error lines. Still don't know enough about interpreting these SMART reports...

Nobody but the manufacturer knows how to interpret the "raw"values.

 

Just FYI, I have 8 WD-EADS, and none of them has ever shown such crazy numbers. They've always had those raw numbers at 0. (so far).

 

Purko

 

Interesting... as I said, only the manufacturer knows... and it might differ by firmware version even within the same drive. Some drives only start keeping statistics after they have some number of running hours logged on them.

 

Many manufacturers show the raw numbers, many do not, some only show them on some parameters.  About the only raw-value that can be understood (most-of-the-time) is the temperature.  It is not always correctly calibrated, as at least one drive has reported a temperature "lower" than the ambient room temperature, and that is just not possible, no matter how many fans are used... (unless evaporate cooling is in effect, and if the drives are "wet" there's another problem that needs attention that the "SMART" report might not alert you to.)

 

Joe L.

Link to comment
Just FYI, I have 8 WD-EADS, and none of them has ever shown such crazy numbers. They've always had those raw numbers at 0. (so far).

 

This looks like a classic case of whether you show lots of info and risk confusing and alarming the large base of non-technical users, or show very little info and frustrate your power users.  Seagate chose to show more info, probably assuming this is usually only visible to more technical and knowledgeable users.  WD went the other way, like Microsoft with dumbed-down interfaces, and Detroit with idiot lights.  This is the difference between an oil gauge and a Service Engine indicator.  Both are valid at times.  I as a more technical user like to know what is really happening behind the scenes, but I don't apply that to everything.  While I prefer more info when it is related to computers, I'm tired of auto info, and don't need to know the current compression ratios and oil pressure readings, I'd rather let the mechanic get greasy.  Before crossing a bridge, I don't need the current metal and concrete stress readings or thermal expansion/contraction measurements.

 

We need a balance between the two choices of more info or less info.  The best interface would be one that provides a simple and intuitive initial display, plus a link or button for the detailed technical info, PLUS links to explanatory info (what do these numbers mean).  There are a growing number of tools able to read SMART info, but in general they seem to go to one extreme or the other.  They either are limited to a PASS or FAIL indicator only, or they show large complex tables of raw information, with little to no explanation.  I think most users would like to see more information, but only if it is accompanied with an appropriate explanation that provides some perspective on the numbers.

 

All drives have soft errors (often in the millions), and read and seek error rates, whether they choose to reveal them or not.  If the VALUE or WORST numbers are still near 100 (or higher) or 200, then the manufacturer considers them completely nominal, within their expected range, whatever the size or behavior of the RAW VALUE's.

Link to comment

We need a balance between the two choices of more info or less info.  The best interface would be one that provides a simple and intuitive initial display, plus a link or button for the detailed technical info, PLUS links to explanatory info (what do these numbers mean).

 

I very much agree with that.

Maybe we can write some kind of script that goes through smartctl's output and makes sense of it...

 

Here is what I wish to have:

root@Tower:~#
root@Tower:~# smartchk /dev/sda

/dev/sda:
4: This disk has little wear and is still in good condition.
root@Tower:~#
root@Tower:~#

....where the code returned is on the Purko's scale from 0 to 5

 

5: This disk is in perfectly healthy condition.

4: This disk has little wear and is still in good condition.

3: This disk is past half of its life expectancy.

2: This disk is showing all signs of old age. Consider replacing it now.

1: This disk is in critical condition. Should be replaced immediately

0: This code is not even needed: When the disk is dead you'll know it.

 

Can somebody write such a tool?

 

If we had this tool, then preclear_disk.sh could be much more understandable and helpful upon completion.

 

Purko

 

 

Link to comment

If we had this tool (to interpret the results), then preclear_disk.sh could be much more understandable and helpful upon completion.

 

Purko

If it were only so easy...

 

If you are lucky, you might be able to narrow a given drive down to three or four categories:

 

1.  Dead... no sign of life...  Cannot read or write.

2.  Use for write only operations... since you'll never read it back.  ;)

3.  Failing in some way now...  could be mechanical, electrical, or comically aural  (making a funny noise)

4.  Working now, but extremely high odds of failure within the next 175000 hours.

 

Most drives will fall into category 4, with high odds of failure at some future point.  ;D ;D

 

Joe L.

 

 

 

Link to comment

If you are lucky, you might be able to narrow a given drive down to three or four categories:

1.  Dead... no sign of life...  Cannot read or write.

You don't need any tool to tell you that. You see it yourself.

 

Seriously, if these numbers are different for different manufacturers, and if only manufecturers know what these numbers mean, and if no mortal can interpret these number, then what's the whole point of having smartctl?

 

Purko

 

Link to comment

Thanks for the script Joe, just used for the first time but plan on getting into the habit.

 

I have just cleared a brand new 1.5TB EADS WD drive, took a little over 21 hours!

 

Here are the details from the Smart tests, the raw error read rate has increased during the test from 0 to 1.  None of my other WD non-EADS drives have any errors of this type so I don't think it is an anomaly due to first use.  Is it something to be concerned about?

 

> Offline data collection status:  (0x84)       Offline data collection activity
>                                       was suspended by an interrupting command from host.
54c54
<   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
---
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
58c58
<   7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
---
>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
63c63
< 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       14
---
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       16
67c67
< 199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
---
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
=========================================================================

Link to comment

Thanks for the script Joe, just used for the first time but plan on getting into the habit.

 

I have just cleared a brand new 1.5TB EADS WD drive, took a little over 21 hours!

 

Here are the details from the Smart tests, the raw error read rate has increased during the test from 0 to 1.  None of my other WD non-EADS drives have any errors of this type so I don't think it is an anomaly due to first use.  Is it something to be concerned about?

 

> Offline data collection status:  (0x84)       Offline data collection activity
>                                       was suspended by an interrupting command from host.
54c54
<   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
---
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
58c58
<   7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
---
>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
63c63
< 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       14
---
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       16
67c67
< 199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
---
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
=========================================================================

About the only RAW values you can interpret yourself are those for re-allocated sectors, sectors pending re-allocation, and drive temperature.

 

Your drive is currently in category 4 of those listed in this post

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.