Preclear plugin


Recommended Posts

Since the primary purpose for the preclear function has been made obsolete, (FINALLY!)

 

Ok..  What did I miss?  I haven't been following too closely..  Did limetech do something in an update that changes how we add a drive?

 

Jim

 

Yep, the array will remain online while the disk is zeroed in the background.

Link to comment

Since the primary purpose for the preclear function has been made obsolete, (FINALLY!) I would like to propose a few changes to this project. Firstly, I can't see a need for the abbreviated write zeros and signature function any more, so efforts imho should be focused on testing and rehabilitation of new and possibly marginal older drives. To that end, I would like to see more efforts on a possibly non- destructive badblocks run option, plus the ability to mix and match the current read write dd routine with the more aggressive badblocks options. Having the ability to zero and prepare the drive for unraid inclusion can be an optional last step.

 

Comments? Additions? Modifications?

 

When I studied Joe L. code, I realized that the primary function of the original script was, for a long time, to detect malfunction in hard drives. Little coding is needed to clear and write the clear signature. In fact, it can be done with less than 40 lines of code. All the remaining code is to make a pleasant user output, retrieve and compare SMART attributes, generate reports, avoid wrong disks from being cleared, stress the disk headers etc....

 

Of course I'm open to any suggestions. We can easily add other methods to complement those in place. If you can point me the proven badblocks tests we could add, that would be great.

 

But I would never add a disk to my array before I took a good look into it's SMART attributes, and that kind of awareness is not officially offered yet.

These are my favorite versions of badblock commands -

- Read-only method:      badblocks -b 4096 -c 256 -sv /dev/sdX

- Destructive read/write method:    badblocks -b 4096 -c 256 -wsv /dev/sdX

- Non-destructive read/write method:    badblocks -b 4096 -c 256 -nsv /dev/sdX

Each of the above use 4K blocks, in 1MB segments.

 

Add an option for a SMART long test in there, and you are close to my proposal (last paragraph) a while ago.  ;)  I still like my idea, for user controlled strategies.

Link to comment

Since the primary purpose for the preclear function has been made obsolete, (FINALLY!) I would like to propose a few changes to this project. Firstly, I can't see a need for the abbreviated write zeros and signature function any more, so efforts imho should be focused on testing and rehabilitation of new and possibly marginal older drives. To that end, I would like to see more efforts on a possibly non- destructive badblocks run option, plus the ability to mix and match the current read write dd routine with the more aggressive badblocks options. Having the ability to zero and prepare the drive for unraid inclusion can be an optional last step.

 

Comments? Additions? Modifications?

 

When I studied Joe L. code, I realized that the primary function of the original script was, for a long time, to detect malfunction in hard drives. Little coding is needed to clear and write the clear signature. In fact, it can be done with less than 40 lines of code. All the remaining code is to make a pleasant user output, retrieve and compare SMART attributes, generate reports, avoid wrong disks from being cleared, stress the disk headers etc....

 

Of course I'm open to any suggestions. We can easily add other methods to complement those in place. If you can point me the proven badblocks tests we could add, that would be great.

 

But I would never add a disk to my array before I took a good look into it's SMART attributes, and that kind of awareness is not officially offered yet.

These are my favorite versions of badblock commands -

- Read-only method:      badblocks -b 4096 -c 256 -sv /dev/sdX

- Destructive read/write method:    badblocks -b 4096 -c 256 -wsv /dev/sdX

- Non-destructive read/write method:    badblocks -b 4096 -c 256 -nsv /dev/sdX

Each of the above use 4K blocks, in 1MB segments.

 

Add an option for a SMART long test in there, and you are close to my proposal (last paragraph) a while ago.  ;)  I still like my idea, for user controlled strategies.

 

This program has a lot of caveats, like its output; there's no reliable way of retrieving the speed or how many blocks it had already read.

Link to comment

I'd really like to see an option to display slow sectors, I've been using MHDD, it's the first thing I do on any new disk, before preclear, any disk with abnormal slow sectors doesn't go in my servers, because in my experience these will turn to bad blocks sooner rather then later, and until that happens SMART looks perfect.

 

Also use it when I suspect a disk has developed some, e.g., parity check slows down considerably, unfortunately MHDD is not practical to run on server installed disks because it only works with onboard ports in IDE mode, doesn't work in AHCI or HBAs.

 

Here's an example of a recent one, WD Green 3TB, parity check took 3 more hours than usual, after identifying disk, had several zones with read speeds <1MB/s, replaced it and used it for a while on my test server, after 3 or 4 parity checks had a lot of pending sectors.

 

Perfect SMART report before it was replaced:

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1512
  3 Spin_Up_Time            0x0027   176   174   021    Pre-fail  Always       -       6200
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       102
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       683
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       55
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       98
194 Temperature_Celsius     0x0022   126   117   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

 

mhdd.png.82f9149d1ab8511a238fd08f0246bfd0.png

Link to comment

I'd really like to see an option to display slow sectors, I've been using MHDD, it's the first thing I do on any new disk, before preclear, any disk with abnormal slow sectors doesn't go in my servers, because in my experience these will turn to bad blocks sooner rather then later, and until that happens SMART looks perfect.

 

Also use it when I suspect a disk has developed some, e.g., parity check slows down considerably, unfortunately MHDD is not practical to run on server installed disks because it only works with onboard ports in IDE mode, doesn't work in AHCI or HBAs.

 

Here's an example of a recent one, WD Green 3TB, parity check took 3 more hours than usual, after identifying disk, had several zones with read speeds <1MB/s, replaced it and used it for a while on my test server, after 3 or 4 parity checks had a lot of pending sectors.

 

Perfect SMART report before it was replaced:

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1512
  3 Spin_Up_Time            0x0027   176   174   021    Pre-fail  Always       -       6200
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       102
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       683
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       55
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       4
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       98
194 Temperature_Celsius     0x0022   126   117   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

 

8zwkQ8I.jpg

 

cool graphic cant say the advertising is something i want to see.

Link to comment

is this something to be worried about during a preclear? never seen these in the logs when preclearing a drive. sdg is the drive that i'm currently preclearing, and viewing the status it seems to be still preclearing OK.

 

Dec  2 00:12:50 Tower udevd[3424]: timeout: killing 'ata_id --export /dev/sdg' [3425]
Dec  2 00:12:51 Tower udevd[3424]: timeout: killing 'ata_id --export /dev/sdg' [3425]
Dec  2 00:12:52 Tower udevd[3424]: timeout: killing 'ata_id --export /dev/sdg' [3425]
Dec  2 00:12:53 Tower udevd[3424]: 'ata_id --export /dev/sdg' [3425] terminated by signal 9 (Killed)
Dec  2 00:12:53 Tower udevd[3424]: timeout 'scsi_id --export --whitelisted -d /dev/sdg'
Dec  2 00:13:35 Tower kernel: sdg: sdg1

 

 

EDIT: okay just to add a little more info..

 

root@Tower:~# cat /var/log/syslog |grep udevd
Dec  2 00:12:49 Tower udevd[3424]: timeout 'ata_id --export /dev/sdg'
Dec  2 00:12:50 Tower udevd[3424]: timeout: killing 'ata_id --export /dev/sdg' [3425]
Dec  2 00:12:51 Tower udevd[3424]: timeout: killing 'ata_id --export /dev/sdg' [3425]
Dec  2 00:12:52 Tower udevd[3424]: timeout: killing 'ata_id --export /dev/sdg' [3425]
Dec  2 00:12:53 Tower udevd[3424]: 'ata_id --export /dev/sdg' [3425] terminated by signal 9 (Killed)
Dec  2 00:12:53 Tower udevd[3424]: timeout 'scsi_id --export --whitelisted -d /dev/sdg'
Dec  2 14:06:25 Tower udevd[30612]: timeout 'ata_id --export /dev/sdg'
Dec  2 14:06:26 Tower udevd[30612]: timeout: killing 'ata_id --export /dev/sdg' [30613]
Dec  2 14:06:27 Tower udevd[30612]: timeout: killing 'ata_id --export /dev/sdg' [30613]
Dec  2 14:06:28 Tower udevd[30612]: 'ata_id --export /dev/sdg' [30613] terminated by signal 9 (Killed)
Dec  2 14:06:28 Tower udevd[30612]: timeout 'scsi_id --export --whitelisted -d /dev/sdg'
Dec  3 03:59:05 Tower udevd[18972]: timeout 'ata_id --export /dev/sdg'
Dec  3 03:59:06 Tower udevd[18972]: timeout: killing 'ata_id --export /dev/sdg' [18973]
Dec  3 03:59:07 Tower udevd[18972]: 'ata_id --export /dev/sdg' [18973] terminated by signal 9 (Killed)
Dec  3 03:59:07 Tower udevd[18972]: timeout 'scsi_id --export --whitelisted -d /dev/sdg'

 

i started the preclear on dec 1 at around 1-2pm, the times of these errors is about dec 2 00:12, 14:06, and dec 3 03:59. i'm thinking this aligns with some part of the preclear process, as i've run 3 cycles, and it's about 11-14 hours apart which based on my estimation, this is about how long it takes for 1 cycle??

 

and lastly, my preclear report ended up fine, no issues.

 

root@Tower:/boot/preclear_reports# cat preclear_rpt_WD-WCC4M4LCZXJY_2015-12-03
========================================================================1.15
== invoked as: /boot/config/plugins/preclear.disk/preclear_disk.sh -c 3 /dev/sdg
== WDCWD20EFRX-68EUZN0   WD-WCC4M4LCZXJY
== Disk /dev/sdg has been successfully precleared
== with a starting sector of 64
== Ran 3 cycles
==
== Using :Read block size = 1000448 Bytes
== Last Cycle's Pre Read Time  : 5:36:55 (98 MB/s)
== Last Cycle's Zeroing time   : 4:37:50 (119 MB/s)
== Last Cycle's Post Read Time : 9:15:10 (60 MB/s)
== Last Cycle's Total Time     : 13:53:59
==
== Total Elapsed Time 47:17:09
==
== Disk Start Temperature: 26C
==
== Current Disk Temperature: 27C,
==
============================================================================
** Changed attributes in files: /tmp/smart_start_sdg  /tmp/smart_finish_sdg
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Temperature_Celsius =   120     121            0        ok          27
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 3.
0 sectors were pending re-allocation after post-read in cycle 1 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 2 of 3.
0 sectors were pending re-allocation after post-read in cycle 2 of 3.
0 sectors were pending re-allocation after zero of disk in cycle 3 of 3.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
0 sectors had been re-allocated before the start of the preclear.
0 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change.
============================================================================

 

anyone preclear any drives with the plugin recently and see the same lines in their logs? is this normal? a bug?

 

 

I'm getting the same thing happening.  Once with a Hitachi drive and now with a WD red.  These are being precleared from an IBM 1015.  Afaik this timeout signal 9 killed log message shows right before post-read starts. Guess i'll have to try my onboard intel ports next time.

Link to comment

Just wanted to add 2 more things I forgot to mention about MHDD:

 

There's a linux clone, http://whdd.org/demo/, there's even a slackware package but it's 32bits.

 

While the graphic part of MHDD is pretty, what I really care about are the stats, below two examples of a good disk, and a bad one that at 4% scan has several slow sectors.

 

width=200http://s10.postimg.org/gjso76ort/2016_03_31_09_55_54.jpg[/img]

 

width=200http://s10.postimg.org/8guhvv46h/2016_03_31_16_59_10.jpg[/img]

 

Good disks usually have very few sectors >10ms, but it's normal to a handful of <150ms, on a healthy disk there shouldn't be any >150ms.

Link to comment

I'm getting the tput:  unknown terminal "screen" error with the beta version.  The non-beta version appears to work correctly.  I did verify screen is setup/working when I login via SSH as root.  Any suggestions?

Which version of unRAID you're using?

Link to comment

I'm getting the tput:  unknown terminal "screen" error with the beta version.  The non-beta version appears to work correctly.  I did verify screen is setup/working when I login via SSH as root.  Any suggestions?

Which version of unRAID you're using?

6.1.9

Link to comment

I see there has been some discussion on this already... but given the changes in 6.2 that are coming down the pike, I think this plugin should entirely strip out the zeroing and post read verification steps (basically all of the clearing...) and not worry about having a completely clear disk when it's done it's stress testing.

 

This should let you run more stress test cycles in a shorter time since you don't have that whole 100% write and 100% verification that take huge swaths of time....

 

Edit: Addition, currently if you want to run this 3 times on a disk it runs the clear and post read 3 whole times... it would be cool IMO if you could pick the number of times each test runs (with options)

 

AKA, Badblocks 3 times, Pre-Smart 1, Post-Smart 1, etc... so you could mix and match instead of having to run though the whole script every single time... (if this was already an option... my bad... I really didn't know that)

Link to comment

I am building a new test unraid server with 6.2 beta 21 - fresh install.  I have 4 8tb seagate harvested drives.  I found the beta preclear plugin and i am running it on the drives 2 cycles per drive.  It appears to be working.  But, i am reading a few comments saying preclear isn't necessary anymore.  I am assuming that 6.2 beta formats the drive without bringing the array down and doesn't actually predict reliability as well as preclear?  By the way, thanks for providing the beta plugin and all of your time that went into it.

Link to comment

I was wondering why the plugin installs on the Settings page of the GUI. Wouldn't Tools be a more appropriate home for it?

for me it is installed with the Settings under Settings and the Control under Tools.  If you have something different maybe you have not got the latest version of the plugin.
Link to comment

I was wondering why the plugin installs on the Settings page of the GUI. Wouldn't Tools be a more appropriate home for it?

for me it is installed with the Settings under Settings and the Control under Tools.  If you have something different maybe you have not got the latest version of the plugin.

 

Well, that is indeed the case for the Dynamix File Integrity plugin, but that's a different thread :)

 

I have the latest versions of both the Preclear Disks plugin (2016.03.22) and the Preclear Disks Beta plugin (2016.03.24a) but neither is "split" as you suggest.

 

Link to comment

I also have the latest Preclear Plugin 2016.03.22 and only have single icon in Settings page under User Utilities.

 

In addition when I open it I have an option to clear my 2nd Cache Drive (I am using 2 cache drives BTRFS).

 

Shouldn't this NOT be listed ? It can f*ck up my data on the cache drive.

 

 

 

Link to comment

When I go to preclear a drive it will pop up the preclear options page. I leave the defaults and click start. It pauses for quite a long time and then goes pack to the page with the list of drives and the drive says "starting" indefinitely. When I click on the preview button I get this:

 

/boot/config/plugins/preclear.disk/preclear_disk.sh  -c 1 /dev/sdh 2>/tmp/preclear.log
root@Tower:/usr/local/emhttp# /boot/config/plugins/preclear.disk/preclear_disk.sh  -c 1 /dev/sdh 2>/tmp/preclear.log
Sorry: Device /dev/sdh is busy.: 1
root@Tower:/usr/local/emhttp#

 

I've tried rebooting several times. No idea what could be keeping it busy.

Link to comment

When I go to preclear a drive it will pop up the preclear options page. I leave the defaults and click start. It pauses for quite a long time and then goes pack to the page with the list of drives and the drive says "starting" indefinitely. When I click on the preview button I get this:

 

/boot/config/plugins/preclear.disk/preclear_disk.sh  -c 1 /dev/sdh 2>/tmp/preclear.log
root@Tower:/usr/local/emhttp# /boot/config/plugins/preclear.disk/preclear_disk.sh  -c 1 /dev/sdh 2>/tmp/preclear.log
Sorry: Device /dev/sdh is busy.: 1
root@Tower:/usr/local/emhttp#

 

I've tried rebooting several times. No idea what could be keeping it busy.

 

Do you have the 6.2 beta? The line in the preclear script that triggers the "Device X is busy" calls sfdisk -R, but the version of sfdisk in 6.2 doesn't have the -R option.

Link to comment

When I go to preclear a drive it will pop up the preclear options page. I leave the defaults and click start. It pauses for quite a long time and then goes pack to the page with the list of drives and the drive says "starting" indefinitely. When I click on the preview button I get this:

 

/boot/config/plugins/preclear.disk/preclear_disk.sh  -c 1 /dev/sdh 2>/tmp/preclear.log
root@Tower:/usr/local/emhttp# /boot/config/plugins/preclear.disk/preclear_disk.sh  -c 1 /dev/sdh 2>/tmp/preclear.log
Sorry: Device /dev/sdh is busy.: 1
root@Tower:/usr/local/emhttp#

 

I've tried rebooting several times. No idea what could be keeping it busy.

 

Do you have the 6.2 beta? The line in the preclear script that triggers the "Device X is busy" calls sfdisk -R, but the version of sfdisk in 6.2 doesn't have the -R option.

 

Yes, I do. That must be the problem. I needed dual parity. I looked around for a bit but didn't have time to dig through the whole thread for the cause. The beta version of the plugin works fine though. Are there any major differences between the normal version and beta version scripts? I looked through them and they are obviously different scripts but didn't go through the functions to see what the actual differences were, task wise.

Link to comment
  • trurl featured, unfeatured and pinned this topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.