Jump to content

Preclear Plugin seems to have stopped on Post Read


rh535

Recommended Posts

I am trying to Preclear a brand new WD Red 8TB drive and it seems to have stopped. This happend once at 23% on post read and i thought i did something so i just started again. This time it stopped on 83% post read. I looked at the log and this seems to be the warning that is popping up.

 

Mar 04 01:07:50 preclear_disk_VK0UEERY_12419: Zeroing: dd command failed, exit code: 1

Any thoughts on what i should do?

 

 

Here is the entire log if that helps

 

Feb 28 21:33:52 preclear_disk_VK0UEERY_1111: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --notify 7 --frequency 4 --cycles 1 --no-prompt /dev/sdd
Feb 28 21:33:52 preclear_disk_VK0UEERY_1111: Preclear Disk Version: 0.8.4-beta
Feb 28 21:33:52 preclear_disk_VK0UEERY_1111: S.M.A.R.T. info type: default
Feb 28 21:33:52 preclear_disk_VK0UEERY_1111: S.M.A.R.T. attrs type: default
Feb 28 21:33:55 preclear_disk_VK0UEERY_1111: Pre-Read: dd if=/dev/sdd of=/dev/null bs=2097152  iflag=direct
Feb 28 21:33:55 preclear_disk_VK0UEERY_1111: Pre-Read: dd pid [3598]
Feb 28 22:30:32 preclear_disk_86GGSA5MS_13748: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --notify 7 --frequency 3 --cycles 1 --no-prompt /dev/sdb
Feb 28 22:30:32 preclear_disk_86GGSA5MS_13748: Preclear Disk Version: 0.8.4-beta
Feb 28 22:30:34 preclear_disk_86GGSA5MS_13748: S.M.A.R.T. info type: default
Feb 28 22:30:34 preclear_disk_86GGSA5MS_13748: S.M.A.R.T. attrs type: default
Feb 28 22:30:34 preclear_disk_86GGSA5MS_13748: xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option
Feb 28 22:30:40 preclear_disk_86GGSA5MS_13748: Pre-Read: dd if=/dev/sdb of=/dev/null bs=2097152  iflag=direct
Feb 28 22:30:40 preclear_disk_86GGSA5MS_13748: Pre-Read: dd pid [16336]
Mar 01 00:20:15 preclear_disk_86GGSA5MS_13748: Pre-Read: dd - read 1000204886016 of 1000204886016.
Mar 01 00:20:15 preclear_disk_86GGSA5MS_13748: : 0
Mar 01 00:20:15 preclear_disk_86GGSA5MS_13748: Zeroing: dd if=/dev/zero of=/dev/sdb bs=2097152 seek=1 conv=fdatasync,noerror oflag=direct
Mar 01 00:20:15 preclear_disk_86GGSA5MS_13748: Zeroing: dd pid [15361]
Mar 01 02:06:52 preclear_disk_86GGSA5MS_13748: Zeroing: dd - wrote 1000202788864 of 1000204886016.
Mar 01 02:06:53 preclear_disk_86GGSA5MS_13748: Zeroing: dd command failed, exit code: 1
Mar 01 02:06:57 preclear_disk_86GGSA5MS_13748: Post-Read: dd if=/dev/sdb bs=512 count=4096 skip=1 iflag=direct
Mar 01 02:06:58 preclear_disk_86GGSA5MS_13748: Post-Read: dd pid [17998]
Mar 01 02:06:58 preclear_disk_86GGSA5MS_13748: Post-Read: dd if=/dev/sdb bs=2097152  skip=1 iflag=direct 2>/tmp/.preclear/sdb/dd_output | cmp - /dev/zero &>/tmp/.preclear/sdb/cmp_out 
Mar 01 02:06:58 preclear_disk_86GGSA5MS_13748: Post-Read: dd pid [18020]
Mar 01 03:56:24 preclear_disk_86GGSA5MS_13748: Post-Read: dd - read 1000202788864 of 1000204886016.
Mar 01 03:56:24 preclear_disk_86GGSA5MS_13748: Zeroing: 1
Mar 01 13:13:45 preclear_disk_VK0UEERY_1111: Pre-Read: dd - read 8001563222016 of 8001563222016.
Mar 01 13:13:45 preclear_disk_VK0UEERY_1111: : 0
Mar 01 13:13:45 preclear_disk_VK0UEERY_1111: Zeroing: dd if=/dev/zero of=/dev/sdd bs=2097152 seek=1 conv=fdatasync,noerror oflag=direct
Mar 01 13:13:45 preclear_disk_VK0UEERY_1111: Zeroing: dd pid [27973]
Mar 02 04:41:26 preclear_disk_VK0UEERY_1111: Zeroing: dd - wrote 8001561124864 of 8001563222016.
Mar 02 04:41:27 preclear_disk_VK0UEERY_1111: Zeroing: dd command failed, exit code: 1
Mar 02 04:41:31 preclear_disk_VK0UEERY_1111: Post-Read: dd if=/dev/sdd bs=512 count=4096 skip=1 iflag=direct
Mar 02 04:41:31 preclear_disk_VK0UEERY_1111: Post-Read: dd pid [16632]
Mar 02 04:41:31 preclear_disk_VK0UEERY_1111: Post-Read: dd if=/dev/sdd bs=2097152  skip=1 iflag=direct 2>/tmp/.preclear/sdd/dd_output | cmp - /dev/zero &>/tmp/.preclear/sdd/cmp_out 
Mar 02 04:41:31 preclear_disk_VK0UEERY_1111: Post-Read: dd pid [16654]
Mar 02 18:00:53 preclear_disk_VK0UEERY_12419: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --notify 7 --frequency 4 --cycles 1 --no-prompt /dev/sdd
Mar 02 18:00:53 preclear_disk_VK0UEERY_12419: Preclear Disk Version: 0.8.4-beta
Mar 02 18:00:53 preclear_disk_VK0UEERY_12419: S.M.A.R.T. info type: default
Mar 02 18:00:53 preclear_disk_VK0UEERY_12419: S.M.A.R.T. attrs type: default
Mar 02 18:00:55 preclear_disk_VK0UEERY_12419: Pre-Read: dd if=/dev/sdd of=/dev/null bs=2097152  iflag=direct
Mar 02 18:00:55 preclear_disk_VK0UEERY_12419: Pre-Read: dd pid [14898]
Mar 03 09:40:17 preclear_disk_VK0UEERY_12419: Pre-Read: dd - read 8001563222016 of 8001563222016.
Mar 03 09:40:17 preclear_disk_VK0UEERY_12419: : 0
Mar 03 09:40:18 preclear_disk_VK0UEERY_12419: Zeroing: dd if=/dev/zero of=/dev/sdd bs=2097152 seek=1 conv=fdatasync,noerror oflag=direct
Mar 03 09:40:18 preclear_disk_VK0UEERY_12419: Zeroing: dd pid [7424]
Mar 04 01:07:49 preclear_disk_VK0UEERY_12419: Zeroing: dd - wrote 8001561124864 of 8001563222016.
Mar 04 01:07:50 preclear_disk_VK0UEERY_12419: Zeroing: dd command failed, exit code: 1
Mar 04 01:07:55 preclear_disk_VK0UEERY_12419: Post-Read: dd if=/dev/sdd bs=512 count=4096 skip=1 iflag=direct
Mar 04 01:07:55 preclear_disk_VK0UEERY_12419: Post-Read: dd pid [11144]
Mar 04 01:07:55 preclear_disk_VK0UEERY_12419: Post-Read: dd if=/dev/sdd bs=2097152  skip=1 iflag=direct 2>/tmp/.preclear/sdd/dd_output | cmp - /dev/zero &>/tmp/.preclear/sdd/cmp_out 
Mar 04 01:07:55 preclear_disk_VK0UEERY_12419: Post-Read: dd pid [11166]

 

Link to comment

I have run into the same trouble. My post read seems to have stopped at 54%, though CPU usage does continue at 100% on CPU 1 (CPU 0 at 0%)

 

Mar 03 16:52:13 preclear_disk_Z840RRGX_8870: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --notify 2 --frequency 2 --cycles 2 --no-prompt /dev/sdb
Mar 03 16:52:13 preclear_disk_Z840RRGX_8870: Preclear Disk Version: 0.8.4-beta
Mar 03 16:52:13 preclear_disk_Z840RRGX_8870: S.M.A.R.T. info type: default
Mar 03 16:52:13 preclear_disk_Z840RRGX_8870: S.M.A.R.T. attrs type: default
Mar 03 16:52:18 preclear_disk_Z840RRGX_8870: Pre-Read: dd if=/dev/sdb of=/dev/null bs=2097152 iflag=direct
Mar 03 16:52:18 preclear_disk_Z840RRGX_8870: Pre-Read: dd pid [11375]
Mar 04 07:41:07 preclear_disk_Z840RRGX_8870: Pre-Read: dd - read 8001563222016 of 8001563222016.
Mar 04 07:41:07 preclear_disk_Z840RRGX_8870: : 0
Mar 04 07:41:08 preclear_disk_Z840RRGX_8870: Zeroing: dd if=/dev/zero of=/dev/sdb bs=2097152 seek=1 conv=fdatasync,noerror oflag=direct
Mar 04 07:41:08 preclear_disk_Z840RRGX_8870: Zeroing: dd pid [4742]
Mar 04 22:18:19 preclear_disk_Z840RRGX_8870: Zeroing: dd - wrote 8001561124864 of 8001563222016.
Mar 04 22:18:20 preclear_disk_Z840RRGX_8870: Zeroing: dd command failed, exit code: 1
Mar 04 22:18:23 preclear_disk_Z840RRGX_8870: Post-Read: dd if=/dev/sdb bs=512 count=4096 skip=1 iflag=direct
Mar 04 22:18:24 preclear_disk_Z840RRGX_8870: Post-Read: dd pid [1981]
Mar 04 22:18:24 preclear_disk_Z840RRGX_8870: Post-Read: dd if=/dev/sdb bs=2097152 skip=1 iflag=direct 2>/tmp/.preclear/sdb/dd_output | cmp - /dev/zero &>/tmp/.preclear/sdb/cmp_out 
Mar 04 22:18:24 preclear_disk_Z840RRGX_8870: Post-Read: dd pid [2015]

Link to comment
  • 2 weeks later...
On 3/9/2017 at 11:09 AM, rh535 said:

After uninstalling then reinstalling the plugin with changing the sata port - I was able to complete the preclear.

Thank you for updating your post. I was trying to preclear 2-4tb disks at the same time. Both stopped and after looking at the report I noticed that even the time elapsed had not changed in hours. I stopped the preclear and had some pretty outrageous CPU utilization still so I uninstalled the plugin, rebooted the server, and reinstalled the plugin. Now I am preclearing one drive at a time. It seems to be making good progress. Thanks again.

Link to comment

I now need to follow up, it seems to have hung again. The time is now 16:10 so when looking at the screenshot, the time elapsed has not updated in 40 minutes. I failed to write down the time/percent when this happened yesterday but I'm fairly certain this drive's preclearing also frozen at 88% during the post-read process. Can anyone help me out?

Screenshot.PNG

Link to comment
On 3/24/2017 at 8:47 PM, geonerdist said:

It is still showing 88% complete at 150MB/s and an elapsed time of 6:30:47...there is also zero flashing going on on the hard drive activity LED on my server. 

 

I think the biggest change for me was changing the sata port. Have you tried that?

Link to comment
15 hours ago, rh535 said:

 

I think the biggest change for me was changing the sata port. Have you tried that?

 

I wish it were that easy. My drives are using a SAS backplane connected to on board RAID controller set to AHCI mode. The best I was able to do was swap drives around to use a different connector which did nothing.

Link to comment
  • 2 months later...

@gfjardim - PLEASE INCLUDE INSTRUCTIONS TO INSTALL MY TWEAKED VERSION OF JOEL'S PRECLEAR SCRIPT (AT LEAST) INTO YOUR PLUGIN. Should be installed by default until this issue is resolved.

 

Users - the current version of the script installed with the pluggin is a rewrite by @gfjardimof the script that @Joe L. wrote and I tweaked by me @bjp999 (to make it faster on the post read and provide the real time status updates (that you now see in dynamix and in the pluggin page)). GF's rewrite is an ambitious project, with good potential, but it is failing. There is no problem running either Joe's original or my tweaked version as far as I know. So if your desire is to preclear a disk and not to help gf debug, please follow the instructions to install the proven scripts. The earlier scripts do work with the plugin (iI think), but were originally written to run from the command line (in a "screen" season) which DEFINITELY works flawlessly! I'd guess more than 500 disks were processed with those scripts over the years without issue with the scripting.

 

If people are trying to get the preclear to work with gf's script, or even with the originals, I suggest completely shutting down the unRaid GUI after you kick off the preclear. The GUI does background smart reports and maybe the preclear plugin does also (not sure). The script also does them to get the temperature. Too many smart reports, possibly running in parallel, may be related to the failure, so shutting down the GUI may allow the process to complete successfully. So users running with the gui not running and reporting success or failure would give valuable data points to help gf debug the problem. The GUI has a special configuration around stopping background refreshes (including smart reports) during parity checks/builds that could be enhanced to also stop then during preclears. But first we need some hard evidence which is lacking now.

 

If you turn off the gui, you can monitor whether the preclear is running properly by observing the disk lights on the server itself, or running "top" from the command prompt and seeing it the preclear prices is pegged at or near 100% CPU.

 

You can also check a file called /tmp/preclear_stat_sdX (where sdX is the device for the disk being precleared). This includes the exact message that appears in the plugin and GUI. Adding this string was another of my ideas/tweaks to preclear. Previously you had tho go to the command line to check on its status.

 

The simple command:

cat /tmp/preclear_stat_sd*

 

Should be a way to monitor one or more preclears. If it stops updating, the script is hung.

Link to comment
35 minutes ago, bjp999 said:

@gfjardim - PLEASE INCLUDE INSTRUCTIONS TO INSTALL MY TWEAKED VERSION OF JOEL'S PRECLEAR SCRIPT (AT LEAST) INTO YOUR PLUGIN. Should be installed by default until this issue is resolved.

I'm going to add a mod comment to this plugin within CA pointing to this post.  When these issues are resolved, can someone point it out to me as I don't really follow the goings on with PreClear.

Link to comment
  • 2 weeks later...

I think I got this fixed:

 

a) I've added to my preclear script a function that detects hanging smartctl and hdparm commands and pauses all reading/writing to the disk until they exit.

b) Now all webui info is collected in the background using a daemon, so updated info should be available instantaneously.

c) I've tweaked Unassigned Devices to be smarter when probing SMART info, and soon it could use the same daemon above.

Link to comment
18 minutes ago, bjp999 said:

Interesting. Seems like it might not be a perfect solution - as the smartctl / hdparm run independently, and possibly start an instant after the check and still wind up hanging the script. :( Hope not.

 

The check is done every ~ 7 seconds and I didn't experienced any hangs so far.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...