Jump to content
gfjardim

Preclear plugin

2436 posts in this topic Last Reply

Recommended Posts

Guys, I was too busy these last two months that I couldn't even read this topic. I see many are having troubles with my script, but I need sometime to compile all info to see if I can reproduce all the bugs. If someone could help me compiling all relevant info, I'll gladly appreciate.

I need someone to share the development with me too. Those interested please send me a PM.

Share this post


Link to post
On 2017-4-8 at 0:56 PM, Switchblade said:

 

Please answer this question

 

 

The 2017.03.31 version removed the "Install Statistics" prompt every new update. A minor changes in code were made too, but no bugfixes.

Share this post


Link to post
On 2017-6-9 at 3:58 PM, aptalca said:

I am also having an issue with the plugin stalling. I am in the post-read step of a preclear of an 8TB disk, 22% in, it seems to have stalled. Percentage hasn't changed in hours, time elapsed is also stuck.

 

htop shows the preclear script as running (pegging the cpu core at 100% constantly) and there are two other related active processes: "cmp - /dev/zero" and "dd if=/dev/sdb bs=2097152 skip=1 iflag=direct".

 

Is the preclear still ongoing? Is it only the reporting aspect that is borked? Should I wait another 10 hours for it to finish?

 

The preclear log only shows that the post-read has started, nothing after that.

 

I checked the folder /temp/.preclear/sdb/ and the cmp_out, dd_output and the display_output haven't been modified in hours.

 

Thanks

 

Still having the issue, @aptalca? I could use your help to debug it.

Share this post


Link to post
1 hour ago, gfjardim said:

 

Still having the issue, @aptalca? I could use your help to debug it.

 

After seeing reports from others in other threads, I canceled that stalled preclear and started a new one using JoeL's script in a screen session. It is currently in post-read and has another 15 hours or so to go (8TB drives take forever). After that's done, I'd be happy to help debug.

Share this post


Link to post

@gfjardim

 

Suggestions ...

 

1 - Add logging to your script so you can figure out where the hang is occurring when it happens again. (Joe's script has some subshells to work around an unexplained bug - might want to look at that - you may be able to wrap some of your code in sub shells and avoid your bug??)

 

2 - Add a "resume" feature, so if people use your script and it does hang, they could simply resume their preclear from the plugin page without having to start from scratch. There may already be enough info in the /tmp files to do so, but if not, a "resume" file could be created. People would be more likely to use it if they know that they aren't going to waste alll that time preclearing. This would actually be a very nice feature in preclear. To be able to stop it and resume later - even after a reboot. I was able to resume my failed preclear with a quick patch to my version of the preclear script, but few would know enough to do that, and instead forced to restart at least the current stage.

 

3 - Add a resumer process that detects the hang, kills the old one, and resumes. Users would never know except for a very brief slowdown. You could use your "phone home" system to share the logs when the resumer has to do its thing.

 

BTW - I pretty much hate shell scripting. Awk is so much easier to use. Had thought about rewriting preclear but in awk, but for a variety of reasons decided to let the preclear scripting stay as it was. But in awk you can use the "system()" command to do the dirty little I/O commands, and still use Awk's simple syntax and structuring features to organize the code. Awk might have a small performance impact in the control logic, but IMO would be well worth it given the added maintainability.

 

Sorry don't have the patience to get into the weeds with you on this. I muddle through shell scripting but have no interest in wasting the brain cells I have left on becoming an expert. :)

 

Good luck. If I have other preclears to do I will help test if you've got something in place to find bugs.

Share this post


Link to post
39 minutes ago, bjp999 said:

@gfjardim

 

Suggestions ...

 

1 - Add logging to your script so you can figure out where the hang is occurring when it happens again. (Joe's script has some subshells to work around an unexplained bug - might want to look at that - you may be able to wrap some of your code in sub shells and avoid your bug??)

 

2 - Add a "resume" feature, so if people use your script and it does hang, they could simply resume their preclear from the plugin page without having to start from scratch. There may already be enough info in the /tmp files to do so, but if not, a "resume" file could be created. People would be more likely to use it if they know that they aren't going to waste alll that time preclearing. This would actually be a very nice feature in preclear. To be able to stop it and resume later - even after a reboot. I was able to resume my failed preclear with a quick patch to my version of the preclear script, but few would know enough to do that, and instead forced to restart at least the current stage.

 

3 - Add a resumer process that detects the hang, kills the old one, and resumes. Users would never know except for a very brief slowdown. You could use your "phone home" system to share the logs when the resumer has to do its thing.

 

BTW - I pretty much hate shell scripting. Awk is so much easier to use. Had thought about rewriting preclear but in awk, but for a variety of reasons decided to let the preclear scripting stay as it was. But in awk you can use the "system()" command to do the dirty little I/O commands, and still use Awk's simple syntax and structuring features to organize the code. Awk might have a small performance impact in the control logic, but IMO would be well worth it given the added maintainability.

 

Sorry don't have the patience to get into the weeds with you on this. I muddle through shell scripting but have no interest in wasting the brain cells I have left on becoming an expert. :)

 

Good luck. If I have other preclears to do I will help test if you've got something in place to find bugs.

 

1) Just figured it is a much more complicated problem to deal. Sometimes dd saturates the disk's I/O to such a point where a simple S.M.A.R.T probing is taking as much as 1 minute to complete, and if multiple probings are launched at the same time, this period increases dramatically. I'll have to think in a workaround to this problem, since it envolves emhttp and Unassigned Devices too.

 

2) Already thought about that and it's in my TODO list;

 

3) I would prefer users to be alerted by any issues, but I'll think about it.

 

Thanks a lot for your reply and your kind words!

Share this post


Link to post

Actually I have had problems with frequent smart reports and heavy I/O. Worse on add-on controllers vs MB ports. I can't prove it, but a new Seagate 2T drive I had years ago, that I was preclearing on an add on controller while pulling frequent smart reports, got all screwed up and had the freakiest problems I've ever seen. Very long delays responding. I ultimately returned it as defective - but have always thought it was due in some way to preclearing with smart reports. I experimented with something called the "permissive" flag in smartctl on that Seagate and that did not help - maybe made things worse. But do know its not a solution to allowing one to pull constant smart reports. Since then never preclear on anything but motherboard port, and very gentle on pulling smart reports.

 

Kinda forgot all that as I avoided the problems for years. I had disabled updates of the stock GUI (it was causing hangs with my version of unRAID (6.0.1)), so background smart checks were not happening. And with myMain, I never let it do the auto-updates, and I implemented a refresh button that remembered the temperatures from refresh to refresh and avoided pulling new smart report. This causes nearly no smart reports unless I explicitly ask for one. Asking for one every few hours is very low risk.

 

But the machine I used to preclear yesterday was a new build running 6.3.5. Stock GUI using default settings, so doing its background updates. Putting all the fact together with your conclusion, makes a lot of sense that the constant smart reports would screw things up.

 

Would be interesting test to do with GUI turned off or settings adjusted. Probably would not hang.

Share this post


Link to post

In V6 generation of SMART reports is decoupled from the GUI and is determined by the setting of TUNABLE (POLL_ATTRIBUTES). Default value is every 30 minutes, though most people have it reduced to a couple of minutes.

 

Share this post


Link to post

I'm having the same issue as everyone else. Trying to preclear an 8tb drive. It seems to hang after about the same amount of time every time. I can get through 1 full phase and about 92-95% of the next phase before preclear completely freezes and won't update the time or progress. It also causes the webui to stop updating (I can navigate the webui, but no information in populated. For instance, all of my disks disappear. I can ssh in and see progress using the preclear command once the webui freezes, but eventually that freezes too.

Share this post


Link to post
 
Still having the issue, [mention=7767]aptalca[/mention]? I could use your help to debug it.


Hi@gfjardim

I started another preclear through the plugin and it got stuck during post read. One cpu core is pegged at 100%. I'll leave it as is for now. Let me know what you need me to do to debug.

Share this post


Link to post
1 hour ago, aptalca said:

 


Hi@gfjardim

I started another preclear through the plugin and it got stuck during post read. One cpu core is pegged at 100%. I'll leave it as is for now. Let me know what you need me to do to debug.

 

 

 

I've made some changes in the code. If you can, please cancel this preclear instance, upgrade the plugin and start a new one.

 

Thanks a lot!

Share this post


Link to post

Plugin version 2017-06-15a got stuck on pre-read 99% on an 8tb drive

Share this post


Link to post
Plugin version 2017-06-15a got stuck on pre-read 99% on an 8tb drive

Aaarrgghh... i feel your pain

Sent from my LG-D855 using Tapatalk

Share this post


Link to post
On 6/15/2017 at 0:36 AM, gfjardim said:

 

 

I've made some changes in the code. If you can, please cancel this preclear instance, upgrade the plugin and start a new one.

 

Thanks a lot!

 

Same issue, freezes about halfway through the zeroing process on my 8tb red.

Share this post


Link to post

Hi

 

v2017.06.15a prelcear got stuck at 26% on a 8TB Seagate Archive drive in pre-read pass. The time also got stuck.

 

At the same time I'm doing a Parity Rebuild.

 

BR Søren

Edited by SørenBM

Share this post


Link to post

Hello....added a new disk to the array and it showed up on unassigned disks. I selected the preclear option and it just stays stuck on "retrieving information". Please advise.

Share this post


Link to post
On 2017-6-16 at 9:30 AM, aptalca said:

Plugin version 2017-06-15a got stuck on pre-read 99% on an 8tb drive

 

Any webui locks? 

Share this post


Link to post
 
Any webui locks? 


No, the webui is functional. Everything else works. It's just that one cpu core is pegged at 100% by the dd process and the preclear status no longer updated (temp files no longer update status, they are no longer written to)

Share this post


Link to post

Guys, to start I need a copy of your plugin logs and, if possible, the contents of /tmp/.preclear directory.

 

Thanks a lot for your replies.

Share this post


Link to post

So something weird happened, a full pre-read, zeroing, and post read successfully completed on an 8tb red. Nothing changed other than I made sure to not touch anything else on the unraid box while it processed. I checked the status at several points in the process and the webui was functional each time. Not sure what changed for me.

Share this post


Link to post

Updated preclear and I don't know if this is related or not but Im having issues.  Unable to remove plugins (all) the check box does not appear and the "remove" button stays unselectable.  At the same time I am unable to select a notification type or trigger when starting a preclear.  Tried a reboot.  Any advice?  Thanks, Andrew

preclear snip.JPG

Plugin snip.JPG

Share this post


Link to post
13 minutes ago, allischalmersman said:

Updated preclear and I don't know if this is related or not but Im having issues.  Unable to remove plugins (all) the check box does not appear and the "remove" button stays unselectable.  At the same time I am unable to select a notification type or trigger when starting a preclear.  Tried a reboot.  Any advice?  Thanks, Andrew

preclear snip.JPG

Plugin snip.JPG

 

Probably not related to this plugin. Please open the page in another browser to confirm the bug.

Share this post


Link to post
2 hours ago, gfjardim said:

 

Probably not related to this plugin. Please open the page in another browser to confirm the bug.

Sorry I did not try another browser.  I have never used anything other than Firefox with my UnRaid dealings.  Everything works with Chrome. This seems to be an issue with Firefox as documented in this post

 

Edited by allischalmersman

Share this post


Link to post

I tried to upgrade to the 2017.06.23a version this morning, but received this error:

 

plugin: updating: preclear.disk.plg

Fatal error: escapeshellarg(): Argument exceeds the allowed length of 4096 bytes in /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin on line 342

 

I'm running unRAID 6.3.5 and I'm upgrading from preclear 2017.06.21a.

 

Thanks.

Edited by jademonkee
Added version info.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.