"Almost" stuck on Parity Check-in


xtrips

Recommended Posts

Hello,

This is my second run of Parity Check-in since yesterday. The one before I aborted and rebooted.
When it reaches about 70% it slows to a crawl as you can see in the screenshot.
Note that in the first run Disk 1 gave some errors. After reboot they don't show up, yet.
I stopped the Dockers engine to ease up on the server.
What can I do now?

I don't know it it is a coincidence but 2 days ago I upgraded to 6.8.0 rc1.

Untitled.jpg

Link to comment

You should post your system diagnostics zip file (obtained via Tools->Diagnostics) to see if we can spot any reason for the slowdown.

 

If the diagnostics show no obvious reason you can also consider using the DiskSpeed docker container to check if any of your drives have ‘slow’ spots that might cause such symptoms.

Link to comment

DiskSpeed - Disk Diagnostics & Reporting tool
Version: 2.3
 

Scanning Hardware
10:21:19 Spinning up hard drives
10:21:19 Scanning system storage

Lucee 5.2.9.31 Error (application)

Messagetimeout [90000 ms] expired while executing [/usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi]

StacktraceThe Error Occurred in
/var/www/ScanControllers.cfm: line 243

241: <CFOUTPUT>#TS()# Scanning system storage<br></CFOUTPUT><CFFLUSH>
242: <CFFILE action="write" file="#PersistDir#/hwinfo_storage_exec.txt" output=" /usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi" addnewline="NO" mode="666">
243: <cfexecute name="/usr/sbin/hwinfo" arguments="--pci --bridge --storage-ctrl --disk --ide --scsi" variable="storage" timeout="90" /><!--- --usb-ctrl --usb --hub --->
244: <CFFILE action="delete" file="#PersistDir#/hwinfo_storage_exec.txt">
245: <CFFILE action="write" file="#PersistDir#/hwinfo_storage.txt" output="#storage#" addnewline="NO" mode="666">
 

called from /var/www/ScanControllers.cfm: line 242

240:
241: <CFOUTPUT>#TS()# Scanning system storage<br></CFOUTPUT><CFFLUSH>
242: <CFFILE action="write" file="#PersistDir#/hwinfo_storage_exec.txt" output=" /usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi" addnewline="NO" mode="666">
243: <cfexecute name="/usr/sbin/hwinfo" arguments="--pci --bridge --storage-ctrl --disk --ide --scsi" variable="storage" timeout="90" /><!--- --usb-ctrl --usb --hub --->
244: <CFFILE action="delete" file="#PersistDir#/hwinfo_storage_exec.txt">
 

Java Stacktracelucee.runtime.exp.ApplicationException: timeout [90000 ms] expired while executing [/usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi]
  at lucee.runtime.tag.Execute._execute(Execute.java:241)
  at lucee.runtime.tag.Execute.doEndTag(Execute.java:252)
  at scancontrollers_cfm$cf.call_000006(/ScanControllers.cfm:243)
  at scancontrollers_cfm$cf.call(/ScanControllers.cfm:242)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823)
  at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:66)
  at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45)
  at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464)
  at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454)
  at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427)
  at lucee.runtime.engine.Request.exe(Request.java:44)
  at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1090)
  at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1038)
  at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
  at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
  at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
  at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:748)
 

Timestamp10/16/19 10:22:49 AM IDT

Link to comment

Not sure what can cause that☹️  just checked and that container is working fine on my 6.8.0-rc1 system.   I wonder if it is related to your current issue in any way - the fact it mentions a timeout does feel it could be?  Might be worth making a post in the support thread for that container to see if the maintainer has any ideas.

 

BTW:  I looked at the diagnostics and did not spot any obvious problem.

Link to comment
1 minute ago, itimpi said:

Not sure what can cause that☹️  just checked and that container is working fine on my 6.8.0-rc1 system.   I wonder if it is related to your current issue in any way - the fact it mentions a timeout does feel it could be?  Might be worth making a post in the support thread for that container to see if the maintainer has any ideas.

 

BTW:  I looked at the diagnostics and did not spot any obvious problem.

What container?

Link to comment

I aborted the Parity Check In for the second time now. It was not worth it.
Diskspeed is running now. Crashed before because it couln't run alongside with Parity I guess.

The amount of errors on Disk 5 are 1008 up until now.
And Disk 1 showed around 70 errors yesterday.
I won't hold my breadth for Diskspeed. I suppose the diagnosis is bad.
What is the recommended procedure for me now?
I suppose those 2 HDDs are nearly dead.
I still can access the files on each of them disks.
I was thinking:
- stop parity completely
- remove 1 and 5
- replace 1 and 5 with new ones
- connect old 1 and then old 5 as external and mount
- copy old 1/5 to new 1/5
- after checking that all is fine, enable parity and run Parity check in.

 

What do you think?

Edited by xtrips
Link to comment
9 minutes ago, johnnie.black said:

Also not looking good, your plan sounds good to me, alternatively you can use ddrescue to clone the failing disks and recover as much data as possible, if there are read errors some files will likely be corrupt but you can identify which ones.

Thanks for checking.
Never used ddrescue. Would it be any different than using winscp from Windows and visualizing the source and target drive?
Using command lines gives me the bees knees, but I will do it "carefully" if there is a serious advantage.

Link to comment
  • 2 weeks later...
On 10/16/2019 at 6:36 PM, johnnie.black said:

Also not looking good, your plan sounds good to me, alternatively you can use ddrescue to clone the failing disks and recover as much data as possible, if there are read errors some files will likely be corrupt but you can identify which ones.

Hello,

Did you ever use ddrescue? I am asking because I find it weird that I am almost done with the second and last HDD that have raised so many errors while running Parity check, while when used as a source with ddrescue don't raise even 1 error or bad sector or whatever....
What does this means?

Link to comment
Yes, many times, some times disk read errors are intermittent, if there are no read errors with ddrescue all data will be ok.
 
 

I got that. I didn’t make myself clear enough. I meant what does it mean for that source disc? is it failing or is it not? What tool do you suggest to use to make that clear once and for all?


Sent from my iPhone using Tapatalk Pro
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.