DiskSpeed, hard drive benchmarking (unRAID 6+), version 2.9.2


Recommended Posts

39 minutes ago, papnikol said:

There is more bandwidth due to the move from PCI e 1.0 to PCIe 2.x. But still, the bandwidth is not balanced across all disks. The bandwidth usage is maximized for each drive until the last 2-3 drives have very little bandwidth available.

I'm seeing the same using the SASLP, looks like some issue with diskspeed and this specific controller, real world bandwidth of the SASLP is around 640MB/s, so just divide that by the number of disks you have connected and it will be max speed during check/rebuild.

 

1767935946_Screenshot2018-05-0813_45_02.png.fc91a7e3a94c4b353e21ec79c5383057.png

 

 

Link to comment

Updated to today's version, tried to run a benchmark, got this error:

 

Lucee 5.2.6.59 Error (application)
Message	source file [/tmp/DiskSpeedTmp/benchmark_0000_01_00.0.txt] is not a file
Stacktrace	The Error Occurred in
/var/www/Benchmark.cfm: line 117 
115: <CFIF URL.Restart EQ "Y">
116: <CFLOOP index="CR" from="1" to="#BenchCheck.RecordCount#">
117: <CFFILE action="Delete" file="/tmp/DiskSpeedTmp/#BenchCheck.Name[CR]#">
118: </CFLOOP>
119: <CFLOCATION URL="Benchmark.cfm" addtoken="NO">

 

Link to comment
5 hours ago, JustinAiken said:

Updated to today's version, tried to run a benchmark, got this error

 

Is this reproduceable? I can't see how this would have happened unless there was a race condition where two processes were deleting the files at the same time.

Link to comment
7 hours ago, johnnie.black said:

I'm seeing the same using the SASLP, looks like some issue with diskspeed and this specific controller, real world bandwidth of the SASLP is around 640MB/s, so just divide that by the number of disks you have connected and it will be max speed during check/rebuild.

 

What the app does is to start a balls-to-the-wall read of each drive using dd (to null) with the progress & cache bypass switches as  background tasks running simultaneously. After 15 seconds, it kills the pids of the background processes and then analyzes the log files. To the OS, it should look like x number of applications requesting disk data. This is different from unRAID because unRAID will read one block from each drive, check, read the next block, check that block, repeat - the drives should naturally balance out the reads as unRAID is (likely) self-throttling.

 

I'm not sure how I can work around this.

Link to comment
Quote

Is this reproduceable? I can't see how this would have happened unless there was a race condition where two processes were deleting the files at the same time.

 

Just tried to "Purge Everything and Start Over"  (which seemed to work), then ran the benchmark again... got the same thing:

Lucee 5.2.6.59 Error (application)
Message	source file [/tmp/DiskSpeedTmp/benchmark_0000_01_00.0.txt] is not a file
Stacktrace	The Error Occurred in
/var/www/Benchmark.cfm: line 117 
115: <CFIF URL.Restart EQ "Y">
116: <CFLOOP index="CR" from="1" to="#BenchCheck.RecordCount#">
117: <CFFILE action="Delete" file="/tmp/DiskSpeedTmp/#BenchCheck.Name[CR]#">
118: </CFLOOP>
119: <CFLOCATION URL="Benchmark.cfm" addtoken="NO">

 

Link to comment
48 minutes ago, JustinAiken said:

Just tried to "Purge Everything and Start Over"  (which seemed to work), then ran the benchmark again... got the same thing

 

Update your Docker and try the benchmark again and let me know if it worked - you don't need to purge again. I moved the temp files from the exported "DiskSpeed" directory to one internal inside the Docker (which gets cleaned up whenever you update the Docker) but missed a path reference. It should have only happened if you previously aborted a benchmark prior to updating the Docker app.

Edited by jbartlett
  • Like 1
Link to comment

Hey, amazing disk speed plugin.  It works perfect on my test box, but not on my main unraid box.  This is what I get. 

 

I first tried it when I had a failed drive,  so I assumed it was due to the emulated drive.  After replacing the drive I get the same error.  I've tried removing the container and removing the appdata folder but the issue remains.

 

Any ideas?

 

 

DiskSpeed - Disk Diagnostics & Reporting tool
Version: Beta 3a
 

Scanning Hardware
21:11:32 Spinning up hard drives
21:11:32 Scanning storage controllers
21:11:34 Found Controller 82371AB/EB/MB PIIX4 IDE (2 ports)
21:11:34 Found Controller SATA AHCI controller (30 ports)
21:11:34 Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports)
21:11:34 Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports)
21:11:34 Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports)
21:11:34 Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports)
21:11:34 Scanning controllers for hard drives

Lucee 5.2.6.59 Error (expression)
Message Element at position [9] does not exist in list
Stacktrace The Error Occurred in
/var/www/ScanControllers.cfm: line 646
644: <CFLOOP index="Key" list="#StructKeyList(HW)#">
645: <CFLOOP index="PortNo" from="1" to="#ArrayLen(HW[Key].Ports)#">
646: <CFIF HW[Key].Ports[PortNo].DevicePath NEQ "">
647: <cfexecute name="/bin/ls" arguments="-l #HW[Key].Ports[PortNo].DevicePath#" timeout="300" variable="tmp" />
648: <CFSET dir=ListToArray(tmp,Chr(10))>
 
called from /var/www/ScanControllers.cfm: line 641
639: </CFIF>
640: </CFIF>
641: </CFLOOP>
642: <!--- <cfdump var=#hw#> --->
643: <!--- Fetch drive attributes --->
 
Java Stacktrace lucee.runtime.exp.ExpressionException: Element at position [9] does not exist in list
  at lucee.runtime.type.wrap.ListAsArray.getE(ListAsArray.java:110)
  at lucee.runtime.type.wrap.ListAsArray.get(ListAsArray.java:275)
  at lucee.runtime.type.wrap.ListAsArray.get(ListAsArray.java:280)
  at lucee.runtime.type.util.ArraySupport.get(ArraySupport.java:327)
  at lucee.runtime.util.VariableUtilImpl.get(VariableUtilImpl.java:263)
  at lucee.runtime.util.VariableUtilImpl.getCollection(VariableUtilImpl.java:257)
  at lucee.runtime.PageContextImpl.getCollection(PageContextImpl.java:1447)
  at scancontrollers_cfm$cf.call_000048(/ScanControllers.cfm:646)
  at scancontrollers_cfm$cf.call(/ScanControllers.cfm:641)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:939)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:833)
  at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:63)
  at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:44)
  at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2405)
  at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2395)
  at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2363)
  at lucee.runtime.engine.Request.exe(Request.java:44)
  at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1091)
  at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1039)
  at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
  at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
  at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:676)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1132)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
  at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:748)
 
Timestamp 5/22/18 9:11:39 PM EDT
Edited by simon021
Link to comment

@jbartlett, Thank you for this tool. I very much appreciate what you are doing here. 

 

Update:  Looks like this is the same error message Simon021 posted.

 

I have downloaded the latest docker version and can't seem to get past the initial hardware scan. This is the error message I received:

 

DiskSpeed - Disk Diagnostics & Reporting tool
Version: Beta 3a

Scanning Hardware
12:01:51 Spinning up hard drives
12:01:51 Scanning storage controllers
12:01:53 Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports)
12:01:53 Found Controller C602 chipset 4-Port SATA Storage Control Unit (4 ports)
12:01:53 Found Controller C600/X79 series chipset 6-Port SATA AHCI Controller (6 ports)
12:01:53 Scanning controllers for hard drives
Lucee 5.2.6.59 Error (expression)
Message	Element at position [9] does not exist in list
Stacktrace	The Error Occurred in
/var/www/ScanControllers.cfm: line 646 
644: <CFLOOP index="Key" list="#StructKeyList(HW)#">
645: <CFLOOP index="PortNo" from="1" to="#ArrayLen(HW[Key].Ports)#">
646: <CFIF HW[Key].Ports[PortNo].DevicePath NEQ "">
647: <cfexecute name="/bin/ls" arguments="-l #HW[Key].Ports[PortNo].DevicePath#" timeout="300" variable="tmp" />
648: <CFSET dir=ListToArray(tmp,Chr(10))>

There was a stack trace after this section. I've attached a txt file with the full output. 

 

I'm concerned because the message seems to indicate that there is a disk in position 9 that does not exist. If that is the case, that may explain some problems I've been having. If you could tell me if this message means that my Unraid server thinks it has a HD that is not really there, it might be the key to figuring out what's going on with my server.

 

Thank you!

Kamhighway

DiskSpeed.txt

Edited by kamhighway
Link to comment
  • 2 weeks later...
On 6/7/2018 at 8:36 PM, paradigmevo said:

Here is what my docker displays when started.


Is this repeatable? Can you verify that Privileged is enabled in the Docker settings? If it is repeatable & privileged is enabled, alter this URL to match the IP of your unraid system and open. Create a debug file through this URL and email it to hddb@strangejourney.net

 

http://[ip]:[port]/isolated/CreateDebugInfo.cfm

Link to comment

Sorry for the lack of updates for the past month, real life intervened.

 

Beta 4 has been pushed. Those who couldn't even get past the controller/drive scanning should work now. For those who's SAS controllers aren't detecting, still investigating. But drives attached to them should fall into the "Unknown Controller" section so you can still benchmark them.

 

Fixed drive/controller scanning issue when scanning up the bus tree and not finding a drive
Do not display the "Unknown Controller" if only the UNRAID Flash drive is under it
Do not benchmark the UNRAID Flash drive
Mitigated a race condition where multiple threads might try to write to the same file at the same time during a benchmark
Flag the UNRAID Flash drive as nonsubmittable to the HDDB
Flag drives with invalid characters in their vendor/model/rev as nonsubmittable to the HDDB
Group NVMe drives together after non-NVMe drives

 

  • Like 1
Link to comment

Beta 4a released.

 

This version adds support to display a red box around drives that are currently showing activity. This utilizes the data from /var/local/emhttp/diskload.ini which seems to update every 5 seconds so the page checks it every 5 minutes.

 

The red box doesn't display properly (it's too wide) around drives that have a rotated text layer, I'll address that later.

 

I may opt to utilize the collectl command which can update more frequently and allow support for installation on non-UNRAID boxes but it's apt-get install routine takes in dialog that I'll need to figure a way around.

Edited by jbartlett
Link to comment

I also get this error on 1 of my 2 servers:

 

DiskSpeed - Disk Diagnostics & Reporting tool
Version: Beta 4
 

Scanning Hardware
19:47:47 Spinning up hard drives
19:47:47 Scanning storage controllers
19:49:26 Found Controller SAS3008 PCI-Express Fusion-MPT SAS-3 (8 ports)
19:49:26 Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports)
19:49:26 Found Controller Sunrise Point-H SATA controller [AHCI mode] (8 ports)
19:49:26 Scanning controllers for hard drives

Lucee 5.2.7.62 Error (expression)
Message Element at position [18] does not exist in list
Stacktrace The Error Occurred in
/var/www/ScanControllers.cfm: line 650
648: <CFLOOP index="Key" list="#StructKeyList(HW)#">
649: <CFLOOP index="PortNo" from="1" to="#ArrayLen(HW[Key].Ports)#">
650: <CFIF HW[Key].Ports[PortNo].DevicePath NEQ "">
651: <cfexecute name="/bin/ls" arguments="-l #HW[Key].Ports[PortNo].DevicePath#" timeout="300" variable="tmp" />
652: <CFSET dir=ListToArray(tmp,Chr(10))>
 
called from /var/www/ScanControllers.cfm: line 645
643: </CFIF>
644: </CFIF>
645: </CFLOOP>
646: <!--- <cfdump var=#hw#> --->
647: <!--- Fetch drive attributes --->
 
Java Stacktrace lucee.runtime.exp.ExpressionException: Element at position [18] does not exist in list
  at lucee.runtime.type.wrap.ListAsArray.getE(ListAsArray.java:118)
  at lucee.runtime.type.wrap.ListAsArray.get(ListAsArray.java:284)
  at lucee.runtime.type.wrap.ListAsArray.get(ListAsArray.java:289)
  at lucee.runtime.type.util.ArraySupport.get(ArraySupport.java:326)
  at lucee.runtime.util.VariableUtilImpl.get(VariableUtilImpl.java:263)
  at lucee.runtime.util.VariableUtilImpl.getCollection(VariableUtilImpl.java:257)
  at lucee.runtime.PageContextImpl.getCollection(PageContextImpl.java:1496)
  at scancontrollers_cfm$cf.call_000048(/ScanControllers.cfm:650)
  at scancontrollers_cfm$cf.call(/ScanControllers.cfm:645)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823)
  at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:64)
  at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45)
  at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464)
  at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454)
  at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427)
  at lucee.runtime.engine.Request.exe(Request.java:44)
  at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1091)
  at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1039)
  at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
  at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
  at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:676)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1132)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
  at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:748)
 
Timestamp 6/17/18 7:49:32 PM CEST

 

Both are almost the same hardware: 

- MB: supermicro x11ssh-ctf

- 8 drives through MB LSI SAS3008

 

The working server has (24 array drives )

- a M1015 with 8 drives

- 8 drives on the MB sata ports

 

The non working server has (20 array drives + 2)

- a M1015 and intel expander with 12 drives (4 slots not used, 16 max) (says Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports), should that not be 16 ports??)

- 2,  non array drives, that are connected to the MB sata ports. They are also not mounted.

 

I just mounted those 2, non array drives, but I still get the error.

 

I stopped the docker, start again, and still get the error.

 

Privileged is ON

 

I can't seem to get a debug log, this doesn't work:

http://172.17.0.2:8888/isolated/CreateDebugInfo.cfm

Never mind, this does work: http://192.168.157.78:18888/isolated/CreateDebugInfo.cfm

 

I'll mail you the file...

 

New Bitmap Image.bmp

Edited by Wimpie
typo's
Link to comment
6 hours ago, Wimpie said:

The non working server has (20 array drives + 2)

- a M1015 and intel expander with 12 drives (4 slots not used, 16 max) (says Found Controller SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (8 ports), should that not be 16 ports??)

- 2,  non array drives, that are connected to the MB sata ports. They are also not mounted.

 

It seems quite a few people who are getting errors with the recent version are running expanders. One debug file I looked had I couldn't make much headway with. I'll take a look at yours. The program logic analyzes the file system to identify controllers, ports, and the drives attached to those ports, and the PCI slots they're plugged into for nesting identification. There's no program out there that reliably gives me this information that I've found so I have to study the file system and figure it out manually. In your case, it's likely that there's two sets of 8 ports and my program only was able to understand one of them. I don't have any expanders on hand so I can run tests on it and look at the file system. The debug file gives me a file tree of the devices and text files that contain certain keywords but I may need to expand the files logged.

 

Link to comment
17 minutes ago, jbartlett said:

I don't have any expanders on hand so I can run tests on it and look at the file system. The debug file gives me a file tree of the devices and text files that contain certain keywords but I may need to expand the files logged.

 

When you have a new version with 'extra files logged', I'll send you a new debug log...

 

 

PS: Thanks for this docker, I like it very much.

 

Edited by Wimpie
Link to comment
7 hours ago, Wimpie said:

When you have a new version with 'extra files logged', I'll send you a new debug log...

 

PS: Thanks for this docker, I like it very much.

 

I just pushed beta 4b which modifies the Debug Creation tool to include all files under 100 bytes with read permissions under /sys/devices/pci*

Link to comment
2 hours ago, jbartlett said:

 

I just pushed beta 4b which modifies the Debug Creation tool to include all files under 100 bytes with read permissions under /sys/devices/pci*

 

A new debug file has been created and send to you.

 

Hope you find the problem...

 

Link to comment

I think I'll be reworking how the text div overlay rotates. It's currently a free rotate but the issue is that it maintains the same shape while rotated so that's why the red box is so on rotated text layers - it has the height of the original frame but now also the width of the height of the frame rotated 90 degrees. It also has the issue with not being able to put the text to the top of the drive when rotated because the text layer is not as wide as the height is. I'll ponder changing it so that you can rotate 90 degrees left/right or 180 (up-side-down) and change the height/width of it being rotated 90 degrees so it stays the same height & width as the drive image.

 

I've been studying the /sys tree for PCI buses, controllers, and drives and I'm going to write a test script to see if I can work at it the other way around. Right now, the program tries to identify the controllers and then figure out the ports on those controllers, then the drives attached to those ports. I can get a list of all attached drives from /sys/block and work up the system tree to find what port it's connected to and then which controller - and from there see what that PCI sclot that controller is plugged into or if it's part of another device.

Edited by jbartlett
Link to comment
  • jbartlett changed the title to DiskSpeed, hard drive benchmarking (unRAID 6+), version 2.9.2

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.