DiskSpeed, hard drive benchmarking (unRAID 6+), version 2.9.2


Recommended Posts

11 hours ago, MMW said:

On its way

 

 

Thank you. Your spinners have the expected "port_no" files for reference, I suspect that the app tried to identify the port numbers for your multiple USB drives. I added logic to exclude USB controllers when sorting the ports.

 

Beta 5a

  • Do not attempt to identify & sort by Port No for USB Controllers
  • Do not list or benchmark USB drives when benchmarking all drives
Link to comment
3 minutes ago, MMW said:

Hi John,

 

Ran the update but I still get the same error message.

 

Any suggestions?

 

It worked before the last beta update.

 

The whole drive & controller detection logic was re-written from scratch so there's new gotchas involved. Previously, the program identified the controllers and then the ports attached to them, and lastly the drives attached to the ports. This proved to be troublesome with the server grade setups with multiple controllers on one, back planes, etc. that were really hard to get working since I didn't have access to such hardware to figure out how they're represented in the /sys/devices tree.

 

Now it detects the drives working the other way around. It finds the drives, then the controllers it's attached to, and then the drive's port.

 

I made some modifications to the port port detection & assignment and a whole metric crapton of logging tracing through the entire process. Please update to Beta 5b and rerun the scan. If you get another error, please recreate the debug file and email it.

Link to comment

Thanks for you work on this. More and better diagnostics is just what unRAID needs.

 

I just ran diskspeed and got the following result.  I have three WD30EZRX drives. Two of them (disk2 and disk4) behave normally and virtually identically.  But disk6 is weird.  What could cause the massive dip at 2.5 TB?  Have you seen this behavior before?  Should I replace it?

chart.jpeg

Link to comment
59 minutes ago, CaptainTivo said:

What could cause the massive dip at 2.5 TB?  Have you seen this behavior before?  Should I replace it?

See if the result is repeatable. If something was attempting to access the drive at the same exact time the benchmark tried to run it could cause that kind of result.

Link to comment
10 hours ago, jonathanm said:

See if the result is repeatable. If something was attempting to access the drive at the same exact time the benchmark tried to run it could cause that kind of result.

Unfortunately, it seems very repeatable.  Also, I had to disable the Speedgap detection to get it to finish at all.

Checked the SMART values and these stand out:

 

9    Power on hours    0x0032    051    051    000    Old age    Always    Never    35911 (4y, 1m, 4d, 7h)
197    Current pending sector    0x0032    200    200    000    Old age    Always    Never    2

 

Drive was put into service May 2014.  Could the speed drop be explained by the current pending sector count?

chart (2).jpeg

Link to comment
6 hours ago, CaptainTivo said:

Could the speed drop be explained by the current pending sector count?

Not always directly correlated, but since the drive is showing current pending, and you are seeing the drastic drop in performance, I'd say it's time to retire the drive from array duty.

 

Both together gives me a very bad feeling about the drive. It's not a very good feeling to have a known bad drive, and suddenly a drive you had no warning goes bad and you have to cross your fingers and hope the iffy drive survives long enough to recreate the failed one. I would rather proactively replace than limp along and hope.

Link to comment

The SpeedGap logic detects drive activity by comparing the min & max read speeds. If the difference exceeded the threshold, it increases the threshold a little bit and then retests it. If you were getting wildly varying results, that would explain why you had to disable it to even finish the drive. It looks like there's a LOT of coverage that's giving varying read issues.

 

Kick off a benchmark from the main screen and select the drive in question and another drive on the same controller but do not disable the SpeedGap logic. When the benchmark starts, you'll see the following under the graphs: "Click on a drive label to hide or show it." - the period at the end is a hidden mouse click trigger to unhide the iframes that are running the tests. It'll let you see what the SpeedGap detection is finding.

Link to comment

My next project with DiskSpeed will be handy here. Do a full read of the entire drive and graph out on a heat map the amount of data each section reads in 1 second.

 

For example, I start a balls-to-the-wall read of the drive using dd with progress tracking being logged. It reports how many bytes it read each second with the read speed. The number of sectors those blocks will hold the read speed for that second. Then repeat for the next second - and so on. I'll likely do an "overhead" scan where it reads a couple seconds worth over with big gaps in between, like every GB or 100GB, then go through and fill it in.

 

Also trying to programmatically determine how many platters & read heads a drive has. That one has been very elusive. I've tried about five different methods to try to get timings, got one more left to try before calling it quits and using crowd sourcing for populating. Last method to try would be to read the first two sectors of the drive, then move forward one sector and read two, then moving it back one more sector, and continuing until I've stepped over 20 sectors reading two sectors at each spot. Each of these two sector read will be repeated 200 times and the read speed averaged. The thought is that when the sectors are on the same track, they'll give nearly identical speeds but when the sectors span a track requiring the drive head to move, there will be a slight bump. A single platter drive may reveal a spike when reading sectors 1 & 2 (first sector = 0), a dual platter drive with four heads would show a spike at sectors 3 & 4 where a dual platter with 3 heads would show a spike on 2 & 3. *crosses fingers*

 

Hard drives reads all heads on a given track in sequence so if you read 4k of data with 512 byte sectors, it'll read track 0, heads 0-3 in sequence to give you the first 2k of data, then move the drive's head to track 1 and read heads 0-3 again for the next 2k of data.

Link to comment
7 hours ago, jonathanm said:

Not always directly correlated, but since the drive is showing current pending, and you are seeing the drastic drop in performance, I'd say it's time to retire the drive from array duty.

 

Both together gives me a very bad feeling about the drive. It's not a very good feeling to have a known bad drive, and suddenly a drive you had no warning goes bad and you have to cross your fingers and hope the iffy drive survives long enough to recreate the failed one. I would rather proactively replace than limp along and hope.

 

Hmm.  Here is what I found about "Current Pending Sector Count" (damn, I just hear that in Siri's voice :-)

"Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors)."

If the reads of these sector have unrecoverable errors, haven't I lost data already?

Or is it that the read error trigger a seek to a different track and that would explain the sudden drop in read data rate?

 

Anyway, I decided to remove the drive.  I just installed a new 8TB drive shucked from a WD My Book so I have lots of extra space.  I just wish there were a less nerve-racking way to move the data from disk to disk.  I use rsync but I am never completely convinced that every single bit is copied.

 

7 hours ago, jbartlett said:

The SpeedGap logic detects drive activity by comparing the min & max read speeds. If the difference exceeded the threshold, it increases the threshold a little bit and then retests it. If you were getting wildly varying results, that would explain why you had to disable it to even finish the drive. It looks like there's a LOT of coverage that's giving varying read issues.

 

Kick off a benchmark from the main screen and select the drive in question and another drive on the same controller but do not disable the SpeedGap logic. When the benchmark starts, you'll see the following under the graphs: "Click on a drive label to hide or show it." - the period at the end is a hidden mouse click trigger to unhide the iframes that are running the tests. It'll let you see what the SpeedGap detection is finding.

 

I did that an here is what it spit out.  BTW, I kept looking for a way to abort the test and finally started clicking around on the scroll bars of this tiny little window:

image.png.8414487f55bf7f716842233620854d30.png

 

And it finally showed a sliver of the abort button.  Maybe you could enlarge it a bit so that the Abort button can be seen?

 

 

Here is the test output:

06:45:20 Spot: [100] ScanLoc: [2288211] SizeLoc: [3 TB] SkipLines: [1] Avg: [75.86 MB] AvgMin: [72.09 MB] AvgMax: [78.64 MB] 
06:45:39 Spinning up sde (3TB)
06:45:39 Performing random seek tests
06:45:49 Performing sequential seek tests
06:46:01 Performing drive latency tests
Random Seek: 568
SequentialSeek: 5869
DriveLatency: 1145113
06:46:11 Spot: [0] ScanLoc: [0] SizeLoc: [0 GB] Avg: [151.24 MB] AvgMin: [141.56 MB] AvgMax: [159.91 MB] 
06:46:28 Spot: [10] ScanLoc: [228927] SizeLoc: [300 GB] Avg: [147.10 MB] AvgMin: [141.56 MB] AvgMax: [152.04 MB] 
06:46:45 Spot: [20] ScanLoc: [457854] SizeLoc: [600 GB] Avg: [142.57 MB] AvgMin: [136.31 MB] AvgMax: [149.42 MB] 
06:47:02 Spot: [30] ScanLoc: [686781] SizeLoc: [900 GB] Avg: [139.44 MB] AvgMin: [135.00 MB] AvgMax: [145.49 MB] 
06:47:19 Spot: [40] ScanLoc: [915708] SizeLoc: [1.2 TB] Avg: [129.76 MB] AvgMin: [120.59 MB] AvgMax: [138.94 MB] 
06:47:36 Spot: [50] ScanLoc: [1144635] SizeLoc: [1.5 TB] Speed Gap of 117.96 MB (max allowed is 45 MB), retrying
06:47:53 Spot: [50] ScanLoc: [1144635] SizeLoc: [1.5 TB] Speed Gap of 61.60 MB (max allowed is 50 MB), retrying
06:48:10 Spot: [50] ScanLoc: [1144635] SizeLoc: [1.5 TB] Speed Gap of 95.68 MB (max allowed is 55 MB), retrying
06:48:27 Spot: [50] ScanLoc: [1144635] SizeLoc: [1.5 TB] Speed Gap of 79.95 MB (max allowed is 60 MB), retrying
06:48:44 Spot: [50] ScanLoc: [1144635] SizeLoc: [1.5 TB] Avg: [116.25 MB] AvgMin: [69.47 MB] AvgMax: [131.07 MB] 
06:49:01 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 116.65 MB (max allowed is 45 MB), retrying
06:49:18 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 117.96 MB (max allowed is 50 MB), retrying
06:49:35 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 117.96 MB (max allowed is 55 MB), retrying
06:49:52 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 112.72 MB (max allowed is 60 MB), retrying
06:50:09 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 117.96 MB (max allowed is 65 MB), retrying
06:50:26 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 106.17 MB (max allowed is 70 MB), retrying
06:50:43 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 110.10 MB (max allowed is 75 MB), retrying
06:51:00 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Speed Gap of 82.58 MB (max allowed is 80 MB), retrying
06:51:17 Spot: [60] ScanLoc: [1373562] SizeLoc: [1.8 TB] Avg: [101.33 MB] AvgMin: [52.43 MB] AvgMax: [123.21 MB] 
06:51:34 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 98.30 MB (max allowed is 45 MB), retrying
06:51:51 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 108.79 MB (max allowed is 50 MB), retrying
06:52:08 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 110.10 MB (max allowed is 55 MB), retrying
06:52:25 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 110.10 MB (max allowed is 60 MB), retrying
06:52:42 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 108.79 MB (max allowed is 65 MB), retrying
06:52:59 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 110.10 MB (max allowed is 70 MB), retrying
06:53:16 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 102.24 MB (max allowed is 75 MB), retrying
06:53:33 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 106.17 MB (max allowed is 80 MB), retrying
06:53:50 Spot: [70] ScanLoc: [1602489] SizeLoc: [2.1 TB] Speed Gap of 107.48 MB (max allowed is 85 MB), retrying
06:54:07 Kill flag found

 

Link to comment
  • 4 weeks later...
On 9/21/2018 at 3:38 AM, jbartlett said:

It's been a month or so and I haven't heard of anyone mentioning that this utility isn't detecting their controller and/or drives. Is anyone still having issues in that regard?

I get this error on Unraid 6.6 on my Threadripper setup:

 

Lucee 5.2.8.50 Error (application)
Message Error invoking external process
Detail lsusb: gconv.c:74: __gconv: Assertion `outbuf != NULL && *outbuf != NULL' failed.
Stacktrace The Error Occurred in
/var/www/CustomTags.cfm: line 555 
553: 
554: <!--- Get LSUSB info --->
555: <cfexecute name="/usr/bin/lsusb" arguments="-v -s #Arguments.Bus#:#PortInfo.DevNum#" variable="LSUSB" timeout="30" />
556: <!--- <cfoutput>lsusb -v -s #Arguments.Bus#:#PortInfo.DevNum#<pre>#lsusb#</pre><hr></cfoutput> --->
557: <CFLOOP index="Line" list="#LSUSB#" delimiters="#Chr(10)#">
 
called from /var/www/CustomTags.cfm: line 579 
577: <CFLOOP index="DeviceIdx" from="1" to="#Devices.RecordCount#">
578: <CFIF Devices.Name[DeviceIdx] EQ "device">
579: <CFSET PortInfo.Ports[PortNo]=ParseUSB(Devices.Link[DeviceIdx],Arguments.Bus)>
580: </CFIF>
581: <CFIF Devices.Name[DeviceIdx] EQ "peer">
 
called from /var/www/ScanControllers.cfm: line 372 
370: <CFLOOP index="CurrLine" list="#USBList#" delimiters="#Chr(10)#">
371: <CFSET i=i+1>
372: <CFSET USBTree=ParseUSB(CurrLine)>
373: </CFLOOP>
374: 
 
called from /var/www/ScanControllers.cfm: line 328 
326: <CFSET OK=1>
327: </CFIF>
328: </CFLOOP>
329: <!--- Set top level ChildDrives --->
330: <CFLOOP index="i" from="1" to="#ArrayLen(HWTree)#">
 
Java Stacktrace lucee.runtime.exp.ApplicationException: Error invoking external process
  at lucee.runtime.tag.Execute.doEndTag(Execute.java:258)
  at customtags_cfm$cf.udfCall2(/CustomTags.cfm:555)
  at customtags_cfm$cf.udfCall(/CustomTags.cfm)
  at lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:107)
  at lucee.runtime.type.UDFImpl._call(UDFImpl.java:357)
  at lucee.runtime.type.UDFImpl.call(UDFImpl.java:226)
  at lucee.runtime.type.scope.UndefinedImpl.call(UndefinedImpl.java:803)
  at lucee.runtime.util.VariableUtilImpl.callFunctionWithoutNamedValues(VariableUtilImpl.java:756)
  at lucee.runtime.PageContextImpl.getFunction(PageContextImpl.java:1716)
  at customtags_cfm$cf.udfCall2(/CustomTags.cfm:579)
  at customtags_cfm$cf.udfCall(/CustomTags.cfm)
  at lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:107)
  at lucee.runtime.type.UDFImpl._call(UDFImpl.java:357)
  at lucee.runtime.type.UDFImpl.call(UDFImpl.java:226)
  at lucee.runtime.type.scope.UndefinedImpl.call(UndefinedImpl.java:803)
  at lucee.runtime.util.VariableUtilImpl.callFunctionWithoutNamedValues(VariableUtilImpl.java:756)
  at lucee.runtime.PageContextImpl.getFunction(PageContextImpl.java:1716)
  at scancontrollers_cfm$cf.call_000022(/ScanControllers.cfm:372)
  at scancontrollers_cfm$cf.call(/ScanControllers.cfm:328)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:931)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:821)
  at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:64)
  at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45)
  at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2462)
  at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2452)
  at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2425)
  at lucee.runtime.engine.Request.exe(Request.java:44)
  at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1091)
  at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1039)
  at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
  at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
  at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
  at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:748)
  Timestamp 9/22/18 9:28:44 AM CST
Link to comment
13 hours ago, Dazog said:

lsusb: gconv.c:74: __gconv: Assertion `outbuf != NULL && *outbuf != NULL' failed.

This error is happening in the LSUSB command.

 

Can you enter Terminal mode in the docker (click on the DiskSpeed Docker icon & select "Console") and type in "lsusb -v" and see if it errors? If it does, try it outside of the docker app.

 

Do you have any audio USB devices connected or anything other than keyboard/mouse/drives?

 

(no need for the long java stack trace)

Link to comment
58 minutes ago, jbartlett said:

This error is happening in the LSUSB command.

 

Can you enter Terminal mode in the docker (click on the DiskSpeed Docker icon & select "Console") and type in "lsusb -v" and see if it errors? If it does, try it outside of the docker app.

 

Do you have any audio USB devices connected or anything other than keyboard/mouse/drives?

 

(no need for the long java stack trace)

Here is the output of lsusb -v

 

# lsusb -v
lsusb: invalid option -- '�'
lsusb: invalid option -- '�'
lsusb: invalid option -- '�'

 

No usb audio devices connected.

Link to comment
37 minutes ago, Dazog said:

Here is the output of lsusb -v

 

# lsusb -v
lsusb: invalid option -- '�'

Are you on DiskSpeed version 5c? If not, upgrade and try again. If you are on the most recent or you updated and still get the error, please do the following:

 

Click on the DiskSpeed Docker icon and select "Console".

Enter the following 3 commands:

apt-get update

apt-get -y upgrade

/usr/sbin/update-usbids

 

Close out the console & stop/start the Docker and try again. If it continues to error, I'll add logic to trap this error but it'll either prevent displaying the USB Bus tree or prevent some of it from displaying.

Edited by jbartlett
Link to comment
23 minutes ago, jbartlett said:

Are you on DiskSpeed version 5c? If not, upgrade and try again. If you are on the most recent or you updated and still get the error, please do the following:

 

Click on the DiskSpeed Docker icon and select "Console".

Enter the following 3 commands:

apt-get update

apt-get -y upgrade

/usr/sbin/update-usbids

 

Close out the console & stop/start the Docker and try again. If it continues to error, I'll add logic to trap this error but it'll either prevent displaying the USB Bus tree or prevent some of it from displaying.

Tried what you suggested and still the same error.

 

Yes I am running current version 5c.

Link to comment

Taking drive images requests? I have 2x NVMe SSD 970 EVO drives (my cache pool) that get a generic mechanical disk image.

Drive ID: nvme1n1 (Cache 2) 
Vendor: Samsung  
Model: SSD 970 EVO
Serial Number: S467NF0K6------
 
Revision: 1B2QEXE7
Capacity: 1TB
RPM: Solid State Device
Logical/Physical Sector Size: 512/512

 

I have attached a generic Amazon pic of the device. There is a Pro version as well.
 

20-147-691-V01.jpg

Link to comment
  • jbartlett changed the title to DiskSpeed, hard drive benchmarking (unRAID 6+), version 2.9.2

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.