DiskSpeed, hdd/ssd benchmarking (unRAID 6+), version 2.10.7


Recommended Posts

On 12/8/2019 at 7:29 AM, sota said:

Curious why the cache disks performed so poorly during simultaneous vs. individual.

Only thing notable is they're in the 2 rear chassis bays instead of in the 12 up front.

My guess is that the read speed of the faster drives were capped by the OS in order to maximize the output of the other slower drives. Gives better overall performance. You're only utilizing 2GB of the stated 7.8GB bandwidth so you may have a bottleneck elsewhere in the PCI chain.

Link to comment

Guess it could be the expander in the drive cage; H240 is a 2x4 12Gb/s lane controller, but the cage is 12+2.

Trying to dig up some info on the cage/expander and see if there's a possible bottleneck there.

Not that it's a big screaming deal; throughput is well above my normal needs as it is

Edited by sota
Link to comment
  • 2 months later...

New user, installed on two similar systems, only difference is number of drives.

Default ports does not work for host, need to change port number from 18888 to 8888, else keep getting connection refused.

Fist server no problem with running tests.

Second server crash in what appears to be a timeout waiting for the 20 spinning drives to spin up:

```

DiskSpeed - Disk Diagnostics & Reporting tool
Version: 2.4
 

Scanning Hardware
08:25:25 Spinning up hard drives
08:25:25 Scanning system storage

Lucee 5.2.9.31 Error (application)

Messagetimeout [90000 ms] expired while executing [/usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi]

StacktraceThe Error Occurred in
/var/www/ScanControllers.cfm: line 243

241: <CFOUTPUT>#TS()# Scanning system storage<br></CFOUTPUT><CFFLUSH>
242: <CFFILE action="write" file="#PersistDir#/hwinfo_storage_exec.txt" output=" /usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi" addnewline="NO" mode="666">
243: <cfexecute name="/usr/sbin/hwinfo" arguments="--pci --bridge --storage-ctrl --disk --ide --scsi" variable="storage" timeout="90" /><!--- --usb-ctrl --usb --hub --->
244: <CFFILE action="delete" file="#PersistDir#/hwinfo_storage_exec.txt">
245: <CFFILE action="write" file="#PersistDir#/hwinfo_storage.txt" output="#storage#" addnewline="NO" mode="666">
 

called from /var/www/ScanControllers.cfm: line 242

240:
241: <CFOUTPUT>#TS()# Scanning system storage<br></CFOUTPUT><CFFLUSH>
242: <CFFILE action="write" file="#PersistDir#/hwinfo_storage_exec.txt" output=" /usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi" addnewline="NO" mode="666">
243: <cfexecute name="/usr/sbin/hwinfo" arguments="--pci --bridge --storage-ctrl --disk --ide --scsi" variable="storage" timeout="90" /><!--- --usb-ctrl --usb --hub --->
244: <CFFILE action="delete" file="#PersistDir#/hwinfo_storage_exec.txt">
 

Java Stacktracelucee.runtime.exp.ApplicationException: timeout [90000 ms] expired while executing [/usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi]
  at lucee.runtime.tag.Execute._execute(Execute.java:241)
  at lucee.runtime.tag.Execute.doEndTag(Execute.java:252)
  at scancontrollers_cfm$cf.call_000006(/ScanControllers.cfm:243)
  at scancontrollers_cfm$cf.call(/ScanControllers.cfm:242)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933)
  at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823)
  at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:66)
  at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45)
  at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464)
  at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454)
  at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427)
  at lucee.runtime.engine.Request.exe(Request.java:44)
  at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1090)
  at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1038)
  at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
  at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80)
  at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
  at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
  at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:748)
 

Timestamp2/20/20 8:29:32 AM PST

```

Link to comment

I have found it necessary useful to first "spin up all drives" for anything that needs to check every drive before doing whatever (like diskpeed does). Even reboots are faster if you just spin them all up first. It avoids both possible time outs or errors waiting for drives to respond as well as contention between reading and spinning up multiple drives.

ps. 15 drive array. may not be as much of an issue with fewer drives?

Edited by interwebtech
Link to comment

I tried to implement's unraid's version of spinning up a drive but had issues with it but that was a bit ago, should try again. The way it works now is to issue a dd command to read random sectors on the drive but I've noticed that even those that seems to execute without issue, drives aren't always spun up.

Link to comment

Great plugin, really is very cool.  I installed it to see if I could have a performance indicator to decide which drive to pull and replace with a 12TB Iron Wolf.  Unfortunately I couldn't determine which drive to pull from the 4TBs since they all look pretty consistent.

 

Which leads me to a feature request: would it be possible to put a roll up of the smart data for the drives on the view?  I think if on the main view it could show things like drive age, errors, and indicators, combined with the performance it could be a "replace this drive" decision making plugin.

 

drives.thumb.JPG.723f6623e6ce076c91d9378567b1e340.JPG

 

Still not sure which drive to replace with the new iron wolf, although the internet always says replace seagate 😁😂

 

Quote

I have no problem using the spin up or spin down buttons, so whatever unraid is doing does work.

I have loads of problems with spinning up the drives, it rarely works on the Areca SAS controller for example.

Edited by spamalam
Link to comment
On 2/25/2020 at 4:02 AM, spamalam said:

Which leads me to a feature request: would it be possible to put a roll up of the smart data for the drives on the view?  I think if on the main view it could show things like drive age, errors, and indicators, combined with the performance it could be a "replace this drive" decision making plugin.

That is an excellent suggestion.

 

I've been getting some time to work on things again (who knew that having your fiancee move in with you would also mean a lot less evening time available? 😉 ) so I've been working to move the backend website & database to a cloud server and off of my local server. Then I'll be working on adding drive images for drives released in the past year and adding features/fixing bugs to the Diskspeed app & the Hard Drive database app.

Link to comment
22 minutes ago, gfjardim said:

Hi @jbartlett, just now I've tested your app, and it's magnificent! Well done, pal!

 

One thing I've observed is that apparently disk reads are being cached. There are options in dd that prevent this behavior, did you use them?

Yes. All calls to dd use the iflag=direct parameter.

 

Can you show the evidence that leads you to think that things are being cached?

Link to comment
1 hour ago, jbartlett said:

Yes. All calls to dd use the iflag=direct parameter.

 

Can you show the evidence that leads you to think that things are being cached?

Did not took a screenshot, but the controller benchmark showed greater speed values when all disks were read together, probably because they were already read one at a time. When I dropped the cache, it returned to normal behavior.

 

I had better luck setting "iflag=nocache" parameter rather then the iflag=direct before. I think it worth trying.

Link to comment

image.png.3c97f34f8130488e55ff603936770d79.png

 

I have a weird one.

Drive ID: sdk
Vendor: SEAGATE  
Model: DKS2E
Serial Number: Z1Z6Y0KM0000R5191JBJ
 

Revision: 0
Capacity: 4TB
Logical/Physical Sector Size: 512/512

 

Two different runs, both times it "glitched".

Disk wasn't even assigned to anything or partitioned at the time.

Link to comment
3 hours ago, gfjardim said:

Did not took a screenshot, but the controller benchmark showed greater speed values when all disks were read together, probably because they were already read one at a time. When I dropped the cache, it returned to normal behavior.

 

I had better luck setting "iflag=nocache" parameter rather then the iflag=direct before. I think it worth trying.

While I do not see any utilization of the cach on my system when I use iflag=direct, the data is still being cached. I also verified that executing

"dd if=/dev/[device] iflag=nocache count=0" also dropped the cache for the drive. I'll add that command prior to any dd read.
 

Edited by jbartlett
Link to comment
5 hours ago, jbartlett said:

While I do not see any utilization of the cach on my system when I use iflag=direct, the data is still being cached. I also verified that executing

"dd if=/dev/[device] iflag=nocache count=0" also dropped the cache for the drive. I'll add that command prior to any dd read.
 

 

Direct bypasses both data cache and I/O queueing; nocache bypasses only the data cache. The performance can be dramatically different depending the amount of I/O the device supports and how it manages it's I/O requests without the kernel I/O queue:

 

No caches, I/O queue, 512 bytes read block:

root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches

root@Servidor:~# dd if=/dev/sdh of=/dev/null count=1048576 iflag=nocache

1048576+0 records in

1048576+0 records out

536870912 bytes (537 MB, 512 MiB) copied, 2.14517 s, 250 MB/s

 

No caches, I/O queue, 1M read block:

root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches

root@Servidor:~# dd if=/dev/sdh of=/dev/null bs=1M count=512 iflag=nocache

512+0 records in

512+0 records out

536870912 bytes (537 MB, 512 MiB) copied, 1.98191 s, 271 MB/s

 

No caches, no I/O queue, 512 bytes read block:

root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches

root@Servidor:~# dd if=/dev/sdh of=/dev/null count=1048576 iflag=direct

1048576+0 records in

1048576+0 records out

536870912 bytes (537 MB, 512 MiB) copied, 217.961 s, 2.5 MB/s

 

No caches, no I/O queue, 1M read block:

root@Servidor:~# echo 3 > /proc/sys/vm/drop_caches

root@Servidor:~# dd if=/dev/sdh of=/dev/null bs=1M count=512 iflag=direct

512+0 records in

512+0 records out

536870912 bytes (537 MB, 512 MiB) copied, 2.12653 s, 252 MB/s

 

 

 

Link to comment
6 hours ago, jbartlett said:

While I do not see any utilization of the cach on my system when I use iflag=direct, the data is still being cached. I also verified that executing

"dd if=/dev/[device] iflag=nocache count=0" also dropped the cache for the drive. I'll add that command prior to any dd read.
 

Maybe hard disks internal cache?

Link to comment
7 hours ago, gfjardim said:

echo 3 > /proc/sys/vm/drop_caches

Interesting. I'll add this to the cache popping logic.

 

7 hours ago, gfjardim said:

Maybe hard disks internal cache?

Cache sizes are in MB which should be exhausted in just a couple seconds of read time and each location on the drive is read for 15 seconds.

Link to comment

Version 2.5 pushed. Due to internal server logs being stored in the container, I recommend to update to reduce Docker container bloat.

  • Modified new drive detection logic to submit to the top frame for rescanning drive hardware
  • Added Vendor cleanup to match case of known vendors. "SEAGATE" becomes "Seagate" for example.
  • Added cache popping logic for systems that don't respect the cache bypass command on dd
  • Purge internal tomcat & lucee logs which could cause Docker bloat
  • If an incomplete drive scan leaves a partial hardware profile due to a timeout from a spun down drive, rescan the hardware
  • Display drive ID on benchmark random/sequential seek & latency tests
  • Added informational message that is displayed if a Speed Gap was detected

 

  • Thanks 1
Link to comment

Upgraded to v2.5 this morning and thought I'd re-run the benchmarks. 

Was receiving 'speed gap' notification, so I aborted, disabled all dockers and re-ran. Still got the 'speed gap' notification, so I aborted, and re-ran with the 'disable speed gap detection' option checked.

The final results page looks vastly different to the last time I ran it, however, going back to the main page, the curves look just as before (almost as if the main page didn't update to the latest results - even after I Ctrl+R refreshed the page).

Please see attached. 

So: is it something I should be worried about? I haven't received any notification from Unraid that anything is wrong with SMART, but should I do something to confirm that my disks (or controller?) are ok? 

Thanks for your help.

 

EDIT: I just ran the controller benchmark, which showed erratic results, and as per the recommendation I restarted the Docker and ran it again: same erratic results (although I think the disks with low speeds were different this time). Attached is a screenshot. 

DiskSpeed_main_20200304.png

DiskSpeed_results_20200304.png

DiskSpeed_controller_20200304.png

Edited by jademonkee
added controller benchmark
Link to comment
  • jbartlett changed the title to DiskSpeed, hdd/ssd benchmarking (unRAID 6+), version 2.10.7

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.