wgstarks Posted August 18, 2019 Share Posted August 18, 2019 13 minutes ago, BRiT said: Replace your slowest drives? I'm sure at some point I'll be upgrading all the 4TB drives to 8TB, but at $250US+ each, that probably will only happen as the old drives fail. Was hoping to be able to improve parity check speeds by reconfiguring the system since it seems that my speeds are abnormally low. Quote Link to comment
jbartlett Posted August 18, 2019 Author Share Posted August 18, 2019 (edited) 1 hour ago, wgstarks said: Sorry. Realized I had run the wrong benchmark tests. Controller benchmarking shows a variation of 3.3% on the onboard controller (Your controller is not bottlenecking) and a variation of 0.0% on the dell H310 controller (Your controller is not bottlenecking). Not sure what the next step would be to resolve the speed issues? Can you share the graphs of the controller benchmarks? Drive 9 was found to have a steady read speed over a large portion of the drive when it should be declining for a spinner, this indicates that the drive can send data faster than the controller it's attached to can handle. You also have a drive that has a wave to it. Can you tell me what make/model drive that is? Waves can be a sign of degraded areas. At a minimum, perform a benchmark on it every month or less to see if it holds steady - if it does, it may be how that drive was designed to operate. I'm working on updating the Hard Drive Database web site so it displays the latest benchmark from everyone who has the same drive (instead of averaging all tests which can cause big spikes) to see if that's just how that drive operates. Edited August 18, 2019 by jbartlett 1 Quote Link to comment
BRiT Posted August 18, 2019 Share Posted August 18, 2019 1 hour ago, wgstarks said: I'm sure at some point I'll be upgrading all the 4TB drives to 8TB, but at $250US+ each, that probably will only happen as the old drives fail. Was hoping to be able to improve parity check speeds by reconfiguring the system since it seems that my speeds are abnormally low. If it isn't an issue, any chance you can switch PCIExpress ports for the controller? Just to rule out any inaccuracies in the motherboard manual on which port is wired to which? Quote Link to comment
jbartlett Posted August 18, 2019 Author Share Posted August 18, 2019 @wgstarks - can you also include a screen shot of your System Bus Tree? Quote Link to comment
wgstarks Posted August 18, 2019 Share Posted August 18, 2019 2 hours ago, jbartlett said: Can you share the graphs of the controller benchmarks? 2 hours ago, jbartlett said: You also have a drive that has a wave to it. Can you tell me what make/model drive that is? I think you mean this one- I also uploaded the drive benchmarks to your db. Quote Link to comment
wgstarks Posted August 18, 2019 Share Posted August 18, 2019 (edited) This one also has a wave form. Looks like the same model. Edit: Not the same but similar. Edited August 18, 2019 by wgstarks Quote Link to comment
wgstarks Posted August 18, 2019 Share Posted August 18, 2019 2 hours ago, jbartlett said: @wgstarks - can you also include a screen shot of your System Bus Tree? Quote Link to comment
wgstarks Posted August 18, 2019 Share Posted August 18, 2019 2 hours ago, BRiT said: If it isn't an issue, any chance you can switch PCIExpress ports for the controller? Just to rule out any inaccuracies in the motherboard manual on which port is wired to which? IF I understand the controller benchmark report correctly it's showing that the controller is installed in an x8 slot right? I could switch slots later this week if there would be some necessity for it. Quote Link to comment
BRiT Posted August 18, 2019 Share Posted August 18, 2019 It does look like it's in an 8x Slot. Well your system does seem a bit of a mystery as to why your Parity Check speeds are limited, unless it's truly limited by the speed of the smaller capacity drives. So the thought was no harm trying to switch to the other slot and see if it impact benchmarks at all. If it's put into a lesser capable slot, then the impacts should be shown immediately, but there's a slight chance it might be put into a more capable slot or at least one that might behave differently. It's more of a this doesn't make much sense, so try something idea. Quote Link to comment
wgstarks Posted August 18, 2019 Share Posted August 18, 2019 14 minutes ago, BRiT said: It does look like it's in an 8x Slot. Well your system does seem a bit of a mystery as to why your Parity Check speeds are limited, unless it's truly limited by the speed of the smaller capacity drives. So the thought was no harm trying to switch to the other slot and see if it impact benchmarks at all. If it's put into a lesser capable slot, then the impacts should be shown immediately, but there's a slight chance it might be put into a more capable slot or at least one that might behave differently. It's more of a this doesn't make much sense, so try something idea. Can't do any harm I guess. I'll give it a shot later this week when I can take the server offline. Quote Link to comment
DanielCoffey Posted August 20, 2019 Share Posted August 20, 2019 Hmm... do I need to be concerned about Disk 1 in this array? All drives are 8Tb WD Reds of the same age... It has a consistent slow spot at the start of its platters and more wobbles than the rest of the disks. In addition when I benchmarked the controller, the first run through showed a VERY low result for it (so much so that DiskSpeed thought the controller was saturated). I redid the Benchmark and the result is consistent between tests now. I have also attached the Quick SMART for that drive. WDC_WD80EFZX-68UW8N0_VK1DZHAY-20190820-1724.txt Quote Link to comment
dalben Posted August 20, 2019 Share Posted August 20, 2019 I've got some weird results that I'm hoping to get some explanation too. Here's the controller benchmark followed by the disk benchmark. I've colour coded the matching disk models in the benchmark test. I'm surprised the same model drive on the same controller can show such wild results. Quote Link to comment
jbartlett Posted August 21, 2019 Author Share Posted August 21, 2019 13 hours ago, DanielCoffey said: Hmm... do I need to be concerned about Disk 1 in this array? Perform a benchmark on it every week or so to see if it returns an identical test. If it does return an identical test over the span of a month, it may just be how that drive is. Quote Link to comment
jbartlett Posted August 21, 2019 Author Share Posted August 21, 2019 8 hours ago, dalben said: I've got some weird results that I'm hoping to get some explanation too. It's weird that you're seeing such a variance. It's the same command given to each drive - read balls-to-the-wall starting from the start of the drive and 15 seconds later a kill command is sent - regardless if it's running on it's own or all at once. Did you click off of it or otherwise leave the benchmarking page at any point prior? I added code to stop reads if such an abort takes place. If you have the "Dynamix System Statistics" plugin installed, check to see if there's constant drive activity such as one of those tasks are running wild. If stopping & starting the DiskSpeed docker clears it up, then I have more work to do on that front. Quote Link to comment
dalben Posted August 21, 2019 Share Posted August 21, 2019 OK, let me do another test in a more controlled manner (making sure I don't surf out to another page etc.) and I'll fire up system stats in another window. I'll send it through once done. Thanks Quote Link to comment
DanielCoffey Posted August 21, 2019 Share Posted August 21, 2019 1 hour ago, jbartlett said: Perform a benchmark on it every week or so to see if it returns an identical test. If it does return an identical test over the span of a month, it may just be how that drive is. The problem is that the server locks up on about every four shutdowns/sleeps and drops Parity 1 and Disk 1 every time. I was looking for anomalies and spotted that Disk 1 stood out under the DiskSpeed tests. I have my own thread started about the dropouts and have added the DiskSpeed results to it. Quote Link to comment
Harro Posted August 21, 2019 Share Posted August 21, 2019 (edited) 2 hours ago, DanielCoffey said: The problem is that the server locks up on about every four shutdowns/sleeps and drops Parity 1 and Disk 1 every time. I was looking for anomalies and spotted that Disk 1 stood out under the DiskSpeed tests. I have my own thread started about the dropouts and have added the DiskSpeed results to it. Have you run multiple tests on that drive with the same results? If so and comparable test, I would replace it. On a side note, I replaced my two HP220 controllers with a new LSI 16i and have jumped 38MB/s in mt parity check. Decreasing my time by 5.5 hours. 23 hrs old 17.5 hrs new. I am hoping to shave off another couple of hours once all my 8TB drives are replaced with 10's. Thanks to @jbartlett for this diskspeed test, it gave me a graphic view on where my slow downs were occurring and on what controllers. Edited August 21, 2019 by Harro 1 Quote Link to comment
DanielCoffey Posted August 21, 2019 Share Posted August 21, 2019 I have run multiple tests, yes. The really low result was a one-off and the c.162Mb/s was its regular result which coincides with the highest speed I can get on the beginning of a parity check. I don't think my 8 drives are even close to saturating the 9201-8i I have as it allows 4Gb/s transfer and the eight drives are less than half of that. I will be pulling that drive today anyway once the rebuild completes and I can write to the array again. Quote Link to comment
Harro Posted August 21, 2019 Share Posted August 21, 2019 Have you tried to mount that disk(1) on the onboard controller? Something just looks off with a 20MB/s difference compared to your other drive tests. Quote Link to comment
DanielCoffey Posted August 21, 2019 Share Posted August 21, 2019 I agree the speed is odd but it is the one disk that seems to trigger the hard lock on power down or sleep. It is outside the array now and undergoing a full SMART test. My other Unassigned 8Tb is back in the array undergoing a rebuild. Quote Link to comment
dalben Posted August 23, 2019 Share Posted August 23, 2019 On 8/21/2019 at 3:42 PM, jbartlett said: It's weird that you're seeing such a variance. It's the same command given to each drive - read balls-to-the-wall starting from the start of the drive and 15 seconds later a kill command is sent - regardless if it's running on it's own or all at once. Did you click off of it or otherwise leave the benchmarking page at any point prior? I added code to stop reads if such an abort takes place. If you have the "Dynamix System Statistics" plugin installed, check to see if there's constant drive activity such as one of those tasks are running wild. If stopping & starting the DiskSpeed docker clears it up, then I have more work to do on that front. OK, Here's another run in a more controlled environment. The weirdness has gone and the disk models seem to give the same performance. Quote Link to comment
jbartlett Posted August 27, 2019 Author Share Posted August 27, 2019 On 8/23/2019 at 2:42 PM, dalben said: OK, Here's another run in a more controlled environment. The weirdness has gone and the disk models seem to give the same performance. I added code to invalidate a test result if a drive gives a 10% or greater improvement in speeds over it's single-drive speed. 1 Quote Link to comment
electron286 Posted September 8, 2019 Share Posted September 8, 2019 On 7/24/2019 at 1:54 PM, jbartlett said: Apologies that this slipped through the cracks. Does unRAID return the Serial Numbers for the drives? Sorry, been a little while since I checked for a reply. Here are the drive details of the first 3 drives, with the complete serial numbers for ease, all viewed with unRAID... Under Drive details for the drives I see; (Note using as the SMART controller type: 3Ware 2 /dev/twa1) Model family: SAMSUNG SpinPoint F3 Device model: SAMSUNG HD502HJ Serial number: S27FJ9FZ404491 (Note using as the SMART controller type: 3Ware 1 /dev/twa1) Model family: SAMSUNG SpinPoint F3 Device model: SAMSUNG HD502HJ Serial number: S27FJ9FZ404504 (Note using as the SMART controller type: 3Ware 1 /dev/twa0) Model family: Seagate Barracuda 7200.7 and 7200.7 Plus Device model: ST3160828AS Serial number: 5MT44SV6 And under MAIN tab in Unraid under Devices, I see this; Device Identification Parity 1AMCC_FZ404491000000000000 - 500 GB (sdf) Parity 2 1AMCC_FZ404504000000000000 - 500 GB (sdg) Disk 1 1AMCC_5MT44SV6000000000000 - 160 GB (sdc) Quote Link to comment
electron286 Posted September 8, 2019 Share Posted September 8, 2019 19 minutes ago, electron286 said: Sorry, been a little while since I checked for a reply. Here are the drive details of the first 3 drives, with the complete serial numbers for ease, all viewed with unRAID... Under Drive details for the drives I see; (Note using as the SMART controller type: 3Ware 2 /dev/twa1) Model family: SAMSUNG SpinPoint F3 Device model: SAMSUNG HD502HJ Serial number: S27FJ9FZ404491 (Note using as the SMART controller type: 3Ware 1 /dev/twa1) Model family: SAMSUNG SpinPoint F3 Device model: SAMSUNG HD502HJ Serial number: S27FJ9FZ404504 (Note using as the SMART controller type: 3Ware 1 /dev/twa0) Model family: Seagate Barracuda 7200.7 and 7200.7 Plus Device model: ST3160828AS Serial number: 5MT44SV6 And under MAIN tab in Unraid under Devices, I see this; Device Identification Parity 1AMCC_FZ404491000000000000 - 500 GB (sdf) Parity 2 1AMCC_FZ404504000000000000 - 500 GB (sdg) Disk 1 1AMCC_5MT44SV6000000000000 - 160 GB (sdc) I just saw there have been a few updates to the tool. Downloaded the latest version and this is what I now get, it no longer stalls, but I have this; DiskSpeed - Disk Diagnostics & Reporting tool Version: 2.1 Scanning Hardware 12:44:12 Spinning up hard drives 12:44:12 Scanning system storage 12:44:25 Scanning USB Bus 12:44:32 Scanning hard drives Lucee 5.2.9.31 Error (application) MessageError invoking external process Detail/usr/bin/lspci: option requires an argument -- 's' Usage: lspci [<switches>] Basic display modes: -mm Produce machine-readable output (single -m for an obsolete format) -t Show bus tree Display options: -v Be verbose (-vv for very verbose) -k Show kernel drivers handling each device -x Show hex-dump of the standard part of the config space -xxx Show hex-dump of the whole config space (dangerous; root only) -xxxx Show hex-dump of the 4096-byte extended config space (root only) -b Bus-centric view (addresses and IRQ's as seen by the bus) -D Always show domain numbers Resolving of device ID's to names: -n Show numeric ID's -nn Show both textual and numeric ID's (names & numbers) -q Query the PCI ID database for unknown ID's via DNS -qq As above, but re-query locally cached entries -Q Query the PCI ID database for all ID's via DNS Selection of devices: -s [[[[<domain>]:]<bus>]:][<slot>][.[<func>]] Show only devices in selected slots -d [<vendor>]:[<device>][:<class>] Show only devices with specified ID's Other options: -i <file> Use specified ID database instead of /usr/share/misc/pci.ids.gz -p <file> Look up kernel modules in a given file instead of default modules.pcimap -M Enable `bus mapping' mode (dangerous; root only) PCI access options: -A <method> Use the specified PCI access method (see `-A help' for a list) -O <par>=<val> Set PCI access parameter (see `-O help' for a list) -G Enable PCI access debugging -H <mode> Use direct hardware access (<mode> = 1 or 2) -F <file> Read PCI configuration dump from a given file StacktraceThe Error Occurred in /var/www/ScanControllers.cfm: line 456 454: <CFSET tmpbus=Replace(Key,":","-","ALL")> 455: <CFFILE action="write" file="#PersistDir#/lspci-vmm-s_#tmpbus#_exec.txt" output="/usr/bin/lspci -vmm -s #Key#" addnewline="NO" mode="666"> 456: <cfexecute name="/usr/bin/lspci" arguments="-vmm -s #Key#" timeout="300" variable="lspci" /> 457: <CFFILE action="delete" file="#PersistDir#/lspci-vmm-s_#tmpbus#_exec.txt"> 458: <CFFILE action="write" file="#PersistDir#/lspci-vmm_#tmpbus#.txt" output="#lspci#" addnewline="NO" mode="666"> called from /var/www/ScanControllers.cfm: line 455 453: <!--- Get the controller information ---> 454: <CFSET tmpbus=Replace(Key,":","-","ALL")> 455: <CFFILE action="write" file="#PersistDir#/lspci-vmm-s_#tmpbus#_exec.txt" output="/usr/bin/lspci -vmm -s #Key#" addnewline="NO" mode="666"> 456: <cfexecute name="/usr/bin/lspci" arguments="-vmm -s #Key#" timeout="300" variable="lspci" /> 457: <CFFILE action="delete" file="#PersistDir#/lspci-vmm-s_#tmpbus#_exec.txt"> Java Stacktracelucee.runtime.exp.ApplicationException: Error invoking external process at lucee.runtime.tag.Execute.doEndTag(Execute.java:258) at scancontrollers_cfm$cf.call_000046(/ScanControllers.cfm:456) at scancontrollers_cfm$cf.call(/ScanControllers.cfm:455) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823) at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:66) at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45) at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464) at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454) at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427) at lucee.runtime.engine.Request.exe(Request.java:44) at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1090) at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1038) at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102) at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684) at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) Timestamp9/8/19 12:44:32 PM PDT Quote Link to comment
electron286 Posted September 8, 2019 Share Posted September 8, 2019 I also see what looks like the same results on the 2nd server. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.