ptr727

Members
  • Posts

    139
  • Joined

  • Last visited

Everything posted by ptr727

  1. New user, installed on two similar systems, only difference is number of drives. Default ports does not work for host, need to change port number from 18888 to 8888, else keep getting connection refused. Fist server no problem with running tests. Second server crash in what appears to be a timeout waiting for the 20 spinning drives to spin up: ``` DiskSpeed - Disk Diagnostics & Reporting tool Version: 2.4 Scanning Hardware 08:25:25 Spinning up hard drives 08:25:25 Scanning system storage Lucee 5.2.9.31 Error (application) Messagetimeout [90000 ms] expired while executing [/usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi] StacktraceThe Error Occurred in /var/www/ScanControllers.cfm: line 243 241: <CFOUTPUT>#TS()# Scanning system storage<br></CFOUTPUT><CFFLUSH> 242: <CFFILE action="write" file="#PersistDir#/hwinfo_storage_exec.txt" output=" /usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi" addnewline="NO" mode="666"> 243: <cfexecute name="/usr/sbin/hwinfo" arguments="--pci --bridge --storage-ctrl --disk --ide --scsi" variable="storage" timeout="90" /><!--- --usb-ctrl --usb --hub ---> 244: <CFFILE action="delete" file="#PersistDir#/hwinfo_storage_exec.txt"> 245: <CFFILE action="write" file="#PersistDir#/hwinfo_storage.txt" output="#storage#" addnewline="NO" mode="666"> called from /var/www/ScanControllers.cfm: line 242 240: 241: <CFOUTPUT>#TS()# Scanning system storage<br></CFOUTPUT><CFFLUSH> 242: <CFFILE action="write" file="#PersistDir#/hwinfo_storage_exec.txt" output=" /usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi" addnewline="NO" mode="666"> 243: <cfexecute name="/usr/sbin/hwinfo" arguments="--pci --bridge --storage-ctrl --disk --ide --scsi" variable="storage" timeout="90" /><!--- --usb-ctrl --usb --hub ---> 244: <CFFILE action="delete" file="#PersistDir#/hwinfo_storage_exec.txt"> Java Stacktracelucee.runtime.exp.ApplicationException: timeout [90000 ms] expired while executing [/usr/sbin/hwinfo --pci --bridge --storage-ctrl --disk --ide --scsi] at lucee.runtime.tag.Execute._execute(Execute.java:241) at lucee.runtime.tag.Execute.doEndTag(Execute.java:252) at scancontrollers_cfm$cf.call_000006(/ScanControllers.cfm:243) at scancontrollers_cfm$cf.call(/ScanControllers.cfm:242) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823) at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:66) at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45) at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464) at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454) at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427) at lucee.runtime.engine.Request.exe(Request.java:44) at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1090) at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1038) at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102) at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684) at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) Timestamp2/20/20 8:29:32 AM PST ```
  2. Still happens on my 6.8.2, had to disable WSD.
  3. I tried with DirectIO yes, and DirectIO yes plus case insensitive yes, no difference (see attached results). Given that a disk share over SMB showed good performance, I am sceptical that it is a SMB issue, my money is on a performance problem in the shfs write path. DiskSpeedResult_Ubuntu_Cache.xlsx
  4. I have spent a significant amount of time and effort chasing the SMB performance problem (I immediately noticed the slowdown when I switched from W2K16 to Unraid), so I do think my side of the street has been well worn. I referenced the tool I wrote to automate the tests in the last three of my blog posts where I detail my troubleshooting, and every one posted in this thread. For completeness, here again: https://github.com/ptr727/DiskSpeedTest
  5. Ok, but why would a SMB option make a difference if it looks as if it is a "shfs" write problem, i.e. SMB over disk performance was good, SMB over user share performance was bad, read performance always good? I'll give it a try (case sensitive SMB will break Windows), but I won't be able to test until next week. I believe it should be easy to reproduce the results using the tool I've written, so I would suggest you profile the code yourself, rather than wait for my feedback to the experiments.
  6. Same happening to me running 6.8.2.
  7. Thank you for the info. Would it then be accurate to say the read/write and write performance problem shown in the ongoing SMB test results are caused by shfs? Can you comment on why the write performance is so massively impacted compared to read, especially since the target is the cache and needs no parity computations on write, i.e. can be read through and write through?
  8. Some more googling, and I now assume when you say shfs you are referring to Unraid's fuse filesystem, that happens to be similarly named to better known shfs, https://wiki.archlinux.org/index.php/Shfs. A few questions and comments: - Is unraid's fuse filesystem proprietary, or open source, or GPL and we can request source? - For operations hitting just the cache, no parity, no spanning, why the big disparity between read and write for what should be a noop? - Logically cache only shares should bypass fuse and go direct to disk, avoiding the performance problem. - All appdata usage on cache only will suffer from the same IO write performance problem as observed via SMB. Unless users explicitly change appdata for containers from mnt/user/appdata to mnt/cache/appdata.
  9. So, you are absolutely right, a "disk" share's performance is on par with that of Ubuntu. Can you tell me more about "shfs"? As far as I can google shfs was abandoned in 2004, replaced by SSHFS, but I don't understand why a remote ssh filesystem would be used, or are we taking vanilla libfuse as integrated into the kernel? DiskSpeedResult_Ubuntu_Cache.xlsx
  10. Testing now, about an hour left to go. Did you try to reproduce the results I see, instructions should be clear: https://github.com/ptr727/DiskSpeedTest
  11. See: https://github.com/ptr727/DiskSpeedTest https://github.com/Microsoft/diskspd/wiki/Command-line-and-parameters -Srw means disable local caching and enable remote write though (try to disable remote caching). What I found is that Unraid SMB is much worse at mixed readwrite and write compared to Ubuntu on the same exact hardware, where the expectation is a similar performance profile. Are you speculating that the problem is caused Fuse?
  12. I am more convinced than ever that this is an Unraid problem. I've done several rounds of tests, in my latest I ran Ubuntu Server on exactly the same hardware as Unraid, and the performance is significantly better compared to Unraid. See: https://blog.insanegenius.com/2020/02/02/unraid-vs-ubuntu-bare-metal-smb-performance/
  13. You are welcome to run a test on your own setup for comparison, I describe my test method. By my testing the Unraid numbers really are bad, I attached my latest set of data. DiskSpeedResult_Ubuntu_Cache.xlsx Btw, 500MBps is near 4Gbps, are you running 10Gbps ethernet?
  14. I have now tested Unraid vs. W2K19 VM, and Ubuntu VM, and now Ubuntu bare metal on the same hardware. There is no reason why Unraid should be slower on the cache drive, but the ReadWrite and Write performance is abysmal. https://blog.insanegenius.com/2020/02/02/unraid-vs-ubuntu-bare-metal-smb-performance/
  15. Ah, I was about to say but it only allows the top level to be excluded, not child folders, then I noticed the edit box, and I can add my own path instead of using the GUI. Would be cool if the GUI allowed sub-folder selection, but I'll wait for restore to complete, then try to exclude the Plex metadata folder. Thx
  16. Ah, so verify is comparing the files, not just verifying integrity, got it. I bet it is the Plex metadata that is taking forever, is there an ability to exclude the metadata from backup (it can be redownloaded), maybe a path exclusion?
  17. Hi, not sure if it is related to this plugin, but this AM I noticed that none of my dockers are running. Actually only one is running, postfix, but that does not use any appdata storage. I looked at the log, and it looks like backup ran, last reported verifying the backup, but I'm not sure why the containers were not restarted after the backup. Feb 2 02:00:01 Server-1 Plugin Auto Update: Checking for available plugin updates Feb 2 02:00:03 Server-1 Plugin Auto Update: community.applications.plg version 2020.02.01 does not meet age requirements to update Feb 2 02:00:04 Server-1 Plugin Auto Update: Community Applications Plugin Auto Update finished Feb 2 03:00:01 Server-1 CA Backup/Restore: ####################################### Feb 2 03:00:01 Server-1 CA Backup/Restore: Community Applications appData Backup Feb 2 03:00:01 Server-1 CA Backup/Restore: Applications will be unavailable during Feb 2 03:00:02 Server-1 CA Backup/Restore: this process. They will automatically Feb 2 03:00:02 Server-1 CA Backup/Restore: be restarted upon completion. Feb 2 03:00:02 Server-1 CA Backup/Restore: ####################################### Feb 2 03:00:02 Server-1 CA Backup/Restore: Stopping Duplicacy Feb 2 03:00:02 Server-1 CA Backup/Restore: docker stop -t 60 Duplicacy Feb 2 03:00:02 Server-1 CA Backup/Restore: Stopping nginx Feb 2 03:00:06 Server-1 kernel: veth8a1bd58: renamed from eth0 Feb 2 03:00:06 Server-1 CA Backup/Restore: docker stop -t 60 nginx Feb 2 03:00:06 Server-1 CA Backup/Restore: Stopping plex Feb 2 03:00:10 Server-1 kernel: vethf123b74: renamed from eth0 Feb 2 03:00:10 Server-1 CA Backup/Restore: docker stop -t 60 plex Feb 2 03:00:10 Server-1 CA Backup/Restore: postfix set to not be stopped by ca backup's advanced settings. Skipping Feb 2 03:00:10 Server-1 CA Backup/Restore: Stopping radarr Feb 2 03:00:14 Server-1 kernel: veth3440e0e: renamed from eth0 Feb 2 03:00:14 Server-1 CA Backup/Restore: docker stop -t 60 radarr Feb 2 03:00:14 Server-1 CA Backup/Restore: Stopping sabnzbd Feb 2 03:00:18 Server-1 kernel: vethd3efd80: renamed from eth0 Feb 2 03:00:18 Server-1 CA Backup/Restore: docker stop -t 60 sabnzbd Feb 2 03:00:18 Server-1 CA Backup/Restore: Stopping sonarr Feb 2 03:00:22 Server-1 kernel: veth323bd38: renamed from eth0 Feb 2 03:00:22 Server-1 CA Backup/Restore: docker stop -t 60 sonarr Feb 2 03:00:22 Server-1 CA Backup/Restore: Stopping vouch-proxy Feb 2 03:00:23 Server-1 kernel: veth21f55b3: renamed from eth0 Feb 2 03:00:23 Server-1 CA Backup/Restore: docker stop -t 60 vouch-proxy Feb 2 03:00:23 Server-1 CA Backup/Restore: Backing up USB Flash drive config folder to Feb 2 03:00:23 Server-1 CA Backup/Restore: Using command: /usr/bin/rsync -avXHq --delete --log-file="/var/lib/docker/unraid/ca.backup2.datastore/appdata_backup.log" /boot/ "/mnt/user/backup/Unraid/USB" > /dev/null 2>&1 Feb 2 03:00:23 Server-1 CA Backup/Restore: Changing permissions on backup Feb 2 03:00:23 Server-1 CA Backup/Restore: Backing up libvirt.img to /mnt/user/backup/Unraid/libvirt/ Feb 2 03:00:23 Server-1 CA Backup/Restore: Using Command: /usr/bin/rsync -avXHq --delete --log-file="/var/lib/docker/unraid/ca.backup2.datastore/appdata_backup.log" "/mnt/user/system/libvirt/libvirt.img" "/mnt/user/backup/Unraid/libvirt/" > /dev/null 2>&1 Feb 2 03:00:27 Server-1 CA Backup/Restore: Backing Up appData from /mnt/user/appdata/ to /mnt/user/backup/Unraid/appdata/[email protected] Feb 2 03:00:27 Server-1 CA Backup/Restore: Using command: cd '/mnt/user/appdata/' && /usr/bin/tar -cvaf '/mnt/user/backup/Unraid/appdata/[email protected]/CA_backup.tar.gz' --exclude 'docker.img' * >> /var/lib/docker/unraid/ca.backup2.datastore/appdata_backup.log 2>&1 & echo $! > /tmp/ca.backup2/tempFiles/backupInProgress Feb 2 04:00:01 Server-1 Docker Auto Update: Community Applications Docker Autoupdate running Feb 2 04:00:01 Server-1 Docker Auto Update: Checking for available updates Feb 2 04:00:10 Server-1 Docker Auto Update: Installing Updates for code-server nginx sabnzbd Feb 2 04:00:40 Server-1 Docker Auto Update: Community Applications Docker Autoupdate finished Feb 2 04:40:01 Server-1 apcupsd[10670]: apcupsd exiting, signal 15 Feb 2 04:40:01 Server-1 apcupsd[10670]: apcupsd shutdown succeeded Feb 2 04:40:04 Server-1 apcupsd[13274]: apcupsd 3.14.14 (31 May 2016) slackware startup succeeded Feb 2 04:40:04 Server-1 apcupsd[13274]: NIS server startup succeeded Feb 2 06:18:13 Server-1 CA Backup/Restore: Backup Complete Feb 2 06:18:13 Server-1 CA Backup/Restore: Verifying backup Feb 2 06:18:13 Server-1 CA Backup/Restore: Using command: cd '/mnt/user/appdata/' && /usr/bin/tar --diff -C '/mnt/user/appdata/' -af '/mnt/user/backup/Unraid/appdata/[email protected]/CA_backup.tar.gz' > /var/lib/docker/unraid/ca.backup2.datastore/appdata_backup.log & echo $! > /tmp/ca.backup2/tempFiles/verifyInProgress The CA backup status tab says verifying, but it has been more than 2 hours of verifying. Is it really verifying, if so, should the containers not be restarted after backup, not after verify, else they will be offline much longer than needed? Any ideas how to find out if verify is really running, or something went wrong?
  18. Thx, I'll give it a try tonight when I get home.
  19. I see, does this imply that the /dev/foo identifiers need to match, or can btrfs figure out how to match the UUID with the devid? E.g. what if I swap drive bay positions, or a different controller changes the device identifier?
  20. Thx, out of curiosity, how does the OS know the other partitions?
  21. How can I mount the BTRFS cache volume from Ubuntu? My cache consists of 4 x 1TB SSD drives. Looking at the btrfs docs, it looks like I need to know the disk layout when I mount the disks. What would the btrfs mount options be for a 4 drive cache volume created by Unraid?
  22. Looks to me like a disk IO problem in Unraid, not a Samba problem. https://blog.insanegenius.com/2020/01/18/unraid-vs-ubuntu-smb-performance/
  23. Cache performance in v6.8.1 is worse than v6.7.2. See: https://blog.insanegenius.com/2020/01/16/unraid-smb-performance-v6-7-2-vs-v6-8-1/