Jump to content


  • Content Count

  • Joined

  • Last visited

Community Reputation

0 Neutral

About dgirard

  • Rank
    Advanced Member


  • Gender
  1. I'm seeing this now on a brand new (to unraid) system running 6.8.3. It's a trial key with only 2 drives and no additional setup--no vm's, no docker setup, not even any plugins installed or any data loaded/shares. So likely this is more basic than suggested above. Note, I don't have this issue on my primary production system, so maybe it's cpu generation based or some other hardware interaction. FWIW, old system is 2x AMD 2431 on a Supermicro H8dm8-2, new system is 1x AMD 6274 and Supermicro H8DG6.
  2. Update: Appears to be related to the Floppy Drive that's detected (even though I don't have one). I updated ScanControllers to skip it and it gets through scanning. David
  3. Hello! I'm having a problem similar to interwebtech. The web interface never gets past "scanning hard drives" When I look at the docker log (icon on the right in Unraid), I see several Java errors, here's the first one: lucee.runtime.exp.ApplicationException: Error invoking external process at lucee.runtime.tag.Execute.doEndTag(Execute.java:258) at scancontrollers_cfm$cf.call_000046(/ScanControllers.cfm:456) at scancontrollers_cfm$cf.call(/ScanControllers.cfm:455) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:933) at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:823) at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:66) at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:45) at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2464) at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2454) at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2427) at lucee.runtime.engine.Request.exe(Request.java:44) at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1090) at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1038) at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102) at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:492) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:80) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:684) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1152) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684) at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2464) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) There are several more, but I suspect they're all related to this one? I swapped out ScanControllers.cfm for the CreateDebugInfo.cfm as was previously suggested and am emailing the output. Any ideas? Running unraid 6.6.7 and the latest DiskSpeed container.
  4. Just add Hard drives, and an unraid license/flash drive. Asking $250. I can meet up anywhere in the Metro Detroit area (within 50 miles of zip 48111), or anywhere along the i75 corridor between Detroit and Cincinnati (I make that drive at least once a month for work) This is a used setup. If I recall correctly, one of the drive bays didn't work. I think it was a backplane or cabling problem. It's obvious in that it just doesn't work...so consider this to be a 19 drive system or take some time and try to fix what's wrong (probably something simple, but I really don't know). Also included in this setup: ABit AB9 PTO motherboard 2 GB DDR2-800 RAM PowerPC and cooling Silencer 750 EPS12v Quad power supply (was enough to power 18 drives+ in this system) Intel CPU (honestly, I can't remember what CPU is in this guy--I can probably fire it up and see if you'd like--but it works) 1x Supermicro AOC-saslp-MV8 Sas/SATA card with breakout cables for 8 SATA drives Total of 19 drive bays connected to SATA controllers (between the 8 on the Supermicro, and 11 on the motherboard. All 7 Fans appear to be working, but are used--easy to replace if there's any issue there Also to note, I seem to be missing ONE of the hot plug drive trays (I guess to go with the one slot that doesn't work)...I have it here somewhere and If I find it before delivery I'll add it (and I'll update this listing if I find it) I know the motherboard, ram and CPU are several generations old at this time, but this system manages unraid just fine! Probably not the best for lots of Dockers or KVM's due to limited ram and CPU cores (that's why I upgraded) Let me know if you have any questions. This system is pretty much ready to go! David
  5. Hello all. I had previously setup my VM's via the Gui and had edited the XML to set custom port numbers for VNC (the objective is to have consistent port numbers for specific VM's instead of them being assigned in the order that the VM's are started). I also need to have a password on some of the VM's. This was working fine until the last unraid update... Now it seems that I can set a password in the GUI, but still need to edit the XML to add the custom port number for VNC. Unfortunately, as soon as I edit the XML for VNC, it "forgets" the password...so I go back and set the password in the GUI and it "forgets" the custom port number...so I can't figure out how to get both to stick...it just worked with previous versions... Any suggestions? What am I missing here? Thanks
  6. It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable. Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive. I know you said you replaced the SATA cable, but it doesn't look like a good cable from here. There's still a small chance that it may be a bad power situation instead. Rob: My UDMA_CRC_Error_Count is 2,so it does not seem to be CRC errors from the drives perspective. I'm still happy to try another SATA cable if that still makes sense now that the errors seem to have stopped (I'm going to monitor for a couple days before confirming) with the NCQ setting change. Also, it's possible that rjscotts problem *is* a cable or power Yes, I was referring to the ICRC in rjstott's syslog extract. DRDY just means 'Drive ReaDY', a good flag. The important part of your exception is the 'internal error' message unfortunately. When a programmer maps something odd or unexpected to 'internal error', it usually means they either don't expect it to happen, or don't want to deal with it, or don't know how to deal with it. There's no further information available, so you are kind of stuck! Any firmware updates available, for that disk controller? Thanks robj: It makes more sense to me now. I'll look into controller firmware...Im using the sata controller on the motherboard (it's a super micro) for this drive. The drive itself has the latest. I checked the NCQ settings on several of my drives and they're all set to 31...so looks like the all-off isn't working (at least in my setup) So either a bug or something strange with my setup...I'll look closer tomorrow. I'll also keep watching to see if the error messages occur now that the ssd is set to NCQ=1. I suspect this is the bug that's alluded to in the bugzilla report (doesn't look like its been fixed by the linux kernel guys yet). Thanks again! David
  7. Rob: My UDMA_CRC_Error_Count is 2,so it does not seem to be CRC errors from the drives perspective. I'm still happy to try another SATA cable if that still makes sense now that the errors seem to have stopped (I'm going to monitor for a couple days before confirming) with the NCQ setting change. Also, it's possible that rjscotts problem *is* a cable or power, since when I look back at my log, I see a different message before the failed command...I see: May 9 23:14:43 Tower kernel: ata16.00: cmd 61/00:50:e0:4f:b1/38:00:03:00:00/40 tag 10 ncq 7340032 out May 9 23:14:43 Tower kernel: res 40/00:a8:d8:a7:ae/00:00:03:00:00/40 Emask 0x40 (internal error) May 9 23:14:43 Tower kernel: ata16.00: status: { DRDY } May 9 23:14:43 Tower kernel: ata16.00: failed command: WRITE FPDMA QUEUED I do not see the CRC error message you pointed out...I don't know what DRDY means, but could it mean that we're overflowing the buffers sending data to the drive (that would explain the out of IOMMU space I reported earlier as well I would assume) It's the BadCRC and ICRC error flags that specifically indicate corrupted packets, usually from a bad SATA cable. Since you have repeated ICRC error flags, which cause the pauses and resets, and cause the SATA link speed to be slowed down to hopefully improve communications integrity, I suspect you also have an increased UDMA_CRC_Error_Count on the SMART report for that drive. I know you said you replaced the SATA cable, but it doesn't look like a good cable from here. There's still a small chance that it may be a bad power situation instead.
  8. One more interesting observation: I have "force NCQ disabled=yes" on the disk configuration screen. Yet it appears (maybe I'm looking the wrong way?) that NCQ is still enabled for all my drives, including this cache drive that's having the problems. If I cat /sys/block/sdc/device/queue_depth it reports a value of 31, which indicates NCQ is in play if I understand this correctly (I believe it should report 0 or 1 if NCQ is disabled?) now, if I change the queue_depth to 1 with echo 1 >/sys/block/sdc/device/queue_depth it appears that my errors with this ssd no longer occur (based on a quick test...set to 1, copy large file to ssd, no errors, set back to 31, recopy same file, errors occur.) Am I understanding this right? I thought about looking at this because of this: https://bugzilla.kernel.org/show_bug.cgi?id=89261 and this document which explains how to dynamically change ncq settings: https://exemen.wordpress.com/2011/05/16/enabling-disabling-and-checking-ncq/
  9. Interesting article. Sounds like I need to pull out the Samsung SSD or be faced with performance problems at some point. I do not think this is the cause of our current problem (rjscott and I) as I reformatted my SSD and re-copied all the data to it and the errors continued immediately. I'm also not certain it's sata cable or power (I'm not ruling it out however)...I did replace the sata cable and even changed the sata port that it was connected to. Power seems stable (it's in a super micro 24 drive server with the dual power supplies) and I have no other power problems. In addition, it seems this problem started with beta 15. It IS making the entire server pause while it tries to reset the sata port. If this problem is just limited to samsung SSD's as cache drives, then I'll just pull mine, but since the problem seems to have started with a release, is it possible this is a driver or kernel problem?
  10. binhex: Looks like similar problem exists with delugevpn... 2015-05-06 06:04:34,620 DEBG 'setip' stderr output: /home/nobody/setip.sh: line 4: netstat: command not found and 2015-05-06 06:04:34,730 DEBG 'setip' stderr output: /home/nobody/setip.sh: line 4: netstat: command not found 2015-05-06 06:04:34,733 DEBG 'webui' stderr output: /home/nobody/webui.sh: line 4: netstat: command not found 2015-05-06 06:04:34,734 DEBG 'setport' stderr output: /home/nobody/setport.sh: line 4: netstat: command not found Are we doing something wrong? or did they change the upstream OS distro on you perhaps? I connected to the container while running and couldn't find a netstat command anywhere... Thanks! David
  11. OK, here's the smart report. Looks OK to me, but maybe I'm missing something?
  12. I'll start by apologizing for changing the subject...I didn't realize I was changing the entire thread. Other boards create a sub-subject within a thread if the subject is edited. Strange that I can even edit it. Thanks Mods for fixing it...I meant no harm... I was hoping I'd distilled the problem down to these errors. This is even more confusing now...if IOMMU is an Intel VT-d error then I must have some real problems, since I have AMD CPU's, and wasn't running any virtualization at the time these occurred. I also only see these errors when writing to my SSD (one out of 19 drives). Perhaps the IOMMU's are just a side effect, the real problem seems to the "exception emask" and "failed command" messages. there are hundreds of these. I'll start at the beginning, since perhaps the original snippit isn't enough to distill the problem. Here's the history: The night after I upgraded to beta 15 (from beta 14), the mover ran and tried to move all my files that were on cache onto a drive that was full...it generated a ton of the messages I quoted above, along with "out of space" errors..and blew away the files it wasn't able to move (I lost data). It wasn't anything critical I'd lost, and it appeared that the problem was likely with the ssd or my share configuration, so I updated those as I indicated (reformatted, flashed firmware, added full drives to the "excluded" disks in the share configuration, etc) but the errors writing to the SSD didn't go away. Here are some extracts from that first nights log: These errors occurred while I was copying one of my disk image files (to and from my cache drive) so I could test KVM vs XEN: This exact sequence appears approx 197 times in a row (sometimes with different number of WRITE QUEUED messages) from timestamp 22:17:31 to 23:51:11 (the time I was copying a 10GB file) Also seems like a long time to copy 10 GB to and from an SSD... I didn't notice these messages until the next day, when I went looking to find out why a bunch of files were missing that should have been moved over from the cache drive. Here's part of the log from the Mover script: There is one set of these error messages for each movie that mover tried to move. Here's the really strange thing: each folder had a movie file (a file with extension MKV if it makes any difference) and an NFO file. I don't see reference to any of the MKV files in the log, however they were completely deleted from the cache drive--not moved. The NFO files were "moved", but after the move (and during according to the log) if you try to access them yo get the "Exe format error". I had attributed the exe format error to corruption in the reiserfs file system--rather than try to repair it, I reformatted the drive as xfs (I've been systematically doing that anyway) and restored the files that were there. The other strange thing with this is that these files had not been moved from cache previously (I don't have the older log files, so I don't know what/if there were previous errors. They'd all been there at least a week or two (perhaps since beta 14?). Also, disk9, where mover was trying to move them to, was *almost* full (just a few K free), and was NOT part of the share configuration (not included or excluded) and there were drives that are included that had plenty of free space. I've updated all my share configs to explicitly indicate which drives to include and exclude, so perhaps that was my mistake in configuration. Finally, the mover didn't nuke all my files, some of them failed like these (share was different) I had originally thought these mover errors stemmed from the IOMMU's earlier..perhaps I was wrong and it was simply configuration and riserfs corruption. If that's the case then I think I'm fixed regarding lost files...but the IOMMU's seem troubling since I don't have an Intel CPU and they do seem to be impacting performance (all those ATA bus resets can't be good for performance). Lastly, here's a snip from the start of the log where the ata30 device starts up: Is there anything else I should be looking at? or are the remaining IOMMU messages and failed command messages just informational? Thanks!
  13. Hello: After upgrading to beta16, (possibly before, but I didn't notice these errors--I think they're new with beta16)... I'm getting a bunch of errors talking to my cache drive (it's an SSD). The errors in my log look like this: At first I thought maybe firmware on the SSD, so I upgraded that, and while I was in there I replaced the SATA cable to the SSD. I also saw some strange storage behavior, with error messages on some folders when viewed via share0 indicating "wrong exec format"...I suspect those are a side effect of this problem. Since this was my cache drive and it was formatted with btrfs, I moved everything off, reformatted with ifs, and put files back...I saw a TON of these errors when putting the files back...so it wasn't related to the file system. I have NCQ turned off in the unraid settings, and smart ctrl shows the cache drive is clean... Where should I look next?...or is this a bug in the sata_nv driver again?
  14. OpenVPN! OK, I can confirm that OpenVPN works out-of-the-box with this Arch os image. (ok, not out-of-the-box,but without anything special other than pacman and configuration…) I don't think there's a need to add it to the unraid repository, since the one in the Arch repositories works just fine. all I had to do was: pacman -S openvpn and it installed the package and dropped the sample configuration files. I'm using it as a client to connect to "Private Internet Access" (that's the company that provides my anon-internet access service) and I just followed their guide for linux openvpn setup and it worked great! next step is to try to setup an inbound private vpn so I can get access to my systems from outside via vpn…it appears everything is already installed to make that happen as well, but it just needs configuration. I'm happy to document that process once I get it going if anyone is interested. David
  15. Is anyone else having trouble with NFS mounts from the ArchVM to unraid?...I'm having a problem where one folder on a share (seems to be my sabtemp folder) becomes inaccessible...it shows ownership and permissions as "? ? ? ? ? ? ? ? ? ? ? ?". It seems to resolve itself after some time (hours?)...but in the mean time sabnzbd returns all kinds of errors and basically either looses the dl, or is uanble to run the sickbead post-process script...leaving me to clean up... It's happening on a regular basis now...daily...Not sure if this is the stale nfs file issue that's been seen in the past due to Fuse inode cleanup... I haven't tried remounting to see if it clears... I tried setting up the mount as SMB, but ran into permissions issues that just seemed silly to have to work through (why use smb to mount linux to linux?).... I'm just doing a basic mount with no options...so perhaps there's something I need to add to help prevent this...? Any suggestions?