greg_gorrell

Members
  • Posts

    172
  • Joined

  • Last visited

Everything posted by greg_gorrell

  1. Just a quick look at the logs and your drive config shows some sort of BTRFS corruption. Without digging in further, I cannot say for sure but there is a possibility that the memory problems lead to corrupt filesystem or you have a bad drive/controller. I say this because the high number of errors in the SMART stats and the system log entries like this one: I would start by using a known good drive for cache or testing it on the array to rule the drive out.
  2. Yes, that is the name of the zpool. I did notice that everything works fine with a fresh docker.img file created on the cache or array via the settings and the appdata folders on the zpool, so it is definitely some weird little bug with using ZFS. It works for now, I'll see what happens with the official ZFS implementation when it comes around. Apologies for the delay in responding, I had a drive go on the other server that kinda took priority lately but thank you for taking the time to check out the diagnostics and offer input.
  3. Perhaps I am not asking in the correct way. Could somebody please explain to me how the web interface interacts with the underlying services? When I click "reboot server" on the main page, what has to happen for the "shutdown -r" command to be executed by the system? Is it possible that the web server component of Unraid sends a command to the system and will not move on until that command is completed? Is it possible I have an issue with Docker or ZFS and that issue is why the timeout is occurring, rendering the timeout more of a symptom than a cause for the problem? Thanks again.
  4. Thanks for the reply Squid, but I honestly am not sure what would be causing this issue but the "upstream timeout" seems to be in the log every time this happens. To clarify though, I don't suspect CA has anything to do with it and just happens to be related to the job I was performing at the time this happened most recently. Generally, the timeout occurs when I am accessing the Docker page and not the CA Apps page, like it timesout when querying the service. Since I am getting no other information from the logs, I have no clue where to start but it seems that the issue lies with the query itself from the Web GUI and the Dockers. Maybe not, but when I start the array and run it with either Docker disabled completely or Docker enabled with no containers running, the issue does not manifest itself. After a couple Docker containers are running, at some point this timeout will occur and any subsequent commands to control a service will not execute, whether sent over SSH or the web interface. I am going to attempt to move all of the docker related stuff off the zpool and onto a cache drive managed by Unraid, but any assistance on how to better troubleshoot this would be greatly appreciated.
  5. Hello, I have an H ML350P with plenty of resources, been attempting to upgrade from 6.8.2 to 6.9.2 for quite some time now. Each time I do, I am unable to start more than a few docker containers before the Web GUI starts acting erratically. I am not sure what the cause is, and the only thing I see in the logs in common with each occurrence is the following "upstream time out" error messages: Jan 29 09:49:19 ML350P nginx: 2022/01/29 09:49:19 [error] 9325#9325: *902 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.51, server: , request: "POST /plugins/community.applications/scripts/notices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "10.0.0.101", referrer: "http://10.0.0.101/Main" After this error shows in the log, no commands to the server will work. While I can still navigate the web interface, aside from the Docker page which just tries to load endlessly, I am unable to send a command to stop or restart the docker service. The machine will not reboot either with the button on GUI or by command line. The syslog indicates the system is going down for reboot, but the nothing happens after that. I have not been able to pin this down to a particular container and seems to be fine when no containers are running. As soon as I fire up three or so, I can expect the issue to occur again. Also, I should note that I am using ZFS and that is where my docker config and containers are located. I have also tried deleting the docker image file as well, now I cannot even get them to run from the templates. I am thinking this may be a ZFS issue, but is there anywhere else I can look for some clues? Here is what happens when I tried to add my dockers back: Jan 29 10:20:44 ML350P nginx: 2022/01/29 10:20:44 [error] 7556#7556: *2249 upstream timed out (110: Connection timed out) while reading upstream, client: 10.0.0.51, server: , request: "POST /Docker/AddContainer?xmlTemplate=user%3A%2Fboot%2Fconfig%2Fplugins%2FdockerMan%2Ftemplates-user%2Fmy-UniFi-Video.xml&rmTemplate= HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "10.0.0.101", referrer: "http://10.0.0.101/Docker/AddContainer?xmlTemplate=user%3A%2Fboot%2Fconfig%2Fplugins%2FdockerMan%2Ftemplates-user%2Fmy-UniFi-Video.xml&rmTemplate=" Thanks in advance! ml350p-diagnostics-20220129-1040.zip
  6. No, pihole is on the same untagged, native 10.0.0.0/24 VLAN the as Unraid eth0 interface. There are no firewall rules or other networking issues at play either. I just tested on my second Unraid server, we will say with 10.0.0.3 IP. I created the VLAN bridges exactly as I did on the first server and I can successfully ping the Pihole. This leads me to believe that it isn't normal behavior and there is a configuration issue with the first Unraid box.
  7. Hey all, I was hoping someone would be able to explain some behavior that seems a little odd to me. I and currently on the latest production release of Unraid and this particular server runs a pfsense VM and an Manjaro VM. The pfsense VM is say 10.0.0.1 and has a dual port NIC passed through which is connected to my modem on WAN side and LAN side to a managed L3 Cisco switch. The other day, I created some VLANs on my network to segment some traffic like most do. In Unraid, I set up the VLANs as well, each having their own br0.vlan00 interface and I moved the dockers which are exposed to the internet on their own VLAN. I have a pihole running at say 10.0.0.80 which provides DNS for all of the network currently. Before creating the VLANs, my unraid server and all the docker containers would resolve DNS through the pihole. After creating the VLANs though, nothing on the Unraid box can reach the pihole. Keep in mind, although I have created VLANs, I have not moved the pihole yet and both Unraid and the pihole are on the same 10.0.0.0/24 LAN network, I have only added additional br0.vlan00 interfaces. Is there a reason that I am unable to even ping the pihole IP, either from the Unraid host at 10.0.0.2 or from the dockers or VMs utilizing br0?? After moving the Dockers to the "DMZ" VLAN, obviously a different subnet, they are able to resolve requests from the pihole and ping it as well. Perhaps this is more a Linux behavior than Unraid, but I have not encountered it before as this is my first foray into VLANs on a Linux box so could someone confirm this is typical? Thanks in advance!
  8. What a moron, I skimmed the whole thread before posting and I still missed that. Sorry guys! Edit: I wouldn't say it is "ignored," just not reflected in the GUI in my case.
  9. I began testing this build on my HP ML350 Gen8 in hopes the temperature values would be fixed in the GUI with the smartctl changes I noticed in the code. Unfortunately this hasn't done anything to resolve the problem of the default "Automatic" setting not pulling the SMART data (incorrect syntax error), and when set manually I still am not getting the temperature data on the Main or Dashboard tabs. I have also noticed a new problem occurring now in this build that wasn't on 6.8.x. If I go to an individual disk and set the SMART controller manually, after clicking "apply" and reloading the page, the SMART data will update and reflect the change but the GUI still shows "default." Just playing around some these settings, I have noticed it can be somewhat buggy as well. I have hit apply and refreshed the page on two occasions now only to have the settings revert back to the default. Perhaps someone could try to reproduce this error.
  10. Does anyone know why I would be unable to access Heimdall all of a sudden? It worked fine for a year now then over the past week since we had a power outage, I try to login and go to a 419 page informing me that my session expired. I just access this container inside my network via IP so no reverse proxying or DNS issues. I also tried deleting the keys and still had no luck. What might be causing this and how can i force a new session?
  11. Yes, I was getting all smart data when running manually. Looking at files in /var/local/emhttp/smart, it is clear that the underlying command hard-coded into the dynamix webgui is not running properly nor is it affected by what settings are entered for either each disk or globally. See my last post though, I explored the code a little bit last night and believe that this is something currently be resolved and would expect to see in next beta release. Lots of lines were removed in and only one added, and I am not even sure how to interpret it correctly: if (file_exists("$file") && exec("grep -Pom1 '^SMART.*: \K[A-Z]+' ".escapeshellarg($file)." |tr -d '\n' 2>/dev/null", $ssa) && in_array("$ssa",$failed)) { as referenced in this commit on July 12: https://github.com/limetech/webgui/commit/6f8507e5474e9b77fef836ee7379a1bee25a7a5b
  12. I just spent the whole evening trying to figure out what is going on here. After playing around int he Global Disk Settings, trying various SMART controller types and testing, I think I have some answers. I set the SMART controller to HP cciss globally and it wouldn't work. I tried a few others, eventually landing on "SAT," whatever the hell that may mean. To my surprise it returned all of the data I was looking for. Every one of the fields are populated and temperature data works, although like in the case of OP, it doesn't transfer to the Main page. As mentioned before, the Dynamix webGUI is pulling the data from the /var/local/emhttp/disks.ini file. You would think this information might be related to the data in the files contained in the /var/local/emhttp/smart/ directory which seem to query the SMART data from the disk, but it appears that the files in the smart folder are not connected in any way. In my case, the directory contains basic text file for each disk that contains what would be the smartctl output for that device, as well as a file of the same name with a .ssa extension. No matter what SMART controller I select in the disk settings, the information in these files does not seem to change and just reports the following: After doing some more searching, none of these files have anything to do with the Dashboard or Main pages. It seems that others have had issues with the Areca controllers in which it has been stated that the smart reporting on Dashboard and Main pages are hard-coded in emhttp and the parameters are not able to be defined by the user. I checked out the webgui code and it and I want to say they are currently working on a fix for this, as there was a commit last month removing the smartctl command from the monitor script. The extent of my PHP knowledge is reading a couple chapters of PHP in 24 hours 20 years ago when I was in middle school, so I could be full of bs here. I just spent way too much time into this last night when I could have been implementing a script to alert me to issues via another method. Hopefully some devs can chime in on what is going on or if anyone here is familiar with the codebase and wants to check it out. It definitely seems like something too simple to just leave not working properly, especially when thats kind of an important feature for this OS.
  13. I just picked up an ML350P Gen8 and am going through the same issue currently. I put the P420i controller into HBA mode after getting all the firmware up to date and just have been doing some testing before I try to migrate everything over. I have a hodgepodge of SATA drives installed, different brands sizes, etc. and haven't noticed any issues with the fan speeds since running Unraid so I am taking that as a good sign. Unraid lists no temps or SMART data for any of the drives, but I am able to retrieve it via the smartctl command as you mention. One of my ideas was to try to use iLO and set an alarm for this but of course there is no easy way to install the agent that reports this data into Unraid, although I am considering trying to convert the rpm into a txz if there is no easy way to obtain the data in the GUI. If you are still working on this, I would be glad to exchange notes here and see if we can't figure out a way to solve this. It's so strange to me that Unraid can be so polished in many aspects and fall flat in others, especially when it's paid software and this mature. Regardless, I am going to try some things tonight and will share what I come up with if noteworthy.
  14. That is odd, in Chrome it does not work either and I simply get ERR_SSL_PROTOCOL_ERROR. My configuration is pretty much the same, although some of the directives are are defined in the ssl.conf and proxy.conf files. Just to verify, I removed the proxy-conf file for mediawiki I created and added your config you shared above. I get the exact same results in Firefox and Chrome now, without the ability to connect via IP internally now though. Any thoughts there? Could it be an issue with letsencrypt and/or the cert maybe?
  15. I am using this container on Unraid behind the Linuxserver.io letsencrypt container. I see that you recommended that in your documentation, which is very good I might add. I have learned a lot from your notes, so thanks for that. I am still new to Nginx though, and am having some issues getting it to work properly with mediawiki. I have tried using the docuwiki proxy config in the letsencrypt container and changing the proto, ip, port as needed but still having no luck. Currently, I am able to access the mediawiki container via IP internally, but when attempting to use the domain name I end up with an error: SSL received a record that exceeded the maximum permissible length. Error code: SSL_ERROR_RX_RECORD_TOO_LONG Can you share a configuration that works please? I would assume I am just directing it to the container IP:PORT with proxy_pass, but I can't semm to figure out the issue. I will note that I have a password enabled as well, just in case it is relevant. Thanks!
  16. I came here with the exact same problem after noticing that my /books folder had no content and the database and books were written to the /appdata path. If you would literally read the last page of this thread (this page), you would find the answer just as I did and wouldn't have to ask the same question. Searching and reading the threads would help in not cluttering the forums with the same question that has already been answered. As stated before, when you run the wizard upon setup, make sure you have the /disk/books path set correctly. Your content is being written to the image because the path for the library was changed via the wizard. As mentioned above, reading the posts on this thread should clue you into this.
  17. Well I tried both, but it was going in and editing existing entries and adding the http/https to beginning of link. It didn't work until I removed the container and appdata and reinstalled. Only then was I able to use http for each link and have it work. Thanks for the reply!
  18. Anyone know why every item I create on the page ends up with a hyperlink of "10.0.0.2:8443/10.0.0.100:80" instead of simply the address to the container or URI the item is for, 10.0.0.100:80 in this instance? I have tried completely wiping the container and various ways of entering the correct information, as well as searching for some file to edit in the configs with no luck.
  19. I think development has ceased on this product. I was using malvarez00's docker and while I didn't have the logs filling up, three different times in the past month I would try to log in and it would hang on starting database services. I found no issues in the logs and each solution I tried ended in a different problem. That support thread was way worse than this one, and no one was even checking it. I attempted to install this Docker and it will not even let me log in. I fire it up for the first time and create a local account, then it boots me to the login screen. Once I enter my creds, it hangs. Unfortunately I think out investment in these cameras was a bad idea and with no more support on the software, Ubiquiti has really lost my respect. I was starting to turn clients onto them for the ease of use, great price point, and amazing community but I have to say I will be searching for alternatives now. What a joke.
  20. Anyone still monitoring this thread? I upgraded to an SSD for my cache drive and migrated all Dockers over, my only issue occurring with this one. When booting, It was hanging on "Starting Database Services" screen, now it only loads up to screen "Error Starting Software Update Service, Read Operation to Server 127.0.0.1:7441 failed on database av." I assumed there was an issue with port 7441 so I added that, and still no luck. If this was a normal unifi NVR I could SSH in and run some commands to manually update the database. Unfortunately I am not familiar with Docker to get these same commands to run. Any suggestions without wiping and starting fresh? Below is most relevant info I can find in the logs: 1580054336.913 2020-01-26 10:58:56.913/EST: ERROR Error starting service: Read operation to server 127.0.0.1:7441 failed on database av in main com.mongodb.MongoException$Network: Read operation to server 127.0.0.1:7441 failed on database av at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:302) ~[mongo-java-driver-2.13.2.jar:?] at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:273) ~[mongo-java-driver-2.13.2.jar:?] at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:84) ~[mongo-java-driver-2.13.2.jar:?] at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:66) ~[mongo-java-driver-2.13.2.jar:?] at com.mongodb.DBCursor._check(DBCursor.java:498) ~[mongo-java-driver-2.13.2.jar:?] at com.mongodb.DBCursor._hasNext(DBCursor.java:621) ~[mongo-java-driver-2.13.2.jar:?] at com.mongodb.DBCursor._fill(DBCursor.java:726) ~[mongo-java-driver-2.13.2.jar:?] at com.mongodb.DBCursor.toArray(DBCursor.java:763) ~[mongo-java-driver-2.13.2.jar:?] at org.mongojack.DBCursor.toArray(DBCursor.java:404) ~[mongojack-2.5.1.jar:?] at org.mongojack.DBCursor.toArray(DBCursor.java:389) ~[mongojack-2.5.1.jar:?] at com.ubnt.common.super.new.D.o00000(Unknown Source) ~[airvision.jar:?] at com.ubnt.common.super.new.D.o00000(Unknown Source) ~[airvision.jar:?] at com.ubnt.common.super.new.D.new(Unknown Source) ~[airvision.jar:?] at com.ubnt.common.super.new.D.new(Unknown Source) ~[airvision.jar:?] at com.ubnt.airvision.data.AbstractManager.findAll(Unknown Source) ~[airvision.jar:?] at com.ubnt.airvision.service.OoOO.A.Ö00000(Unknown Source) ~[airvision.jar:?] at com.ubnt.airvision.service.update.UpdateService.new.super(Unknown Source) ~[airvision.jar:?] at com.ubnt.airvision.service.update.UpdateService.Ó00000(Unknown Source) ~[airvision.jar:?] at com.ubnt.airvision.service.D.Ò00000(Unknown Source) ~[airvision.jar:?] at com.ubnt.airvision.service.D.Ó00000(Unknown Source) [airvision.jar:?] at com.ubnt.airvision.Main.o00000(Unknown Source) [airvision.jar:?] at com.ubnt.airvision.Main.start(Unknown Source) [airvision.jar:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243) [commons-daemon-1.0.15.jar:1.0.15] Caused by: com.fasterxml.jackson.databind.exc.InvalidFormatException: Can not construct instance of com.ubnt.airvision.data.Camera$Platform from String value 'GEN4L': value not one of declared Enum instance names: [GEN1, GEN2, GEN3L, GEN3LM, AIRVISION, AIRCAM, UVC]
  21. Hey guys, for two days in a row now my VMs have crashed. I am not sure what is causing this but I have to stop and restart the array to get them to run again. I will post the relevant log info here and attach the diagnostics from right when it happened. Please note that for some reason, the VM logs show 18:34 for the time when everything else is showing 13:34. Windows 7: 2019-01-24 18:34:04.562+0000: shutting down, reason=crashed pfsense: 2019-01-24 18:33:23.053+0000: shutting down, reason=crashed Syslog: an 24 07:52:21 Tower avahi-daemon[5409]: Joining mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:feaf:f563. Jan 24 07:52:21 Tower avahi-daemon[5409]: New relevant interface vnet2.IPv6 for mDNS. Jan 24 07:52:21 Tower avahi-daemon[5409]: Registering new address record for fe80::fc54:ff:feaf:f563 on vnet2.*. Jan 24 07:53:44 Tower avahi-daemon[5409]: Interface vnet2.IPv6 no longer relevant for mDNS. Jan 24 07:53:44 Tower avahi-daemon[5409]: Leaving mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:feaf:f563. Jan 24 07:53:44 Tower kernel: br0: port 4(vnet2) entered disabled state Jan 24 07:53:44 Tower kernel: device vnet2 left promiscuous mode Jan 24 07:53:44 Tower kernel: br0: port 4(vnet2) entered disabled state Jan 24 07:53:44 Tower avahi-daemon[5409]: Withdrawing address record for fe80::fc54:ff:feaf:f563 on vnet2. Jan 24 09:45:40 Tower kernel: mdcmd (43): spindown 1 Jan 24 13:33:22 Tower avahi-daemon[5409]: Interface vnet0.IPv6 no longer relevant for mDNS. Jan 24 13:33:22 Tower avahi-daemon[5409]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:feb3:52ec. Jan 24 13:33:22 Tower kernel: br0: port 2(vnet0) entered disabled state Jan 24 13:33:22 Tower kernel: device vnet0 left promiscuous mode Jan 24 13:33:22 Tower kernel: br0: port 2(vnet0) entered disabled state Jan 24 13:33:22 Tower avahi-daemon[5409]: Withdrawing address record for fe80::fc54:ff:feb3:52ec on vnet0. Jan 24 13:33:22 Tower emhttpd: error: shcmd_test, 1188: Resource temporarily unavailable (11): system Jan 24 13:33:23 Tower kernel: pci-stub 0000:03:04.0: claimed by stub Jan 24 13:33:23 Tower kernel: pci-stub 0000:03:04.1: claimed by stub Jan 24 13:34:04 Tower avahi-daemon[5409]: Interface vnet1.IPv6 no longer relevant for mDNS. Jan 24 13:34:04 Tower avahi-daemon[5409]: Leaving mDNS multicast group on interface vnet1.IPv6 with address fe80::fc54:ff:fe49:dbdb. Jan 24 13:34:04 Tower kernel: br0: port 3(vnet1) entered disabled state Jan 24 13:34:04 Tower kernel: device vnet1 left promiscuous mode Jan 24 13:34:04 Tower kernel: br0: port 3(vnet1) entered disabled state Jan 24 13:34:04 Tower avahi-daemon[5409]: Withdrawing address record for fe80::fc54:ff:fe49:dbdb on vnet1. libvirt: 2019-01-24 18:33:22.852+0000: 6528: error : qemuMonitorIO:718 : internal error: End of file from qemu monitor 2019-01-24 18:34:04.320+0000: 6528: error : qemuAgentIO:598 : internal error: End of file from agent monitor 2019-01-24 18:34:04.361+0000: 6528: error : qemuMonitorIO:718 : internal error: End of file from qemu monitor tower-diagnostics-20190124-1336.zip
  22. Yeah, you definitely want a standalone pc for that.
  23. Yes, I do have the Advanced Buttons plugin installed, and like I said above it just won't uninstall. I deleted it off the flash drive and will reboot my system when I get home, as I am remoted into a VM right now and can't risk losing access. Thanks Squid, I commend you for all do here. Your dedication to these forums is admirable.
  24. Currently on unRAID version 6.5 I have a few plugins at the top of the list showing that they have updates available: CA Backup/Restore - 2018.03.15 (current version) - 2018.07.15 (update) ControlR - v2018.03.21 Tips and Tweaks - 2018.03.21 Custom Tab - 2017.12.13 Unassigned Devices - 2018.03.21 User Scripts - 2018.02.16 These will not update at all. When I go to update them, I see the notification box pop up with this: plugin: updating: cabackup2:.plg plugin: not installed Then it says Plugin Update has finished. This has only recently occurred and I am not sure if it is only these plugins or if it is something with my system. I will also note I have the advanced buttons plugin installed and I do know it is not compatible with my version of unRAID. I am unable to delete this, as it says it is uninstalling and then tells me it was successfully uninstalled, but it still remains on the list of plugins. I will also note that other plugins have updated to more recent versions, my latest updated plugin is Fix Common Problems, which was just updated on 8/5/2018. Has anyone had issues like this before or know what I can do? Is there a way to manually remove plugins and reinstall them?
  25. Works just fine for me.. sounds like you have some other issues going on.