MarkUK

Members
  • Posts

    32
  • Joined

Everything posted by MarkUK

  1. Same problem here - when adding "2nd Unraid Share" it also becomes stuck on updating. The VM seems to be able to be updated normally otherwise - only this causes the problem... HOWEVER - when I manually edited the XML to add the second filesystem bus (which is put on 0x06, I think, as the first 'bus' ID available) I can now 'save' the settings even in GUI mode...
  2. Hey, Just an update; the previous issue surrounding the cache drive filling us was actually completely my fault; the downloads directory was actually on a share that had cache enabled, to it wasn't leaking out of the appdata directory like I had suspected. The other issue (files remaining in temp) probably persists but also isn't such an issue. Only other thing I've had problems with is the port mapping; sometimes I need to manually change the port mapping, and then change it back, before I can access the web GUI. Anyway, for the time being we're using another method of downloading (hence why I discovered the "cache" issue) but will probably try this again now I've worked out what caused the previous one. Cheers.
  3. The 6-camera limit is imposed by the browser (when viewing more than 6 cameras; I believe a browser can only run 6 simultaneous streams as a hard limit, ZM themselves are trying to resolve it for their future roadmap) so I just use a hosts file with "zoneminder1", "zoneminder2", etc, pointing to the IP, then a PHP page on another server pulls together links to the ZM preview images, but with the custom hosts for each one. Nicely avoids the 6 camera limit. Completely agree with security, of course... I suppose for the most part (for my usage), even when using ZMNinja, it's all internal traffic. Will have to look at the Docker certs shortly, thanks. Cheers
  4. Hey, Thanks for the continued update! Is there a reason why insecure access was removed? Only, for the most part, these are run on local servers behind firewalls with no external access, and even with external access the extra security (to some) may simply not be of interest depending on the purpose. For me, it's just a pain as I'll have to set up self-signed certs - and to avoid the 6-camera limit I use "hosts" (and a custom camera viewing page) - so it's not even on a single host I'd need to set this up for. Having the option for both would be nice - I'll manually re-open HTTP access for my system but that'll potentially be lost with future upgrades! Thanks again for your work 🙂
  5. Hey Markus & all, I've started using this recently and have found it to be pretty good so far! Only, I've got two (very likely related) problems: - Sometimes there is data left in /temp/ even for files that have completely finished downloading. This perhaps is by design and I just need to leave it for longer, but otherwise I'm having to clear out /temp/ occasionally? - The Cache drive was filled up today as the mount must have disappeared. (i.e., what was /mnt/user/downloads/ -> /downloads/ disappeared). There has been an outstanding issue that I believe hasn't occurred again whereby /mnt/user disappears off the entire server - but this is just relating to a single docker. Even after the mount was restored a number of files began downloading off the 'internal' version (within the Docker) and not the 'external' Unraid share. Any idea why? And any idea if there's an straight forward way to make a kill switch or other system that prevents downloads if the mount has disappeared?
  6. Wanted to add that I haven't seen this behaviour since my last post (21 days ago). Hopefully resolved, at least from my end! Cheers
  7. It has just happened again - diagnostics attached. This happened specifically when transferring about 3 GB of files (from a Windows box) over SMB (The share I was copying to is not exported at all over NFS or AFP) unraid-diagnostics-20181010-0041.zip
  8. Throwing another one in the ring here... Unraid 6.6.1, has been running 6.5.3 without problems (that's a lie, but unrelated problems) and now seeing the same - the /mnt/user share disappearing over NFS and, as such, VM's and Dockers dying very quickly. It has happened twice since upgrading to 6.6.1. Attached are diagnostics during a 'crash'. I haven't tried manually remounting the shares, but a reboot has solved it in both cases. I've set the fuse_remember parameter to -1 because I had a problem where I had stale file handlers in one of the VM's (which has happened a few times, too). unraid-diagnostics-20181008-1813.zip
  9. The discussion looks to be very similar in nature to what I've seen (except their problem only lasts 10 - 15 minutes - although, it could just be magnitude; perhaps mine would free up after 10 - 20 hours or more?! Never left it that long so far)... https://www.spinics.net/lists/linux-xfs/msg06058.html Towards the end the discussion moves towards the mass deletion of file structures, similar to what I've seen. Their solution was, effectively, to slow down deletions by reducing how parallel the deletions were...
  10. My guess is that some resource is being exhausted within either XFS or, indeed, one of the supporting techs interfacing either the raid array or the docker/VM's (although I'm pretty certain I've had a crash with a regular Unraid-only rm - no virtual mount / 9p / etc). I had hoped to find the culprit by examining the XFS stats during a crash and seeing an exhaustion of inodes or something, but nothing really looked obviously wrong from that. Honestly, I'm at a loss as to what's causing this - my next step, though, is to try and make this reproducible (and consistent) and remove all the extra factors such as software (ZM), using a VM, a docker, etc etc. Just try and boil it down to the minimum steps to cause the problem... You're right that the level of file access doesn't seem to be abnormal - although the level of deletions may be. The hang is definitely total - even if the system is left for most of the day to recover, it never does. The hang is also purely IO (from what I can see) as the system is still trying to work (e.g., I can run simple SSH commands until it tries to do any IO and then that SSH session will also hang).
  11. Hey Frank, appreciate the reply - I had considered this, but both the memory usage is unaffected, plus the rm can run for many hours (happily deleting files at the time - not just enumerating the list of files to delete) before it crashes! The files are also structured in 5 - 8 folders deep, so each folder only has a couple of thousand files in or less.
  12. Final update for a while. I've reverted back to Zoneminder. I couldn't get on with Motion/Motioneye at all - it was not going to work for us long-term. I've deleted all the old Zoneminder image files - there were about 50 million files and the server crashed once whilst deleting them. I've reduced the storage of ZM images to just events, plus added extra filters to periodically delete older events to remove the 'bulk deleting' that happens when it starts running out of space. Lastly, I'm going to create one (or two) test Unraid boxes running a parity-protected XFS array and try creating tens of millions of tiny files and subsequently deleting them; I'm fairly sure this is where the problem is happening and (for my own sanity, if nothing else) want to prove/disprove this and possibly even create something reproducible from it. Until then, this machine shouldn't fill up for 10 - 20 weeks so it could be a while before I naturally experience this issue again - thanks for your input!
  13. Further to add: the logging server can accept any command that can be executed over SSH; so if there's any good logging commands I could be running during/before a crash let me know and I'll add them to the test!
  14. Hey pwm - I've got various logging happening (namely, I run a continuous "ps" to only show D-state blocking; plus a separate server monitors the xfs_stat output; the CPU load and CPU usage; and memory (free/total/etc)). Before last nights crash the last recorded values were: Memory free: 85% Load: 13 (1 minute), 12 (5 minute), 9 (15 minute) CPU Usage: 2% I didn't (because it's done using Putty, which wasn't running at the time) get the last ps output. I've also got the XFS_Stat output if it's useful, but it seems the metrics in that are just growing statistics rather than a decent snapshot of state. So, short answer: no, nothing is consuming any memory (85% is roughly the normal amount with no deletes running whatsoever!).
  15. I can't believe I'm back on this - again... I changed to Motion/Motioneye (which, by the way, I really didn't get on with - another story, though!). So NO Zoneminder in the mix whatsoever. And it happened again. But - I was mass-deleting files (ironically, the Zoneminder files - a couple of million 100Kb files). I am - nearly - certain that this error is being caused by the mass deletion of millions of files on XFS (or a parity-protected XFS array or some other factor of the setup). To 'check' this I've disabled all Dockers and VM's and am just mass-deleting files. I'll update this if that crashes - and, if so, I'll just run my regular setup (including CCTV - NOT ZM though) and no mass deleting - in theory, the server should stay up indefinitely.........
  16. Right, yet another drastic change to try and resolve this. Firstly, I wrote a test case to generate tens of thousands of images repeatedly and delete them continuously, to try and provoke 'the problem' above. No joy - even after peaking above 20 load, it still wasn't pushing it to the same problem. So, for now, we're going to use Motion for our CCTV. I don't like it as much (partly because we're so familiar with ZM) but it stores events as video files rather than images, so far fewer deletes required (ZM supports this only in the test branch; when it comes to mainstream I'll try this whole thing again!). Thanks all for your input - I'll update this, again, if the problem comes back even without Zoneminder! Cheers
  17. Hi Squid, thanks for your reply. It was my understanding that the total free memory on Linux was (roughly) Available + Buffers + Cache (minus some of that which can't be freed immediately) in which case there's 10GB free? Good spot on the swapfile plugin; I'll disable that right away! The problem was definitely happening beforehand, but I doubt it can be doing it any good if it's not supported (nor, I now see, updated in 3 years!). Cheers!
  18. Sadly, this is back AGAIN. So... before I do something like replace further hardware (namely, PSU, possibly USB stick - I don't think it's this as it's a new stick, but nonetheless)... What can I do to see what's happening here? I've checked /proc/*/stack which didn't exist (not really familiar with process debugging, so any thoughts would help). iostat shows very little activity - no disk activity, only tiny (< 2%) iowait, mostly in idle (> 90%). cat /proc/meminfo (mid-crash) MemTotal: 16376368 kB MemFree: 212004 kB MemAvailable: 10024248 kB Buffers: 10472 kB Cached: 10778832 kB SwapCached: 0 kB Active: 6876176 kB Inactive: 8830668 kB Active(anon): 5383916 kB Inactive(anon): 135492 kB Active(file): 1492260 kB Inactive(file): 8695176 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 20 kB Writeback: 0 kB AnonPages: 4917608 kB+ Mapped: 132476 kB Shmem: 602156 kB Slab: 243832 kB SReclaimable: 125428 kB SUnreclaim: 118404 kB KernelStack: 10144 kB PageTables: 26628 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8188184 kB Committed_AS: 6774596 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB AnonHugePages: 4577280 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB DirectMap4k: 343504 kB DirectMap2M: 16312320 kB If I'm SSH'ed in at the time I'll usually get a few minutes where I can run a few commands (if I happen to get the outage text in time - I usually do, but it can be in the middle of the night). I will have to wait between 6 and 48 hours for another failure, based on typical failure rates. I'm left with the options - move Zoneminder to another server (very much don't want to do this - this single server EASILY can cope with the load, it operates at ~ 10% CPU usage for most of the time), try 'randomly' replacing hardware, or fixing this damn issue. Any ideas at all that would steer this towards a fix are appreciated - thank you!
  19. Hey III_D, thanks for your reply. I have been keeping an eye on eBay for some good priced PSU's fearing the same! I've (as of yesterday) replaced the Docker version of Zoneminder with a VM and installed it directly on there - perhaps some odd behaviour is happening. If the problem comes back I may have to follow your advice and replace the PSU! As it happens, it's a few years old, but relatively alright when it was new, but definitely could be a culprit! Cheers!
  20. New update: The problem is BACK. Problem is, I've changed motherboard (and thus CPU and RAM) and I'm no longer using the PCI-e LSI controller, which I had started to suspect as the problem. Attached are my new diagnostics. Drive 3 has a single error on SMART that, I believe, was the result of a bad cable (I changed cabling when I moved machines again) and I haven't experienced anything before or since. This happened immediately after changing systems and didn't occur again after changing for a new cable. I'm at a complete loss here - I still think this is likely to be caused by Zoneminder deleting large volumes of files, but now the only commonality between the two setups are the hard drives and the software installation (and, I suppose, the USB stick... ). Any thoughts?! unraid-diagnostics-20180807-0016.zip
  21. I've now moved the entire Unraid setup back to my previous mobo&cpu (one with 8 SATA ports on board) - the processor is about half the overall speed, but no additional controller is needed. Despite some issues it has gone back now and is working - the faster motherboard (using the LSI card) is no longer in use. I may well build a temporary LSI-only setup with a few spare/smaller drives and see if I can replicate and/or fix the issue - but the card cost about £70 and I don't know how much energy I want to put into solving this. So, the problem is 'fixed'/side-stepped rather than fully solved, but will have to do for now! Will re-visit this if the problem starts again on the test setup. Thanks for all the help
  22. Last update for now: I'm not 100% sure but I feel that the LSI SAS2008 card may have, between a few moves, become slightly loose in the socket. I've re-seated it, have asked Unraid to rebuild that drive, and will post an update in a few days if no crashes happen before then! Thanks, all!
  23. You could just "netstat --listen" on your box, look for the Telnet or FTP ports being open, and then manually edit /etc/inetd.conf (by commenting out the line for FTP, which - at least on my box - appears to have been enabled by default), then proceeding to /etc/rc.d/rc.inetd restart Edited to add: my box is only 2 months old and FTP was enabled by default; Telnet wasn't
  24. Attached is a new diag report (different from the above, obviously, in that one of the data drives has failed) unraid-diagnostics-20180801-1924.zip