Vcent

Members
  • Posts

    8
  • Joined

  • Last visited

Vcent's Achievements

Noob

Noob (1/14)

2

Reputation

  1. Update, in case someone else hits this lovely snag in the future: The problem was indeed a ram-stick, which worked perfectly fine until stressed juuust right, at which point it started outputting gibberish garbage, which crashed the system. Unfortunately, none of the memtests identified the ram as being faulty, passing with no errors every time (unless I started faffing about in the options while a test ran, which would sometimes make a ton of errors appear - I'm guessing that's a bug in memtest though, rather than indicative of this particular flaw). I ended up finding it by removing one stick, running it for a day, stressing the system, and when it didn't crash, tried swapping the sticks to verify the problem - and indeed, it promptly wound up crashing the system once everything got loaded hard enough. Curiously all memtests regardless of runtime still insist that everything is peachy, and that there's nothing wrong with the working, or the faulty stick. System has been both stable and dependable after removal of the problematic stick.
  2. Probably too late to help you, but the solution to that is to type in 'digikam' without the quotes, and being slightly patient as it is launched. Unfortunately launching it that way, leads to a tiny window with digikam opening, since Guacamole for some godforsaken reason assumes window size should remain static, and squashes digikam into the same window size as a terminal window. I found some sort of way to fix that, but unfortunately it's been long enough by now, that I've forgotten how I did it.
  3. It's currently on the last 25% of the final (fourth) pass of MemTest86 Free Version 9.3, which much like the included memtest86+ on unRaid has found .. diddly squat, except that there are no issues with the ram. I would be highly surprised if that changes during the last hour of the test, seeing as the included memtest was run several times previously, and also didn't find anything during any of the passes either. Unless unRaid is somehow significantly harder on the ram, since it manages to kill the machine in a much shorter period of time than any of the test runs. It's done with its run, finding nothing. The advice it gives to run it again in multi-CPU mode is nice, but not possible on my motherboard/CPU combo, due to some UEFI limitation on it. Don't have any other AMD boards lying around to test with either. I'm letting it run once more through the night, but I'd be gobsmacked if that changed the result.
  4. Did you ever figure this out? Having the same problem, and have had it for a while now - tried changing my OVPN file to a different endpoint, didn't change anything - figure it's the torrent client trying to contact itself through the VPN, which OpenVPN doesn't like (for obvious reasons).
  5. Been having the syslog server (or rather mirror) up for a while now, problem is that it rarely captures anything particularly interesting - a docker will drop a net connection, then make up a new one, and so on, until eventually .. the server just stops responding to anything, usually starts producing heat, and spins up the fans. Oftentimes the last message will either be about the dropped (local) IPv6 address, or about the disks spinning down, then nothing else gets logged. And that's it, nothing more was logged. For once it actually managed to log ..something, rather than stop around 13:25:59 (withdrawing address, making a new one, yadda yadda, server unresponsive). DM-3 shows up in the log, which is as interesting as it is annoying, since I still don't have a cache device, although it might come from the system having a misconception somewhere about having a swap file (which it doesn't anymore, and swap is indicated at 0kb)? I actually thought for a while that I had figured out the problem, as a misbehaving docker got corrupted, and started spewing out files into its config directory, and since I was running minimal plugins, nothing ever reported that it was doing so, or filling up. Fairly sure I've fixed said docker, or at least directed it at a suitable target, but still - server goes space heater less often now, but it's still happening far too often to be useful. I'm guessing that the almost constant "Old network address died, making a new one" is due to a docker VPN, which itself keeps detecting a loopback which makes it kill off the packet ..this happens not infrequently, but interestingly enough everything works ok(or at least as expected), and I can't quite figure out how to stop it from happening - the issue is that a client ends up trying to send packets to itself, through the tunnel, which uhh.. OpenVPN obviously doesn't like, and kills the packets. Currently I'm guessing the shutdown is due to a thermal issue, but I can't conclusively say that it is - There's fairly decent cooling overall, although the area around the USB slots/North bridge does seem to get fairly hot for some reason. I do however have some pictures of the on screen/console output once the server goes down, sadly they're pretty much all the end of a trace, with a bunch of register addresses which mean ..nothing to me. Most understandable was > Kernel panic - not syncing: fatal exception in interrupt > Kernel Offset: disabled > ---[end kernel panic - not syncing: Fatal exception in interrupt ]---
  6. And I guess more bumping. System is by now killing itself multiple times a day, although I did manage to trace the php killing to a docker container, which has been removed. At this point even thinking about a parity check is a joke, as it just slows down the server before the next kill restarts the process. There's nothing particularly consistent about it - sometimes I'm doing something that uses a good amount of resources and it works fine, other times the server dies, often it's just left on its own, then dies, one time I've even just left it at the unRaid login prompt, with nothing mounted or done (not even logged in), and yet it's managed to kill itself in the ~10 hours or so it was just ..standing at an idle login prompt, with no workload at all. The logs are of little help to me, and I can't claim to be able to decipher the kernel panic log that remains on screen whenever the server crashes - it's not even the exact same every time either, although it does consistently appear to be of "Not syncing: fatal exception in interrupt" type, whatever that means beyond a fatal error in an interrupt though, 'i haven't the foggiest.
  7. Update, I guess. Not going to be helpful for anyone with a similar problem, I suspect. Got parity upgraded by rebuilding it, then upgraded drive by rebuilding onto the old parity. So parity swap, manual style. Quite annoying, but worked first time I tried it. Currently the server is busy crashing/freezing itself about every 1-2 days, killing some flavour of php (php-7 I think?) for using too much memory, due to a pathetically low limit being set ..somewhere, that I can't find. Apparently it can only use ~270 something mb of ram, despite the system having 16gb available, yet that is above the limit, so it gets reaped by the oom killer. I'm also getting errors for a DM-3 device, which is curious, as I don't have a cache drive installed (never have), and I can only find mentions of that address/designation in threads about cache SSDs. Dockers have been nerfed, all running memory limits, none of them approaching them, only running a handful of dockers changes ..nothing, and at this point essentially all plugins have been uninstalled, to no help. System still kills itself randomly, with regularity, and displaying the same symptoms each time it happens: Fans are running at a decent clip, pumping hot air out of the case (despite unraid being frozen and unresponsive to even a basic ping), interestingly the network interface lights still light up, but overall apart from functioning as a space heater, the server is not functional.
  8. Right. I recently pretty much filled my initial uNRAID array, and had a disk that was throwing some errors, although has been stable for a fair while (I'm not entirely convinced the disk is the problem, rather than a temporary issue where I bumped the cable with the array up, reseated it, and it racked up a ton of errors in the meantime), but I digress - I figured I'd swap that disk for a bigger one, and since I got a deal on a 14Tb drive, with my current 12Tb parity, I had to upgrade parity first. I had however learnt about the parity swap procedure before that, so figured it would make sense to do that instead, and use the old disk with the errors as a scratch drive for something. Anyhow, I ran a pre-clear on the new parity drive, which came up fine, and so follow the process for the parity swap, which goes fine ... until it doesn't. The progress will at some point just stop, and stay stuck wherever it got to, never progressing. Specifically once this appears in the syslog, there's pretty much a 100% chance of parts of the server being locked up, the parity swap being stuck, the relevant disks spinning down at their designated spin-down time, and zero chance of doing a clean shutdown or anything like that. Most functionality is still retained, at least insofar as a server with no mounted array can be said to have functionality - usually the webGUI works, although it can crash as well. Logs are accessible, but shutdown commands just get logged, without shutting down. So far I've tried several times to get the process completed, most successfully it got to 100%, then ..got stuck, of course. The syslog did actually manage to capture the successful termination of the old->new parity copy, but since whatever needed to be run afterward wasn't run, this was overall not a success, and the array didn't recognize the new parity drive as correct. Things I've done to try to keep it from happening: Run memtest. Several times, at varying lengths(including just running it all night), all came up clear. Most attempts were made while running in safe mode, to ensure that the problem didn't come from a plugin. So far I'm at my wit's end, as I can't find any clear indication of what or why something is trying to access places it shouldn't, or how to prevent it - the PID listed in any error messages is long dead by the time I see it. For some interesting reason, the drives will all show as "Device encrypted and unlocked" once the server fails in this manner, regardless of whether or not I've actually mounted/unlocked the array before initiating the parity swap. tower-diagnostics-20211206-1716.zip