codefaux

Members
  • Posts

    129
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by codefaux

  1. That doesn't support any of the other metrics from any of the other dozen or so scripts at the link I mentioned, a few of which I'm interested in/using now manually. That's why I suggested adding the text collector as a supported feature on the plugin by default. It won't impact anyone except people who want to use it if it merely watches a folder for scripts to execute. Could you be bothered to explain why? I don't mean to sound dismissive or aggressive or anything, but would you follow a recommendation from a stranger with no justification or explanation? I'm not looking for suggestions; I'm not the plugin author here lol I'll allow them to be run from anywhere you require them to be placed. I'm suggesting that you add in support for the text collector to your prometheus_node_exporter plugin, since prometheus_node_exporter already supports it. I'm suggesting where you should place them. You disagree; place them anywhere you like.
  2. Could you be convinced to add a variable to pipe stdout to a specific path? My target: prometheus_node_exporter contains a textfile collector. There are a handful of community scripts designed to - be run from cron (ie your plugin on a custom schedule) - collect useful (I'd argue critical) system metrics (ie smartmon stats like temps and failure indicators) for prometheus-node-exporter - output data to stdout, to be piped to a file The inconvenience is, to use these is not straightforward. They would require extensive modification most users can't grok. Thanks to your plugin, an alternative is to manually save them to disk someplace, then use your plugin to write a script which runs them with stdout piped to a file, scheduled cron-alike. EDIT: These are the same script, ignore the name difference I used an old screenshot for the one by mistake This is workable but some users may still find this unapproachable. If the user could paste the entire script (in my case, /boot/config/scripts/smartmon.sh aka smartmon.sh [raw] among others) into a User Script, schedule them as cron-alikes, and mark their output to be sent to a specific path, it would make these more approachable. It could be implemented similar to the variables you've recently added; an STDOUT variable could be changed to request redirection. Regardless of your decision, keep up the great work. The plugin has been quite valuable for many of us!
  3. Hey here's another suggestion for an easy addition full of feature delivery. There exists a handful of community scripts to ammend data provided by node_exporter. Most of them seem to be intended to pipe to a text file and read via node_exporter's textfile collector. The metrics go into a file wherever, then to the exporter as a launch parameter, you add `--collector.textfile.directory=/var/lib/node_exporter` for example, and all of the readable files within /var/lib will be exported as metrics if they're in the correct format. For example, smartmon.sh writes smartmon statistics such as temperature and unallocated blocks. nvme_metrics.sh might be of interest, btrfs_stats.py, maybe even directory-size.sh for some folks. The most simple way I can think of is for your plugin to create a directory in the ram filesystem at /var/log and add `--collector.textfile.directory=/var/log/node_exporter` and suggest users execute the desired scripts, writing into /var/log/node_exporter in per-script files. I can see two ways of doing this. - One, users copy script files to someplace like /boot/config/scripts (one-time write for the scripts, no flash wear) and execute them via the User Scripts plugin as such; Scheduled similarly; The /var/log filesystem will exist on any system, won't cause flash wear, is wiped on reboot. The path should have plenty of space for a few kb (or even dozen MB) of metrics being rewritten every few minutes. If it doesn't, failure case is that logging fails on the system -- not ideal but it's mounted with 128MB by Unraid and should never be near full unless a user has a serious problem elsewhere. If it's filling, the absense of this plugin's proposed metrics won't prevent it or even delay it by much. These metrics are designed to be overwritten, not appeneded, so they should never grow more than a few dozen MB in the most obscene scenario. Plugins seem to run as root, so permissions shouldn't be a problem. I'm also going to ping the user scripts developer to allow stdout to be piped to a file per-script, so users can simply paste the scripts into User Scripts and forward the stdout, instead of needing to save them to /boot/config/scripts manually and write a User Script to run it manually.
  4. Might be worth running it from the console to see if there's any ignored error spam like in my comment. Could be an error being dropped instead of forwarded to system logs, like on my (and everyone else's) system using the Node Exporter.
  5. lol it's ok, I was just very confused. I'm not even using the Docker containers, I JUST needed the exporter; I'm running the rest on another system. That's the main reason I went into this so sideways; I only wanted the exporter, but it had all these extra steps and I had to assume which went to what. I did so incorrectly, but that's ok. For sure! I thought the file went with the plugin, that's all. Yup, but I didn't know that and it was all in one giant block so I assumed (incorrectly) and here we are, lol That's very much what I had expected, literally everyone using this plugin is silently ignoring repeated errors because the stdout isn't directed anywhere people know about... Synonymous with "remember that" -- it's something you say to people who know the thing you're about to say. COMPLETELY did not know that, but given the reply upstream that does not surprise me anymore lol. I'll remember that. Since the logs aren't kept, it won't be filling up a logfile. I haven't restarted my array in a few months and I don't intend to soon, so I'll likely just leave them running in a detached tmux until the issue is properly resolved. I'd already opened a bug report upstream, I'll link them here and add this information but it seems for now the best bet would be to patch the plugin to disable the md collector on your end. Edit: Github issue https://github.com/prometheus/node_exporter/issues/2642
  6. I already explained that, but I'll be more verbose. I downloaded the plugin before I wrote its config file. You wrote the instructions. Steps four and five. I did not perform steps four and five until after I had installed the plugin. I also did not do anything beyond those steps, regrading modifying settings, configuration or parameters. There's no "restart" control that I could find, and I didn't feel like restarting my entire server (or the array and all of my services) simply to restart a single plugin. Thus, I used the console, killed the running process, and restarted it by hand. No custom parameters, I didn't change some settings. You'll note that I never said there was a user-facing issue, or that I couldn't connect or report metrics from it. It functions just fine, but on my system it's burping a line at the console about an internal (non-critical) error, every time Prometheus connects to it and retrieves metrics. The only difference between now and if I uninstall/reinstall/reboot is that the errors will be sent to a logfile someplace or discarded entirely -- I have no idea which -- instead of being sent to the console, since I ran it by hand. What I'm realizing though, is this is above your head, so to speak. If you run it by hand yourself, does it throw an error every time Prometheus polls it? I'll file a bug upstream with the actual node collector's maintainer, as it's now clear to me that the actual collector is mishandling the output of /proc/mdstat on my system and it has nothing to do with the small wrapper plugin you wrote. Mmmmmmm no, though. It's not "only meant to be installed." It's meant to be learned about, your system is meant to be manually prepared for it, and then it's meant to be installed. Steps four and five could/should be handled by either the previous step's container, or this plugin itself if it notices the configuration file is not present. Furthermore, installing the plugin starts the collector, which already expects the config file to be present, so steps four and five should actually be before step three. If the installation process weren't so complicated, I would've noticed that this wasn't your problem earlier. I installed the plugin by finding it in the CA app and going "hey that does what I need" and then discovering that it wasn't working. And in NO situation do you merely "install the plugin and that's it" so that's just a flat inaccurate thing to claim.
  7. Literally just run the binary from the ssh terminal. I hadn't written its config file yet so I noted how it was executed, killed it, wrote the config file, and executed it by hand in a terminal.
  8. Bug report. Prometheus Node Exporter throws an error every single time it's polled by Prometheus. EDIT: As of version 2023.02.26 -- first time using this plugin, unsure when bug first appeared Text is; ts=2023-03-24T17:51:45.859Z caller=collector.go:169 level=error msg="collector failed" name=mdadm duration_seconds=0.000470481 err="error parsing mdstatus: error parsing mdstat \"/proc/mdstat\": not enough fields in mdline (expected at least 3): sbName=/boot/config/super.dat"
  9. Glad you solved it, something to always keep in mind when building a system. To the general public who winds up here looking for similar solutions; (Posting it here because they won't go back and experience it like you did. Seriously not trying to rub it in. This needs to be seen *anywhere* the conclusion is "overclocked RAM" and is *NOT* JUST FOR RYZEN SYSTEMS but they do it worst. We've had this talk, I know you know.) Never overclock RAM, at all, regardless of what it says on the box, INCLUDING XMP, without testing it. Gaming system, Facebook system, ESPECIALLY A SERVER. XMP IS OVERCLOCKING. I'll say it again because people argue it so much; before you argue, Google it. XMP IS OVERCLOCKING. It's a "factory supported overclock" but it IS OVERCLOCKING, and you HAVE TO TEST FOR STABILITY when overclocking. Read Intel's documentation on the XMP spec. I DO NOT CARE if your BIOS defaults it on, VIGILANTLY turn it back *OFF* unless you're going to run an extremely extended *(DOZENS OF HOURS) RAM test when you're building a server. RAM overclocking leads to SILENT DATA CORRUPTION, which is *RIDICULOUS* on a SERVER, which is explicitly present to HANDLE DATA. I should also note that I personally have never seen a literal server-intended motherboard which supports *any* means of overclocking, and I feel like that's due to the causal link between overclocking and data corruption. Overclocking server hardware is NOT a good decision, unless you also test for stability. Overclocking RAM without testing it is literally identical to knowingly using a stick of failing RAM. You're running a piece of engineered electronics faster than the engineers who built it said to, often at higher voltages than it's designed for, to get it to go fractionally faster. Does that move a bottleneck? Not even a little bit, not since the numbered Pentium era. It HELPS on a high end system, IF YOU TEST IT THOROUGHLY, but I would NEVER overclock RAM on a server. I feel like NOT having silently corrupted data on a randomly unstable system is better overall than "the benchmark number is a little higher and I SWEAR programs load faster!" Stop thinking overclocking RAM is safe. Stop using XMP on "server" systems. Any potential speed gain is not worth it. Be safer with your systems, and your data.
  10. You fundamentally misunderstand Host Access. Host Access allows the HOST (unRAID) to reach the Docker container; otherwise the HOST (and only the HOST) cannot reach the container with Host Access off. The containers can get to the outside Internet with no host access, and no container-specific IP. The containers can receive connections from anywhere EXCEPT the host, with Host Access turned off. The containers can receive connections from anywhere INCLUDING the host, with Host Access on. When it crashes, bring the array up in maintenance mode, do a manual xfs_repair on every drive (/dev/md* from terminal, or manually per drive from GUI) -- I used to, every time unRAID crashed, and do still when we have power interruptions -- and now no longer have to fix parity. I still scan as a test every so often on my 113TB 30-disk array but it no longer requires sync fixes ever. I'm unsubscribing from this thread; I've posted my fix (still 100% stable on my hardware, where it was unstable regularly and repeatedly, and verified by reverting into instability and re-applying to regain stability) and it's not helping and/or nobody is listening. If anyone needs direct access to my information or wishes to work one-on-one with me to look into issues privately, or verify this is the same crash, etc, send me a private message and I'll gladly help; the chatter here is going in circles and I've said my part. I hope at least someone was helped by my presence; good luck.
  11. As an afterthought; because it's probably relevant, here's my network config page. Heading down for the night, will check in when I wake up.
  12. Try 6.8.3, enable vlans even if you're not using them. That seems to be the part which fixed my issues. If that isn't stable I'll screenshot my configurations and we'll try to figure it out -- I was crashing every few days with five or so containers. Now I'm stable with easily a dozen running right now. Good catch, I forgot about that actually. I had disabled host access at the time, expecting it to have been a part of the fix. Currently host access is enabled, still stable, on 6.8.3 -- other things may have changed since then since I accidentally reverted a version, but stability is unaffected. Here's a screenshot of my currently running system.
  13. Up until a week ago I was still running 6.9.x as mentioned earlier, with host access, with flawless uptime, given the workaround I indicated. I recently accidentally downgraded to 6.8.3 (flash device problems, long story) and I'm still stable. Perhaps try 6.8.3? PS, anyone on the 6.10 rc series who can verify stability? I'm unwilling to touch anything until Limetech has this issue figured out. I tried briefly with a brand new flash device but it issued me a trial license without telling me what's what it was going to do, and since keys are no longer locally managed I can't fix it without contacting support (GREAT JOB LIMETECH) so capital F that, until I know it's gonna work.
  14. I still haven't updated, because the workaround I posted still works for me. I'm using Docker with 20+ containers, each with a dedicated IP, all from one NIC. I never even tried the 6.10-rc because after it was released, I read a few posts from folks using the 6.10-rc that the problem still existed even with ipvlan. Honestly, I'm stable, and I'm not going to upgrade until I stop hearing about this bug.
  15. Sorry to hear it -- I'm still running the configuration posted and still stable save the recent power bumps in our area, but I had nearly a month of uptime at one point and even that power cycle was scheduled. I've also moved additional Docker containers onto the system so I could shut down the other for power/heat savings during the heat wave. I haven't had the spoons to convince myself to change anything since it became stable, so I'm not on the RC yet. I might suggest that if there is a panic, it could be unrelated -- post its details just to be sure. Various hardware have errata that could cause panics, including both Intel and AMD C-state bugs on various generations, and even filesystem corruption from previous panics -- something Unraid doesn't forward to the UI from the kernel logs, by default. Good luck all.
  16. Yes, I am well aware of the significance of VLAN 0. I'm also well aware that it happens when VLANs are enabled for an interface. However, upon reading my message, the following things stand out as unusual -- which may be why I wrote them in detail, and provided screenshots -- 1 -- There are no currently enabled interfaces with VLANs enabled so..."perfectly normal when" goes right out the window, yeah? 2 -- Before (when I was crashing) I did not have VLAN 0 messages in my log 3 -- After (now that I'm NOT crashing) I DO have VLAN 0 messages in my log The first point I was trying to raise, in this case, was that I'm not using VLANs on any of my enabled network interfaces, and none of my disabled interfaces have VLANs configured but they are enabled -- the system seems to still be activating VLAN 0 as if preparing for VLAN operation, which seems to be abnormal thus commenting on it, if it were normal I wouldn't have taken the time. The second point I was trying to raise, specifically, was that I seem to have stumbled upon a potential solution for this crash problem while Docker/Kernel/Limetech find a solution. What I'd like to see is if anyone experiencing these crashes could do the following; A) Enable VLAN but do not configure it on an unused network interface B) DISABLE that interface (IPv4/IPv6 Address Assignment None, not a member of bridge or bond) C) Reboot, start your array, check if you see "VLAN 0" related log messages D) Report if your system becomes stable I've got uptime around a month now, perfectly stable, with no changes to my system beyond what's mentioned above and in greater detail in my previous message.
  17. That's unfortunate, I'm still stable and I really would love to help figure out how. Normally, within twenty four hours (I check my logs like ten times a day during unstable periods) I'd have one non-fatal panic attributed to the nf_nat or contrack or similar, later (several hours) followed by a metadata checksum error on a random Docker-active volume (assuming a container touched a file, kernel panic caused the thread to drop before metadata was updated, etc?) which would worsen (checksum logspam) until I got another nf_nat or similar subsystem panic which would actually be fatal and lock the system entirely. I actually got very good at the reboot/maintenence mount/fsck thing. This explicitly has not happened since the day I switched off bridging and bonding. Another thing I noticed is in my kernel logs I'm seeing a few new lines that I haven't ....excplicitly noticed before? I don't know if it's due to the no-bonding/no-bridging configuration or not. It likely is due to the fact that I started to enable VLAN on eth1 before disabling eth1, but my logs include VLAN 0 references. So, to be clear, the day I fixed this crash, my configuration changed FROM; eth0 + eth1 in 802.3ad active aggregation (bonding) with bridging enabled, vlans off TO; eth0; bonding/bridging/vlans off eth1; bonding/bridging off, vlans ON (unconfigured), interface disabled My suspicion is that despite eth1 being disabled, a script is detecting that "an interface has vlans enabled" and is triggering a default VLAN 0 handling on all interfaces...? Anyway, relevant kernel logs: [597058.181899] docker0: port 1(vethcd65992) entered blocking state [597058.181902] docker0: port 1(vethcd65992) entered disabled state [597058.181961] device vethcd65992 entered promiscuous mode [597058.182040] IPv6: ADDRCONF(NETDEV_UP): vethcd65992: link is not ready [597058.751447] eth0: renamed from veth8c778f7 [597058.762734] IPv6: ADDRCONF(NETDEV_CHANGE): vethcd65992: link becomes ready [597058.762825] docker0: port 1(vethcd65992) entered blocking state [597058.762828] docker0: port 1(vethcd65992) entered forwarding state [597058.762934] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready [597091.715908] vethbcba7b5: renamed from eth0 [597091.745044] igb 0000:01:00.0 eth0: mixed HW and IP checksum settings. [597093.591117] igb 0000:01:00.0 eth0: mixed HW and IP checksum settings. [597093.591546] eth0: renamed from vethc557863 [597093.602749] 8021q: adding VLAN 0 to HW filter on device eth0 [597094.179544] veth8c778f7: renamed from eth0 [597094.215339] docker0: port 1(vethcd65992) entered disabled state [597094.292125] docker0: port 1(vethcd65992) entered disabled state [597094.294522] device vethcd65992 left promiscuous mode [597094.294523] docker0: port 1(vethcd65992) entered disabled state This happens any time I cycle a container, note the VLAN 0 reference. I'm almost positive that was not present before. Relevant network configuration: Perhaps repeat this by enabling eth1, enabling VLAN, disabling it, then restarting your array? I'm not sure if it's worth the effort, but I suppose during your next crash cycle you could probably do that without a lot of fanfare. Unrelated; Anyone interested in the one-liner I wrote to unconditionally scan-and-repair every drive in your (Maintenance Mode mounted) array, assuming A) You're okay losing files which are corrupt and would otherwise require extensive filesystem-level repair to recover, and B) you're using xfs, no encryption or can modify the script to accomodate either of the above? I just paste it in an ssh terminal after every non-graceful power cycle, typically lose a logfile if anything at all, BUT it's definitely not for mission-critical arrays. Frankly unless you're going to do filesystem repair or pay someone else to, this is what you're gonna wind up doing to get your filesystem to either mount or stop complaining about checksum errors anyway. It also, so far, has resulted in no sync errors caused by kernel panics when I re-scan when starting the array normally (removing the Maintenance mode) afterward. I may lose a file or two (you can find out --scrolling back to read logs or capturing them to file is an option) but this is non-critical data for me. I prefer the parity safety and ease of use to the ability to pay someone to save a file or two, lol..
  18. I FINALLY have something I'm fully confident posting. It's been over two weeks since my last crash. I haven't seen a single panic in over a week and a half, but I DID reboot once during that time for an unrelated issue. Host Access to networks was disabled throughout; this did not improve the situation. My previous configuration used eth0 and eth1 in an Active 802.3AD bond, with Bridging enabled as well. This suffered the kernel panics described in this post. Docker containers were on dedicated IPs on interface br0. My CURRENT configuration uses eth0, Bonding off, Bridging off, with Docker containers configured "Network Type" as "Custom : eth0" -- Host Access is still disabled; I have not tested enabling it as I do not require it. My kernel logs have been clean* since. I don't know if this will help anyone, but it has finally and conclusively fixed the problem for me, using Unraid 6.8.3 (Yes I know it's out of date, I'm trying to chase down a problem so I avoid changing two things at once.) *I have a recurring GPU-related message about copying the VBIOS which comes up regularly, no clue why, but it's unrelated.
  19. While I understand that they are harmless and related to the Docker engine, we're also having an nf_conntrack related crash, with the Docker engine. Some of us, at least, in another thread which the devs recently mentioned that they can't reproduce. Maybe this should be looked into instead of dismissed? What is the actual location of the misconfiguration? How would one go about setting it to conform, despite it being harmless? Is this within the reach of an experienced Linux user, or is this developer land? Kernel parameters? Module parameters? Sysctl pokes? I cannot find reference to this error that isn't an unRaid forum, and none of the ones I've found have a fix.
  20. @SimonF For what it's worth, my bug report is about SMART settings in smart-one.cfg being erased by a badly written config handler, any potential relation here is that my RAID controllers (and MOST RAID controllers) don't pass most SCSI Generic commands through (that's the invalid opcode stuff) and block the spindown requests. Looking at your logs, it took me about two seconds to notice that your diagnostic is absolutely massive. Syslog is bloated with messages from Docker regarding renaming interfaces, which it does on start/stop. Further inspection from your docker.txt log shows the same. You have Docker containers starting/stopping CONSTANTLY. Like, constantly. time="2021-05-14T15:01:00.939164349+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/9daba11afa5b1539423fbfcaa502e85ca37f44511d3757ceb9fe8056091dfd6e pid=9625 time="2021-05-14T15:01:02.395197081+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/8815f62b37eff81428302a45ff320d6c76751e8ca3308c2263daf9115b40f74b pid=10397 time="2021-05-14T15:01:03.293534909+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/7e3fb4527bdc2a445aef61e6b3f9af809ab9cb36d611eefda72fdde9ef079bda pid=11011 time="2021-05-14T15:01:04.999049158+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/ff736b719aff2ebb8ce67c46d004d045a18c587fbc87d633be5a5d2c8996b119 pid=12017 time="2021-05-14T15:01:07.716413644+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/64c3621669d986744d9125dc9a5b55726019b51292b61e48df796ce3d68be50d pid=13557 time="2021-05-14T15:01:08.773384420+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/efa3d7d9abf05c6ecbbf0aed96416403c1379fd4f36591b98b0682ef774c8f7d pid=14225 time="2021-05-14T15:01:12.045508457+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c813ad16fe529af568973fdc650a9bd3de57bd6e55d6367a93ddd4084665c4f pid=15538 time="2021-05-14T15:01:15.746417852+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/1e52f4e57041c2f7893d8a6e1abf7e37e27fcf818fab52effef960f5db12aa34 pid=17061 time="2021-05-14T15:01:16.448800458+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/839c48777be381e1f8f1b714c9d14abf5f40311930fb276cecda12a2cdb4fea8 pid=18076 time="2021-05-14T15:01:17.235544165+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/fa6e0d3a14af579e27803b51d02dabddca368af83821a8d6c945e8787b781b33 pid=18530 time="2021-05-14T15:01:18.245830784+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/7dac976c24b8e028fe6dd0b01eddc81d0f0cd81e9f6cb8dab8d5ecec2f8300f8 pid=19635 time="2021-05-14T15:01:44.028428540+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/55a2bd752bd9b5168e218a67fc6b6e2235701169c4a90731bd35bc5e1cf42005 pid=25056 time="2021-05-14T15:01:44.949010842+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/fe519f83d40335b9cef8c78ff77d6280695288cf150e408c7c1ddac80672c77c pid=26477 time="2021-05-14T15:01:48.531324200+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/ef6a7ac83c489c6571bfc5840666293ee6e989f0908abfece313ac818d78b6b9 pid=28876 That's one minute and 14 container restarts. The array could be failing to spin down because a container is hammering it with constant IO while trying to start up. Look at your Docker containers page, under the Uptime column, and see which one never grows old. Check logs, repair the cause, reboot, and see if it still fails to spin down -- if it does, please submit another diagnostic afterward.
  21. If your board has a dedicated management port (and you're using it) the disappearing IPMI access means the BMC died too -- BMC should run sans RAM, on most systems. Hell, the BMC in some systems will run without a CPU. The only thing I can imagine faulting the BMC and shutting down a system would be failing hardware (CPUs, motherboard, RAM, power supply, cooling) -- but even with an overheat situation, the BMC will normally stay up. I've never heard of this "3.3v electrical tape trick" and it horrifies me. Electrical tape is awful and frankly I would petition distributors to stop selling it fully. There is no situation in which electrical tape is the best option, literally ever, especially for things like this. Please use Kapton tape for mods like this in the future -- it's not easily punctured, its adhesive doesn't turn into a greasy black snot smear after brief exposure to warmth and/or moisture, it doesn't stretch or shrink or slide out from beneath pressure -- all qualities "electrical tape" revels in. Kapton tape. Polyimide tape. Also having to workaround a problem related to the power supply not being good enough for hard disks on a dual-CPU server-grade system we've been suspecting the power supply on -- gotta say it doesn't instill confidence. Anyway - if you're still suspecting it's an Unraid problem (despite your report of it doing the exact same thing in the memtest before according to previous messages, which 100% clears Unraid of responsibility if this is the case) feel free to boot that Windows environment you built, use the storage drivers you slipstreamed into it to assemble a striped Storage Space, and run a batch of benchmarks and/or stress tests on it so Windows is doing more than just sitting idle at the equivalent of a stopped array screen and never crashing... I'd imagine you have your own benchmarking tools selected, since the sort of person who can build a PE with that sort of integration packed in tends to have related tools picked out already. If Windows runs for several hours but Unraid still power-faults, you have your answer. Actually, do yourself a favor and do that anyway. It seems to be a looming question in your mind, it'd be best to solve it instead of continuing to be offline for so long with no motion. Ideally something here brings a solution one step closer. Good luck, and let us know. I apologize if I'm intermittently slow, things don't look to be queting down for me in the nearish future. I'll try to check back in regularly.
  22. Being able to connect to your domain from within the LAN but reaching your Unifi WebGUI (which I'm left to infer is what you're using for a router..? I've only ever used Unifi for Wireless AP management, so I'm left to assume A WHOLE LOT here) that means your router is not appling port forwarding to your request. This is normal behavior - you're trying to connect to your external IP from behind its router, and many, many routers handle this case exactly as they should, which is to say they silently ignore the request. Picture it like throwing a package through a window -- it shouldn't come flying back in, unless you did something to make it do that. You need to set up NAT Reflection. Also, it is a wildly horrible idea to expose the Unraid web interface to the internet, it is absolutely not designed for that sort of exposure. Very, very risky. If you want to use a domain name to connect from inside only, you need some manner of DNS override if you don't have the ability to set up and properly administer your own local domain. I don't know about Unifi's capabilities; I tend to stray away from the proprietary/closed-source/overly expensive systems so I'm not the guy for that. As an example of implementation, Unbound (software service, available on Linux and BSD-based router software) is capable of redirecting any request to any single (or wildcard) host/domain to any other IP address, silently. Obviously SSL breaks horribly when doing this, but it works for local-only domain resolution. This is how I access the things I have which should not be exposed to the big wide world -- instead of taking over *.cookies.us I would instead take over *.cookies -- that way I'm not walking over any public namespaces. If you need access from both inside *and* outside, NAT Reflection is required, and will even maintain SSL certification. The fact that you're able to connect when the VPN is turned off (of course you can't with it turned on, the VPN is intercepting and tunneling all of your non-local traffic because that's what they do, and your external IP address is non-local) means somehow your request for your domain name is either A) resolving your router's local IP which should never happen, or B) resolving your external IP, which is then connecting to the router, which is accepting the connection instead of applying Port Forwarding to it. If the router is accepting that from the public side, that also means you're exposing that web UI to the internet, which I would not consider a good thing. Ideally that helps somewhat -- let me know if not, but things in my life have been ...bumpy, so I may be delayed.
  23. This was my solution also, due to both this and a bug which eats SMART config files. 6.8.3 -- still saw one trace for nf_xxxxx but it was non-fatal and I've been stable for a week and a half ish, for the first time in kinda a while.
  24. Fair, I wasn't specific enough in that statement. Yes, that socket is only disabled when using PCIe based M.2 storage in M2_2 -- I left that unsaid since, as you also pointed out, it was already on the screen and I felt no need to handhold the information out any further. Thank you for adding to my specificity, which does not address my original point in that yours was still lacking. And perhaps it was overstated, but it's pretty clear that my point was that there is no more accurate definition of a vendor's specific implementation than their own documentation, providing they wrote it correctly and keep it up to date with regard to changes. You can throw attention at my tone and word choice all you wish, that's not the action of a person with strong standing in a conversation. Your statements were generalized based on the chipset's overarching capacity. My overplaying a single phrase might make me overdramatic, but it doesn't make you correct. Generalizations about a chipset's capability don't apply to every version of a board using that chipset -- demonstrated even doubly so in that each brand even has a few tiers of each chipset available as a different model of motherboard. Specificity matters, especially in the context of the computing world. I'm not sure how adding less specific information to what has been an essentially motionless thread or arguing about its merits has helped matters. In any case, it's getting a bit off-topic. If you feel the need to dispute the merits of accuracy further, feel free to send me a message. There's clearly nothing more to be gained here otherwise, I'm clocking out of this thread until OP needs feedback I can provide. Ideally there was enough to be gained here to have made it worth everyone's while. Good luck!
  25. @John_M That may be factually accurate, but specifically I was quoting the breakdown the motherboard manual indicates actually reaches the PCIe slots, not hypothetically edu-guessing based on what lanes should/can be available to a chipset vendor. Implementation-derived specs are always better than application-derived specs because while any vendor can use an X470 chipset, some vendors might choose to implement PCIe lane distribution, weighting, switching, and layout in very different manners, due to differing choices in included/supported peripherals, as well as how widely they wish to implement the ability for the user to map PCIe lanes onto physical slots IE how many different "bank switch sets" can be accomplished per board. I'm not saying you're wrong, I'm only saying that you're making judgements based upon what CAN be done, rather than what the actual manufacturer who implemented/built/sells the finished product based around the application says WAS done. This is the Word Of GOD on what this specific model of motherboard can do, barring solely A) improper documentation or B) updates via UEFI image/etc which changes the capabilities of the board; Other boards USING an X470 can and will have drastically different layouts and implementations within the constraints of what the CPU/APU itself provides, but every X470 does not have the same layout and pretending it does only adds confusion. I'm not trying to argue, I'm telling you for a fact, of the PCIe x16 connectors physically available, this board has only two which can achieve at best x8, one which will never exceed x4 and is also disabled when using PCIe based M.2 storage, and that is it. Your clarification on which CPU is in use is appreciated and seems to preclude the presence of a graphics processor, but I don't have generation-to-model-shorthand memorized so I'd actually have to spend more time to look it up and frankly it's not worth it. The information has been covered.