Jump to content

codefaux

Members
  • Posts

    129
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by codefaux

  1. On 3/29/2023 at 10:57 PM, ich777 said:

    You can also use my plugin

    That doesn't support any of the other metrics from any of the other dozen or so scripts at the link I mentioned, a few of which I'm interested in/using now manually. That's why I suggested adding the text collector as a supported feature on the plugin by default. It won't impact anyone except people who want to use it if it merely watches a folder for scripts to execute.

     

     

    On 3/29/2023 at 10:57 PM, ich777 said:

    I would not recommend the boot drive for this, not because of wear reasons

    Could you be bothered to explain why? I don't mean to sound dismissive or aggressive or anything, but would you follow a recommendation from a stranger with no justification or explanation?


     

    On 3/29/2023 at 10:57 PM, ich777 said:

    but maybe consider maybe switching over to /mnt/user/appdata/scripts and run it from there

    I'm not looking for suggestions; I'm not the plugin author here lol

     

    I'll allow them to be run from anywhere you require them to be placed. I'm suggesting that you add in support for the text collector to your prometheus_node_exporter plugin, since prometheus_node_exporter already supports it. I'm suggesting where you should place them. You disagree; place them anywhere you like.

    • Confused 1
  2. Could you be convinced to add a variable to pipe stdout to a specific path?

     

    My target: prometheus_node_exporter contains a textfile collector.

     

    There are a handful of community scripts designed to

    - be run from cron (ie your plugin on a custom schedule)

    - collect useful (I'd argue critical) system metrics (ie smartmon stats like temps and failure indicators) for prometheus-node-exporter
    - output data to stdout, to be piped to a file

     

    The inconvenience is, to use these is not straightforward. They would require extensive modification most users can't grok.

     

    Thanks to your plugin, an alternative is to manually save them to disk someplace, then use your plugin to write a script which runs them with stdout piped to a file, scheduled cron-alike.

    image.png.5dbd3ee8538b00037bf220a351c58925.png

    EDIT: These are the same script, ignore the name difference I used an old screenshot for the one by mistake

    image.thumb.png.7cf4bd44bedf74e0c836472c69a64ee3.png

     

    This is workable but some users may still find this unapproachable.

     

    If the user could paste the entire script (in my case, /boot/config/scripts/smartmon.sh aka smartmon.sh [raw] among others) into a User Script, schedule them as cron-alikes, and mark their output to be sent to a specific path, it would make these more approachable.

     

    It could be implemented similar to the variables you've recently added; an STDOUT variable could be changed to request redirection.

     

    Regardless of your decision, keep up the great work. The plugin has been quite valuable for many of us!

  3. Hey here's another suggestion for an easy addition full of feature delivery.

     

    There exists a handful of community scripts to ammend data provided by node_exporter. Most of them seem to be intended to pipe to a text file and read via node_exporter's textfile collector.

     

    The metrics go into a file wherever, then to the exporter as a launch parameter, you add `--collector.textfile.directory=/var/lib/node_exporter` for example, and all of the readable files within /var/lib will be exported as metrics if they're in the correct format.

     

    For example, smartmon.sh writes smartmon statistics such as temperature and unallocated blocks. nvme_metrics.sh might be of interest, btrfs_stats.py, maybe even directory-size.sh for some folks.

     

    The most simple way I can think of is for your plugin to create a directory in the ram filesystem at /var/log and add `--collector.textfile.directory=/var/log/node_exporter` and suggest users execute the desired scripts, writing into /var/log/node_exporter in per-script files. I can see two ways of doing this.

    - One, users copy script files to someplace like /boot/config/scripts (one-time write for the scripts, no flash wear) and execute them via the User Scripts plugin as such;

     

    image.png.d7b119cbfceba5aeed78551dc8067618.png

     

    Scheduled similarly;

    image.thumb.png.fe24a16660a8b95478ed08120deb77ee.png

     

     

    The /var/log filesystem will exist on any system, won't cause flash wear, is wiped on reboot.  The path should have plenty of space for a few kb (or even dozen MB) of metrics being rewritten every few minutes. If it doesn't, failure case is that logging fails on the system -- not ideal but it's mounted with 128MB by Unraid and should never be near full unless a user has a serious problem elsewhere. If it's filling, the absense of this plugin's proposed metrics won't prevent it or even delay it by much.

     

    These metrics are designed to be overwritten, not appeneded, so they should never grow more than a few dozen MB in the most obscene scenario. Plugins seem to run as root, so permissions shouldn't be a problem.

     

    I'm also going to ping the user scripts developer to allow stdout to be piped to a file per-script, so users can simply paste the scripts into User Scripts and forward the stdout, instead of needing to save them to /boot/config/scripts manually and write a User Script to run it manually.

     

  4. 4 hours ago, gambler32k said:

    Anyone elses Prometheus AdGuard Exporter plugin stopped working?

    i had everything setup and working for months but now nothing happends when pressing start button.

    Tried reinstalling the plugin with no luck.

    Might be worth running it from the console to see if there's any ignored error spam like in my comment. Could be an error being dropped instead of forwarded to system logs, like on my (and everyone else's) system using the Node Exporter.

  5. On 3/25/2023 at 9:32 PM, ich777 said:

    Oh, sorry now I get it... :)

     

    Steps four and five for the Docker container not the plugin, sorry for that...

    lol it's ok, I was just very confused. I'm not even using the Docker containers, I JUST needed the exporter; I'm running the rest on another system. That's the main reason I went into this so sideways; I only wanted the exporter, but it had all these extra steps and I had to assume which went to what. I did so incorrectly, but that's ok.

     

     

    On 3/25/2023 at 9:32 PM, ich777 said:

    The plugin can't install a configuration file for a Docker container because first I don't like changing things which are not directly controlled by the plugin and second the plugin can't know where the configuration for your Prometheus Docker container is, there are simply too many variables (name, paths,...).

    For sure! I thought the file went with the plugin, that's all.

     

     

    On 3/25/2023 at 9:32 PM, ich777 said:

    Usually you don't have to restart the plugin itself because the config that you are changing is related to the Docker container not the plugin.

    Yup, but I didn't know that and it was all in one giant block so I assumed (incorrectly) and here we are, lol

     

     

    On 3/25/2023 at 9:32 PM, ich777 said:

    Yes, it's actually exactly the same on my system but TBH, I've never noticed it because I've always ran it as a service:

    That's very much what I had expected, literally everyone using this plugin is silently ignoring repeated errors because the stdout isn't directed anywhere people know about...

     

     

    On 3/25/2023 at 9:32 PM, ich777 said:

    keep in mind

    Synonymous with "remember that" -- it's something you say to people who know the thing you're about to say.

     

     

    On 3/25/2023 at 9:32 PM, ich777 said:

    Unraid doesn't use the default md driver

    COMPLETELY did not know that, but given the reply upstream that does not surprise me anymore lol.

     

     

     

    On 3/25/2023 at 9:32 PM, ich777 said:

    For now you can, of course if you want to, disable the mdadm metrics entirely.

    I'll remember that. Since the logs aren't kept, it won't be filling up a logfile. I haven't restarted my array in a few months and I don't intend to soon, so I'll likely just leave them running in a detached tmux until the issue is properly resolved.

     

     

    I'd already opened a bug report upstream, I'll link them here and add this information but it seems for now the best bet would be to patch the plugin to disable the md collector on your end.

     

    Edit: Github issue https://github.com/prometheus/node_exporter/issues/2642

    • Thanks 1
  6. 35 minutes ago, ich777 said:

    Why are you doing that by hand? What settings did you change?

    I already explained that, but I'll be more verbose.

     

    I downloaded the plugin before I wrote its config file. You wrote the instructions. Steps four and five. I did not perform steps four and five until after I had installed the plugin. I also did not do anything beyond those steps, regrading modifying settings, configuration or parameters.

     

    There's no "restart" control that I could find, and I didn't feel like restarting my entire server (or the array and all of my services) simply to restart a single plugin. Thus, I used the console, killed the running process, and restarted it by hand. No custom parameters, I didn't change some settings.

     

    42 minutes ago, ich777 said:

    I installed the plugin and have no issue whatsoever and can connect to it just fine.

    You'll note that I never said there was a user-facing issue, or that I couldn't connect or report metrics from it. It functions just fine, but on my system it's burping a line at the console about an internal (non-critical) error, every time Prometheus connects to it and retrieves metrics.

     

    The only difference between now and if I uninstall/reinstall/reboot is that the errors will be sent to a logfile someplace or discarded entirely -- I have no idea which -- instead of being sent to the console, since I ran it by hand.

     

    What I'm realizing though, is this is above your head, so to speak. If you run it by hand yourself, does it throw an error every time Prometheus polls it? I'll file a bug upstream with the actual node collector's maintainer, as it's now clear to me that the actual collector is mishandling the output of /proc/mdstat on my system and it has nothing to do with the small wrapper plugin you wrote.

     

    49 minutes ago, ich777 said:

    The plugin is only meant to be installed and that's it.

    Mmmmmmm no, though. It's not "only meant to be installed."  It's meant to be learned about, your system is meant to be manually prepared for it, and then it's meant to be installed. Steps four and five could/should be handled by either the previous step's container, or this plugin itself if it notices the configuration file is not present. Furthermore, installing the plugin starts the collector, which already expects the config file to be present, so steps four and five should actually be before step three.

     

    If the installation process weren't so complicated, I would've noticed that this wasn't your problem earlier. I installed the plugin by finding it in the CA app and going "hey that does what I need" and then discovering that it wasn't working. And in NO situation do you merely "install the plugin and that's it" so that's just a flat inaccurate thing to claim.

  7. 22 hours ago, ich777 said:

    A little bit more information would be helpful.

    Where do you get that error?

    I can't reproduce that.

    Literally just run the binary from the ssh terminal. I hadn't written its config file yet so I noted how it was executed, killed it, wrote the config file, and executed it by hand in a terminal.

     

    image.thumb.png.7ba44e5cea7624031502aa87554a36ec.png

  8. Bug report. Prometheus Node Exporter throws an error every single time it's polled by Prometheus.

    EDIT: As of version 2023.02.26 -- first time using this plugin, unsure when bug first appeared

     

    Text is;

    ts=2023-03-24T17:51:45.859Z caller=collector.go:169 level=error msg="collector failed" name=mdadm duration_seconds=0.000470481 err="error parsing mdstatus: error parsing mdstat \"/proc/mdstat\": not enough fields in mdline (expected at least 3): sbName=/boot/config/super.dat"

     

    image.thumb.png.53de32e09858a3a9e597124e7ec5c704.png

  9. Glad you solved it, something to always keep in mind when building a system.

     

     

    To the general public who winds up here looking for similar solutions;

    (Posting it here because they won't go back and experience it like you did. Seriously not trying to rub it in. This needs to be seen *anywhere* the conclusion is "overclocked RAM" and is *NOT* JUST FOR RYZEN SYSTEMS but they do it worst. We've had this talk, I know you know.)

     

    Never overclock RAM, at all, regardless of what it says on the box, INCLUDING XMP, without testing it. Gaming system, Facebook system, ESPECIALLY A SERVER. XMP IS OVERCLOCKING.

     

    I'll say it again because people argue it so much; before you argue, Google it. XMP IS OVERCLOCKING. It's a "factory supported overclock" but it IS OVERCLOCKING, and you HAVE TO TEST FOR STABILITY when overclocking. Read Intel's documentation on the XMP spec. I DO NOT CARE if your BIOS defaults it on, VIGILANTLY turn it back *OFF* unless you're going to run an extremely extended *(DOZENS OF HOURS) RAM test when you're building a server. RAM overclocking leads to SILENT DATA CORRUPTION, which is *RIDICULOUS* on a SERVER, which is explicitly present to HANDLE DATA. I should also note that I personally have never seen a literal server-intended motherboard which supports *any* means of overclocking, and I feel like that's due to the causal link between overclocking and data corruption. Overclocking server hardware is NOT a good decision, unless you also test for stability.

     

    Overclocking RAM without testing it is literally identical to knowingly using a stick of failing RAM. You're running a piece of engineered electronics faster than the engineers who built it said to, often at higher voltages than it's designed for, to get it to go fractionally faster. Does that move a bottleneck? Not even a little bit, not since the numbered Pentium era. It HELPS on a high end system, IF YOU TEST IT THOROUGHLY, but I would NEVER overclock RAM on a server. I feel like NOT having silently corrupted data on a randomly unstable system is better overall than "the benchmark number is a little higher and I SWEAR programs load faster!" Stop thinking overclocking RAM is safe. Stop using XMP on "server" systems. Any potential speed gain is not worth it. Be safer with your systems, and your data.

    • Upvote 2
  10. While I understand that they are harmless and related to the Docker engine, we're also having an nf_conntrack related crash, with the Docker engine. Some of us, at least, in another thread which the devs recently mentioned that they can't reproduce.

     

    Maybe this should be looked into instead of dismissed?

    What is the actual location of the misconfiguration? How would one go about setting it to conform, despite it being harmless? Is this within the reach of an experienced Linux user, or is this developer land? Kernel parameters? Module parameters? Sysctl pokes? I cannot find reference to this error that isn't an unRaid forum, and none of the ones I've found have a fix.

    • Like 1
  11.  

    @SimonF For what it's worth, my bug report is about SMART settings in smart-one.cfg being erased by a badly written config handler, any potential relation here is that my RAID controllers (and MOST RAID controllers) don't pass most SCSI Generic commands through (that's the invalid opcode stuff) and block the spindown requests.

     

     

    Looking at your logs, it took me about two seconds to notice that your diagnostic is absolutely massive. Syslog is bloated with messages from Docker regarding renaming interfaces, which it does on start/stop. Further inspection from your docker.txt log shows the same. You have Docker containers starting/stopping CONSTANTLY. Like, constantly.

    time="2021-05-14T15:01:00.939164349+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/9daba11afa5b1539423fbfcaa502e85ca37f44511d3757ceb9fe8056091dfd6e pid=9625
    time="2021-05-14T15:01:02.395197081+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/8815f62b37eff81428302a45ff320d6c76751e8ca3308c2263daf9115b40f74b pid=10397
    time="2021-05-14T15:01:03.293534909+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/7e3fb4527bdc2a445aef61e6b3f9af809ab9cb36d611eefda72fdde9ef079bda pid=11011
    time="2021-05-14T15:01:04.999049158+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/ff736b719aff2ebb8ce67c46d004d045a18c587fbc87d633be5a5d2c8996b119 pid=12017
    time="2021-05-14T15:01:07.716413644+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/64c3621669d986744d9125dc9a5b55726019b51292b61e48df796ce3d68be50d pid=13557
    time="2021-05-14T15:01:08.773384420+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/efa3d7d9abf05c6ecbbf0aed96416403c1379fd4f36591b98b0682ef774c8f7d pid=14225
    time="2021-05-14T15:01:12.045508457+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/4c813ad16fe529af568973fdc650a9bd3de57bd6e55d6367a93ddd4084665c4f pid=15538
    time="2021-05-14T15:01:15.746417852+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/1e52f4e57041c2f7893d8a6e1abf7e37e27fcf818fab52effef960f5db12aa34 pid=17061
    time="2021-05-14T15:01:16.448800458+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/839c48777be381e1f8f1b714c9d14abf5f40311930fb276cecda12a2cdb4fea8 pid=18076
    time="2021-05-14T15:01:17.235544165+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/fa6e0d3a14af579e27803b51d02dabddca368af83821a8d6c945e8787b781b33 pid=18530
    time="2021-05-14T15:01:18.245830784+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/7dac976c24b8e028fe6dd0b01eddc81d0f0cd81e9f6cb8dab8d5ecec2f8300f8 pid=19635
    time="2021-05-14T15:01:44.028428540+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/55a2bd752bd9b5168e218a67fc6b6e2235701169c4a90731bd35bc5e1cf42005 pid=25056
    time="2021-05-14T15:01:44.949010842+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/fe519f83d40335b9cef8c78ff77d6280695288cf150e408c7c1ddac80672c77c pid=26477
    time="2021-05-14T15:01:48.531324200+02:00" level=info msg="starting signal loop" namespace=moby path=/var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/ef6a7ac83c489c6571bfc5840666293ee6e989f0908abfece313ac818d78b6b9 pid=28876

     

    That's one minute and 14 container restarts. The array could be failing to spin down because a container is hammering it with constant IO while trying to start up. Look at your Docker containers page, under the Uptime column, and see which one never grows old. Check logs, repair the cause, reboot, and see if it still fails to spin down -- if it does, please submit another diagnostic afterward.

  12. If your board has a dedicated management port (and you're using it) the disappearing IPMI access means the BMC died too -- BMC should run sans RAM, on most systems. Hell, the BMC in some systems will run without a CPU. The only thing I can imagine faulting the BMC and shutting down a system would be failing hardware (CPUs, motherboard, RAM, power supply, cooling) -- but even with an overheat situation, the BMC will normally stay up.

     

    I've never heard of this "3.3v electrical tape trick" and it horrifies me. Electrical tape is awful and frankly I would petition distributors to stop selling it fully. There is no situation in which electrical tape is the best option, literally ever, especially for things like this. Please use Kapton tape for mods like this in the future -- it's not easily punctured, its adhesive doesn't turn into a greasy black snot smear after brief exposure to warmth and/or moisture, it doesn't stretch or shrink or slide out from beneath pressure -- all qualities "electrical tape" revels in. Kapton tape. Polyimide tape.

     

    Also having to workaround a problem related to the power supply not being good enough for hard disks on a dual-CPU server-grade system we've been suspecting the power supply on -- gotta say it doesn't instill confidence.

     

    Anyway - if you're still suspecting it's an Unraid problem (despite your report of it doing the exact same thing in the memtest before according to previous messages, which 100% clears Unraid of responsibility if this is the case) feel free to boot that Windows environment you built, use the storage drivers you slipstreamed into it to assemble a striped Storage Space, and run a batch of benchmarks and/or stress tests on it so Windows is doing more than just sitting idle at the equivalent of a stopped array screen and never crashing... I'd imagine you have your own benchmarking tools selected, since the sort of person who can build a PE with that sort of integration packed in tends to have related tools picked out already. If Windows runs for several hours but Unraid still power-faults, you have your answer. Actually, do yourself a favor and do that anyway. It seems to be a looming question in your mind, it'd be best to solve it instead of continuing to be offline for so long with no motion.

     

    Ideally something here brings a solution one step closer. Good luck, and let us know. I apologize if I'm intermittently slow, things don't look to be queting down for me in the nearish future. I'll try to check back in regularly.

  13. Being able to connect to your domain from within the LAN but reaching your Unifi WebGUI (which I'm left to infer is what you're using for a router..? I've only ever used Unifi for Wireless AP management, so I'm left to assume A WHOLE LOT here) that means your router is not appling port forwarding to your request. This is normal behavior - you're trying to connect to your external IP from behind its router, and many, many routers handle this case exactly as they should, which is to say they silently ignore the request. Picture it like throwing a package through a window -- it shouldn't come flying back in, unless you did something to make it do that. You need to set up NAT Reflection. Also, it is a wildly horrible idea to expose the Unraid web interface to the internet, it is absolutely not designed for that sort of exposure. Very, very risky.

     

    If you want to use a domain name to connect from inside only, you need some manner of DNS override if you don't have the ability to set up and properly administer your own local domain. I don't know about Unifi's capabilities; I tend to stray away from the proprietary/closed-source/overly expensive systems so I'm not the guy for that. As an example of implementation, Unbound (software service, available on Linux and BSD-based router software) is capable of redirecting any request to any single (or wildcard) host/domain to any other IP address, silently. Obviously SSL breaks horribly when doing this, but it works for local-only domain resolution. This is how I access the things I have which should not be exposed to the big wide world -- instead of taking over *.cookies.us I would instead take over *.cookies -- that way I'm not walking over any public namespaces.

     

    If you need access from both inside *and* outside, NAT Reflection is required, and will even maintain SSL certification.

     

    The fact that you're able to connect when the VPN is turned off (of course you can't with it turned on, the VPN is intercepting and tunneling all of your non-local traffic because that's what they do, and your external IP address is non-local)  means somehow your request for your domain name is either A) resolving your router's local IP which should never happen, or B) resolving your external IP, which is then connecting to the router, which is accepting the connection instead of applying Port Forwarding to it. If the router is accepting that from the public side, that also means you're exposing that web UI to the internet, which I would not consider a good thing.

     

    Ideally that helps somewhat -- let me know if not, but things in my life have been ...bumpy, so I may be delayed.

  14. 2 hours ago, John_M said:

    That is explicitly referring to the second M.2 socket (hence M2_2). The first M.2 socket (M2_1) has four lanes direct from the AM4 socket.

     

    Fair, I wasn't specific enough in that statement. Yes, that socket is only disabled when using PCIe based M.2 storage in M2_2 -- I left that unsaid since, as you also pointed out, it was already on the screen and I felt no need to handhold the information out any further. Thank you for adding to my specificity, which does not address my original point in that yours was still lacking.

     

     

    And perhaps it was overstated, but it's pretty clear that my point was that there is no more accurate definition of a vendor's specific implementation than their own documentation, providing they wrote it correctly and keep it up to date with regard to changes. You can throw attention at my tone and word choice all you wish, that's not the action of a person with strong standing in a conversation. Your statements were generalized based on the chipset's overarching capacity. My overplaying a single phrase might make me overdramatic, but it doesn't make you correct. Generalizations about a chipset's capability don't apply to every version of a board using that chipset -- demonstrated even doubly so in that each brand even has a few tiers of each chipset available as a different model of motherboard. Specificity matters, especially in the context of the computing world. I'm not sure how adding less specific information to what has been an essentially motionless thread or arguing about its merits has helped matters.

     

    In any case, it's getting a bit off-topic. If you feel the need to dispute the merits of accuracy further, feel free to send me a message. There's clearly nothing more to be gained here otherwise, I'm clocking out of this thread until OP needs feedback I can provide. Ideally there was enough to be gained here to have made it worth everyone's while. Good luck!

  15. @John_M That may be factually accurate, but specifically I was quoting the breakdown the motherboard manual indicates actually reaches the PCIe slots, not hypothetically edu-guessing based on what lanes should/can be available to a chipset vendor. Implementation-derived specs are always better than application-derived specs because while any vendor can use an X470 chipset, some vendors might choose to implement PCIe lane distribution, weighting, switching, and layout in very different manners, due to differing choices in included/supported peripherals, as well as how widely they wish to implement the ability for the user to map PCIe lanes onto physical slots IE how many different "bank switch sets" can be accomplished per board.

     

    I'm not saying you're wrong, I'm only saying that you're making judgements based upon what CAN be done, rather than what the actual manufacturer who implemented/built/sells the finished product based around the application says WAS done.

     

    This is the Word Of GOD on what this specific model of motherboard can do, barring solely A) improper documentation or B) updates via UEFI image/etc which changes the capabilities of the board;

    image.png.313c144020426a71357fd88a44f5572c.png

     

    Other boards USING an X470 can and will have drastically different layouts and implementations within the constraints of what the CPU/APU itself provides, but every X470 does not have the same layout and pretending it does only adds confusion.

     

    I'm not trying to argue, I'm telling you for a fact, of the PCIe x16 connectors physically available, this board has only two which can achieve at best x8, one which will never exceed x4 and is also disabled when using PCIe based M.2 storage, and that is it. Your clarification on which CPU is in use is appreciated and seems to preclude the presence of a graphics processor, but I don't have generation-to-model-shorthand memorized so I'd actually have to spend more time to look it up and frankly it's not worth it. The information has been covered.

  16. I agree -- get your important data off. After that, even if you had to start from scratch, you didn't lose anything really painful.

     

    Regarding what's next; Parity only needs to be as large as the largest disk in your array. If you intend to step up a size in disks, Parity should/has to be increased first, but having an extra-large parity drive has no specific benefit otherwise. I suspect you knew that but I just wanted to be sure.

  17. @John_M  I reviewed specification of their motherboard before I made the post. Its contents are factual for this exact context. Depending on which CPU they're using it either gets x8/x8, x8/x0 or x4/x0 depending on what CPU they have installed. If their CPU allows it, simply loading the board should have allowed it to function if it was going to. No configuration should be required within the UEFI setup.

     

     

    Also I didn't mention it but on this board PCIE_6 is a PCIE 2.0 x4 interface despite having an x16 physical connector, I wouldn't suggest using it for your GPU regardless of how much space there is.

     

    It is quite correct that the use case specified will run pretty much flawlessly at x8, assuming "everything else" on the other card doesn't require x16 bandwidth which is why I was seeking clarification on the matter. I suppose it's moot since the hardware isn't capable of maxing out an x16 interface anyway..

     

    If both GPUs worked in Windows with this board, they should also work in Linux (and thus Unraid) on this board -- there should be no need to use a different one, if that's too much trouble. That still leaves some discrepancies -- both logs have the same physical address (as pointed out by John_M) and the kernel itself shows no indication of any errors or anything else loaded into a PCIe lane that is misbehaving. It might be worth ensuring everything is installed correctly, fully seated and powered, and capturing another diagnostic.zip for us to peruse if you're still having issues.

  18. Sorry for disappearing, life has been a lot lately.

     

    I'm gonna be honest - I don't use the Unraid webUI all that much for system admin -- I'm a terminal guy. It would appear that there is no "user-facing" location to find those logs, but I could be mistaken.

     

    Samba's logs are stored in the filesystem at /var/log/samba

    Unbalance stored them, I believe, in /boot/logs

     

    I'm not sure where any of the other relevant logs would be kept. You mentioned using Secure for a specific folder, and that Cache is set to Yes for it. Impulsively, a part of me suspects that the permissions rewrite is causing trouble...

     

    I don't recall you menitoning the volume of your data, just the disks -- would it be realistic for you to essentially move all data off the server, erase and recreate the shares with proper security/cache settings, then use the shares to transfer the data back onto the server? That might more effectively assure all files are rewritten with the correct permissions, but it also could be a waste of time.

     

    This really feels like one of those issues where the webUI could be a limiting factor. Are you experienced with Linux at a more terminal prompt level?

  19. It's important to understand what the options are and why.

     

    Turbo Write updates parity by reading the block from every disk in your array, doing math, then writing to parity. This requires all of your disks to be spun up. Some people run spun 24/7, some drive models will refuse to spin down, etc etc -- Turbo Write is faster but only if it doesn't have to wait for a disk to spin up.

     

    The non-Turbo write updates parity by reading the block from parity, doing math to change it to what it should be, and writing it back. This works even if most of your drives are spun down, but can slow things down due to the time it takes to update parity.

     

     

    The Auto Turbo Write plugin switches back and forth depending on wether or not your disks are all spinning. It technically should not cause any negative effects, and technically should use the best case at all times, but nothing is perfect. It's unlikely you'll need/want to remove it specifically unless you know better.

     

     

    Parity corruption only happens when a disk is written to but parity is not updated, or when a data path is corrupted in transit (bad RAM/etc) or when a disk is failing somewhere. Hard shutdown, direct disk access outside the array, improper procedure repairing filesystem damage, etc etc..

  20. Looking at your kernel logs, neither syslog nor lspci output indicate that the Linux kernel can see both devices at the same time. The output of lspci.txt shows only one GPU or the other -- not both. This means the PCI subsystem itself cannot see both devices, which points to hardware. Even using vfio hardware passthrough, lspci should still show the hardware as present. This also means something changed aside from just installing the drivers, between those boots. Drivers cannot mask hardware from the kernel.

     

    Your problem seems to be motherboard device support, physical installation, or some related issue -- from what I can see, no operating system would be able to see both devices.

     

    First - Your motherboard does not have the bandwidth to run two GPUs at PCIe 3.0 x16 link speed so you may not see ideal performance, and depending on your configuration, it may be running as low as x4 bandwidth. At best, with that board, you will get PCI3 3.0 x8 link speed but only if you connect your GPUs to physical slots marked PCIE_1 and PCIE_4 AND you're not using a "with Vega Radeon Graphics" or "with Radeon Graphics" variant CPU. Any other ports will run PCIe 2.0 x4 bandwidth or lower regardless of their physical size. If you ARE running a "with xx Graphics" variant CPU, you will only get one PCIE slot to work, as in this case the onboard lane swithc will run x8/x0 (or x4/x0) on PCIE_1/PCIE_4 respectively.

     

    Second - Check for BIOS updates, as this may improve compatibility slightly. The above CPU/link width restrictions will still apply, however, as this is the result of using a relatively low-end gaming motherboard with insufficient PCIE lane capacity.

     

    What use case do you have for two GPUs? If there's significant need to discuss upgrade paths, we could look into it.

    • Like 1
  21. I'm not familiar with using Wireguard or related with Unraid, or of its implications, but it sounds like you might be running afoul of something pretty common. To clarify, are you having problems accessing your server via domain name from inside the LAN, or outside the LAN?

     

    What level of experience do you have with DNS troubleshooting? It would also help to know what DDNS service you're using, both the company hosting it and the software/website you're using to configure IP addresses and so forth.

     

    Also, are you expecting inbound connections to use your VPN's tunnel, or use your ISP-provided IP address for inbound connections?

  22. I've had limited experience with non-fatal mce errors (ie, most of my mce errors have historically been crash-causing) but immediate suspects;

     

    - Overclocking (XMP is overclocking) - turn it off if you're overclocking.

    - BIOS/microcode updates - check your motherboard support website for updates and apply them carefully.

    - Not sure if related, but during loading/init of the Intel GPU modeules, your kernel explodes -- very likely to be worthwhile to disable this, if you can.

    Spoiler
    
    
    May  4 15:36:34 Watchtower root: ---------Enabling Intel Kernel Module----------
    May  4 15:36:34 Watchtower kernel: Linux agpgart interface v0.103
    May  4 15:36:34 Watchtower kernel: ------------[ cut here ]------------
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: drm_WARN_ON(!IS_PLATFORM(dev_priv, INTEL_TIGERLAKE) && !IS_PLATFORM(dev_priv, INTEL_ROCKETLAKE))
    May  4 15:36:34 Watchtower kernel: WARNING: CPU: 11 PID: 5292 at drivers/gpu/drm/i915/intel_pch.c:123 intel_pch_type+0x86e/0x8d3 [i915]
    May  4 15:36:34 Watchtower kernel: Modules linked in: i915(+) iosf_mbi i2c_algo_bit drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables intel_wmi_thunderbolt wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd btusb btrtl btbcm glue_helper btintel rapl i2c_i801 intel_cstate bluetooth nvme ahci i2c_smbus intel_uncore nvme_core i2c_core libahci e1000e input_leds ecdh_generic led_class ecc video wmi backlight button acpi_pad
    May  4 15:36:34 Watchtower kernel: CPU: 11 PID: 5292 Comm: modprobe Not tainted 5.10.28-Unraid #1
    May  4 15:36:34 Watchtower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z590 Phantom Gaming 4/ac, BIOS P1.30 02/01/2021
    May  4 15:36:34 Watchtower kernel: RIP: 0010:intel_pch_type+0x86e/0x8d3 [i915]
    May  4 15:36:34 Watchtower kernel: Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 ca 40 ff e0 48 c7 c1 ea 1b 54 a0 4c 89 e2 48 c7 c7 4e 15 54 a0 48 89 c6 e8 2c 9e 25 e1 <0f> 0b b8 09 00 00 00 eb 58 48 8b 7b 18 48 c7 c2 4b 1c 54 a0 be 04
    May  4 15:36:34 Watchtower kernel: RSP: 0018:ffffc9000114f9f0 EFLAGS: 00010286
    May  4 15:36:34 Watchtower kernel: RAX: 0000000000000000 RBX: ffff888145780000 RCX: 0000000000000027
    May  4 15:36:34 Watchtower kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff8886648d8920
    May  4 15:36:34 Watchtower kernel: RBP: ffff888145784385 R08: 0000000000000000 R09: 00000000ffffefff
    May  4 15:36:34 Watchtower kernel: R10: ffffc9000114f820 R11: ffffc9000114f818 R12: ffff888101ac4760
    May  4 15:36:34 Watchtower kernel: R13: 0000000045784380 R14: ffff888145786eb8 R15: ffff888145786d70
    May  4 15:36:34 Watchtower kernel: FS:  00001537c1288740(0000) GS:ffff8886648c0000(0000) knlGS:0000000000000000
    May  4 15:36:34 Watchtower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May  4 15:36:34 Watchtower kernel: CR2: 000000000105e918 CR3: 0000000138340005 CR4: 00000000007706e0
    May  4 15:36:34 Watchtower kernel: PKRU: 55555554
    May  4 15:36:34 Watchtower kernel: Call Trace:
    May  4 15:36:34 Watchtower kernel: intel_detect_pch+0x6b/0x22f [i915]
    May  4 15:36:34 Watchtower kernel: i915_driver_probe+0x270/0xb32 [i915]
    May  4 15:36:34 Watchtower kernel: ? rpm_resume+0x9a/0x3d6
    May  4 15:36:34 Watchtower kernel: i915_pci_probe+0xf8/0x118 [i915]
    May  4 15:36:34 Watchtower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
    May  4 15:36:34 Watchtower kernel: ? __pm_runtime_resume+0x64/0x71
    May  4 15:36:34 Watchtower kernel: local_pci_probe+0x3c/0x7a
    May  4 15:36:34 Watchtower kernel: pci_device_probe+0x140/0x19a
    May  4 15:36:34 Watchtower kernel: ? sysfs_do_create_link_sd.isra.0+0x6b/0x98
    May  4 15:36:34 Watchtower kernel: really_probe+0x157/0x341
    May  4 15:36:34 Watchtower kernel: driver_probe_device+0x63/0x92
    May  4 15:36:34 Watchtower kernel: device_driver_attach+0x37/0x50
    May  4 15:36:34 Watchtower kernel: __driver_attach+0x95/0x9d
    May  4 15:36:34 Watchtower kernel: ? device_driver_attach+0x50/0x50
    May  4 15:36:34 Watchtower kernel: bus_for_each_dev+0x70/0xa6
    May  4 15:36:34 Watchtower kernel: bus_add_driver+0xfe/0x1af
    May  4 15:36:34 Watchtower kernel: driver_register+0x99/0xd2
    May  4 15:36:34 Watchtower kernel: ? 0xffffffffa05e4000
    May  4 15:36:34 Watchtower kernel: i915_init+0x58/0x6b [i915]
    May  4 15:36:34 Watchtower kernel: do_one_initcall+0x71/0x162
    May  4 15:36:34 Watchtower kernel: ? do_init_module+0x19/0x1eb
    May  4 15:36:34 Watchtower kernel: ? kmem_cache_alloc+0x108/0x130
    May  4 15:36:34 Watchtower kernel: do_init_module+0x51/0x1eb
    May  4 15:36:34 Watchtower kernel: load_module+0x1b18/0x20cf
    May  4 15:36:34 Watchtower kernel: ? map_kernel_range_noflush+0xdf/0x255
    May  4 15:36:34 Watchtower kernel: ? __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: ? _cond_resched+0x1b/0x1e
    May  4 15:36:34 Watchtower kernel: __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: do_syscall_64+0x5d/0x6a
    May  4 15:36:34 Watchtower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    May  4 15:36:34 Watchtower kernel: RIP: 0033:0x1537c13cb09a
    May  4 15:36:34 Watchtower kernel: Code: 48 8b 0d f9 7d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c6 7d 0c 00 f7 d8 64 89 01 48
    May  4 15:36:34 Watchtower kernel: RSP: 002b:00007fffcf1b7de8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
    May  4 15:36:34 Watchtower kernel: RAX: ffffffffffffffda RBX: 0000000000429e40 RCX: 00001537c13cb09a
    May  4 15:36:34 Watchtower kernel: RDX: 000000000041c368 RSI: 00000000002fa9b8 RDI: 0000000000d63f60
    May  4 15:36:34 Watchtower kernel: RBP: 0000000000d63f60 R08: 000000000042701a R09: 0000000000000000
    May  4 15:36:34 Watchtower kernel: R10: 0000000000427010 R11: 0000000000000246 R12: 000000000041c368
    May  4 15:36:34 Watchtower kernel: R13: 0000000000000000 R14: 0000000000429f50 R15: 0000000000429e40
    May  4 15:36:34 Watchtower kernel: ---[ end trace 59cc9b3ebf49a875 ]---
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: [drm] VT-d active for gfx access
    May  4 15:36:34 Watchtower kernel: checking generic (80000000 7e9000) vs hw (a2000000 1000000)
    May  4 15:36:34 Watchtower kernel: checking generic (80000000 7e9000) vs hw (80000000 10000000)
    May  4 15:36:34 Watchtower kernel: fb0: switching to inteldrmfb from EFI VGA
    May  4 15:36:34 Watchtower kernel: Console: switching to colour dummy device 80x25
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: [drm] *ERROR* crtc 51: Can't calculate constants, dotclock = 0!
    May  4 15:36:34 Watchtower kernel: ------------[ cut here ]------------
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
    May  4 15:36:34 Watchtower kernel: WARNING: CPU: 11 PID: 5292 at drivers/gpu/drm/drm_vblank.c:722 drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x125/0x2a1 [drm]
    May  4 15:36:34 Watchtower kernel: Modules linked in: i915(+) iosf_mbi i2c_algo_bit drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables intel_wmi_thunderbolt wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd btusb btrtl btbcm glue_helper btintel rapl i2c_i801 intel_cstate bluetooth nvme ahci i2c_smbus intel_uncore nvme_core i2c_core libahci e1000e input_leds ecdh_generic led_class ecc video wmi backlight button acpi_pad
    May  4 15:36:34 Watchtower kernel: CPU: 11 PID: 5292 Comm: modprobe Tainted: G        W         5.10.28-Unraid #1
    May  4 15:36:34 Watchtower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z590 Phantom Gaming 4/ac, BIOS P1.30 02/01/2021
    May  4 15:36:34 Watchtower kernel: RIP: 0010:drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x125/0x2a1 [drm]
    May  4 15:36:34 Watchtower kernel: Code: 4c 8b 6f 50 4d 85 ed 75 03 4c 8b 2f e8 74 5c 06 e1 48 c7 c1 b0 bb 3f a0 4c 89 ea 48 c7 c7 d7 b2 3f a0 48 89 c6 e8 d6 b9 2c e1 <0f> 0b e9 4d 01 00 00 41 57 48 8d 55 b0 48 8b 45 90 4c 89 f7 89 75
    May  4 15:36:34 Watchtower kernel: RSP: 0018:ffffc9000114f7d0 EFLAGS: 00010086
    May  4 15:36:34 Watchtower kernel: RAX: 0000000000000000 RBX: ffff888145780000 RCX: 0000000000000027
    May  4 15:36:34 Watchtower kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff8886648d8920
    May  4 15:36:34 Watchtower kernel: RBP: ffffc9000114f840 R08: 0000000000000000 R09: 00000000ffffefff
    May  4 15:36:34 Watchtower kernel: R10: ffffc9000114f600 R11: ffffc9000114f5f8 R12: 0000000000000000
    May  4 15:36:34 Watchtower kernel: R13: ffff888101ac4760 R14: ffff8881012d2000 R15: ffff88810560d4a8
    May  4 15:36:34 Watchtower kernel: FS:  00001537c1288740(0000) GS:ffff8886648c0000(0000) knlGS:0000000000000000
    May  4 15:36:34 Watchtower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May  4 15:36:34 Watchtower kernel: CR2: 000000000105e918 CR3: 0000000138340005 CR4: 00000000007706e0
    May  4 15:36:34 Watchtower kernel: PKRU: 55555554
    May  4 15:36:34 Watchtower kernel: Call Trace:
    May  4 15:36:34 Watchtower kernel: ? __intel_get_crtc_scanline+0x19f/0x19f [i915]
    May  4 15:36:34 Watchtower kernel: ? drm_property_destroy+0xb3/0xb3 [drm]
    May  4 15:36:34 Watchtower kernel: drm_get_last_vbltimestamp+0x8b/0xab [drm]
    May  4 15:36:34 Watchtower kernel: drm_reset_vblank_timestamp+0x58/0xc0 [drm]
    May  4 15:36:34 Watchtower kernel: ? drm_vblank_get+0xc2/0xcf [drm]
    May  4 15:36:34 Watchtower kernel: drm_crtc_vblank_on+0xaf/0x10c [drm]
    May  4 15:36:34 Watchtower kernel: intel_modeset_setup_hw_state+0x845/0x11ae [i915]
    May  4 15:36:34 Watchtower kernel: ? _cond_resched+0x1b/0x1e
    ### [PREVIOUS LINE REPEATED 1 TIMES] ###
    May  4 15:36:34 Watchtower kernel: ? ww_mutex_lock+0x10/0x73
    May  4 15:36:34 Watchtower kernel: ? drm_modeset_lock_all_ctx+0x81/0xc7 [drm]
    May  4 15:36:34 Watchtower kernel: intel_modeset_init_nogem+0x1113/0x1707 [i915]
    May  4 15:36:34 Watchtower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
    May  4 15:36:34 Watchtower kernel: ? intel_irq_postinstall+0x40a/0x506 [i915]
    May  4 15:36:34 Watchtower kernel: i915_driver_probe+0x885/0xb32 [i915]
    May  4 15:36:34 Watchtower kernel: ? rpm_resume+0x9a/0x3d6
    May  4 15:36:34 Watchtower kernel: i915_pci_probe+0xf8/0x118 [i915]
    May  4 15:36:34 Watchtower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
    May  4 15:36:34 Watchtower kernel: ? __pm_runtime_resume+0x64/0x71
    May  4 15:36:34 Watchtower kernel: local_pci_probe+0x3c/0x7a
    May  4 15:36:34 Watchtower kernel: pci_device_probe+0x140/0x19a
    May  4 15:36:34 Watchtower kernel: ? sysfs_do_create_link_sd.isra.0+0x6b/0x98
    May  4 15:36:34 Watchtower kernel: really_probe+0x157/0x341
    May  4 15:36:34 Watchtower kernel: driver_probe_device+0x63/0x92
    May  4 15:36:34 Watchtower kernel: device_driver_attach+0x37/0x50
    May  4 15:36:34 Watchtower kernel: __driver_attach+0x95/0x9d
    May  4 15:36:34 Watchtower kernel: ? device_driver_attach+0x50/0x50
    May  4 15:36:34 Watchtower kernel: bus_for_each_dev+0x70/0xa6
    May  4 15:36:34 Watchtower kernel: bus_add_driver+0xfe/0x1af
    May  4 15:36:34 Watchtower kernel: driver_register+0x99/0xd2
    May  4 15:36:34 Watchtower kernel: ? 0xffffffffa05e4000
    May  4 15:36:34 Watchtower kernel: i915_init+0x58/0x6b [i915]
    May  4 15:36:34 Watchtower kernel: do_one_initcall+0x71/0x162
    May  4 15:36:34 Watchtower kernel: ? do_init_module+0x19/0x1eb
    May  4 15:36:34 Watchtower kernel: ? kmem_cache_alloc+0x108/0x130
    May  4 15:36:34 Watchtower kernel: do_init_module+0x51/0x1eb
    May  4 15:36:34 Watchtower kernel: load_module+0x1b18/0x20cf
    May  4 15:36:34 Watchtower kernel: ? map_kernel_range_noflush+0xdf/0x255
    May  4 15:36:34 Watchtower kernel: ? __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: ? _cond_resched+0x1b/0x1e
    May  4 15:36:34 Watchtower kernel: __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: do_syscall_64+0x5d/0x6a
    May  4 15:36:34 Watchtower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    May  4 15:36:34 Watchtower kernel: RIP: 0033:0x1537c13cb09a
    May  4 15:36:34 Watchtower kernel: Code: 48 8b 0d f9 7d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c6 7d 0c 00 f7 d8 64 89 01 48
    May  4 15:36:34 Watchtower kernel: RSP: 002b:00007fffcf1b7de8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
    May  4 15:36:34 Watchtower kernel: RAX: ffffffffffffffda RBX: 0000000000429e40 RCX: 00001537c13cb09a
    May  4 15:36:34 Watchtower kernel: RDX: 000000000041c368 RSI: 00000000002fa9b8 RDI: 0000000000d63f60
    May  4 15:36:34 Watchtower kernel: RBP: 0000000000d63f60 R08: 000000000042701a R09: 0000000000000000
    May  4 15:36:34 Watchtower kernel: R10: 0000000000427010 R11: 0000000000000246 R12: 000000000041c368
    May  4 15:36:34 Watchtower kernel: R13: 0000000000000000 R14: 0000000000429f50 R15: 0000000000429e40
    May  4 15:36:34 Watchtower kernel: ---[ end trace 59cc9b3ebf49a876 ]---
    May  4 15:36:34 Watchtower kernel: ------------[ cut here ]------------
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: drm_WARN_ON(!pll->info->funcs->get_hw_state(dev_priv, pll, &pipe_config->dpll_hw_state))
    May  4 15:36:34 Watchtower kernel: WARNING: CPU: 11 PID: 5292 at drivers/gpu/drm/i915/display/intel_display.c:11160 hsw_get_pipe_config+0x88f/0xd34 [i915]
    May  4 15:36:34 Watchtower kernel: Modules linked in: i915(+) iosf_mbi i2c_algo_bit drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables intel_wmi_thunderbolt wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd btusb btrtl btbcm glue_helper btintel rapl i2c_i801 intel_cstate bluetooth nvme ahci i2c_smbus intel_uncore nvme_core i2c_core libahci e1000e input_leds ecdh_generic led_class ecc video wmi backlight button acpi_pad
    May  4 15:36:34 Watchtower kernel: CPU: 11 PID: 5292 Comm: modprobe Tainted: G        W         5.10.28-Unraid #1
    May  4 15:36:34 Watchtower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z590 Phantom Gaming 4/ac, BIOS P1.30 02/01/2021
    May  4 15:36:34 Watchtower kernel: RIP: 0010:hsw_get_pipe_config+0x88f/0xd34 [i915]
    May  4 15:36:34 Watchtower kernel: Code: 85 d2 75 03 48 8b 17 48 89 14 24 e8 ba 1b f8 e0 48 8b 14 24 48 c7 c1 92 37 55 a0 48 c7 c7 e7 f8 54 a0 48 89 c6 e8 1b 79 1e e1 <0f> 0b 41 80 be ca 06 00 00 08 77 57 41 83 ff 04 75 51 49 81 c6 08
    May  4 15:36:34 Watchtower kernel: RSP: 0018:ffffc9000114f740 EFLAGS: 00010286
    May  4 15:36:34 Watchtower kernel: RAX: 0000000000000000 RBX: ffff888145799000 RCX: 0000000000000027
    May  4 15:36:34 Watchtower kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff8886648d8920
    May  4 15:36:34 Watchtower kernel: RBP: ffff888145780000 R08: 0000000000000000 R09: 00000000ffffefff
    May  4 15:36:34 Watchtower kernel: R10: ffffc9000114f570 R11: ffffc9000114f568 R12: ffff8881012d2000
    May  4 15:36:34 Watchtower kernel: R13: 0000000000000202 R14: ffff888145780000 R15: 0000000000000001
    May  4 15:36:34 Watchtower kernel: FS:  00001537c1288740(0000) GS:ffff8886648c0000(0000) knlGS:0000000000000000
    May  4 15:36:34 Watchtower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May  4 15:36:34 Watchtower kernel: CR2: 000000000105e918 CR3: 0000000138340005 CR4: 00000000007706e0
    May  4 15:36:34 Watchtower kernel: PKRU: 55555554
    May  4 15:36:34 Watchtower kernel: Call Trace:
    May  4 15:36:34 Watchtower kernel: ? __intel_display_power_put_domain+0x15/0x126 [i915]
    May  4 15:36:34 Watchtower kernel: ? ktime_get_mono_fast_ns+0x61/0x7d
    May  4 15:36:34 Watchtower kernel: ? skl_ddi_pll_get_hw_state+0xa2/0xb0 [i915]
    May  4 15:36:34 Watchtower kernel: ? verify_single_dpll_state.isra.0+0x79/0x27d [i915]
    May  4 15:36:34 Watchtower kernel: intel_atomic_commit_tail+0xc1d/0x101e [i915]
    May  4 15:36:34 Watchtower kernel: ? flush_workqueue+0x29b/0x2bf
    May  4 15:36:34 Watchtower kernel: intel_atomic_commit+0x272/0x280 [i915]
    May  4 15:36:34 Watchtower kernel: intel_modeset_init+0x106/0x1b8 [i915]
    May  4 15:36:34 Watchtower kernel: i915_driver_probe+0x8b1/0xb32 [i915]
    May  4 15:36:34 Watchtower kernel: ? rpm_resume+0x9a/0x3d6
    May  4 15:36:34 Watchtower kernel: i915_pci_probe+0xf8/0x118 [i915]
    May  4 15:36:34 Watchtower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
    May  4 15:36:34 Watchtower kernel: ? __pm_runtime_resume+0x64/0x71
    May  4 15:36:34 Watchtower kernel: local_pci_probe+0x3c/0x7a
    May  4 15:36:34 Watchtower kernel: pci_device_probe+0x140/0x19a
    May  4 15:36:34 Watchtower kernel: ? sysfs_do_create_link_sd.isra.0+0x6b/0x98
    May  4 15:36:34 Watchtower kernel: really_probe+0x157/0x341
    May  4 15:36:34 Watchtower kernel: driver_probe_device+0x63/0x92
    May  4 15:36:34 Watchtower kernel: device_driver_attach+0x37/0x50
    May  4 15:36:34 Watchtower kernel: __driver_attach+0x95/0x9d
    May  4 15:36:34 Watchtower kernel: ? device_driver_attach+0x50/0x50
    May  4 15:36:34 Watchtower kernel: bus_for_each_dev+0x70/0xa6
    May  4 15:36:34 Watchtower kernel: bus_add_driver+0xfe/0x1af
    May  4 15:36:34 Watchtower kernel: driver_register+0x99/0xd2
    May  4 15:36:34 Watchtower kernel: ? 0xffffffffa05e4000
    May  4 15:36:34 Watchtower kernel: i915_init+0x58/0x6b [i915]
    May  4 15:36:34 Watchtower kernel: do_one_initcall+0x71/0x162
    May  4 15:36:34 Watchtower kernel: ? do_init_module+0x19/0x1eb
    May  4 15:36:34 Watchtower kernel: ? kmem_cache_alloc+0x108/0x130
    May  4 15:36:34 Watchtower kernel: do_init_module+0x51/0x1eb
    May  4 15:36:34 Watchtower kernel: load_module+0x1b18/0x20cf
    May  4 15:36:34 Watchtower kernel: ? map_kernel_range_noflush+0xdf/0x255
    May  4 15:36:34 Watchtower kernel: ? __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: ? _cond_resched+0x1b/0x1e
    May  4 15:36:34 Watchtower kernel: __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: do_syscall_64+0x5d/0x6a
    May  4 15:36:34 Watchtower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    May  4 15:36:34 Watchtower kernel: RIP: 0033:0x1537c13cb09a
    May  4 15:36:34 Watchtower kernel: Code: 48 8b 0d f9 7d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c6 7d 0c 00 f7 d8 64 89 01 48
    May  4 15:36:34 Watchtower kernel: RSP: 002b:00007fffcf1b7de8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
    May  4 15:36:34 Watchtower kernel: RAX: ffffffffffffffda RBX: 0000000000429e40 RCX: 00001537c13cb09a
    May  4 15:36:34 Watchtower kernel: RDX: 000000000041c368 RSI: 00000000002fa9b8 RDI: 0000000000d63f60
    May  4 15:36:34 Watchtower kernel: RBP: 0000000000d63f60 R08: 000000000042701a R09: 0000000000000000
    May  4 15:36:34 Watchtower kernel: R10: 0000000000427010 R11: 0000000000000246 R12: 000000000041c368
    May  4 15:36:34 Watchtower kernel: R13: 0000000000000000 R14: 0000000000429f50 R15: 0000000000429e40
    May  4 15:36:34 Watchtower kernel: ---[ end trace 59cc9b3ebf49a877 ]---
    May  4 15:36:34 Watchtower kernel: ------------[ cut here ]------------
    May  4 15:36:34 Watchtower kernel: crtc active state doesn't match with hw state (expected 0, found 1)
    May  4 15:36:34 Watchtower kernel: WARNING: CPU: 4 PID: 5292 at drivers/gpu/drm/i915/display/intel_display.c:14330 intel_atomic_commit_tail+0xc73/0x101e [i915]
    May  4 15:36:34 Watchtower kernel: Modules linked in: i915(+) iosf_mbi i2c_algo_bit drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops nct6775 hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables intel_wmi_thunderbolt wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd btusb btrtl btbcm glue_helper btintel rapl i2c_i801 intel_cstate bluetooth nvme ahci i2c_smbus intel_uncore nvme_core i2c_core libahci e1000e input_leds ecdh_generic led_class ecc video wmi backlight button acpi_pad
    May  4 15:36:34 Watchtower kernel: CPU: 4 PID: 5292 Comm: modprobe Tainted: G        W         5.10.28-Unraid #1
    May  4 15:36:34 Watchtower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z590 Phantom Gaming 4/ac, BIOS P1.30 02/01/2021
    May  4 15:36:34 Watchtower kernel: RIP: 0010:intel_atomic_commit_tail+0xc73/0x101e [i915]
    May  4 15:36:34 Watchtower kernel: Code: b5 48 01 00 00 0f b6 90 48 01 00 00 40 38 d6 74 1b 80 3d 02 9e 0e 00 00 48 c7 c7 53 4d 55 a0 0f 84 97 03 00 00 e8 64 0d 1e e1 <0f> 0b 41 0f b6 97 ec 03 00 00 0f b6 b5 48 01 00 00 40 38 f2 74 1b
    May  4 15:36:34 Watchtower kernel: RSP: 0018:ffffc9000114f8c8 EFLAGS: 00010282
    May  4 15:36:34 Watchtower kernel: RAX: 0000000000000000 RBX: ffff8881012d5000 RCX: 0000000000000027
    May  4 15:36:34 Watchtower kernel: RDX: 00000000ffffefff RSI: 0000000000000001 RDI: ffff888664718920
    May  4 15:36:34 Watchtower kernel: RBP: ffff88814579d000 R08: 0000000000000000 R09: 00000000ffffefff
    May  4 15:36:34 Watchtower kernel: R10: ffffc9000114f6f8 R11: ffffc9000114f6f0 R12: ffff8881012d5000
    May  4 15:36:34 Watchtower kernel: R13: ffff888145780000 R14: ffff888145780000 R15: ffff8881012d2000
    May  4 15:36:34 Watchtower kernel: FS:  00001537c1288740(0000) GS:ffff888664700000(0000) knlGS:0000000000000000
    May  4 15:36:34 Watchtower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May  4 15:36:34 Watchtower kernel: CR2: 0000000000673158 CR3: 0000000138340002 CR4: 00000000007706e0
    May  4 15:36:34 Watchtower kernel: PKRU: 55555554
    May  4 15:36:34 Watchtower kernel: Call Trace:
    May  4 15:36:34 Watchtower kernel: ? flush_workqueue+0x29b/0x2bf
    May  4 15:36:34 Watchtower kernel: intel_atomic_commit+0x272/0x280 [i915]
    May  4 15:36:34 Watchtower kernel: intel_modeset_init+0x106/0x1b8 [i915]
    May  4 15:36:34 Watchtower kernel: i915_driver_probe+0x8b1/0xb32 [i915]
    May  4 15:36:34 Watchtower kernel: ? rpm_resume+0x9a/0x3d6
    May  4 15:36:34 Watchtower kernel: i915_pci_probe+0xf8/0x118 [i915]
    May  4 15:36:34 Watchtower kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe
    May  4 15:36:34 Watchtower kernel: ? __pm_runtime_resume+0x64/0x71
    May  4 15:36:34 Watchtower kernel: local_pci_probe+0x3c/0x7a
    May  4 15:36:34 Watchtower kernel: pci_device_probe+0x140/0x19a
    May  4 15:36:34 Watchtower kernel: ? sysfs_do_create_link_sd.isra.0+0x6b/0x98
    May  4 15:36:34 Watchtower kernel: really_probe+0x157/0x341
    May  4 15:36:34 Watchtower kernel: driver_probe_device+0x63/0x92
    May  4 15:36:34 Watchtower kernel: device_driver_attach+0x37/0x50
    May  4 15:36:34 Watchtower kernel: __driver_attach+0x95/0x9d
    May  4 15:36:34 Watchtower kernel: ? device_driver_attach+0x50/0x50
    May  4 15:36:34 Watchtower kernel: bus_for_each_dev+0x70/0xa6
    May  4 15:36:34 Watchtower kernel: bus_add_driver+0xfe/0x1af
    May  4 15:36:34 Watchtower kernel: driver_register+0x99/0xd2
    May  4 15:36:34 Watchtower kernel: ? 0xffffffffa05e4000
    May  4 15:36:34 Watchtower kernel: i915_init+0x58/0x6b [i915]
    May  4 15:36:34 Watchtower kernel: do_one_initcall+0x71/0x162
    May  4 15:36:34 Watchtower kernel: ? do_init_module+0x19/0x1eb
    May  4 15:36:34 Watchtower kernel: ? kmem_cache_alloc+0x108/0x130
    May  4 15:36:34 Watchtower kernel: do_init_module+0x51/0x1eb
    May  4 15:36:34 Watchtower kernel: load_module+0x1b18/0x20cf
    May  4 15:36:34 Watchtower kernel: ? map_kernel_range_noflush+0xdf/0x255
    May  4 15:36:34 Watchtower kernel: ? __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: ? _cond_resched+0x1b/0x1e
    May  4 15:36:34 Watchtower kernel: __do_sys_init_module+0xc4/0x105
    May  4 15:36:34 Watchtower kernel: do_syscall_64+0x5d/0x6a
    May  4 15:36:34 Watchtower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    May  4 15:36:34 Watchtower kernel: RIP: 0033:0x1537c13cb09a
    May  4 15:36:34 Watchtower kernel: Code: 48 8b 0d f9 7d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c6 7d 0c 00 f7 d8 64 89 01 48
    May  4 15:36:34 Watchtower kernel: RSP: 002b:00007fffcf1b7de8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
    May  4 15:36:34 Watchtower kernel: RAX: ffffffffffffffda RBX: 0000000000429e40 RCX: 00001537c13cb09a
    May  4 15:36:34 Watchtower kernel: RDX: 000000000041c368 RSI: 00000000002fa9b8 RDI: 0000000000d63f60
    May  4 15:36:34 Watchtower kernel: RBP: 0000000000d63f60 R08: 000000000042701a R09: 0000000000000000
    May  4 15:36:34 Watchtower kernel: R10: 0000000000427010 R11: 0000000000000246 R12: 000000000041c368
    May  4 15:36:34 Watchtower kernel: R13: 0000000000000000 R14: 0000000000429f50 R15: 0000000000429e40
    May  4 15:36:34 Watchtower kernel: ---[ end trace 59cc9b3ebf49a878 ]---
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
    May  4 15:36:34 Watchtower kernel: [drm] Initialized i915 1.6.0 20200917 for 0000:00:02.0 on minor 0
    May  4 15:36:34 Watchtower kernel: ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
    May  4 15:36:34 Watchtower kernel: input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input10
    May  4 15:36:34 Watchtower kernel: i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
    May  4 15:36:34 Watchtower root: 
    May  4 15:36:34 Watchtower root: ---Intel Kernel Module successfully enabled!---

     

     

  23. Alrighty - first and foremost: Your data probably isn't gone.

     

    So long as you're careful, it probably won't be.

     

    Now, scanning your logs I don't see a direct cause - walk me through how this happened in more detail; When you say made a backup and restored the files, what exactly did you do? How did you handle the License change? Is your Array still starting, and does it still contain all of the same disks?

     

    Unrelated; Unraid can't see SMART data for your large drive, because it's behind what's technically a RAID controller. Poeple will be quick to point out that this isn't ideal, for a few reasons. There are performance implications, as well as not being able to detect SMART data from the drive and warn you of pre-failure states. If you can, connect the disk to a motherboard connector - it'll likely perform just as well, sometimes better. If you can't use a motherboard port, the specific controller you're using can be cross-flashed to IT-mode firmware. If you didn't understand that, don't try to do it - maybe just buy a specific HBA card, I could link you to some suggestions. If you did understand the bit about cross-flashing but need more information, let me know.

     

    Also I'm assuming you're not using the RAID controller to expose a RAID array to Unraid -- if you are, this goes against the intended function of Unraid and we should discuss things further.

     

    (Just FYI, it's near the end of my night. I apologize, but I'm only going to be here a little longer tonight. I'll be back tomorrow, if someone hasn't helped you resolve this before then. I apologize.)

  24. 2 hours ago, JKunraid said:

    My understanding is Unraid supports btfrs rather than zfs natively and that it also has the ability to add drives of different sizes to the array because it uses parity drives.

    Not quite. I broke down a lot of this information in another post, but it was more aimed at cache comprehension. The writeup will still help, but here's the TL;DR --

    - Unraid stores data in regular single-partition-per-drive XFS filesystem format, by default. It combines them into a single "storage pool" using a sort of layered filesystem controller. It provides Parity to a jagged array by requiring the Parity disk to be the largest, and padding smaller disks in virtual space for parity calculations, IE pretending the disk is fully written with 0xFF or 0x00 any time it's accessed beyond its bounds, effectively (virtually) making all drives the same size as the Parity drives. It uses a seprate process (the in-kernel md system) to maintain Parity disks.

     

    - Unraid does not meaningfully use the BTRFS or ZFS filesystems in any multi-disk way, but it won't stop you from doing so separate from the array itself using third-party tools installed via various plugins. Honestly though, I've used both BTRFS and ZFS for massive multi-disk filesystems and power failures caused complete data loss on both eventually. Unraid will never fail in that manner because it only uses single-disk filesystems. At worst (ABSOLUTE worst, and this is by far an unusual case) you will lose only the data and files physically stored on any disk or disks that are lost if you break Parity's capacity to recover. IE, if you have two parity disks and smash three disks with a hammer, you lose the data on those three disks. If you have two parity disks and only smash two data disks with a hammer, you lose nothing but time while the array rebuilds -- it'll even emulate the missing disks (at a performance hit) until you GET new ones.

     

     

     

    2 hours ago, JKunraid said:

    1.  Can you create triple fault tolerance with Unraid?

     

    - NO. (To my knowledge.) Unraid supports up to two parity disks. These disks will protect any combination of array disks lost, up to two. My current understanding is that there is no Unraid configuration which will support three simultaneous failed devices.

     

    That said;

    2 hours ago, JKunraid said:

    2. And if so, up to how many drives in a single server can Unraid support

    Unraid supports 30 managed storage devices. That's up to 2 parity disks, with the rest being data. It also supports cache pools. If you have more disks connected, you can access them individually using the Unassigned Devices plugin to create per-disk shares, or perhaps pool them manually, but clearly the "Good Stuff" stops at 30.

     

    2 hours ago, JKunraid said:

    3.  And if so how many drives would I lose to partity drives

    None, one, or two. The SOLE requirement of your Parity disk(s) is size. A disk used as Parity must be equal to or greater than the size of every other disk in the array and able to perform IO in a roughly equivalent capacity. There are no requirements on quantity or size of Parity drives per-terabyte or per-disk. Two parity drives will protect one to 28 data disks of any size.

     

     

     

    21 minutes ago, itimpi said:

    you can add extra drives as pools using the BTRFS specific implementation of RAID support and each pool can support up to 30 drives.   The number of drives you lose to parity information depends on the RAID variant supported but it tends to be at least half if you want redundancy so much more than in the main array.

    Good information, I did not cover this -- also was unaware. (To be clear though, this is "outside" the UnRAID array aka in addition to it, if I understand correctly.)

     

     

    Here's the post where I went on a deep dive through how Unraid works to explain why Cache matters to someone who was asking an entirely different question (lol) but it should help with comprehension if you're interested;

     

     

     

    If you have any other questions, I'll be happy to help.

×
×
  • Create New...