civic95man

Members
  • Posts

    224
  • Joined

  • Last visited

Everything posted by civic95man

  1. Do these error only come up when the VM is active? maybe a second cheap video card if you have room on the board have you tried the 6.9 beta yet? I would only proceed down that route with caution as it isn't declared stable yet. Maybe 6.9beta1 so you aren't messing with pools yet. Last resort you could use a phone or tablet if available to manage the VMs
  2. your MCE seems to stem from this MCA: Generic CACHE Level-2 Eviction Error but it was corrected so I guess no real issue and it seems to be the only occurrence of it in your system. You could try accessing the IPMI logs and see if anything else was reported there. I would monitor it further and make sure that it doesn't show up again. I would also make sure that nothing is overclocked (CPU). You could try to update the BIOS and see if that offers any improvement with updated microcode (you're two releases behind), but I'm not a poweredge user so I don't know if that would open up another can of worms.
  3. Could you try booting the unraid flash drive on a different computer to rule out a flash drive issue?
  4. I didn't see anything listed for an order so i can only assume it doesn't matter. Mine (supermicro mobo) was accessed via the BMC IP address. Within that, it gave me the option to both update the bios there and the firmware. It may also include it's own installer/updater (haven't checked) but it would most likely require a windows/dos environment.
  5. That's from the nvidia plugin which calls the nvidia-smi to get a listing of the available cards. Since you're on stock, that program is nolonger there, hence the error. Its harmless. It will go away if you uninstall the plugin.
  6. It might have been related to when you were using the nvidia build, or it could be from the plugin. if you don't plan on ever using that build again then you can delete the plugin. I see, yes I would have never noticed that! So I had another thought just now, you probably need to update both the BIOS **and** the BMC firmware at the same time for the onboard video to work again. And then hunt through the BIOS menus for that hidden option.
  7. The stubbing is a kernel parameter which is passed at the time the kernel is loaded. So the kernel shouldn't try to touch that card, besides binding vfio to it so nothing else will use it, besides the VM. no stupid questions
  8. This should prevent the card from loading any drivers and therefore the kernel will ignore this card as an option. It *should* grab the next available video adapter which would be the onboard. If like you say, the BIOS refuses to make the onboard video the primary adapter, then you might lose any POST messages and boot menu options. It could also be that the option of selecting the onboard video has moved to another menu in the BIOS. With that said, I looked up your mobo and didn't see that it had any onboard video?!?!?
  9. I assume your are passing that card to your VM? Is it not stubbed?
  10. or the batteries need to be calibrated
  11. Well, the next step in troubleshooting would be to boot the system in "safemode" which prevents any add-ons from loading. You could also disable VMs and docker. Then, after your system runs stable with no further page allocation failures, you slowly enable one thing at a time, run for a while to check stability, and repeat. Have you checked if you're using the latest BIOS for your board? It looks like there is a newer version available. This could very well be a BIOS issue in the way the memory is mapped.
  12. Looking at your logs, you need to decode the reason why it was tainted. The letters after it says "Tainted" in the call trace indicate why. G indicates a proprietary module was loaded, W says that a kernel warning was issued at the time the module was loaded, and O means that it was an externally built module, or out-of-tree. Looking further back in your logs when your system comes up shows that the "igb" module is loaded and a warning is issued that it taints the kernel. That module is the driver for your network adapter and was added by limetech to the build. Seeing that such a large number of people use that same adapter, this can be safely ignored. It basically just lets the kernel developers know that if you submit a bug report about the kernel - not the unraid developers - that you have a unapproved configuration. In this case, it does not affect your system.
  13. your logs are filled with the multiple BAR issue: SERVERUS kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] While it is supposed to be harmless and just informative, it may cause an issue if the resources overlap. It seems to be caused by a buggy BIOS. Have you tried updating the BIOS? if that fails, are you able to move your video card to another slot?
  14. From what I understand, this isn't something to really "worry" about. Basically, it looks like some process tried to grab 2^4 pages of memory and failed, but was able to get it another way. This seems to result in the way the memory is mapped in your system and shouldn't be anything to indicate a failure or problem. The "order" means it tried to grab 2^4 pages but failed. Apparently if it fails when trying to grab 2^3 pages then the kernel initiates the OOM process. Now if the order is 0, then you have a problem and are truly out of memory. The call trace is there to "help" you figure out why the memory allocation failed. I found all of this information from this page https://utcc.utoronto.ca/~cks/space/blog/linux/DecodingPageAllocFailures There also seems to be ways to help mitigate this but it depends on how much you want to play with options. I guess you could also add more memory too?? oh, and the parts of the log that say the kernel is "tainted" has nothing to do with the memory allocation errors, they are due to the proprietary modules loaded for the nvidia build. Although, the memory issues don't cause the kernel to become tainted, the tainted kernel *could* cause the memory issues. It might be best to boot with stock unraid (not the nvidia build) and see if this still happens. If is still does then try safe mode and work backwards.
  15. Yes. unassigning and starting then stopping the array causes unraid to "forget" the drive previously in slot. Otherwise it would think the "old" drive should be there and complain that the "new" drive is the wrong one
  16. I have two cards in my system: a P2000 used exclusively for transcoding by plex, and a retired GTX 970 that I pass through to a windows 10 VM for occasional media use and gaming. I usually keep the VM spun down when not needed but sometimes I might forget and just leave it on. In either case, the passed through GTX 970 *seems* to be in a powered down state when the VM is shutdown (i.e. not significantly drawing power). Even with the VM running, the card doesn't seem to affect the power draw much - maybe 10 W or so difference as measured by my UPS. Likewise, the P2000 card which the system actively has control over also seems to be in a lower power state. Anyway, to answer your question, no my card does not run full speed when the VM is powered down so yours should be fine.
  17. By any chance, does this only come up after a reboot (frozen/hung system or otherwise)? If you login to the IPMI and look at the logs, does it show anything with more detail? A lot of the references I find to HT link sync errors involve people overclocking their rig and pushing the limits. The memory ECC error though doesn't look good.
  18. By any chance is any part of it overclocked? That includes CPU, RAM, chipset, etc. Is the power supply confirmed good? Do you have another power supply to test?
  19. has nothing to do with unraid. Sounds like a hardware failure or incompatible hardware or firmware/BIOS is outdated. Make sure those CPUs are compatible and check the same for the memory. See if there is a BIOS update which addresses this. You said you changed the motherboard? It sounds like to me that the board is damaged (like a bent pin on the CPU socket).
  20. a real quick search says you could add --mac-address XX:XX:XX:XX:XX:XX to the run command. That should give you a static mac address for that container. I'll try this tonight if I remember. EDIT: Be sure the mac address is unique to your network (i.e. use the one that docker already assigned to that container)
  21. You can *try* to assign the container a static IP with your router using a reserved list based on MAC address (via DHCP) - assuming you are using a bridge interface. But I've heard that when the container gets destroyed, the MAC address changes as well, but I never investigated this further. I do this with PLEX so that I have a static IP for portforwarding without the risks/drama associated with those call traces.
  22. ACS override is sort of a black magic by trial and error. You can try each combination until you find something that splits the groups how you want, if possible. Sometimes, there are groups which just can't be split by any means. Just be sure not to try and pass through the USB controller that your unraid flash is attached to!
  23. try reading this: Granted, that OP is trying to passthrough a GPU rather than USB, it may still be applicable. In that case, they had to downgrade the MB bios. Otherwise, backup the flash and maybe give the 6.9-beta1 a try - seems to include a lot of Ryzen fixes with the newer kernel.
  24. Thanks for the help @S80_UK that link was a life saver and I wouldn't have been able to update the firmware without it. Pro tip: don't use the UPS outlets while updating the firmware, the outlets shut off during the update. The update pushed me to the latest firmware (9.4) and I was able to enable MODBUS under the UPS configuration menu. From there it was just a matter of setting up unraid to use the UPS (cable->USB, type->MODBUS). Quick note: it will take a while for the UPS and unraid to talk and update the status. That was my problem as I was getting impatient, thinking that it wasn't working. Just leave it and it should work. Thanks again