Doridian

Members
  • Posts

    5
  • Joined

  • Last visited

Everything posted by Doridian

  1. From just skimming your logs: Since you are using VMs, are you using PCIe passthrough? If no, a good candidate to try is always to turn off VT-d (AMD-Vi, IOMMU, or whatever your BIOS calls it) in the BIOS and see if that improves your situation any. Some hardware just really hates being part of the IOMMU. If you are using PCIe passthrough, you can try editing your boot options (main tab, click flash) for the "Unraid OS" boot option. After the append initrd=/bzroot, try adding "iommu=pt". (See attached image, you can just edit in that textfield). Click apply and reboot. iommu=pt will only put devices in the IOMMU that you actually pass through, so hopefully not the problematic device (your NIC it seems from the one fault) //EDIT: I did this on mine because I know my NIC shows instability with the iommu on every device.
  2. Keep in mind one thing about unRAID and cache: Files exist either only in cache, or only in array (they get moved between them, not copied). Also, files that are actively in use do not get touched by the mover, like if your VM is running, the disk will be mounted and therefor deemed "untouchable". So, you pretty much have the choice (if your VM is running 24/7 and without external backups): - Keep your VM disk on cache only and have either data loss or make it a RAID1 cache pool - Keep your VM on the array only and deal with the speed ramifications this may cause
  3. I think I solved the issue: Setting iommu=pt intel_iommu=pt (to only translate / IOMMU hardware that is used inside VMs) made it stop spewing these messages. It seems the ConnectX-3 just doesn't like being IOMMU'd when used with docker? Weird, but okay. It works! (This is for anyone else with the same issue as something to try) //EDIT: IGNORE THIS. It just made it take longer...
  4. It does seem to be a known issue with these errors when you run docker containers on a bridge directly without VLAN (on 10G cards). (Well, the nf_conntrack_confirm calltrace seems to be, which I suspect is the cause of the eventual network stall/crash) So far I've been running stable without any calltrace/issues since removing the Docker on br0 (and keeping it on br0.X interfaces instead). See links such as: - https://forums.unraid.net/topic/101342-solved-69-rc2-kernal-panic-and-trace/ - https://forums.unraid.net/topic/97881-unraid-becomes-unresponsive-rendomlydocker-containers-crashing/ (There is a lot more, and people have seem to have various solutions to fix their issue, for some using only VLAN bridges doesn't seem to have fixed it and instead they needed to remove custom IPs, etc etc) However, I would like to be able to run docker containers on br0 which is why I am keeping this thread open to see if anyone has suggestions as for how do to that, or someone from the unRAID team could give me more instructions how to debug further / try new kernels / etc.
  5. My unRAID install is fairly new (on 6.9.2). Today, I suddenly could not reach my server anymore at all via network. I managed to pull some logs (diagnostics would just hang), like dmesg and syslog. The only way to reboot it was a hard reset, reboot would just hang as well. dmesg excerpt which seems most relevant pre-reboot: After a reboot, I also see a weird notification in dmesg (that I never got before when I ran Proxmox VE): Is this a known issue with Mellanox Connect-X 3 cards, since their driver seems to be within the trace? Is there anything I can do to fix this? I do have a couple docker containers on bridges (most in VLAN bridges, but one on the main non-VLAN bridge) //EDIT: In fact, researching this issue on past things (the second one with just the trace), if I don't run any containers on the "root bridge" (the one without VLAN), then I no longer get such traces. Is this known? Is there a fix, because I would really like to run some containers on that bridge without a VLAN attached //EDIT2: I should also add, I am operating the Mellanox NIC in 802.3ad bonded mode with 9000 MTU jumbo packets.