HumanTechDesign

Current Status of ASPEED IPMI KVM and iGPU

HumanTechDesign replied to HumanTechDesign's topic in Hardware

Updated to 6.12.10 from last night and the problem remains. However, I would have also been surprised because 6.12.10 specifically rolled back the kernel to 6.1.79 (which is lower than the kernel from 6.12.9 but still higher than the kernels after Unraid 6.12.4). I will wait for the next major kernel jump (AFAIK should happen with the upcoming next major Unraid release). If that does not change the situation, I will specifically file bug report. Hopefully, Limetech can figure it out.

UnRAID on Asus Pro WS W680-ACE IPMI

HumanTechDesign replied to NAS-newbie's topic in Motherboards and CPUs

If this is related to the mATX version of the board, we are discussing this problem in this thread:

Nur C-State C2 mit ASUS Pro WS W680M ACE SE

HumanTechDesign replied to HumanTechDesign's topic in Deutsch

Danke für den Hinweis! Sorry für den missverständlichen Ablauf. Hätte einen Edit zu dem Post davor machen sollen: Der war der Übeltäter. ADSB-Tracking ist jetzt wieder umgezogen auf einen Raspberry Pi. Bei C6 ist es am Ende nicht geblieben, weil ich dann wie gesagt die M.2 Slots wieder belegt habe. Meine Beobachtungen sehen daher jetzt ungefähr so aus: Mit fr24feed-piaware docker (und anderen containern): nur C2 und im Spindown auf minimum 33W Ohne fr24feed-piaware docker (und anderen containern) und dem PCH M.2 Slot belegt: C6 und im Spindown minimum 22W Ohne fr24feed-piaware docker (und anderen containern) und mit beiden M.2 Slots belegt: C3 und im Spindown minimum 25W Der Hauptgewinn in der real world Umsetzung ist der konsequente Umzug von "heißen" Daten (Nextcloud etc.) vom großen Pool auf den Toshibas und Exos auf die M.2. Ich habe mich früher immer vom Spindown ferngehalten, weil ich 1. bei früheren Synologys die Erfahrung hatte, dass die wegen jedem Mini-Datenpaket im Netzwerk aufgewacht (oder gar nicht schlafen gegangen) sind und 2. ich immer noch nicht ganz sicher bin, ob häufiges Aufwecken jetzt nicht doch schlecht für die drives ist. Ich war quasi schon immer mit meiner Unraid Box auf ZFS, sodass ich niemals das Erlebnis von "nur eine Platte läuft an" hatte, sondern wenn schon immer gleich der ganze zpool lief. Nach dem Umzug auf die M.2s schläft der große pool jetzt auch tagsüber 80% der Zeit. Ich werde mal schauen, ob ich in Zukunft noch einmal Arbeit reinstecke und noch weiter nach Potenzial suche, aber zum aktuellen Zeitpunkt bin ich eigentlich okay damit, würde ich sagen. Warum ich selbst unter den besten Bedingungen nicht über C6 hinauskomme (selbst wenn bis heute nichts in den PCIe-Slots steckt und alle angeschlossenen Geräte grundsätzlich ASPM, DIPM etc. mitbringen), kann ich nicht genau sagen, aber für mich wäre das erstmal okay, wie es jetzt ist. Ich danke euch aber für eure Hilfestellungen und den guten Austausch! Und ein riesen Lob und Danke an @mgutt - ohne Dich und Deine Hilfestellungen hätte ich vorher nicht mal gewusst, wo ich anfangen müsste zu suchen!

Reduce power consumption with powertop

HumanTechDesign replied to mgutt's topic in User Customizations

What are the other devices in your build? ASPM support etc.? I am on the same board (long time no see) and without an M.2 in the CPU slot (top right) I can reach C6, when there is an M.2 installed in the top right slot (CPU), I can reach C3. I have documented some of my findings in the German forum (maybe Google translate can help):

VM mit Win10 inkl. Wake on Lan oder ähnliches

HumanTechDesign replied to i-B4se's topic in Deutsch

Muss mich hier nochmal kurz dranhängen - läuft bei euch das WoL plugin unter 6.12.8? Ich habe WoL vorher nicht benutzt und wollte es jetzt einsetzen. Kann das Plugin auch problemlos installieren, allerdings funktioniert es nicht: der Libvirt wake on lan service in den VM settings steht dauerhaft auf stopped (auch nach Neuinstallation des Plugins, ein-/ausschalten etc.). Im Syslog ist der Wechsel auch zu sehen mit Mar 17 19:51:39 Tower ool www[9805]: /usr/local/emhttp/plugins/libvirtwol/scripts/wol_stop Mar 17 19:51:44 Tower ool www[7324]: /usr/local/emhttp/plugins/libvirtwol/scripts/wol_start Der Libvirt Virtual BMC direkt drunter in den Settings steht dauerhaft auf Running. Für mich sieht es aktuell auch so aus, dass das Plugin ja nicht mehr wirklich gepflegt wird, aber so wie das hier klingt, sollte es ja weiterhin funktionieren? Dachte, vielleicht liegt es an der Unraid Version.

Current Status of ASPEED IPMI KVM and iGPU

HumanTechDesign replied to HumanTechDesign's topic in Hardware

Interesting. Maybe it has something to do with the fact that on the ATX version of the board, the IPMI is handled via the external card instead of a BMC that is directly soldered to the board. Nevertheless, your setup contradicts my assumption in the first post. It would be interesting to hear about other boards with soldered BMCs (such as Supermicro etc.).

Current Status of ASPEED IPMI KVM and iGPU

HumanTechDesign replied to HumanTechDesign's topic in Hardware

But based on that comment, it seems like your iGPU is not used at all in this configuration. Do you see the iGPU in the system devices in that constellation? I believe the problem arises if only the BMC VGA device and the iGPU is available. Then (based on the above commit) the iGPU (or the BMC) gets blocked out. I would imagine that the dGPU is not even part of this chain and therefore gets treated differently. Therefore it would be interesting if you still have the BMC (KVM via Web) AND the iGPU available if you take out your dGPU (with Unraid after 6.12.4). Sorry if I sound so unconvinced. I am just really trying to understand if that Linux Kernel or Unraid or the BIOS/board is the issue.

Current Status of ASPEED IPMI KVM and iGPU

HumanTechDesign replied to HumanTechDesign's topic in Hardware

Meaning the dummy plug or the dGPU? It's interesting to hear this report. Do you still have everything available (on 6.12.8) when the dGPU is removed?

Current Status of ASPEED IPMI KVM and iGPU

HumanTechDesign replied to HumanTechDesign's topic in Hardware

Thank you for your report. If I remember correctly, you had to dummy plug the dGPU to get everything working correctly together - right? Did/could you try without the dGPU put into the server? I could imagine that a dGPU changes the behavior in this instance.

Nur C-State C2 mit ASUS Pro WS W680M ACE SE

HumanTechDesign replied to HumanTechDesign's topic in Deutsch

Wäre auch zu einfach gewesen: hab mir spontan zwei Verbatim Vi3000 geholt. Stecken jetzt in den beiden M.2 Slots. Leider kam es, wie ich befürchtet habe: nur noch C3 und wieder hoch auf ~64W bei angelaufenen Platten (Spin-Down noch nicht getestet). An sich scheinen die eigentlich alles zu unterstützen, was man sich wünschen könnte (ASPM L1, ASPT etc.). Soweit ich weiß ist allerdings der eine M.2-Slot direkt an die CPU angebunden und könnte da deshalb einen niedrigeren State verhindern. Muss mal schauen, ob ich das noch mit anderen SSDs in der Belegung testen kann. Frage in die Runde: wie habe ich den folgenden Output zu den L1 Substates zu deuten? root@tower:~# lspci -vv | grep 'L1SubCap' L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ root@tower:~# lspci -vv | grep 'L1SubCtl1' L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- Wenn ich das richtig verstehe, würden prinzipiell ALLE PCI links und devices L1.1 und L1.2 sprechen, aber keiner tut es - stimmt das? L1 scheinen sie nämlich (weiterhin) alle zu sprechen/aktiviert zu haben: 00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02) (prog-if 00 [Normal decode]) LnkCap: Port #5, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <16us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ 00:1a.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 (rev 11) (prog-if 00 [Normal decode]) LnkCap: Port #25, Speed 16GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- 00:1b.0 PCI bridge: Intel Corporation Device 7ac0 (rev 11) (prog-if 00 [Normal decode]) LnkCap: Port #17, Speed 8GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- 00:1b.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #???? (rev 11) (prog-if 00 [Normal decode]) LnkCap: Port #21, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <64us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ 00:1c.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 (rev 11) (prog-if 00 [Normal decode]) LnkCap: Port #1, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L1 <64us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ 00:1c.3 PCI bridge: Intel Corporation Device 7abb (rev 11) (prog-if 00 [Normal decode]) LnkCap: Port #4, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L1 <64us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ 00:1d.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 (rev 11) (prog-if 00 [Normal decode]) LnkCap: Port #9, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- 01:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 (DRAM-less) (rev 01) (prog-if 02 [NVM Express]) LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ 04:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 (DRAM-less) (rev 01) (prog-if 02 [NVM Express]) LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ 05:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-LM (rev 06) LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L1, Exit Latency L1 <4us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ 06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06) (prog-if 00 [Normal decode]) LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <32us LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+

Unraid OS version 6.12.8 available

HumanTechDesign replied to ljm42's topic in Announcements

I might have tracked down the issue in the Linux kernel. I am trying to gather more input on this issue in this thread: This should also be relevant to @mrhanderson.

HumanTechDesign started following Current Status of ASPEED IPMI KVM and iGPU March 12

Current Status of ASPEED IPMI KVM and iGPU

HumanTechDesign posted a topic in Hardware

After some discussion in the ASUS W680 board specific thread, I want to find out if other boards have the same problem with the current Unraid version (or rather specific Linux kernel). What is the matter? If you have a IPMI/BMC that uses the ast driver in Unraid (the problem turned up with the ASPEED AST2600, but this could be universal to other BMCs as well) and want to use the iGPU from the CPU (e.g. for a VM or docker), you run into the following situation: - If no device (or dummy plug) is inserted, you can verify the complete boot process of Unraid via the hardware VGA or via the KVM screen in your IPMI GUI but lose the iGPU in Unraid (e.g., when you want to use the Intel SR-IOV plugin). For me, it even crashes the boot process - If a device (or dummy plug) is inserted, you still keep it in Unraid, but the VGA/KVM stops updating after the blue boot loader screen of Unraid. This seems to be an issue that has popped up (at least with the board from the thread) only AFTER Unraid 6.12.4 (so 6.12.5 and beyond). I have found this kernel commit which seems to be describing exactly this behavior and could correlate with the kernel updates in the relevant Unraid releases. To find out if this is a general issue or specific to this board, I would now like to find people with the following constellation: - Updated Unraid to a release >6.12.4 - Use of an ASPEED BMC (or other BMC using the ast driver) - Use of BMC VGA/KVM (with full output until the CLI login) AND a detected iGPU in Unraid (with or without dummy plug) For reference: My setup is a ASUS Pro WS W680M-ACE SE with a 12600K. Multi-Monitor (and iGPU) is activated in the BIOS - BIOS settings don't seem to matter. I am currently on Unraid 6.12.8 (Linux Kernel 6.1.74) and can use SR-IOV with a dummy plug but KVM drops out after the bootloader. Other users with the exact same board report that they have KVM + iGPU/SR-IOV. However, they are still on 6.12.4 (Linux Kernel 6.1.49). If you want to find out, if your BMC in Unraid is adressed with ast, you can run lspci -v Then look for your BMC. The last lines should tell you the required part. For me, the (relevant) output looks like this: 06:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) (prog-if 00 [VGA controller]) Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family Flags: medium devsel, IRQ 19, IOMMU group 18 Memory at 84000000 (32-bit, non-prefetchable) [size=64M] Memory at 88000000 (32-bit, non-prefetchable) [size=256K] I/O ports at 4000 [size=128] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+ Kernel driver in use: ast Kernel modules: ast

UnRAID on Asus Pro WS W680-ACE IPMI

HumanTechDesign replied to NAS-newbie's topic in Motherboards and CPUs

I did some digging in the kernel commits (never done that before and I also don't have experience with the internals of Linux), but I found this: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8d6ef26501b97243ee6c16b8187c5b38cb69b77d If I read this correctly our issue actually is a feature instead of a bug (if this actually IS the cause). As far as I can tell it correlates with the kernel timeline in the Unraid releases. I have seen that there has been further development on the module/driver afterward but it would be interesting to see if this has been fixed or if it will stay that way from now on. This would suck because it takes away a lot of functionality. I can see this comment which gives me at least some hope https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/ast/ast_mode.c?id=8d6ef26501b97243ee6c16b8187c5b38cb69b77d#n1784 * FIXME: Remove this logic once user-space compositors can handle more * than one connector per CRTC. The BMC should always be connected. If I have the time, I will also boot up a live distro with a later kernel. However, as far as I can tell ALL boards with a BMC that is adressed with the ast driver (probably all ASPEED BMCs?) should run into this problem - not only this board. This should pop up in the whole server world? I have started a dedicated thread for that topic here:

UnRAID on Asus Pro WS W680-ACE IPMI

HumanTechDesign replied to NAS-newbie's topic in Motherboards and CPUs

I was searching through some threads after my post and I believe this could be true. I am on 6.12.8, so this change in 6.12.6 could very well affect me, yes. What I don't understand: I was already on 6.12.8 when I installed the board, and I could swear that I had it working with all three. I was trying to retrace the steps and as far as I remember, I might have installed the SR-IOV (and gputop etc.) plugin only after I had it working with all three (so iGPU recognised in Unraid but not using SR-IOV). What now breaks the setup seems to be the modified modprobe from the SR-IOV plugin (this throws an error message in the KVM, if the iGPU is not recognised). Interestingly, I don't have to blacklist the ASPEED GPU like others in this thread have to. I can still get to the point, where the plugins are loaded (after the mcelog line). I hope that either a kernel (or Unraid or BIOS) update fixes it in the future. Could you still tell me what your GPU settings in the BIOS are?

UnRAID on Asus Pro WS W680-ACE IPMI

HumanTechDesign replied to NAS-newbie's topic in Motherboards and CPUs

For those of you that have the mATX version of the board (ASUS Pro WS W680M-Ace SE): does anyone have the IPMI + iGPU working together? Either I have IPMI all the way to Unraid but get an error for the modprobe of SR-IOV or the updating of the IPMI screen drops out after the initial Unraid boot selection screen (with the SR-IOV working as intended). I don't have a dGPU and on the mATX IPMI is integrated directly into the board, so no external IPMI card. I have tried basically all of the combinations (+ dummy plug) in the BIOS but can't seem to set it up correctly. The frustrating part: I had it working before but lost the configuration after I needed to do a CMOS reset. Now I can't seem to get it back. Could you report, if you have that working together with SR-IOV and if so - what are your BIOS settings?

HumanTechDesign

Posts

Joined

Last visited

Recent Profile Visitors