[Support] ich777 - AMD Vendor Reset, CoralTPU, hpsahba,...


Recommended Posts

Hello @ich777

 

at first thanks for providing such a number of great plugins and apps to the community, I really appreciate it! 

 

I am running Unraid 6.10.3 on a QNAP TS-453Be, latest BIOS, and tried the QNAP-EC plugin to control the fan. It works (with fancontrol via terminal, or with the dynamix fan control). 

 

But I recognized that the plugin is reporting errors to syslog every time the fan speed is changed: 

 

Aug 19 08:49:44 QUBE qnap-ec[27142]: calling ec_sys_set_fan_speed function with 0 and 64 arguments
Aug 19 08:49:44 QUBE qnap-ec[27142]: unexpected call to simulated Ini_Conf_Get_Field_Int function
Aug 19 08:49:44 QUBE qnap-ec[27142]: function ec_sys_set_fan_speed returned 0
Aug 19 08:49:49 QUBE autofan: Highest disk temp is 36C, adjusting fan speed from: OFF (0% @ 0rpm) to: 64 (25% @ 638rpm)

 

So syslog space will be full in a few days. I also tried to change the fan control behavior in the bios, but "automatic" and "manual" resulting in the same error. 

  • Like 1
Link to comment
1 hour ago, flobit said:

But I recognized that the plugin is reporting errors to syslog every time the fan speed is changed: 

I would recommend that you report that to the developers from the module itself, maybe they know what the cause is here and can maybe fix this.

 

There is also another way to hide this message from the syslog but I would rather recommend by reporting it first on their GitHub since this is the real solution to this.

 

Anyways here is the GitHub Issue tracker and they respond usually really quick (within two days), you can of course mark me there by simply typing in '@ich777' (without quotes) so that I can see what the result of this is.

  • Like 1
Link to comment

Having trouble with the coral plugin here.

The pci coral card shows up in system devices / lspci, but the coral driver plugin says no devices detected. And the apex driver is loaded.

This was previously working and has stopped working after installing a new GPU (without touching the coral card at all). I've tried uninstalling the plugin, rebooting, then reinstalling and rebooting again. Diagnostics are attached from after this process.

Any ideas?

 

lsmod | grep apex
apex                   16384  0
gasket                 98304  1 apex


 

lspci -vv
04:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU (prog-if ff)
        Subsystem: Global Unichip Corp. Coral Edge TPU
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 37
        IOMMU group: 22
        Region 0: Memory at <ignored> (64-bit, prefetchable)
        Region 2: Memory at <ignored> (64-bit, prefetchable)
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
                Vector table: BAR=2 offset=00046800
                PBA: BAR=2 offset=00046068
        Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [f8] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
        Capabilities: [108 v1] Latency Tolerance Reporting
                Max snoop latency: 1048576ns
                Max no snoop latency: 1048576ns
        Capabilities: [110 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=32768ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [200 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel modules: apex

 

krieger-diagnostics-20220827-0053.zip

Link to comment
2 hours ago, MystX said:

The pci coral card shows up in system devices / lspci, but the coral driver plugin says no devices detected. And the apex driver is loaded.

I see that the driver is loaded here:

04:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
    Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
    Kernel modules: apex

 

What is the output from:

ls /dev/apex_0

 

Link to comment
22 hours ago, ich777 said:

Do you maybe have another PCIe slot so that you can swap slots for testing purposes?

Does the TPU work again if you pull out the new GPU?

There aren't any more that aren't covered by GPUs. And I haven't moved the slot the coral card is in.

I'll pull out the 2nd GPU and see what happens. It did cross my mind that I might have run out of PCIe lanes, but as I understand it the bottom slot that the coral card is in is run through the chipset?
(If you're interested, I went from having a 2070Super + GTX750 + coral to 3080 + 2070Super + coral. All on a x570 mobo and Ryzen 5900X).

 

EDIT: Removing the 2070 worked, and /dev/apex_0 shows up. Not sure why that is..

Edited by MystX
  • Like 1
Link to comment
On 8/28/2022 at 5:07 PM, MystX said:

EDIT: Removing the 2070 worked, and /dev/apex_0 shows up. Not sure why that is..

For future googlers, I was able to get both GPUs + coral working by disabling CSM and secure boot in the bios and switching over unraid to boot via UEFI (rename "EFI-" folder in /boot to "EFI").

  • Like 1
Link to comment
  • 2 weeks later...

Hi ich777, I have been battling fan speed control issues for months.

I have an it8628-isa-0a40, I detailed my issues here:

https://github.com/lm-sensors/lm-sensors/issues/373

Today I updated Unraid to 6.10 and saw in the app store your driver and tried it, but unfortunately it didn't work :( Can you assist? BTW, do you have another method instead of paypal?

 

Edit: I checked https://post.smzdm.com/p/a270z6g2/ again and I found that my IOMMU settings which Unraid added messed up the order of my acpi_enforce_resources=lax setting. I fixed it, rebooted and set sudo modprobe it87 force_id=0x8628, and the devices re-appeared, so I will re-follow the guide. I created a user script "fix_it8686E" which includes the sudo command and i run it "At First Array Start Only", but I'm concerned this will still wipe out the GUI settings on each reboot. Is there a better way?

Edited by byb
Added information
Link to comment
2 hours ago, byb said:

Unraid added messed up the order of my acpi_enforce_resources=lax setting

Can you post what Unraid messed up exactly please?

 

2 hours ago, byb said:

sudo modprobe it87 force_id=0x8628

This is ultimately not necessary and won‘t work properly… Just add this to your syslinux.conf:

it87.force_id=0x8628

and remove the script from your autostart, reboot afterwards.

 

2 hours ago, byb said:

Can you assist?

Also please maybe share your Diagnostics if possible since I can't tell much...

 

2 hours ago, byb said:

BTW, do you have another method instead of paypal?

GitHub Sponsors is alos something where you can Donate if you want to: Click

Link to comment
45 minutes ago, J05u said:

Do i need to re-install system temp?

I am not sure if new driver is loaded, not the old one as by name they looks the same

The driver name is the same, I would revommend that you uninstall the System Temp plugin, reboot, install the system plugin again and try it again, it‘s not guaranteed that this modified driver is working.

Link to comment

New IT87 driver working as a charm, really thanks ich777.

 

Gigabyte B560M DS3H v2 motherboard, IT8689 chip. Just installed your driver, set acpi_enforce_resources=lax  and it87.force_id=0x8689 on the append section of the default entry on syslinux.cfg and Dynamix System Temps now shows IT87 temps and fan speeds.

 

Really thanks, ich777.

  • Like 1
Link to comment
17 hours ago, PsychoRS said:

New IT87 driver working as a charm, really thanks ich777.

I'm running a Gigabyte Aorus B560I, I have set the same parameters as you in the syslinux, but my Dynamix System Temps still only has coretemp.

 

image.thumb.png.d81992cd41ee6d4340153dd66d92e09d.png

 

 

When I run sensors-detect I get this:

 

image.thumb.png.c9bddc33e5fb70153b8938cf7df492ea.png

 

I had a previous workaround where I added "modprobe it87 force_id=0x8628" to the go file, which gave me the fan speeds, but no fan control. Instead of using force_id=0x8689 should I be using 8628? I cant actually remember where i found 8628, it was some time ago. I figured when looking at the sensors detect showing 8689 then that should be what I used...

Link to comment
44 minutes ago, eatoff said:

I had a previous workaround where I added "modprobe it87 force_id=0x8628" to the go file

Please add it87.force_id=0x8628 to the syslinux.conf and remove the line from your go file and reboot afterwards, after that run sensors detect.

 

In your case I would recommend that you use:

it87.force_id=0x8689

in your syslinux.conf because you have a different board/chip than @PsychoRS.

Link to comment
30 minutes ago, ich777 said:

Please add it87.force_id=0x8628 to the syslinux.conf and remove the line from your go file and reboot afterwards, after that run sensors detect.

 

Done this, I had already removed anything from the go file, but changed the syslinux.conf to 8628

 

Unfortunately, still the same result:

 

image.thumb.png.b81d0c7436b48dde65b6fa152396f6e7.png

 

Did I have it in the correct place in my first screenshot? straight after "resources=lax"?

 

EDIT FOR CLARITY: I have now tried it87.force_id=0x8628 and it87.force_id=0x8689 with the same result.

Edited by eatoff
Link to comment
4 minutes ago, eatoff said:

Did I have it in the correct place in my first screenshot? straight after "resources=lax"?

This doesn't matter where you put it since it's only passed over to the modules/kernel itself and he knows then what to do, no matter what order you place it.

 

4 minutes ago, eatoff said:

Done this, I had already removed anything from the go file, but changed the syslinux.conf to 8628

It would be better to change it to your hardware ID which is 0x8689

 

Please post your Diagnostics.

Link to comment
On 9/13/2022 at 3:37 PM, ich777 said:

Have you also rebooted?

Please try to load the module from the command line with:

modprobe it87

and try to run Detect again.

Yeah, I have rebooted after every change to the sysconfig file.

 

This is my result:
image.thumb.png.f5940b01782b0bbe16a1443c19c6b54a.png

 

This now does have the fans available in the system temp:

 

image.png.ea1887ae4e9931385f5fa9553ad069e4.png

 

When i Detect for available drivers, nothing comes up apart from coretemp. Am I going to have to modprobe at startup each time to get this working? Or maybe this was working previously, but i was expecting it to come up with the detect button. I'll admit i didnt check the fan speed options since i saw it wasnt picking up the driver module in the available drivers section.

 

EDIT: After a reboot, I DO need to modprobe it87 otherwise all the sensors disappear on a reboot.

 

Follow up EDIT: REMOVED, modprobe is required after a reboot

Edited by eatoff
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.