[Support] ich777 - AMD Vendor Reset, CoralTPU, hpsahba,...


Recommended Posts

On 9/27/2022 at 3:31 PM, ich777 said:

Nice, glad it's now sorted out.

Final update for all those following along at home; I can just use the it87.ignore_resource_conflict=1 and it all works now (screenshot below). No need to force id as per the V2 in that screenshot. Most elegant solution (IMO).

 

image.thumb.png.fac1d68ecf7f6188b6d2b9c0673324eb.png

  • Like 2
Link to comment
  • 2 weeks later...

Folks,

I don't read this list too often, so it may take me a while to respond, however, I will try to respond to DMs more quickly.

 

However, just to bring you up to date on a few changes and other notes,

  • firstly, with my driver it is better to use it87.ignore_resource_conflict=1 rather than acpi_enforce_resource=lax
  • however, the latest version doesn't even need that for boards I know it is safe for (unfortunately mainly ones I have direct access to currently).  More will be added as I get more information from people.
  • we are working on getting all this into the mainline, but it will be a very slow process, probably taking a year or more to see it get out.
  • and finally, I've looked at sensors-detect and realised it only works properly for chips in the mainline module, not mine, as it just lists all the chipsets I have added as "to-be-written".

So, how you can help is if you do find a sensor that is not currently supported, let me know and I'll see what can be done.

Secondly, if you find a board that still needs it87.ignore_resource_conflict=1 let me know the board time and the sensors that you are seeing and I'll look at adding it .

 

Regards

Frank

  • Like 1
  • Thanks 1
Link to comment
  • 2 weeks later...
11 hours ago, ich777 said:

Install it. Nothing more to do.

 

I would strongly suggest that you upgrade to 6.11.1

I installed this plugin, but nothing changed after installing and rebooting the system, I still have to use the command to reset the GPU.
I tried to upgrade to 6.11.1 but I'm having problems with my VMs after the upgrade.

Link to comment
21 minutes ago, DarphBobo said:

Really???  Huh...

 

It looks to me that you're running a root kit instead with the objective of pirating the OS, and an old version at that....

piracy is a purely individual thing, and what I use and for what purposes is already my concern.
You don't know what I have at home, if I said that I'm having problems with the new OS, it does not mean that the problem is with this OS

Link to comment

EDIT: Just downgraded to previous version (6.10.3) and it problem is gone. No other changes where made other than downgrading and rebooting. So its definitely a 6.11.1 bug.

 

Log showing no reset bug after vm shutdown on 6.10.3:

VM.log

 

Original post:

 

Hello, im on 6.11.1 and i think i have the GPU reset problem even though IIRC its not supposed to happen on a 6600xt (XFX 6600xt 210).

 

I think i tried most (if not all) solutions i found on this forum but nothing solved the problem. 

 

AMD vendor plugin didnt solve the problem.


Motherboard bios is up to date, resizable bar is off and correlated settings are all correct, im using the correct bios from techpowerup.

 

Should i be looking into downgrading unraid? Or is a solution to this bug being actively worked on? 

 

I really need to get this to work ASAP, any help is appreciated.

 

Heres the vm config:

VM.xml

 

Heres the error log:

Error.log

 

tower-diagnostics-20221026-2048.zip

Edited by ich777
created files to not bloat the thread
Link to comment
6 hours ago, marceloliv3 said:

Or is a solution to this bug being actively worked on?

I don't see the AMD Vendor Reset plugin installed in your Diagnostics...

 

6 hours ago, marceloliv3 said:

any help is appreciated.

Please give me the output from:

lspci -nn | grep -E "VGA compatible controller|Display controller" | grep -E "AMD|ATI|Advanced Micro Devices" | awk '{print $1}'

if there is only one output (which should be) please give me also the output from:

cat $(find /sys/bus/pci/devices/* -name "*$(lspci -nn | grep -E "VGA compatible controller|Display controller" | grep -E "AMD|ATI|Advanced Micro Devices" | awk '{print $1}')")/reset_method

 

What you should definitely do on your system is blacklist the amdgpu module or binding your GPU to VFIO (but don't do both).

If you want to blacklist it please do this:

mkdir -p /boot/config/modprobe.d
echo "blacklist amdgpu" > /boot/config/modprobe.d/amdgpu.conf

 

 

Even if it worked on 6.10.3 I don't recommend doing it like you do it because you basically stealing the GPU from Unraid because you only have one GPU in your system and this could/will always lead to issues as you can clearly see in the VM subforums.

  • Like 1
Link to comment

Thank you @ich777 for the quick response!

 

Quote

I don't see the AMD Vendor Reset plugin installed in your Diagnostics...

Im sorry, i did a lot of testing with the plugin installed/removed, but i had it removed during my last attempt right before i collected the diagnostics. Please let me know if it may provide useful information, in that case I can reinstall it a grab the diagnostics/ logs again but i think its just giving me the exact same errors with the plugin on or off.

 

Quote

Please give me the output from:

 

root@Tower:~# lspci -nn | grep -E "VGA compatible controller|Display controller" | grep -E "AMD|ATI|Advanced Micro Devices" | awk '{print $1}'
28:00.0
root@Tower:~# cat $(find /sys/bus/pci/devices/* -name "*$(lspci -nn | grep -E "VGA compatible controller|Display controller" | grep -E "AMD|ATI|Advanced Micro Devices" | awk '{print $1}')")/reset_method
bus

 

Quote

If you want to blacklist it please do this:

mkdir -p /boot/config/modprobe.d echo "blacklist amdgpu" > /boot/config/modprobe.d/amdgpu.conf

 

I see! I have the VM running right now doing some work, but as soon as i get a chance ill give it a try a report back.

 

Thank you!

Edited by marceloliv3
typo
Link to comment
2 minutes ago, marceloliv3 said:
root@Tower:~# cat $(find /sys/bus/pci/devices/* -name "*$(lspci -nn | grep -E "VGA compatible controller|Display controller" | grep -E "AMD|ATI|Advanced Micro Devices" | awk '{print $1}')")/reset_method
bus

Please install the AMD Vendor Reset plugin, reboot and send me the output from this command again.

Link to comment
17 hours ago, ich777 said:

Please install the AMD Vendor Reset plugin, reboot and send me the output from this command again.

 

Just did, and ran the command again:

root@Tower:~# cat $(find /sys/bus/pci/devices/* -name "*$(lspci -nn | grep -E "VGA compatible controller|Display controller" | grep -E "AMD|ATI|Advanced Micro Devices" | awk '{print $1}')")/reset_method
bus

 

Seems like the same output as before, I wonder if im doing something wrong with the plugin?

 

image.thumb.png.8b50636e7d7e85f4af1363ee1b7f3188.png

 

Im still at 6.10.3 and even though its still showing errors related to the GPU when i check the logs, it seems to be working fine (its not locking up unraid when i shutdown vm and i can start the vm again just fine w/o needing to do an unclean shutdown.

 

I attached the new diagnostics.

 

Thank you for the help so far!

 

Log showing the vm shutdown:

 

Error.log

 

tower-diagnostics-20221027-1936.zip

Edited by ich777
put error output into file to not bloat the thread
Link to comment
7 hours ago, marceloliv3 said:

I attached the new diagnostics.

Yes, but where is the above mentioned modprobe.d folder, have you yet done that?

 

Also, why do you boot into GUI mode? This is even worse than the normal mode for what you are trying to do, don't forget you are stealing the primary GPU from Unraid and as said from above, this will always lead to issues.

Please change your boot mode to non GUI mode, create the modprobe.d folder and the amdgpu file from above now and reboot!

 

7 hours ago, marceloliv3 said:

Seems like the same output as before, I wonder if im doing something wrong with the plugin?

Have you read the comments from above, with your setup is more wrong than one thing... It is always recommended to have one GPU for the Host (Unraid) and one for the VM, what you are doing will most likely always lead to issues.

 

BTW now that I saw it, your graphics card the 6600/6600XT is not affected by the reset bug and the AMD Vendor Reset plugin will not do much for you.

 

Your issue would be better suited in the VM subforums and I think it is more likely that you are having this issue because of the before mentioned things.

  • Like 1
Link to comment

Hello! @Frank Crawford @ich777 I'm trying to get my fan speed control enabled and have gone down the path of thinking I need the it87 drivers, however after installing the it87 plugin, rebooting with the default Syslinux configuration to include it87.ignore_resource_conflict=1 I'm not having any luck.

 

My motherboard is 

ASUS TUF GAMING B560M-PLUS (Intel)

https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-b560m-plus/

 

Output from sensors-detect has me thinking I was requiring the it87 driver as my output is showing the below unknown ite chip id 0x0101.

 

Linux 5.19.9-Unraid.
root@UNRAID1:~# sensors-detect
# sensors-detect version 3.6.0
# Board: ASUSTeK COMPUTER INC. TUF GAMING B560M-PLUS
# Kernel: 5.19.9-Unraid x86_64
# Processor: 11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz (6/167/1)
This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.
Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no): y
Silicon Integrated Systems SIS5595...                       No
VIA VT82C686 Integrated Sensors...                          No
VIA VT8231 Integrated Sensors...                            No
AMD K8 thermal sensors...                                   No
AMD Family 10h thermal sensors...                           No
AMD Family 11h thermal sensors...                           No
AMD Family 12h and 14h thermal sensors...                   No
AMD Family 15h thermal sensors...                           No
AMD Family 16h thermal sensors...                           No
AMD Family 17h thermal sensors...                           No
AMD Family 15h power sensors...                             No
AMD Family 16h power sensors...                             No
Hygon Family 18h thermal sensors...                         No
Intel digital thermal sensor...                             Success!
    (driver `coretemp')
Intel AMB FB-DIMM thermal sensor...                         No
Intel 5500/5520/X58 thermal sensor...                       No
VIA C7 thermal sensor...                                    No
VIA Nano thermal sensor...                                  No
Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no): y
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'...               No
Trying family `SMSC'...                                     No
Trying family `VIA/Winbond/Nuvoton/Fintek'...               Yes
Found `Nuvoton NCT6798D Super IO Sensors'                   Success!
    (address 0x290, driver `nct6775')
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'...               Yes
Found unknown chip with ID 0x0101

 

Edited by Presjar
Link to comment

My system temp work perfect on 6.10.3 version.

its thanks to ich777!!!

 

but when I upgrade from 6.10.3 to 6.11.1, 

using nerd tools that including perl driver.

my boot option still using 

"it87.ignore_resource_conflict=1 it87.force_id=0x8689"

 

but it no show anything motherboard temp from my ite-8689 chip.

i try using "it87.ignore_resource_conflict=1" to boot, still no work.

 

is anybody same with me?

Link to comment
3 hours ago, Presjar said:

It could have to do with 6.11 no longer supporting the nerdtool plugins.

 

Pearl is installed by default now though. 

 

Maybe do a sanity check on your config.

 

I know nerdpack is no longer supported. but nerd tools is new support for 6.11 series is that wrong ?

 

is any plugin good for check my config ? any suggest?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.