AMD GPU Reset Bug?


Recommended Posts

5 minutes ago, giganode said:

Up until now nVidia never had a reset bug, afaik. But the RTX 3000 series does not have the Code 43 as f.e. the 2000 series or the gtx series before.

As reported by gnif and Level1Techs the new RX 6000 series does not have a reset bug anymore. This is very nice!!

 

Let's all sell our old stuff to bare metal user as soon as we have enough availability of the new generation and let's never have gpu passthrough problems anymore...........

If this is the case indeed. Im selling the 5700XT once I can get my hands on a 6800XT at msrp no stress.. 

Link to comment
1 minute ago, giganode said:

What happens? We will try it ✌️😏

If everything works there could be a possibility to implement it, I think. But only if it does not create new problems for other users.

LOL, that's kinda what I was expecting, thanks for the info. If it is released I'm definitely willing to help test it out.

  • Like 2
Link to comment
On 12/1/2020 at 8:56 AM, ich777 said:

If someone is interested I got a Unraid 6.9.0 beta35 build with a working reset patch integrated for Navi10 only (Radeon Pro 5700XT, Radeon RX5700XT, Radeon Pro W5700X, Radeon Pro W5700, RX5700, Radeon Pro RX5700, RX5600XT, RX5600).

 

The build includes this patch from here: Click (slightly modyfied to work with Kernel 5.8.18)

 

On 12/1/2020 at 9:03 AM, giganode said:

Works good so far. Thank you very much!

 

I can also confirm ich777's build is working with the reset patch with a RX 5600XT. Thanks so much!

Edited by ndetar
  • Like 2
Link to comment

I may have good news for some folks here.

 

I actually tested to passthrough my 5700XT to my macOS vm and I can tell you it works.

 

I use macOS Catalina 10.15.7 with clover bootloader.

 

Bildschirmfoto 2020-12-06 um 18.47.24.png

Bildschirmfoto 2020-12-06 um 18.51.57.png

 

EDIT: It works half way atm. Restart or a start after a shutdown does not work correctly. I need to start a different vm with an other os started and shut down before the gpu works correctly again in macOS.

 

I need to investigate that further.

I will keep you updated.

Edited by giganode
  • Like 1
Link to comment

Hi there,

 

I'm also in that boat.

I'm curious if there is a solution for the RX5500 ? I know that ICH777 provides a separate kernel-build procedure, but I don't see that it includes the RX5500.

I also tried the Reset procedure "Spaceinvader One" showed in his video, But this left the machine in a undefined state (maybe you have to shutdown ALL machines before putting the Power Button). But this wouldn't generally be not a good solution.

In additionI  tried "Thor2002ro" kernel on the beta35 but they didn'T fit (something with related to md version when I rmember correct).

 

Long story short .. does anybody know if there is an existing solution for the current "RC1" available out there ?

Link to comment
On 12/13/2020 at 10:50 AM, batesman73 said:

Hi there,

 

I'm also in that boat.

I'm curious if there is a solution for the RX5500 ? I know that ICH777 provides a separate kernel-build procedure, but I don't see that it includes the RX5500.

I also tried the Reset procedure "Spaceinvader One" showed in his video, But this left the machine in a undefined state (maybe you have to shutdown ALL machines before putting the Power Button). But this wouldn't generally be not a good solution.

In additionI  tried "Thor2002ro" kernel on the beta35 but they didn'T fit (something with related to md version when I rmember correct).

 

Long story short .. does anybody know if there is an existing solution for the current "RC1" available out there ?

Not an official one. @ich777's custom build may help. I can not guarantee this, but if you are up to test it I think we can find out. Seems that we do not have a lot of navi users around here..

Link to comment

Hi,

 

I already tired that. But I'm not sure I did everything right because the description in the according threat isn't clear to me. At least I don't can identify in his kenrel-helper plugin if I'm using a self compiled kernel.

What I did  is to set the parameters, start the docker and a t the end copied the files from the Beta Output directory to "/boot".

At least I the card is still unusable after shutting down a VM and trying to start another.

Link to comment
 
 
Happy Xmas @jonp & @limetech     Is it soon yet? (sorry, could not resist)
No not yet. Apparently a lot of things broke with that custom patch and it requires a lot of kernel flags that would significantly increase the size of the OS. We might look into this again after the new year, but don't get your hopes up just yet. If the work required was simple, didn't have a huge impact to the OS, and was fully functional, it'd have a better chance.

Sent from my Pixel 3 XL using Tapatalk

Link to comment

Not sure if it is news anymore, but happened to hear from wendell@level1techs that AMD has fixed the PCI reset bugs in the Big Navi cards.. Radeon 6800 & 6800XT & they seem to work perfectly out of the box in Ubuntu 20.04

 

Some enthusiastic quotes:😂

"PCI Express reset, 100% "

"My day is perfect! & my excitement is immeasurable!"

"For anybody that is super into VFIO the 6800 or the 6800xt, that's your card"

"AMD has absolutely completely nailed it !just for the PCIE reset issue & other quality of life improvements"

"Best GPU on linux that I have used since the Matrox Millennium 2"

'If having a relatively open GPU, is important to you for the Linux platform, it doesn't get any better than this!"

"I have done intentional crashes, I've injected errors into the platform basically, and the  card has been able to recover every single time"

"It is rock solid!"

For those who are stuck with the earlier versions of the cards, gnif@level1techs, has something decent cooking. Like Jesus, gnif, cried to AMD for all of us VFIO-kind & someone from AMD has shown him the light :)

 

  • Like 1
Link to comment
11 hours ago, Shinobi said:

Not sure if it is news anymore, but happened to hear from wendell@level1techs that AMD has fixed the PCI reset bugs in the Big Navi cards.. Radeon 6800 & 6800XT & they seem to work perfectly out of the box in Ubuntu 20.04

 

Some enthusiastic quotes:😂

"PCI Express reset, 100% "

"My day is perfect! & my excitement is immeasurable!"

"For anybody that is super into VFIO the 6800 or the 6800xt, that's your card"

"AMD has absolutely completely nailed it !just for the PCIE reset issue & other quality of life improvements"

"Best GPU on linux that I have used since the Matrox Millennium 2"

'If having a relatively open GPU, is important to you for the Linux platform, it doesn't get any better than this!"

"I have done intentional crashes, I've injected errors into the platform basically, and the  card has been able to recover every single time"

"It is rock solid!"

For those who are stuck with the earlier versions of the cards, gnif@level1techs, has something decent cooking. Like Jesus, gnif, cried to AMD for all of us VFIO-kind & someone from AMD has shown him the light :)

 

mm what about 5700XT? 

Link to comment
On 12/1/2020 at 5:56 PM, ich777 said:

If someone is interested I got a Unraid 6.9.0 beta35 build with a working reset patch integrated for Navi10 only (Radeon Pro 5700XT, Radeon RX5700XT, Radeon Pro W5700X, Radeon Pro W5700, RX5700, Radeon Pro RX5700, RX5600XT, RX5600).

 

The build includes this patch from here: Click (slightly modyfied to work with Kernel 5.8.18)

@ich777 I would love to try out this patch with my Red Devil 5700XT. 

Link to comment
On 12/26/2020 at 11:44 AM, Shinobi said:

Check out gnif's out of tree kernel module:

https://github.com/gnif/vendor-reset

Wondering if Unraid would absorb it ...

This patch isn't finished yet and has problems on Navi cards so that the HDMI audio isn't properly reseted so it won't be integrated for now into Unraid.

 

2 minutes ago, Pearsondk said:

@ich777 I would love to try out this patch with my Red Devil 5700XT. 

Yes write me a PM.

  • Like 1
Link to comment

Thanks to ich777 and his Kernel Helper container I was able to build an Unraid kernel with the navi reset patch super easily! I have an RX 5600XT and everything seems to be working. The container also has an option for the gnif/vendor-reset module if you want to try that.

 

  • Like 3
Link to comment
  • 2 weeks later...
5 minutes ago, methanoid said:

So, pretty much "sell your AMD card if you want to do GPU passthru without a 6800xx card" ? 

 

I like my RX5700, its MacOS compatible, but no good for passthru.... 🥺

This should work now please see this post here from @giganode on the Unraid-Kernel-Helper thread (btw you don't have to add the Variable manually anymore it's now in the templat itself just set the gnif/vendor-reset to 'true' and also click on 'Show more...' and at gnif/vendor-reset Branch set it to 'feature/audio_reset'):

 

  • Thanks 1
Link to comment
7 hours ago, ich777 said:

This should work now please see this post here from @giganode on the Unraid-Kernel-Helper thread (btw you don't have to add the Variable manually anymore it's now in the templat itself just set the gnif/vendor-reset to 'true' and also click on 'Show more...' and at gnif/vendor-reset Branch set it to 'feature/audio_reset'):

 

Will unraid EVER seamless make the 5000 series work like nvidia does? not adding any flags or wierd stuff but just plug and play our navi? I need at least Catalina as big sur is BIG $h!t with older versions of JAVA so I can not develop there and I already have my 5700XT which i like ...

 

maybe RC3? im running RC2 and still got the bug and I can not go all bleading edge on unraid (those rare branches and stuff ) as I do my job with my VMS

Edited by mSedek
Link to comment
27 minutes ago, mSedek said:

Will unraid EVER seamless make the 5000 series work like nvidia does? not adding any flags or wierd stuff but just plug and play our navi?

I hope you know that this is not a Unraid issue only...

The problem with the patch (gnif/vendor-reset) is that it is not mature enough and has some issues also there are different versions of the patch available on github.

 

27 minutes ago, mSedek said:

maybe RC3? im running RC2 and still got the bug and I can not go all bleading edge on unraid (those rare branches and stuff ) as I do my job with my VMS

Have you tried to build a custom version for RC2 only takes about 15 minutes and a reboot of the server.

Just be sure to turn off all VM's and Dockers when you are building the images.

 

EDIT: It's really simple, just read the description and the post that I've linked.

Link to comment
8 minutes ago, ich777 said:

I hope you know that this is not a Unraid issue only...

The problem with the patch (gnif/vendor-reset) is that it is not mature enough and has some issues also there are different versions of the patch available on github.

 

Have you tried to build a custom version for RC2 only takes about 15 minutes and a reboot of the server.

Just be sure to turn off all VM's and Dockers when you are building the images.

 

EDIT: It's really simple, just read the description and the post that I've linked.

yes I know the fault is not on the unraid side.. will take the time tomorrow and test.

Edited by mSedek
  • Thanks 1
Link to comment
  • 2 weeks later...

Hello all,

 

owner of a 5700 XT here. I need to clarify a few things.

 

1. I have read that if a VM is shut down gracefully (clean shutdown) then GPU gets reset as it should. That is not the case for me. Even with a clean shut down I get the AMD reset error if I try to Start the VM again.

2. I tried the AMD reset script following @SpaceInvaderOne video. After the reset I can start the VM but it will stuck at boot after one or two spinning wheel rotations. The only way to start the VM and have it boot properly is if I do a full server restart.

3. I understand that there is a custom kernel patch from @ich777 but for some page 22 reason it is not available for people on version 6.8.3 (stable).

 

So the only way to start a VM after the GPU has been used is to restart the whole server. Do I understand this correctly? Am I missing something? Thank you for your time.

 

Regards.

Link to comment
2 minutes ago, snolly said:

I understand that there is a custom kernel patch from @ich777 but for some page 22 reason it is not available for people on version 6.8.3 (stable).

Exactly, I had to remove the support pre beta or RC since there where some pretty nasty posts so I decided to support only the current version and since RC2 is really stable and has only visual bugs in it for some users (temps not displayed after a reboot).

 

I can recommend you RC2 since as I said above @giganode & @derpuma run RC2 with the patch and everything works just fine. ;)

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.