AMD GPU Reset Bug?


131 posts in this topic Last Reply

Recommended Posts

On 1/25/2021 at 6:58 PM, snolly said:

Hello all,

 

owner of a 5700 XT here. I need to clarify a few things.

 

1. I have read that if a VM is shut down gracefully (clean shutdown) then GPU gets reset as it should. That is not the case for me. Even with a clean shut down I get the AMD reset error if I try to Start the VM again.

 

This has nothing to do with clean or forced shutdowns.. old amd cards (i.e. polaris all the way up to navi, except big navi) don't support flr (function level reset) as specified in pcie specifications. Without any form of patch you will not get it to work properly.

 

On 1/25/2021 at 7:03 PM, ich777 said:

Exactly, I had to remove the support pre beta or RC since there where some pretty nasty posts so I decided to support only the current version and since RC2 is really stable and has only visual bugs in it for some users (temps not displayed after a reboot).

 

I can recommend you RC2 since as I said above @giganode & @derpuma run RC2 with the patch and everything works just fine. ;)

 

Totally agree with that :) 

Edited by giganode
Link to post
  • Replies 130
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Popular Posts

Thanks to ich777 and his Kernel Helper container I was able to build an Unraid kernel with the navi reset patch super easily! I have an RX 5600XT and everything seems to be working. The container also

You can now switch to the new 'feature/audio-reset' of the vendor-reset. Just use @ich777's docker to compile your custom build as you need it.   I can boot between Windows 10 20H2, Ubuntu 2

The vendor reset does not work with Navi cards right now. The sound card gets a Code 10 error after 2nd Boot of the vm. The quoted issue also shows that Vega cards don't work atm, too.  

Posted Images

On 1/25/2021 at 8:03 PM, ich777 said:

Exactly, I had to remove the support pre beta or RC since there where some pretty nasty posts so I decided to support only the current version and since RC2 is really stable and has only visual bugs in it for some users (temps not displayed after a reboot).

 

I can recommend you RC2 since as I said above @giganode & @derpuma run RC2 with the patch and everything works just fine. ;)

 

14 hours ago, giganode said:

 

This has nothing to do with clean or forced shutdowns.. old amd cards (i.e. polaris all the way up to navi, except big navi) don't support flr (function level reset) as specified in pcie specifications. Without any form of patch you will not get it to work properly.

 

 

Totally agree with that :) 

 

OK then RC2 here I come this weekend!

 

Question about patching. What happens in future Unraid updates? Do I install the official update and the repatch?

Link to post
29 minutes ago, snolly said:

Question about patching. What happens in future Unraid updates? Do I install the official update and the repatch?

Exactly this is the plan, it should also work if you are on the old RC but I don't recommend it.

EDIT:

  1. Upgrade to the new stock version of Unraid
  2. Redownload the Unraid-Kernel-Helper from the CA App
  3. Move the generated bz* files to your USB Boot device
  4. Reboot

 

The patch is not mature enough to integrate it in Unraid itself...

 

I do my best to update the Unraid-Kernel-Helper as quick as possible so that it works with any future versions, but it should work OOB also with newer Unraid versions if nothing trivial changes.

Link to post
1 minute ago, snolly said:

Is there hope that it will ever be?

You have to keep in mind that this patch has to be completely finished first before it can even make it into Unraid.

Also keep in mind that this will increase the Kernel size significantly (even if it looks not much on the naked eye).

 

Another thing is that this patch could lead to other problems for example if you build the patch and install the Nvidia Plugin from the CA App afterwards that the server would crash on boot -> but if you build the patch and the Nvidia drivers with the Unraid-Kernel-Helper so that it will be integrated in the images it will boot up just fine.

 

This is a little bit more complicated than it seems on the first sight...

Link to post
19 minutes ago, ich777 said:

You have to keep in mind that this patch has to be completely finished first before it can even make it into Unraid.

Also keep in mind that this will increase the Kernel size significantly (even if it looks not much on the naked eye).

 

Another thing is that this patch could lead to other problems for example if you build the patch and install the Nvidia Plugin from the CA App afterwards that the server would crash on boot -> but if you build the patch and the Nvidia drivers with the Unraid-Kernel-Helper so that it will be integrated in the images it will boot up just fine.

 

This is a little bit more complicated than it seems on the first sight...

 

Gotcha, thanks for the detailed response. If one runs a rig with AMD GPU only i reckon it's safe to use?

Also is going back and forth between pathed and official kernel possible? I guess it's a matter of copying the patched/unpatched *bz files back and forth into the usb stick right?

Edited by snolly
Link to post
4 minutes ago, snolly said:

If one runs a rig with AMD GPU only i reckon it's safe to use?

You can run also a Nvidia and AMD GPU with the patch but be sure to also select the Nvidia Driver in the Unraid-Kernel-Helper, but yes if you are only running a AMD GPU with this patch it's also safe. :)

 

5 minutes ago, snolly said:

Also is going back and forth between pathed and official kernel possible? I guess it's a matter of copying the patched/unpatched *bz files back and forth into the usb stick right?

Exactly and a reboot.

But why would you do that? That's really not how you should use this patch or even a prebuilt image that is builded with the Unraid-Kernel-Helper.

 

Btw I also use a prebuilt image from my Unraid-Kernel-Helper but I've integrated other things like Intel iGPU, iSCSI & Mellanox Firmware Tools. I like to have all in one image and this also speeds up the boot process...

Link to post
14 minutes ago, ich777 said:

But why would you do that? That's really not how you should use this patch or even a prebuilt image that is builded with the Unraid-Kernel-Helper.

 

I would not do that, I would just like to know if by booting up with a patched kernel will modify Unraid permanently somehow and I wouldn't be able to go back if needed. Is my insanity for cautiousness that kicked in :) - again thanks for the replies

Link to post
1 minute ago, snolly said:

I would not do that, I would just like to know if by booting up with a patched kernel will modify Unraid permanently somehow and I wouldn't be able to go back if needed.

No, you always can download the default bz* images and replace the custom built ones from the Unraid-Kernel-Helper on your USB Boot device and you will be back to stock Unraid since the images are the main things that makes Unraid, well Unraid... ;)

 

3 minutes ago, snolly said:

Is my insanity for cautiousness that kicked in :)

No problem, just asking. :)

Link to post

Does amd series 6000 works only on big sur?.. If not, Id sell my 5700XT, this amd reset but got me almost to the madness.. I need to reboot or restart my mac vm without having to restart the whole server, I got 10 others vms running at the same time with services that I can not stop, but at the same time, I need to work with java 8 and big sur was not working with it. 

 

I dont get whats the problem with AMD, why they refuses to help solve this issue. 

Edited by mSedek
Link to post
4 minutes ago, mSedek said:

Does amd series 6000 works only on big sur?.. If not, Id sell my 5700XT, this amd reset but got me almost to the madness.. I need to reboot or restart my mac vm without having to restart the whole server, I got 10 others vms running at the same time with services that I can not stop, but at the same time, I need to work with java 8 and big sur was not working with it. 

 

I dont get whats the problem with AMD, why they refuses to help solve this issue. 

First of all, they did not refuse. Instead gnif and amd had a few conversations and with others on reddit etc. as well..

Secondly they fixed the issue on big navi cards... So they definetly heard what the costumers said. 

 

Can't tell you if or when Big Sur will run with 6000 cards.. @derpuma told me that they is already something in big sur that belongs to big navi..

 

Nevertheless my 5700xt just runs fine in Big Sur...

Link to post
21 hours ago, giganode said:

First of all, they did not refuse. Instead gnif and amd had a few conversations and with others on reddit etc. as well..

Secondly they fixed the issue on big navi cards... So they definetly heard what the costumers said. 

 

Can't tell you if or when Big Sur will run with 6000 cards.. @derpuma told me that they is already something in big sur that belongs to big navi..

 

Nevertheless my 5700xt just runs fine in Big Sur...

my 5700xt runs fine in big sur but big sur DOES NOT works with JAVA 8, I am a java developer and no JAVA 8 = no BIG SUR for me... so I need to be able to reboot my mac VM seamless.. so I solve nothing by selling my 5700XT to get a 6700/6800 and not being able to run JAVA 8 in BIG SUR... my only option for now is to pray for the reset bug to go away.. 

 

And no AMD did not help by "fixing" the bug in 6000 series.. they should provide a real solution for ALL the affected cards series(400/500/5000)

Edited by mSedek
Link to post
6 minutes ago, mSedek said:

so I need to be able to reboot my mac VM seamless..

Then follow the guide for the reset patch for your 5700XT and install it on your server and you should be able to reboot your VM's seamlessly...

 

7 minutes ago, mSedek said:

And no AMD did not help by "fixing" the bug in 6000 series..

The bug is fixed in the 6000 series cards, with this cards you can reboot VM's seamlessly.

 

8 minutes ago, mSedek said:

they should provide a real solution for ALL the affected cards

Yes, they colaborate with gnif to make his patch possible and this is the current solution to the reset bug.

Link to post
2 hours ago, snolly said:

 

what are the big navi cards and what's the fix?

 

big navi is the rx 6000 series. The fix is simply that big navi follows the pcie specifications for flr (function level reset).

 

1 hour ago, mSedek said:

my 5700xt runs fine in big sur but big sur DOES NOT works with JAVA 8, I am a java developer and no JAVA 8 = no BIG SUR for me... so I need to be able to reboot my mac VM seamless.. so I solve nothing by selling my 5700XT to get a 6700/6800 and not being able to run JAVA 8 in BIG SUR... my only option for now is to pray for the reset bug to go away.. 

 

And no AMD did not help by "fixing" the bug in 6000 series.. they should provide a real solution for ALL the affected cards series(400/500/5000)

 

language barrier??? in order to boot vms with a 5700xt seamlessly include the vendor-reset to the unraid build.

If you need java 8 and big sur does not work with that choose another os for your work.

 

AND YES... they helped! They talked a lot with customers and developers.

 

Polaris cards work with the vendor-reset, aswell as the navi cards.

Link to post
4 minutes ago, giganode said:

 

big navi is the rx 6000 series. The fix is simply that big navi follows the pcie specifications for flr (function level reset).

 

 

language barrier??? in order to boot vms with a 5700xt seamlessly include the vendor-reset to the unraid build.

If you need java 8 and big sur does not work with that choose another os for your work.

 

AND YES... they helped! They talked a lot with customers and developers.

 

Polaris cards work with the vendor-reset, aswell as the navi cards.

the "other OS" I use is Catalina.. but 6000 series does not work in catalina

Link to post
2 minutes ago, giganode said:

 

That has nothing to do with amd!!!!!!!

man it has everything to do with AMD, why would the 6000 series work OTB no reset bug and 5000 series plagged with it??? whos the responsible for a patch??? a  firmware level or driver level update?? none of us is an AMD hardware engineer or AMD drivers developer

Edited by mSedek
Link to post
4 minutes ago, mSedek said:

man it has everything to do with AMD, why would the 6000 series work OTB no reset bug and 5000 series plagged with it??? whos the responsible for a patch??? a  firmware level or driver level update?? none of us is an AMD hardware engineer or AMD drivers developer

 

I don't wanna argue with you anymore. You simply don't want to understand it.

 

Im out...

Link to post
7 minutes ago, mSedek said:

man it has everything to do with AMD

Basically yes and no.

 

May I ask if you have already sold your 5700XT? If you don't sold it yet, simply follow the instructions and install the patch and you can shutdown/restart/force stop your VM's and the card will just reset fine. ;)

 

8 minutes ago, mSedek said:

why would the 6000 series work OTB no reset bug

That's because the 6000 series is not affected by the reset bug, if a 6000 card is not working on OSX this is a whole different story... I think the card that you are using (5700XT) is made for a PC and not for OSX or does it have a sticker on it that it is compatible with OSX? Only because it works on OSX it doesn't mean that it works completely flawlessly and a complete different story is when you are Virtualizing OSX (first because the Apple EULA says that you are allowed to virtualize OSX but only if the host is also on OSX...).

 

12 minutes ago, mSedek said:

whos the responsible for a patch???

That's the question what you are using the card for, consumer cards are not strictly speaking designed for use in a VM, they are designed for desktop use (you can use it in a VM but who says that you not run into troubles, that's basically the same as I wrote above with OSX and the 5700XT).

If the manufacturer says we don't make a patch for that because it's not intended for that usecase you are out of luck and then such awesome people like gnif comes around and even AMD collaborates with him and try to solve the issue.

 

There are many other things to consider...

 

17 minutes ago, mSedek said:

none of us is an AMD hardware engineer or AMD drivers developer

Exactly and that's why I think you should write your frustration somewhere on the AMD forums. ;)

Link to post
1 hour ago, ich777 said:

Basically yes and no.

 

May I ask if you have already sold your 5700XT? If you don't sold it yet, simply follow the instructions and install the patch and you can shutdown/restart/force stop your VM's and the card will just reset fine. ;)

 

That's because the 6000 series is not affected by the reset bug, if a 6000 card is not working on OSX this is a whole different story... I think the card that you are using (5700XT) is made for a PC and not for OSX or does it have a sticker on it that it is compatible with OSX? Only because it works on OSX it doesn't mean that it works completely flawlessly and a complete different story is when you are Virtualizing OSX (first because the Apple EULA says that you are allowed to virtualize OSX but only if the host is also on OSX...).

 

That's the question what you are using the card for, consumer cards are not strictly speaking designed for use in a VM, they are designed for desktop use (you can use it in a VM but who says that you not run into troubles, that's basically the same as I wrote above with OSX and the 5700XT).

If the manufacturer says we don't make a patch for that because it's not intended for that usecase you are out of luck and then such awesome people like gnif comes around and even AMD collaborates with him and try to solve the issue.

 

There are many other things to consider...

 

Exactly and that's why I think you should write your frustration somewhere on the AMD forums. ;)

NO, I havent sold my 5700XT yet. my issues are none related to the cards or macOS, my 5700XT works perfect in Catalina and BIG SUR, the problem is that at the moment BIG SUR has no support for JAVA 8 and IF i sell my 5700XT to get a 6800 (which is highly available here in my country) just to have a RESET BUG free experience, then I lose my ability to develop in macOS as 6000 series ONLY works in BIG SUR.. either way im screwed lol..

 

 

gonna try the patch and see whats the thing.. I dual monitor one with the 5700XT and macOS and one with ubuntu and a NVIDIA 2070 SUPER (which works FLAWLESS), hope that patch does not cause any issue with my setup

Link to post
5 minutes ago, mSedek said:

hope that patch does not cause any issue with my setup

If it does you can always revert back, but it would not interfer with you Nvidia card, why should it... :D

Link to post
43 minutes ago, mSedek said:

NO, I havent sold my 5700XT yet. my issues are none related to the cards or macOS, my 5700XT works perfect in Catalina and BIG SUR, the problem is that at the moment BIG SUR has no support for JAVA 8 and IF i sell my 5700XT to get a 6800 (which is highly available here in my country) just to have a RESET BUG free experience, then I lose my ability to develop in macOS as 6000 series ONLY works in BIG SUR.. either way im screwed lol..

 

 

gonna try the patch and see whats the thing.. I dual monitor one with the 5700XT and macOS and one with ubuntu and a NVIDIA 2070 SUPER (which works FLAWLESS), hope that patch does not cause any issue with my setup

 

In case of Java8... did you try this?:

 

https://code2care.org/howto/install-java-on-macos

Edited by giganode
Link to post
2 minutes ago, giganode said:

 

before you start complaining later that your system does not boot or anything like that.. If you have the nvidia plugin installed, remove it. Compile with vendor-reset and nvidia support combined.

 

In case of Java8... did you try this?:

 

https://code2care.org/howto/install-java-on-macos

yes Java 8 does installs but theres something about the enviroment and no framework recognizes it

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.