[KERNEL]unraid kernel update 5.10rc4 - zenpower|it87|corefeq|amdgpu|jmb575|dvb|r8125|openrgb|reset AMD GPU|zfs|dax|exfat|ntfs3|nvidia driver


Recommended Posts

Okay, It's working :)

 

Both the system and plex transcoding :)

Error was related to me doing something very stupid as I said. Anyway, during testing i left another pendrive in the pc with unraid. It did contain previous version of your kernel. Funny thing is that pendrive wasn't working anyway (non unique guuid) but the license was fine, so i didn't notice (it was very small pendrive for my defense :D). It seems like it was using both of them :P

Anyway, thank you very much :)
p.s. If you would consider providing paypal or bitcoin address, I would be happy to buy you a beer :)

Link to comment

@thor2002ro Don't know if you haven't see my thread, but if you write a script for my Unraid-Kernel-Helper you can easily integrate all your things that you want and create custom images with all the tools installed without the need to edit the 'go' file. ;)

 

You can even integrate new or older drivers, ZFS, iSCSI, a different Kernel,...

 

 

Link to comment

Did anyone notice any problems with GPU passthrough?
I do things in this order: 
1. Serwer starts with plex (config with HW transcoding)

2. If i want to game I start the script to switch
3. Scripts turns plex container off (nvidia-smi is not showing anything)

4. Scripts waits 60s and turns Windows vm on. 

 

The thing is that it only works sometimes. In other cases it doesn't start and locks the web vms UI. From the spaceinvader videos I learned that this behavior is caused by GPU being busy, but all containers using GPU are off and nvidia-smi doesn't show any processes.  If it failes, after the vm run attempt, nvidia-smi is showing: 

 

Unable to determine the device handle for GPU 00000000:2F:00.0: Unknown Error

 

The weirdest thing is, that after such a lock i have to restart the server to get it working and something like this happens: 

1. All other VMs are turing off 

2. Web ui is no longer available 

3. Windows vm turns ON for a moment 

4. Server shuts down

 

I tried to reset gpu between shutting down plex and starting vm though: 

1. fuser -v /dev/nvidia*    <- didn't help
2. nvidia-smi -i 00000000:2F:00.0 -r    <- didn't work as this is primary gpu

 

 

 

Thanks for any help :) 


Edit: It seems that vm randomly doesn't work even if plex (or any other nvidia container) wasn't running at all. The only time it works it's when the VM is set to autostart. Weird thing I noticed is that when VM autostarted and then stopped, the power state of the card is set to P0. During transcoding it shows P2.

After start of server or after transcoding the state is: P8 or P0. If it's P0 VM can be started, if it's P8 - it doesn't work. 

It's just a guess though I am not sure if it's related (maybe it's your power saving mechanism? ) :(

 

Also, i noticed that at boot time, peristance mode is enabled, but only for one card (not the one I have problems with)

Edited by dunioo
Link to comment

sounds weird.... don't know about plex... I use jellyfin and emby.... 

 

I have a GTX970(primary) and a Vega56 

 

I use nvidia for transcoding and can start vm with it no issue... I don't need to shutdown any docker container.... or prepare it in any way.... and after the VM is stopped the driver resumes....

when VM is started the docker container trascodes with CPU.... when the GPU is available again it switches back to gpu on the next transcode

When transcoding usually power state should be P2 .... and P0 when idle in power saving.... 

Also starting and stopping a VM with nvidia usually resets the nvidia persistence mode

 

this is tested extensively since I use it every day nvidia for linux VM and vega for gaming vm

I never once need to reset the nvidia card.... 

 

14 hours ago, dunioo said:

After start of server or after transcoding the state is: P8 or P0. If it's P0 VM can be started, if it's P8 - it doesn't work. 

It's just a guess though I am not sure if it's related (maybe it's your power saving mechanism? ) :(

 

Also, i noticed that at boot time, peristance mode is enabled, but only for one card (not the one I have problems with)

never seen P8 state.... when transcoding....

Quote

The GPU performance state APIs are used to get and set various performance levels on a per-GPU basis. P-States are GPU active/executing performance capability and power consumption states.

P-States range from P0 to P15, with P0 being the highest performance/power state, and P15 being the lowest performance/power state. Each P-State maps to a performance level. Not all P-States are available on a given system. The definition of each P-States are currently as follows:

 

P0/P1 - Maximum 3D performance

P2/P3 - Balanced 3D performance-power

P8 - Basic HD video playback

P10 - DVD playback

P12 - Minimum idle power consumption

seams like its playing video ..... 

the "nvidia-smi -pm 1" command the nvidia install is using  should enable  persistence mode for all cards.... as mentioned until a VM uses it then it resets ....

I will investigate the nvidia-persistence daemon maybe it will better keep the cards persistence mode enabled....

 

you could try to disable autostart at any  docker that uses the card and see if stopping and starting vm with passthrough is fine .... depending on the platform you are using might be IOMMU issues....

 

don't forget to look at the system log and dmesg in command line

 

Link to comment

I have play around with this, and for now, I think the power state is not actually related to the problem. It kind of behaves weirdly though :) 
Turning off persistence mode sets it back to P0, but it doesn't make VM start. 

 

After looking at dmesg, I now think it's related to OpenRGB (which I made working thanks to your modifications). I disabled modprobe and OpenRGB docker). So far it's working, but as this was pretty random problem - I am not 100% sure. 

 

Anyway, I will let you know If anything changes - but for now, sorry for bothering and thanks for help :) 
 

Link to comment

Hi, first of all thank you for this kernel.

 

I was excited to use it because of the new AMD reset included. However, I seem to be having issues with it. I've included the line in /config/go however when booting into my previously working Windows 10 VM it kept rebooting. I've now done a fresh install of the VM and all is well until I try to install the Radeon drivers upon which the VM reboots and now it is back in a boot loop. I am using this on 6.8.3 with an AMD 5700xt and nothing else included regarding any AMD reset fixes.

 

Any ideas?

 

Edit: Looks like this is a known issue with non reference 5700xt's according to this GitHub issue:- https://github.com/gnif/vendor-reset/issues/3

Edited by cobhc
Link to comment
On 11/22/2020 at 8:46 AM, thor2002ro said:

No thanks you are modifying rootfs and complicating things....  my install leaves rootfs intact.... 

You don't have to modify rootfs, it's just a starting point. :)

I don't think it's complicating things more since a user contacted me about the vendor-reset and I implemented that in a test image and you even don't have to do modprobe everything works flawlessly (except for that the vendor-reset doesn't work quite right with a 5700XT and Kernel v5.9+ for now). ;)

 

No problem if you don't use it, was just a sugestion. ;)

 

Quote

I will investigate the nvidia-persistence daemon maybe it will better keep the cards persistence mode enabled....

Not that persistence mode/daemon will in the future be depricated by Nvidia.

Link to comment
On 11/30/2020 at 8:32 AM, ich777 said:

You don't have to modify rootfs, it's just a starting point. :)

I don't think it's complicating things more since a user contacted me about the vendor-reset and I implemented that in a test image and you even don't have to do modprobe everything works flawlessly (except for that the vendor-reset doesn't work quite right with a 5700XT and Kernel v5.9+ for now). ;)

 

No problem if you don't use it, was just a sugestion. ;)

 

Not that persistence mode/daemon will in the future be depricated by Nvidia.

So the 5700XT problem is only on 5.9+? If so, can vendor-reset be implemented on an older kernel version? I only need that fix and not anything else at the moment over the base kernel in Unraid 6.8.3.

Link to comment
  • 2 weeks later...
On 8/28/2020 at 2:19 PM, thor2002ro said:

I'll add latest git zfs in the next version.... probably today release... 5.8.5

 

as for buildscript.... why do you need one there just some easy commands to build kernel

copy current config to .config

then make -j$(nproc)

then make INSTALL_MOD_PATH=../modules modules_install to extract modules to a separate directory

then mksquashfs modules/lib/modules/*/ bzmodules -keep-as-directory -noappend -root-owned -no-xattrs makes the bzmodules

and bzimage is in arch/x86/boot/

 

this is all run on a linux system not on unraid itself....

 

but my advice if you are not familiar with kernel building and configuration to not do it and do some research first, you can get some funky results... your choice :)

 

5.10.0rc4-20201118

I build this way, but The file size of bzimage and bzmodules is different from yours, bzimage 6.17MB, bzmodules 15MB.

Sorry about my English BU.

Link to comment
  • 3 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.