Jump to content

feraay

Members
  • Posts

    51
  • Joined

  • Last visited

Report Comments posted by feraay

  1. In my Optionen this thread is not really for alder lake users.

     

    we have at least 3 different type of problems here

     

    some older cpus where the system hang occurs every few days

     

     a problem with Plex itself and intel stuff

     

    and alder lake which is for sure not official supported with the actual used 5.15 kernel

     

    I guess would be a lot easier if we stop mixing problems with each other only because the outcome is the same 

    • Upvote 1
  2. 10 hours ago, omygoodness said:

    @feraay Yep. If you need any help just let me know. 


    just wondering so you are the only person on the planet alder lake igpu works for Plex transcoding with an 5.15 kernel.

     

    may you can explain your steps here:

     or here:

     

    https://forums.unraid.net/bug-reports/prereleases/69x-610x-intel-i915-module-causing-system-hangs-with-no-report-in-syslog-r1674/page/4/?tab=comments#comment-18072

     

     

    how did you managed to get alder lake igpu driver support with Kernel 5.15? 
     

    thanks. 

     

  3. 2 hours ago, limetech said:

    Does require 5.16+ Linux kernel?


    yes seems so. 

     

    2 hours ago, flyize said:

    While that may help, I'm using Plex and it seems this may be the actual issue.

     

    https://forums.plex.tv/t/plex-media-server-on-ubuntu-21-10-with-intel-12th-gen-alder-lake/768123/12


    for my understanding the problem is only related to hdr tone mapping and not transcoding in general.

     

    so we need to wait for a 5.16+ Kernel. Hopefully that will work. Somewhere I read 5.17 also includes Fixes for igpu 9. Generation and above. But maybe 5.16 will fix it.

     

    https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.17-More-Intel-TTM-Prep

     

    Highlights from this week's pull include: 

    - A fix for GPU hangs caused by certain media and OpenGL workloads that were able to hang Skylake/Gen9 hardware and newer. 

     

    Idk if it’s relevant for transcoding.

     

    @limetech 5.16+ need to be LTS before we can give it a try as RC Version?

    • Like 1
  4. 18 hours ago, flyize said:

    Damn. Oh well, I guess I could try to remove GPU Top and Statistics. But I don't have much hope right now for Alder Lake. :(

    it doesn't matter.

     

    Coffe Lake is 8. Gen 2017.  Just because the result is the same for some people ...... random crashes. Doesn't mean the reason is the same. 

     

    You can see in some Logs posted here that the Alder Lake IGPU hangs and that's clearly the cause for crashes with 12 Gen CPUs.

     

    Of cause there are 1000 other reasons for getting the same result (a system crash) but there is no connection.

    Hope you get my point. 

     

    i have tried anything that was suggested here. Nothing is working for Alder Lake we need to wait for 5.17 Kernel. May we have Luck with 5.16 but I don't think so. See my last post in Alder Lake Thread. 

     

     

    You can see the differences  in the uptime. With my 12900K and HW Transcoding in Plex I will bring my server to crash in Minutes. The guys here with other hardware a talking about days till it happens. So clearly not the same reason just the same result. 

  5. are we sure that 5.16 Kernel will solve our problems? 

     

    https://tomthegreat.com/blog/setting-up-ubuntu-20-04-lts-for-plex-with-intel-gen-12-cpu/amp/

     

    He is using Kernel 5.15 on Barematel Ubuntu installation with plex and it sounds that he is not running into this transcoder problem.

     

    So how solid is it that 5.16 will solve the Problem with GPU Hangs on newer Intel IGpus? 

  6. just a small update 

     

    i uninstalled Intel GPU TOP

    created just a empty i915.conf in /boot/config/modprobe.d

    and set i915.force_probe=4680 i915.enable_guc=2 in syslinuxconfig 

    Two transcoding are running since 15 minutes and no GPU HANG in the logs till now.

    With the intel gpu top installed the error appears just in time. 

    May its just luck I am not sure.

     

    ok it happened 

    Mar 15 16:11:42 Mycroft kernel: i915 0000:00:02.0: [drm] Resetting vcs1 for preemption time out
    Mar 15 16:11:42 Mycroft kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28fffffd, in Plex Transcoder [17657]
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28fffffd, in Plex Transcoder [17657]
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] Resetting vcs1 for stopped heartbeat on vcs1
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on vcs1
    Mar 15 16:11:53 Mycroft kernel: [drm:__uc_sanitize [i915]] *ERROR* Failed to reset GuC, ret = -110
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* Failed to reset chip
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_reset+0x276/0x29b [i915]
    Mar 15 16:11:53 Mycroft kernel: [drm:__uc_sanitize [i915]] *ERROR* Failed to reset GuC, ret = -110
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] Plex Transcoder[17657] context reset due to GPU hang
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms!
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
    Mar 15 16:11:53 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms!
    Mar 15 16:11:58 Mycroft kernel: Fence expiration time out i915-0000:00:02.0:Plex Transcoder[17657]:7cfe!

     

    one transcode died but the server is still responsive and didn't crash and the second transcode is still running.

     

    second transcode also crashed and Plex docker crashed but server is still responsive. 

     

    The WebGui was still accessable but server did not respond anymore.

    I will go with cpu transcoding and test again with Kernel 5.16 

     

    • Like 1
  7. Had my first crash with 6.10 RC3.

     

    So in advice of @Ich777 I do the following. 

    Plugged a HDMI Dummy to Onbard HDMI.

    Removed the chmod -R 777 /dev/dri from go file.

    Installed Intel GPU TOP

    created the /boot/config/modprobe.d/i915.conf File

     

    ok so after 3 crashes in a row and a damaged Plex config I can also see this in the logs.

    Mar 11 15:16:13 Mycroft kernel: [drm:__uc_sanitize [i915]] *ERROR* Failed to reset GuC, ret = -110 
    Mar 11 15:16:13 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* Failed to reset chip 
    Mar 11 15:16:13 Mycroft kernel: i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_reset+0x276/0x29b [i915] 
    Mar 11 15:16:13 Mycroft kernel: [drm:__uc_sanitize [i915]] *ERROR* Failed to reset GuC, ret = -110

     

    so added i915.enable_guc=0 we will see 

     

    ok i915.enable_guc=0 resulted in a crash also. will change it to 2 and test again. 

    I was not able to geht a log from guc=0 the crash was faster ^^

     

     

    with i915.enable_guc=2:

     

    ar 11 16:08:33 Mycroft kernel: i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out
    Mar 11 16:08:33 Mycroft kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28fffffd, in Plex Transcoder [18057]
    Mar 11 16:08:44 Mycroft kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28fffffd, in Plex Transcoder [18057]
    Mar 11 16:08:44 Mycroft kernel: i915 0000:00:02.0: [drm] Resetting vcs0 for stopped heartbeat on vcs0
    Mar 11 16:08:44 Mycroft kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on vcs0
    Mar 11 16:08:45 Mycroft kernel: [drm:__uc_sanitize [i915]] *ERROR* Failed to reset GuC, ret = -110
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* Failed to reset chip
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_reset+0x276/0x29b [i915]
    Mar 11 16:08:45 Mycroft kernel: [drm:__uc_sanitize [i915]] *ERROR* Failed to reset GuC, ret = -110
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] Plex Transcoder[18057] context reset due to GPU hang
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
    Mar 11 16:08:45 Mycroft kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms!

     

    what about 3?

    GuC submission and power management is enabled by setting the kernel module parameter: i915.enable_guc=1

    HuC authentication only is enabled by setting the kernel module parameter: i915.enable_guc=2

    Combine for both features together: i915.enable_guc=3

     

     

    • Like 1
  8. I am on Alder Lake so I just use i915.force_probe=4680. Deleted i915 File under modprobe.d. 

    Works 👍

     

    My VM with Nvidia passthrough throw a error internal error: PCI host devices must use 'pci' or 'unassigned' address type on start.

    I just recreated the VM it boots without a error.

     

    So for now looks good.

  9. do you use a vbios file?

    Two weeks I ago I was also struggling with Nvidia passthrough. Changed Unraid Boot from Uefi to legacy setup Vms completely new etc.

    Got always stuck in Tiano Bios logo. Seems like the GPU Output got frozen on the bootup of the vm. 

    SO startet from scratch with Unraid Boot in Legacy Onboard Gpu enabled on MSI Boards its called IGPU Multimonitor I guess. Installed everything with vnc on first and gpu second and a vbios File. Plugged the Monitor into HDMI because DP just shows up when windows is booted. 

  10. same for me. 

    6.10 RC2

    Intel Core i9 12900k. 

    Intel GPU Top installed and IGPU passthrough to Plex Container. 

    Sometimes the whole System is unresponsive via http and ssh. And sometimes the Transcoding Handler just don't stop and using 100 percent CPU on 12 Threads the webgui is still working then but thats all. 

    When I do a fresh boot and test transcoding nothing happens its just running as expected. 

    After a day or two the Server becomes completely unresponsive.

    I have /boot/config/modprobe.d/i915.conf with content blacklist i915 and Intel GPU Top Installed.

    In go file I have just 

    chmod -R 777 /dev/dri 

    Seems I am not using i915.force_probe=4680 or just to stupid to find it.

    So i sm not really sure if my settings are correct to be honest. 

     

×
×
  • Create New...