• [6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)


    Tristankin
    • Minor

    Since the 5.x kernel based releases many users have been reporting system hangs every few days once the i915 module is loaded.

    With reports from a few users detailed in the thread below we have worked out that the issue is caused by the i915 module and is a persistent issue with both the 6.9.x release and 6.10 release candidates.


    The system does not need to be actively transcoding for the hang to occur. 6.8.3 does not have this issue and is not hardware related. Unloading the i915 module stops the hangs. Hangs are still present in 6.10.0RC2. I can provide a list of similar reports if required.

    • Like 8
    • Thanks 1
    • Haha 1



    User Feedback

    Recommended Comments



    1 hour ago, feraay said:

    are we sure that 5.16 Kernel will solve our problems? 

     

    https://tomthegreat.com/blog/setting-up-ubuntu-20-04-lts-for-plex-with-intel-gen-12-cpu/amp/

     

    He is using Kernel 5.15 on Barematel Ubuntu installation with plex and it sounds that he is not running into this transcoder problem.

     

    So how solid is it that 5.16 will solve the Problem with GPU Hangs on newer Intel IGpus? 

    Interesting.

     

    I guess can someone confirm that we're installing these?

    sudo apt install ocl-icd-libopencl1 beignet-opencl-icd
    
    sudo apt install intel-media-va-driver-non-free libmfx1

     

    And the Intel Compute Runtime v21.49.21786 so that tone mapping would work?

    Link to comment
    1 hour ago, RogerWilco486 said:

    Just curious, why would we be installing Ubuntu packages on Unraid?

    Well of course not. But there are almost certainly Slack packages for these.

    Link to comment

    @feraay highly doubt it to be honest, given my experience with what I believe is this same issue.

     

    I don't use unraid, but I follow this thread because I have been having all the same symptoms: random hangs with no syslog output - couldn't really find any other relevant info on the web. Disabling the i915 kernel module is the only thing that's made the system stable again.

     

    I've been running 5.16 kernels the entire time, and this has been happening all throughout. I'm gonna be trying again soon with the 5.17 series, though not super hopeful.

    Link to comment

    gosh... been safe in 6.8.3 for quite sometime until keep asking me to upgrade to 6.9.2 and i thought the i915 been solved, so and now i'm BACK to this random freeze topic. any solution? because it keeps freeze during transcoding randomly. 

     

    i'm using ASRock J3455-ITX, 16GB RAM.

     

    so far, 

    i915.enable_guc=2

    or 

    i915.enable_guc=0

    doesn't work.

     

    anyone get it running with others parameters? tq.  

    Link to comment

    I was having the hangs/crashes like others and assumed it was the use of the i915 driver. After reading another comment about the CoreFreq plugin causing issues, I removed that and since then, using the i915 driver and having HW decoding working in Plex, have not had a single crash or hiccup in over a month. Currently my server is up for over 25 days. This may help others suffering with this issue.

    • Like 1
    Link to comment
    5 hours ago, sleepinglion251 said:

    I was having the hangs/crashes like others and assumed it was the use of the i915 driver. After reading another comment about the CoreFreq plugin causing issues, I removed that and since then, using the i915 driver and having HW decoding working in Plex, have not had a single crash or hiccup in over a month. Currently my server is up for over 25 days. This may help others suffering with this issue.

    Are you using Alder Lake?

     

    FWIW, I don't have that plugin installed.

    Link to comment
    1 hour ago, Lee Kim Tatt said:

    CoreFreq? I didn't install that, but using Intel GPU Top? Any relation between them?

    I uninstalled Intel GPU Top and GPU Statistics and my server hangs stopped.  I also uninstalled CoreFreq as a precaution, but, that was later.

    Link to comment

    Won't uninstalling Intel GPU Top plugin roll back the blacklisting of the i915 driver and keep HW transcoding from working?

    Link to comment
    1 hour ago, NightOps said:

    Won't uninstalling Intel GPU Top plugin roll back the blacklisting of the i915 driver and keep HW transcoding from working?

    I am still running 6.9.2 on that server and the iGPU is enabled via the touch method which creates and empty i915.conf file in modprobe.d

     

    If that file did not exist, i915 would remain blacklisted (the default in 6.9.2) and no HW transcoding would take place.

    Link to comment
    11 hours ago, Hoopster said:

    I uninstalled Intel GPU Top and GPU Statistics and my server hangs stopped.  I also uninstalled CoreFreq as a precaution, but, that was later.

    Are you using Alder Lake?

    Link to comment

    I am using Coffee Lake. I was able to reinstall GPU TOP and GPU Statistics plugins after removing the CoreFreq plugin and no issues with those either. 

    Link to comment
    47 minutes ago, Hoopster said:

    Coffee Lake (Xeon E-2288G)

    Damn. Oh well, I guess I could try to remove GPU Top and Statistics. But I don't have much hope right now for Alder Lake. :(

    Link to comment
    18 hours ago, flyize said:

    Damn. Oh well, I guess I could try to remove GPU Top and Statistics. But I don't have much hope right now for Alder Lake. :(

    it doesn't matter.

     

    Coffe Lake is 8. Gen 2017.  Just because the result is the same for some people ...... random crashes. Doesn't mean the reason is the same. 

     

    You can see in some Logs posted here that the Alder Lake IGPU hangs and that's clearly the cause for crashes with 12 Gen CPUs.

     

    Of cause there are 1000 other reasons for getting the same result (a system crash) but there is no connection.

    Hope you get my point. 

     

    i have tried anything that was suggested here. Nothing is working for Alder Lake we need to wait for 5.17 Kernel. May we have Luck with 5.16 but I don't think so. See my last post in Alder Lake Thread. 

     

     

    You can see the differences  in the uptime. With my 12900K and HW Transcoding in Plex I will bring my server to crash in Minutes. The guys here with other hardware a talking about days till it happens. So clearly not the same reason just the same result. 

    Edited by feraay
    Link to comment

    So I have a much older CPU than most of what you are all posting. Mine is Intel Core i53570 Ivy Bridge and I need hardware transcoding for my Plex Docker. UnRaid 6.9.2 and above will not run for me. 6.8.3 is rock solid stable. 

     

    If I touch the i915 file, my understanding is that I lose hardware transcoding. If I upgraded my CPU and motherboard to an 11th or 12th gen Intel, I may still have crashing issues with 6.9.2. correct? 

     

    So is my only option to wait until the i915 issue gets resolved?

     

    Thx

     

    Dale

     

     

     

     

    Link to comment
    3 hours ago, dchamb said:

    So I have a much older CPU than most of what you are all posting. Mine is Intel Core i53570 Ivy Bridge and I need hardware transcoding for my Plex Docker. UnRaid 6.9.2 and above will not run for me. 6.8.3 is rock solid stable. 

     

    If I touch the i915 file, my understanding is that I lose hardware transcoding. If I upgraded my CPU and motherboard to an 11th or 12th gen Intel, I may still have crashing issues with 6.9.2. correct? 

     

    So is my only option to wait until the i915 issue gets resolved?

     

    Thx

     

    Dale

     

     

     

     

    I believe that if you follow the guide to enable QuickSync, you probably should be fine. Although your very old version of QS might not be the best...

    Link to comment

    Do you guys in here have a display connected to the iGPU or do you run it headless without any display connected?

    Link to comment

    I ran it with a small monitor attached.  I've since switched to headless, but I haven't tried re-enabling QS in Plex/Handbrake.

    Link to comment
    Just now, NightOps said:

    I ran it with a small monitor attached.  I've since switched to headless, but I haven't tried re-enabling QS in Plex/Handbrake.

    What CPU do you have?

     

    I‘ve know a few people that had issues when no monitor or HDMI Dummy device is plugged in to the iGPU and they try to transcode, issues like that the Kernel module for the iGPU generetad a Kernel panic, server crashes,…

    Link to comment
    2 hours ago, ich777 said:

    What CPU do you have?

     

    I‘ve know a few people that had issues when no monitor or HDMI Dummy device is plugged in to the iGPU and they try to transcode, issues like that the Kernel module for the iGPU generetad a Kernel panic, server crashes,…

    Dummy plug did not help with my Alder Lake.

    Link to comment
    49 minutes ago, flyize said:

    Dummy plug did not help with my Alder Lake.

    Correct, Alder Lake is a differnt thing because it's not "properly" supported by the Kernel on rc4 and older.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.