• [6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)


    Tristankin
    • Minor

    Since the 5.x kernel based releases many users have been reporting system hangs every few days once the i915 module is loaded.

    With reports from a few users detailed in the thread below we have worked out that the issue is caused by the i915 module and is a persistent issue with both the 6.9.x release and 6.10 release candidates.


    The system does not need to be actively transcoding for the hang to occur. 6.8.3 does not have this issue and is not hardware related. Unloading the i915 module stops the hangs. Hangs are still present in 6.10.0RC2. I can provide a list of similar reports if required.

    • Like 8
    • Thanks 1
    • Haha 1



    User Feedback

    Recommended Comments



    35 minutes ago, dlandon said:

    Remove any i915 files from the /flash/modprobe.d folder.

    Hi this needs to be in place if using Alderlake, with black listing in the file if using Intel_GPU_Top or include options for Force Probe. But if you put option in syslinx you can remove the file.

     

    I think this will be required until unraid is using kernal 5.16 as UHD770 should be in the native driver.

    Link to comment
    10 minutes ago, dlandon said:

    I'm not sure I understand.  The i915 entry in modprobe.d will now blacklist the driver.

    For Alder Lake you need the following option, this was the case for Rocket Lake but options values are different.

     

    i915.force_probe=4680 

     

    So one of the following needs to be done for the iGPU.

     

    1. you can add to Syslinx i.e. append initrd=/bzroot,/bzroot-gui isolcpus=2-11 acpi_enforce_resources=lax i915.force_probe=4680 and Unraid will load driver correctly.

    2. From RC2 onwards, Blacklist the driver in modprobe.d and install Intel gpu top to load driver with correct probe.

    3. Add options to modprobe file. options i915 force_probe=4680 

     

    Otherwise the driver will not load for UHD770 iGPU.

    Link to comment

    OK.  So that is a special case.  In my case I have all those entries removed and use the /dev/dri for Plex and some other docker containers and it works fine.

     

    For this issue, we should stay with the generic situation like I use.  Once we have that, we can move to the special cases.

    Link to comment
    1 hour ago, SimonF said:

    For Alder Lake you need the following option, this was the case for Rocket Lake but options values are different.

     

    i915.force_probe=4680 

     

    So one of the following needs to be done for the iGPU.

     

    1. you can add to Syslinx i.e. append initrd=/bzroot,/bzroot-gui isolcpus=2-11 acpi_enforce_resources=lax i915.force_probe=4680 and Unraid will load driver correctly.

    2. From RC2 onwards, Blacklist the driver in modprobe.d and install Intel gpu top to load driver with correct probe.

    3. Add options to modprobe file. options i915 force_probe=4680 

     

    Otherwise the driver will not load for UHD770 iGPU.

    I'm probably misunderstanding something... I'm running an i5-12600k with unRAID 6.10.0 rc2.  In my /etc/modprobe.d/i915.conf file I have "blacklist i915", and I have the intel gpu top installed.  If I'm understanding what you're saying, this should resolve hangups...correct?

    Link to comment

    With RC2 while using an i7 9700 is it suggested to remove Intel GPU TOP plugin, correct? Does the statistics plug-in still work then?

     

    BR

    Link to comment
    34 minutes ago, NightOps said:

    this should resolve hangups...correct?

    No, it was just the solution to remove the file from modprobe.d when RC3 arrives would stop iGPU working for Alderlake as Unraid will load the driver without the force probe. Likely to be when Kernal 5.16 is added to unraid that the native support will be in the i915 driver without the force probe then this work around can be removed. Force Probes is enabling support for a device, but it is not full release code, more than beta but maybe not 100% stable or needs further tweeks for new architecture. 

    Link to comment
    8 minutes ago, SimonF said:

    No, it was just the solution to remove the file from modprobe.d when RC3 arrives would stop iGPU working for Alderlake as Unraid will load the driver without the force probe. Likely to be when Kernal 5.16 is added to unraid that the native support will be in the i915 driver without the force probe then this work around can be removed. Force Probes is enabling support for a device, but it is not full release code, more than beta but maybe not 100% stable or needs further tweeks for new architecture. 

    Gotcha, thanks!  So for rc2 we need to leave it in for things to work - but they may still encounter hangups periodically *possibly* until kernel 5.16 finds its way into unRAID.  At that time we will most likely be safe to drop blacklisting the i915 driver...correct?

    Link to comment
    On 1/22/2022 at 9:24 AM, SimonF said:

    Yes asumming native support is added into 5.16

    It looks like 5.16 came out earlier this month. How would we find out if full support was added?

    Link to comment
    5 hours ago, drkCrix said:

    So is there anything that needs to be done for a new intel install (11 series)with RC2

    RC2 loads i915 driver as standard. Look in the logs to see if it states force probe is needed. if force probe is still required then you need to blacklist i915 in the i915.conf file in /boot/config/modprobe.d then intel_gpu_top will do the force probe.

    Link to comment
    On 1/22/2022 at 2:26 PM, NightOps said:

    I'm probably misunderstanding something... I'm running an i5-12600k with unRAID 6.10.0 rc2.  In my /etc/modprobe.d/i915.conf file I have "blacklist i915", and I have the intel gpu top installed.  If I'm understanding what you're saying, this should resolve hangups...correct?

     Any feedback here?

    I am using GPU Statistics plugin and therefore installed gpu_top plugin. Is gpu_top plugin still needed or suggested do remove?

    Link to comment

    and just had another crash.. funny thing is that i was opening the share without any issues. and 5 minutes later wanted to open unraid's page but it no longer responded.. 

    Link to comment

    Can someone please simplify what needs to be done? Im running a 12600K with Z690 and get system hangs every couple of hours.

    I made an empty /boot/config/modprobe.d/i915.conf file, anything else that needs to be done? Still getting seemingly random system hangs.

     

    Quote

    3. Add options to modprobe file. options i915 force_probe=4680 

    Im not sure i understand. How do i do this?

    Edited by YourNightmar3
    Link to comment
    18 minutes ago, YourNightmar3 said:

    Can someone please simplify what needs to be done? Im running a 12600K with Z690 and get system hangs every couple of hours.

    I made an empty /boot/config/modprobe.d/i915.conf file, anything else that needs to be done? Still getting seemingly random system hangs.

     

    Im not sure i understand. How do i do this?

    “blacklist i915” should be the content of that file I believe.  After 9 days and hours worth of Plex usage, mine finally locked up.  I’m taking /dev/dri out of the container and seeing if that smoothes things out.  Hopefully this is resolved when better driver support comes out.  Also, QSV for the 12600k seems to be non-existent at this point or at least craps out when encoding using Handbrake in unRAID…

    Link to comment

    My server locks up randomly after awhile even with plex hardware transcoding off (doing software transcoding for now). My iGPU basically does nothing right now but still causes the occasional lockup. i5-11600K

    Link to comment
    On 2/3/2022 at 8:44 AM, snailtrails said:

    My server locks up randomly after awhile even with plex hardware transcoding off (doing software transcoding for now). My iGPU basically does nothing right now but still causes the occasional lockup. i5-11600K

     

    this was my solution that allows full Plex hardware transcoding on QSV- linking here just in case it helps:
     

     

    Edited by akawoz
    Link to comment
    16 hours ago, akawoz said:

     

    this was my solution - linking here just in case it helps:
     

     

     

    Is this a permanent or temporary fix. Should we wait for linux driver updates to the intel drivers or unraid server update as the official fix? Can you plex hardware transcode?

    Edited by snailtrails
    • Like 1
    Link to comment
    3 hours ago, snailtrails said:

     

    Is this a permanent or temporary fix. Should we wait for linux driver updates to the intel drivers or unraid server update as the official fix? Can you plex hardware transcode?


    Yes full Plex hardware transcoding on QSV, will keep like this until the next stable version of Unraid ships. It sounds like Linux kernel 5.16.x has big updates for the i915 iGPU driver so fingers crossed that the next stable version of Unraid 6.10 ships with this. 6.9.2 is the current stable release of Unraid.

    Edited by akawoz
    spelling
    • Like 1
    Link to comment

    After 35 days of stable uptime, my server finally crashed this morning -- again without anything relevant in the remote syslog.

    @Hoopster has your machine crashed at any point? 

    Otherwise, does anyone have an ETA for version 6.10 with the kernel patches that would stop this from occurring? 

    Link to comment
    40 minutes ago, bearcat2004 said:

    @Hoopster has your machine crashed at any point? 

    No, I am at 50+ days of uptime since removing the Intel-GPU-top and GPU Statistics plugins. 

     

    I also recently removed the CoreFreq plugin as there have been several reports of it locking up servers.  This was not in response to a crash, just an extra precaution.

    • Thanks 1
    Link to comment
    1 minute ago, Hoopster said:

    No, I am at 50+ days of uptime since removing the Intel-GPU-top and GPU Statistics plugins. 

     

    I also recently removed the CoreFreq plugin as there have been several reports of it locking up servers.  This was not in response to a crash, just an extra precaution.

    If you're doing it, I'm doing it. Thanks for the recommendation, I don't use that plugin much anyway ¯\_(ツ)_/¯

    Link to comment
    1 hour ago, bearcat2004 said:

    After 35 days of stable uptime, my server finally crashed this morning -- again without anything relevant in the remote syslog.

    @Hoopster has your machine crashed at any point? 

    Otherwise, does anyone have an ETA for version 6.10 with the kernel patches that would stop this from occurring? 

    I had a friend who's much more knowledgeable with linux write me up a walkthrough to patch 6.10. I haven't tried it yet, but will report back if I do.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.