• [6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)


    Tristankin
    • Minor

    Since the 5.x kernel based releases many users have been reporting system hangs every few days once the i915 module is loaded.

    With reports from a few users detailed in the thread below we have worked out that the issue is caused by the i915 module and is a persistent issue with both the 6.9.x release and 6.10 release candidates.


    The system does not need to be actively transcoding for the hang to occur. 6.8.3 does not have this issue and is not hardware related. Unloading the i915 module stops the hangs. Hangs are still present in 6.10.0RC2. I can provide a list of similar reports if required.

    • Like 8
    • Thanks 1
    • Haha 1



    User Feedback

    Recommended Comments



    2 hours ago, Titan84 said:

    Since the new Kernel 5.18.x will likely solve this issue

     

    it will solve the hangs but it won't solve the hardware transcoding problem for 12th gen

    Edited by Alz7777
    Link to comment
    2 hours ago, Titan84 said:

    Since the new Kernel 5.18.x will likely solve this issue, what are the chances that @limetech could release say a version 6.11-RC1 for us that has the latest Kernel. In this RC1 version literally the only changes from 6.10.2 would be the new Kernal and nothing else.

    I'm aware that the team have a lot going on right now with the NIC issue and I'm not sure what's all involved with combining unraid with the new Kernel so this might be totally out of the question but I thought id ask ;-)

    I'm in the same boat as a lot of other people with a 12900K but don't want to use a custom Kernel mod if I can help it as I don't want to mess things up 🙂

    I believe that @limetech has already stated that 6.11 will be based on 5.17. Hopefully by the time we get it, many of the 5.18 changes (especially for Alder Lake) will have been backported.

     

    Otherwise, you can try running 5.18 by using @thor2002ro's custom kernel. However, I can tell you that 5.18 doesn't currently fix it.

    Link to comment
    3 hours ago, ich777 said:

    Don't forget that currently Kernel version 5.18.1 is stable... ;)

    Click

    You probably know more than me, but I'm pretty sure that @limetech said recently that it would be based on 5.17. Since you just corrected me, I'm likely totally wrong though. lol

     

    edit: I went back and looked at the post I was thinking of, and as I'm sure you know - I was wrong.

    Edited by flyize
    Link to comment
    5 hours ago, itimpi said:

    I would expect any 6.11 release to have significant new functionality.    If it was only a kernel upgrade I would expect it to be a point release within the 6.10 series.

    Fair enough, a pointed release would be fine using the 6.10.x lable.

     

    3 hours ago, flyize said:

    I believe that @limetech has already stated that 6.11 will be based on 5.17.

    They said that it will be based on the latest Kernel version at the time of releasing 6.11 so id say either 5.18.x or 5.19.

     

    3 hours ago, flyize said:

    Otherwise, you can try running 5.18 by using @thor2002ro's custom kernel. However, I can tell you that 5.18 doesn't currently fix it.

    Yeah, I saw your instructions on how to do it so its an option. Iv seen mixed results though with some people saying that it's working so I dont really know tbh. 

    Link to comment

    In my Optionen this thread is not really for alder lake users.

     

    we have at least 3 different type of problems here

     

    some older cpus where the system hang occurs every few days

     

     a problem with Plex itself and intel stuff

     

    and alder lake which is for sure not official supported with the actual used 5.15 kernel

     

    I guess would be a lot easier if we stop mixing problems with each other only because the outcome is the same 

    • Upvote 1
    Link to comment
    12 hours ago, feraay said:

     

    I guess would be a lot easier if we stop mixing problems with each other only because the outcome is the same 

    This sums up probably 95% of the me too posts in this whole forum, not just this thread.

    • Haha 1
    • Upvote 1
    Link to comment
    On 6/4/2022 at 5:12 PM, feraay said:

    some older cpus where the system hang occurs every few days

     

    I guess would be a lot easier if we stop mixing problems with each other only because the outcome is the same 

     

    This is the problem that I initially brought up this bug report for. I have an 9th gen cpu with a system stuck on 6.8.3 because its the last 4.x kernel and therefore the last time the i915 driver was stable. Looking forward to the next release candidate so I can finally upgrade my system.

    Link to comment
    30 minutes ago, Tristankin said:

     

    This is the problem that I initially brought up this bug report for. I have an 9th gen cpu with a system stuck on 6.8.3 because its the last 4.x kernel and therefore the last time the i915 driver was stable. Looking forward to the next release candidate so I can finally upgrade my system.

     

    Same here with a J3355 CPU.. Been following this thread for a while, but it has become very Alder Lakey :)

    • Like 1
    Link to comment
    5 hours ago, Tristankin said:

    I have an 9th gen cpu with a system stuck on 6.8.3 because its the last 4.x kernel and therefore the last time the i915 driver was stable.

    That it's not stable is simply not true. On what CPU you are exactly @alturismo has had a i9-9900 if I'm not mistaken and he had no issues transcoding on Emby/Plex.

    What container are you using that gives you issues? Do you have a Display connected to the iGPU, what are your BIOS settings for the iGPU?

     

    5 hours ago, muzo178 said:

    Same here with a J3355 CPU.. Been following this thread for a while

    Isn't this Apollo Lake?

    I only can speak for a Asrock J3710 (Braswell) and also for a Asrock J4125 (Gemini Lake Refresh) which are working perfectly fine in terms of transcoding on Unraid 6.10.2 on Emby/Jellyfin.

    Link to comment
    55 minutes ago, ich777 said:

    That it's not stable is simply not true. On what CPU you are exactly @alturismo has had a i9-9900 if I'm not mistaken and he had no issues transcoding on Emby/Plex.

    What container are you using that gives you issues? Do you have a Display connected to the iGPU, what are your BIOS settings for the iGPU?

    still can confirm this as my "old" mashine is still in use on my friends place (im maintaining it), never crashes ...

     

    transcoding is done in plex and tvheadend sometimes, even a gvt-g VM is perm running

     

    image.thumb.png.d41761575ab120a5a6b986e0777951a4.png

    • Like 1
    Link to comment
    On 6/6/2022 at 11:55 PM, ich777 said:

    That it's not stable is simply not true. On what CPU you are exactly @alturismo has had a i9-9900 if I'm not mistaken and he had no issues transcoding on Emby/Plex.

    What container are you using that gives you issues? Do you have a Display connected to the iGPU, what are your BIOS settings for the iGPU?

     

     

    Sure, I have been reporting this issue for 6 months but now you want to know about my system setup?

    Using the HDMI dummy plug, what do you want to know about my BIOS? Binhex-Plex. Doesn't need to be transcoding though, as the initial report.....

    Why do I have 0 resets on 6.8.3 but multiple freezes a week on 6.9.x+

    Edited by Tristankin
    Link to comment
    11 minutes ago, Tristankin said:

    Sure, I have been reporting this issue for 6 months but now you want to know about my system setup?

    TBH I really don't know anymore which threads are for Alder Lake and which are not.

    I think I wrote with someone about another generation than Alder Lake but couldn't remember if it was this thread because the forums are now flooded with Alder Lake transcoding issues where the main reason why it doesn't work is in my opinion Plex...

     

    11 minutes ago, Tristankin said:

    Using the HDMI dummy plug, what do you want to know about my BIOS?

    BIOS settings for the iGPU, aperture size and so on... There should be a dedicated page for this in the BIOS.

     

    Do you have done any modifications to the go file or something like that? What CPU do you own? Have done any undervolting or tuning from the system in the BIOS too?

     

    EDIT: Have now gone through the Diagnostics, can you try to remove the i915.conf file from your modprobe.d file on 6.10.2 and install the Intel-GPU-TOP plugin.

     

    Is the server randomly crashing or is it just while transcoding? If it's not related to transcoding I would recommend to not do anything from the above recommended.

    Link to comment

    I wish the alderlakers would make their own damn threads too.

    No this is instability with the i915 driver as reported by various members in this particular thread. All currently "supported" by the used kernel. (just look back through the thread)

    I am using a i3-9100. I took out the modifications in the go file in the upgrade. No undervolting, no tuning.

    Honestly all of this is covered in the thread attached to the initial bug report so I feel I am having to once again repeat myself. I was encouraged to make a bug report by the mods after complaining enough in the general forum for months and getting no help.

    Consensus was eventually that the intel iGPU drivers in the kernel are flakey. 

    I have been trying to find a solution since 12 March 2021

    • Like 1
    Link to comment
    10 minutes ago, Tristankin said:

    I have been trying to find a solution since 12 March 2021

    May I ask if you are on the latest BIOS version? Have you the possibility in your BIOS to increase the Core/iGPU voltage about +0.05V or maybe +0.1V? This should not harm your system in any way for testing.

    I've seen system iGPU freezes where the motherboard is a little too conservative with the power management from the core voltage itself.

     

    Also what RAM is this that you got installed? I know it's a little odd question but have you yet tried any other RAM from Corsair or something like that?

     

    EDIT: Have you yet tried to switch from MACVLAN to IPVLAN in your Docker settings?

    Link to comment

    Corsair Vengeance LPX 16GB (2x8GB) 2400MHz CL16 DDR4 (running at 2100MT/s from memory)

    I was running on an undervolt on 6.8.3 then reset the bios after i was getting restarts on 6.9.x so I doubt that is the issue. (back to standard voltages)

    I had 6 months without freezes before the upgrade to 6.9.x and have had 6 months without freezers since the downgrade to 6.8.3

    I will check the details about BAR and make sure the bios is up to date. It is currently 8°C and midnight here in Australia and I would need to crawl under the house.

    Link to comment
    6 minutes ago, Tristankin said:

    I will check the details about BAR and make sure the bios is up to date. It is currently 8°C and midnight here in Australia and I would need to crawl under the house.

    No issue to me, whenever you got time... Over here in Austria it's not that late and a little warmer... :)

     

    7 minutes ago, Tristankin said:

    I was running on an undervolt on 6.8.3 then reset the bios after i was getting restarts on 6.9.x so I doubt that is the issue. (back to standard voltages)

    Maybe try to increase the voltage a little bit and see if this helps.

     

    11 minutes ago, Tristankin said:

    Corsair Vengeance LPX 16GB (2x8GB) 2400MHz CL16 DDR4 (running at 2100MT/s from memory)

    This is strange the Diagnostics say that it's running at 2400MT/s

    Configured Memory Speed: 2400 MT/s

     

    • Like 1
    Link to comment

    Hard keeping up with all the questions coming from edits.
    The reboots are random and would often happen overnight with no transcoding.

    2100 has been tried since then for testing.

    I have VLANS turned off so that should not be the issue?
    image.thumb.png.e356ee780350f618648c9ee6d2847c46.png 

     

    I will also have to do the upgrade again as I am still on 6.8.3. But I couldn't put up with a year of testing till now with constant freezes.

     

    Edited by Tristankin
    Link to comment
    13 minutes ago, Tristankin said:

    I will also have to do the upgrade again as I am still on 6.8.3. But I couldn't put up with a year of testing till now with constant freezes.

    You only have that option to switch from MACVLAN to IPVLAN on 6.9.0+

     

    14 minutes ago, Tristankin said:

    The reboots are random and would often happen overnight with no transcoding.

    Then I think it's not strictly related to the i915 module.

    • Like 1
    Link to comment

    I also have to report that on my Skylake the freezes are only happening while the igpu is not blacklisted. Everything is stable when there is no transcoding happening (means: /dev/dri is not forwarded to plex or the igpu is blacklisted as whole). Everything was fine with 200+ days uptime on 6.8.3 like I wrote earlier in this thread. I thought that using a dummy plug fixed my issues on 6.10-rc4  but after that the freezes where back.

     

    Greetings from Vienna btw ;).

    • Like 1
    Link to comment

    @Tristankin & @Akilae keep in mind that the iGPU is maybe also used when you import new media to generate the previews at least in Plex I think that this applies.

     

    It seems also suspicious to me that on 6.10.0rc4 everything worked with a HDMI dummy plug.

     

    I suspect anything else than this module, must be some weird bug...

    @Tristankin Have you yet tried to blacklist it and install the Intel-GPU-TOP plugin?

     

    @Akilae do you have your Diagnostics somewhere?

     

    7 minutes ago, Akilae said:

    Greetings from Vienna btw ;).

    Greetings from Lunz am See... :D

    Link to comment

    Ah yeah OK, the previews. Did not think of that. 

    5.x kernel intel drivers have been known to be a bit shit, and also I have heard reports that the dummy plug has been more important since the 5.x releases (somewhere in this thread from memory)

    The bit that also is making it all very hard to identify is that there is never anything logged in syslog. Which makes the failure appear as a hardware one, but I am pretty darn sure it is an unlogged gpu fault.

    Link to comment
    11 minutes ago, Tristankin said:

    5.x kernel intel drivers have been known to be a bit shit, and also I have heard reports that the dummy plug has been more important since the 5.x releases (somewhere in this thread from memory)

    This was probably me saying that... :)

     

    You also have to keep in mind that the iGPUs where never intended to be run as headless things who are only used transcode... ;)

     

    11 minutes ago, Tristankin said:

    The bit that also is making it all very hard to identify is that there is never anything logged in syslog. Which makes the failure appear as a hardware one, but I am pretty darn sure it is an unlogged gpu fault.

    Would be really interesting what pops up on the screen if the crash happens, but actually no one that has this issue has a screen attached to their systems the whole time so you can take a picture from the error on screen.

     

    I know it's frustrating for you where it isn't working, but it is frustrating for me too because troubleshooting is hard and it can be nearly everything, faulty hardware, hardware incompatibility, a really messed up BIOS (from the manufacturer),...

     

    It is strange since some people actually owning a i3-9100 and have no issues whatsoever on the German subforums who also use this CPU for transcoding...

    • Like 1
    Link to comment
    49 minutes ago, Tristankin said:

    The bit that also is making it all very hard to identify is that there is never anything logged in syslog. Which makes the failure appear as a hardware one, but I am pretty darn sure it is an unlogged gpu fault.

    Just to confirm, you're using remote logging for syslog, right?

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.