• [6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)


    Tristankin
    • Minor

    Since the 5.x kernel based releases many users have been reporting system hangs every few days once the i915 module is loaded.

    With reports from a few users detailed in the thread below we have worked out that the issue is caused by the i915 module and is a persistent issue with both the 6.9.x release and 6.10 release candidates.


    The system does not need to be actively transcoding for the hang to occur. 6.8.3 does not have this issue and is not hardware related. Unloading the i915 module stops the hangs. Hangs are still present in 6.10.0RC2. I can provide a list of similar reports if required.

    • Like 8
    • Thanks 1
    • Haha 1



    User Feedback

    Recommended Comments



    9 minutes ago, ich777 said:

    Why? Please update to 6.11.5

     

    Did you also change the path to /mnt/cache/... instead of /mnt/user/... in the template?

    I honestly somehow missed that there we were all the way to 6.11.5. Updating now. 

     

    Just changed the path. 

     

    I'll report back with my experience in a couple of days. Thanks for the help - my apologies for not getting totally caught up on the current best-practices. 

    Link to comment
    19 minutes ago, mechmess said:

    I'll report back with my experience in a couple of days. Thanks for the help - my apologies for not getting totally caught up on the current best-practices. 

    No worries or need to apologize but it was mentioned a few post above yours to change the path to the "real" file path instead of the FUSE file path.

    • Like 1
    Link to comment
    On 2/8/2023 at 11:28 AM, muzo178 said:

    i can't believe this came down to a simple directory change. will report back in a couple days.

     

    unfortunately my server crashed on four separate occasions...all during plex transcodes.

     

    same story. nothing in the logs. just fell off a cliff.

     

    back to 6.8.3. i now have to accept that this little box is stuck here.

     

    it was a pita to downgrade as well since most plugins require >6.9 these days, but i had a good 6.8.3 backup and managed to downgrade properly. normally you can take out the usb and just copy the backup files over, but it is a pita as the usb is inside on the mobo of the little terramaster f5-221 unit.

     

    a simple directory fix was was good to be true anyways.

     

    oh well..

    Edited by muzo178
    spelling
    Link to comment
    15 hours ago, muzo178 said:

    unfortunately my server crashed on four separate occasions...all during plex transcodes.


    I also turned off vt-d in the bios. Do you have the option to turn off vt-d?

    Edited by Tristankin
    Link to comment
    On 2/13/2023 at 4:07 AM, Tristankin said:


    I also turned off vt-d in the bios. Do you have the option to turn off vt-d?

     

    i don't think i have that option :(

     

    • Thanks 1
    Link to comment
    2 minutes ago, muzo178 said:

    i'm using the linuxserver plex container

    Please try the official one!

     

    Have you yet tried to switch over to IPVLAN in your Docker settings, as far as I see from your Diagnostics you are still using MACVLAN which can cause issues, maybe try that first before you switch containers (btw the containers should be inter-compatible so that you only have to change the repository in the Docker template).

     

    Just a few other things in your go file:

    #Modprobe it87 drivers for getting the fan speed
    modprobe it87 force_id=0x8620

    If you are on 6.11.5 you can remove that and install my plugin and it should work all OOB (you maybe have to run sensors detect again):

    grafik.png.8482caf42d51e9834c89a3e3a6b8390e.png

     

    The next thing that I've saw is:

    # Fix Docker for 6.8.3- Case Insensitive as per https://forums.unraid.net/topic/108643-all-docker-containers-lists-version-%E2%80%9Cnot-available%E2%80%9D-under-update/?do=findComment&comment=994056
    sed -i 's#@Docker-Content-Digest:\\s*\(.*\)@#\@Docker-Content-Digest:\\s*\(.*\)@i#g' /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php

     

    Please remove that and there is actually a similar thing in 6.11.5 because Docker changed a few things for the manifest files but there is also a fix for 6.11.5 in the CA App available:

    grafik.png.7d26b5bc9c14eefd22338a4165f19d78.png

    • Like 2
    Link to comment
    32 minutes ago, ich777 said:

    Please try the official one!

     

    Have you yet tried to switch over to IPVLAN in your Docker settings, as far as I see from your Diagnostics you are still using MACVLAN which can cause issues, maybe try that first before you switch containers (btw the containers should be inter-compatible so that you only have to change the repository in the Docker template).

     

    Just a few other things in your go file:

    #Modprobe it87 drivers for getting the fan speed
    modprobe it87 force_id=0x8620

    If you are on 6.11.5 you can remove that and install my plugin and it should work all OOB (you maybe have to run sensors detect again):

    grafik.png.8482caf42d51e9834c89a3e3a6b8390e.png

     

    The next thing that I've saw is:

    # Fix Docker for 6.8.3- Case Insensitive as per https://forums.unraid.net/topic/108643-all-docker-containers-lists-version-%E2%80%9Cnot-available%E2%80%9D-under-update/?do=findComment&comment=994056
    sed -i 's#@Docker-Content-Digest:\\s*\(.*\)@#\@Docker-Content-Digest:\\s*\(.*\)@i#g' /usr/local/emhttp/plugins/dynamix.docker.manager/include/DockerClient.php

     

    Please remove that and there is actually a similar thing in 6.11.5 because Docker changed a few things for the manifest files but there is also a fix for 6.11.5 in the CA App available:

    grafik.png.7d26b5bc9c14eefd22338a4165f19d78.png

     

    these diagnostics are from 6.8.3 so MACVLAN it is.. I tried switching to IPVLAN when I upgraded to no avail. 

     

    The docker fix in the go file was of course removed when I upgraded to 6.11.

     

    The modprobe thing in go vs the plugin is something i can try, but I sincerely doubt it will make a difference..

     

    I have been trying to upgrade since 6.9, then 6.10 and finally 6.11 trying all of these permutations that you recommeded and more..

     

    i'm stumped.

     

    one thing i still haven't tried is the official plex docker. what i will do for one last time:

    • i will take the box out, connect it to a monitor, see if i can turn off vt-d in the bios @Tristankin
    • upgrade to 6.11.5
    • switch to plex official container @ich777
    • switch to IPVLAN @ich777
    • comment out the modprobe and the docker fix from the go file and switch to using the plugins. @ich777

     

    and pray while i915 hw transcoding :)

     

    anything else i should add to that list you think?

    Edited by muzo178
    • Like 1
    Link to comment

    I am soooooo tired with these crashes. I feel they are even more frequent since I upgraded from 6.10.x to 6.11.x.

    I also

    • disabled VT-d
    • changed plex docker path to /mnt/cache instead of /mnt/user
    • switched to IPVLAN

    and still get crashes (several during the same week).

     

    I don't know what to do. My diagnostics file is attached in any case (downloaded after a power cycle, since the GUI is unresponsive when a crash occurs).

     

    The time where I was under 6.8.3 was so stable in comparison, I could keep my server running for months.

     

    Please @limetech, do something, please 🙂

     

    Edited by Opawesome
    fixed typos
    Link to comment
    46 minutes ago, Opawesome said:

    I am soooooo tired with these crashes. I feel they are even more frequent since I upgraded from 6.10.x to 6.11.x.

    I also

    • disabled VT-d
    • changed plex docker path to /mnt/cache instead of /mnt/user
    • switched to IPVLAN

    and still get crashes (several during the same week).

     

    I don't know what to do. My diagnostics file is attached in any case (downloaded after a power cycle, since the GUI is unresponsive when a crash occurs).

     

    The time where I was under 6.8.3 was so stable in comparison, I could keep my server running for months.

     

    Please @limetech, do something, please 🙂

    mozart-diagnostics-20230216-1633.zip

    They may need your help to fix by attaching a monitor. Check out this post:

     

     

    Link to comment
    5 hours ago, flyize said:

    They may need your help to fix by attaching a monitor. Check out this post:

     

     

    I did that. Will the error message just pop on the screen or should I enter some command to display some log in real time ?

    Edited by Opawesome
    Link to comment

    Alright, just to make things 100% clear, here is what I have done to my machine, now with 21 days uptime.

     

    • All C states turned off
    • Ram @ 2100Mhz
    • VT-d turned off
    • All unnecessary peripherals disabled including serial and parallel port
    • iGPU as first gpu
    • binhex-plex with transcode directory set to /dev/shm on the host, which translates to /transcode in the container
    • config directory pointed directly to cache and appdata share set to cache only
    • MacVLAN -> IPVLAN
    • Everything commented out in the go file
    • Intel GPU Top installed

     

    I hope this helps anyone else having problems

     

    image.thumb.png.aacc83f4b0a819bd2d0926b80b5c7917.png

    Link to comment
    15 minutes ago, Tristankin said:

    Alright, just to make things 100% clear, here is what I have done to my machine, now with 21 days uptime.

     

    • All C states turned off
    • Ram @ 2100Mhz
    • VT-d turned off
    • All unnecessary peripherals disabled including serial and parallel port
    • iGPU as first gpu
    • binhex-plex with transcode directory set to /dev/shm on the host, which translates to /transcode in the container
    • config directory pointed directly to cache and appdata share set to cache only
    • MacVLAN -> IPVLAN
    • Everything commented out in the go file
    • Intel GPU Top installed

     

    I hope this helps anyone else having problems

     

    image.thumb.png.aacc83f4b0a819bd2d0926b80b5c7917.png


    my box seems to be stable now after turning off vt-d. 

     

    what is your cpu again? Does the hdr tone mapping option in plex work for you? mine is a j3355 (intel hd graphics 500) and it doesn’t.

    • Like 1
    Link to comment
    21 minutes ago, Tristankin said:

    All C states turned off

    This should not be necessary.

     

    21 minutes ago, Tristankin said:

    VT-d turned off

    This too.

     

    Maybe try to revert all your changes one by one and see what causes the issue in your case.

    Link to comment
    4 minutes ago, muzo178 said:

    Does the hdr tone mapping option in plex work for you?

    Are you yet on the official container? Usually the official container will give you the best results/experience.

    Link to comment
    2 hours ago, ich777 said:

    Are you yet on the official container? Usually the official container will give you the best results/experience.

    Yes, switched to official.

    Link to comment
    2 hours ago, muzo178 said:

    j3355

    It should at lest work on Apolo Lake platforms according to the Quick Sync Video Matrix.

     

    Have you yet tried the official Jellyfin container if it is working there (if you try it, I would recommend that you use VA-API instead of QuickSync to be on the safe side).

    Link to comment
    10 hours ago, ich777 said:

    It should at lest work on Apolo Lake platforms according to the Quick Sync Video Matrix.

     

    Have you yet tried the official Jellyfin container if it is working there (if you try it, I would recommend that you use VA-API instead of QuickSync to be on the safe side).


    6.8.3 was fine with hdr tone mapping. I haven’t tried jellyfin yet, I’ll give it a shot in the name of troubleshooting. 

    • Like 1
    Link to comment
    3 hours ago, squiddles88 said:

    I have the exact same issue. It is beyond frustrating that I am unable to use Plex transcoding without hard crashes.

    Have you yet read the last 30 comments here in the thread because this is now solved for most people.

    Link to comment
    45 minutes ago, ich777 said:

    Have you yet read the last 30 comments here in the thread because this is now solved for most people.

     

    The only thing I haven't tried is disable VT-d, as I don't have access to the BIOS at the moment.

     

    Even if that did allow hardware transcoding to function, it isn't really solved as it should function with it enabled, and does in other OS's.

     

    Edit: I found muzo178 is also using a Terramaster box. I am using a slightly different box that has similar cpu, J3355 vs J3455 (mine). I'll see if I can get a monitor to the NAS as disable VT-d.

    Edited by squiddles88
    Link to comment
    19 minutes ago, squiddles88 said:

    Edit: I found muzo178 is also using a Terramaster box. I am using a slightly different box that has similar cpu, J3355 vs J3455 (mine). I'll see if I can get a monitor to the NAS as disable VT-d.

    Do you have no monitor or dummy plug connected by default? This could also be the issue.

     

    I know many people using such CPUs and having no issue whatsoever using HW transcoding.

    Link to comment
    43 minutes ago, ich777 said:

    Do you have no monitor or dummy plug connected by default? This could also be the issue.

     

    I know many people using such CPUs and having no issue whatsoever using HW transcoding.

     

    Okay, I've done a bit more digging and I may be having a different problem. I've also disabled VT-d and hooked up a monitor.

     

    I think its more crashing on HEVC decoding than any form of hardware encoding. The GPU only hangs when trying to transcode HEVC videos. It happens immediately after attempting to transcode and then falls back to software transcoding (which obviously just stalls as the CPU is far too weak).

     

    Edit: I am also able to transcode 2mbit SDR 1080p HEVC to H264 fine. It might just be HDR HEVC crashing.

    Edited by squiddles88
    Link to comment
    20 minutes ago, squiddles88 said:

     

    Okay, I've done a bit more digging and I may be having a different problem. I've also disabled VT-d and hooked up a monitor.

     

    I think its more crashing on HEVC decoding than any form of hardware encoding. The GPU only hangs when trying to transcode HEVC videos. It happens immediately after attempting to transcode and then falls back to software transcoding (which obviously just stalls as the CPU is far too weak).

     

    Edit: I am also able to transcode 2mbit SDR 1080p HEVC to H264 fine. It might just be HDR HEVC crashing.

     

    did you try turning off hdr tone mapping? if you are using plex that is… with that off, it works for me.

     

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.