• [6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)


    Tristankin
    • Minor

    Since the 5.x kernel based releases many users have been reporting system hangs every few days once the i915 module is loaded.

    With reports from a few users detailed in the thread below we have worked out that the issue is caused by the i915 module and is a persistent issue with both the 6.9.x release and 6.10 release candidates.


    The system does not need to be actively transcoding for the hang to occur. 6.8.3 does not have this issue and is not hardware related. Unloading the i915 module stops the hangs. Hangs are still present in 6.10.0RC2. I can provide a list of similar reports if required.

    • Like 8
    • Thanks 1
    • Haha 1



    User Feedback

    Recommended Comments



    On 6/22/2022 at 11:54 PM, airlychee said:

    I tested in 6.10.3, when I disable vt-d in the bios, transcoding will not cause the unraid crash.

    Today I enabled vt-d and added intel_iommu=igfx_off, no crashes occurred during transcoding.

    • Like 1
    Link to comment

    One thing that fixed this for me was disabling C-state in the BIOS. I'm using a dummy plug for igpu. Not great but at least no crashes.

    Link to comment
    On 6/14/2022 at 10:15 AM, lostinspace said:

    I've been following this thread for some time (and have made my own thread a year ago when 6.9.0 came out), just wanted to add a data point.  I run an i7-8700 and use the iGPU for transcoding (other system details and diagnostics are in the linked thread for anyone wanting to look).  Rock solid on 6.8.3.  Upgrading to 6.9.x caused crashes every day or two, nothing in remote syslog, just like the OP of this thread.

     

    I will look into BIOS settings when I get a chance.  Truth be told, I didn't try the last recommendation in my own thread to disable C States as I had already reverted back to 6.8.3.

    Just wanted to update my post.  Shortly after I commented above, I upgraded my bios to the latest version, upgraded the firmware on my SSD cache drive to the latest version (which is also used for appdata and all plex metadata), removed all remnants of the 6.8.x way of enabling iGPU passthrough for intel cpus, and upgraded to to 6.10.2.


    It's been rock solid ever since.

    • Like 1
    Link to comment

    Hi new to the forum.

     

    Can anybody tell me if this is still a problem. My unraid server is crash/unresponsive every day or two. 
     

    I’m running an Intel i7-8700. I’ve updated my motherboard bios and unraid to the latest versions

     

    Up until a week ago I had a plex container transcoding using a nvidia p400 and a VM running windows with blue Iris. it was rock solid with only rebooting when planned

     

    The igpu was passed through to the windows vm for blue Iris to use quicksync


    The only thing I have changed in the last week was to remove my Blue Iris VM which had the i7 igpu passed through to it. I removed Blue Iris as I am trying out Frigate in a container. I thought I and did remove the p400 and activated quicksync for both Plex and Frigate.

     

    I have syslog written to a cache drive which shows no errors. I have left a screen connected and will either be black or frozen.
     

    The crash’s/unresponsive events can happen any time with or with out Plex transcoding. Frigate is constantly using quicksync to decode video

     

    My next step is to disable the igpu in the bios and put the p400 back in.

     

    Any other ideas or directions would be very helpful

    Link to comment
    46 minutes ago, Simon82 said:

    The crash’s/unresponsive events can happen any time with or with out Plex transcoding. Frigate is constantly using quicksync to decode video

    Please post your Diagnostics.

    Link to comment

    @Simon82 I see no indication that your iGPU is even enabled in the Diagnostics that you've attached. I only see the Nvidia GPU.

     

    Do you have a mixed network configuration for your Docker containers, for example some use bridge, some use br0 and so on? If yes, make sure that you set your "Docker custom network type" to "ipvlan":

    image.png.02ce8671fa144331d2ab45098e9a2fc7.png

    Link to comment

    Sorry that was the dia from putting back in the p400 I will get an old one load from the crash.

     

    yes I have a mixture of docker networks but it has never been a problem. What sort of problem does it cause in my current config?

    Link to comment
    16 minutes ago, Simon82 said:

    yes I have a mixture of docker networks but it has never been a problem. What sort of problem does it cause in my current config?

    Please change the Docker network to ipvlan.

    macvlan is known to crash servers.

     

    You have started using Frigate or am I wrong? Maybe that is causing crashes in combination with macvlan.

     

    Please change it as described above and see if it crashes again.

    Link to comment
    10 hours ago, ich777 said:

    Please change the Docker network to ipvlan.

    macvlan is known to crash servers.

     

    You have started using Frigate or am I wrong? Maybe that is causing crashes in combination with macvlan.

     

    Please change it as described above and see if it crashes again.

     

    Thanks for the info, I have change the custom network over to ipvlan. Im not completely sure but i think the custom network br0 was already present i just used it for pihole.

     

    I had frigate working a couple of days with the P400 before switching to quicksync. so unsure how much frigate is playing in this.

     

    Edited by Simon82
    Link to comment

    Just an update. Server has been up for 5 days now. So it was either macvlan or the quick sync. I guess I’ll keep it running for another week and then try switching back to quick sync.

    Link to comment
    4 hours ago, AnimusAstralis said:

    I'd appreciate if someone can help me to pinpoint a cause of these hangs. Diagnostics attached.

    Please try to use ipvlan instead of macvlan in your Docker settings.

     

    I would recommend to create a dedicated bug report since this seems not related to this issue.

    • Thanks 1
    Link to comment

    So I gave 6.11.5 a go. Still getting hangs.

    I removed everything from the go file, installed intel-gpu-top, changed over from macvlan to ip. 

    I upgraded the bios to the latest version, turned off all C states, unplugged hdmi kvm switcher and replaced with a dummy hdmi plug.

    I have been on 6.8.3 for months without a single freeze but as soon as I change over to 5.x kernel versions the freezes still occur.

     

    Jan 26 01:10:19 Firefly  rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="14592" x-info="https://www.rsyslog.com"] start
    Jan 26 08:44:23 Firefly  crond[1056]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
    Jan 26 18:32:38 Firefly kernel: md: sync done. time=63099sec
    Jan 26 18:32:38 Firefly kernel: md: recovery thread: exit status: 0
    Jan 26 19:38:33 Firefly kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-12
    Jan 26 19:38:33 Firefly kernel: Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022


    Still happening every 2-5 days. No consistency to when it is happening.

    I did notice these but I assume they are just dodgy files. Did not cause a freeze. Everything else looks fine.

     

    Jan 13 21:14:09 Firefly kernel: Plex Transcoder[21796]: segfault at 18 ip 0000154d3caea3ca sp 00007ffe25bf21d0 error 4 in libavcodec.so.59[154d3c782000+3d8000]
    Jan 13 21:14:09 Firefly kernel: Code: 4c 89 eb 0f 84 dc 00 00 00 c7 83 1c 07 00 00 01 00 00 00 48 8b 83 d0 01 00 00 48 3b 83 e8 01 00 00 7d 19 48 8b 83 c0 01 00 00 <48> 8b 40 18 48 2b 83 f8 01 00 00 48 89 83 00 02 00 00 49 83 bd b8
    Jan 13 21:14:59 Firefly kernel: Plex Transcoder[23296]: segfault at 18 ip 000014fa1a6ea3ca sp 00007ffd66073280 error 4 in libavcodec.so.59[14fa1a382000+3d8000]
    Jan 13 21:14:59 Firefly kernel: Code: 4c 89 eb 0f 84 dc 00 00 00 c7 83 1c 07 00 00 01 00 00 00 48 8b 83 d0 01 00 00 48 3b 83 e8 01 00 00 7d 19 48 8b 83 c0 01 00 00 <48> 8b 40 18 48 2b 83 f8 01 00 00 48 89 83 00 02 00 00 49 83 bd b8
      


    Diagnostics attached. But I assume that rolling back to 6.8.3 is going to be the solution for now.

    firefly-diagnostics-20230127-1210.zip

    Link to comment
    6 hours ago, Tristankin said:

    So I gave 6.11.5 a go. Still getting hangs.

    I removed everything from the go file, installed intel-gpu-top, changed over from macvlan to ip. 

    I upgraded the bios to the latest version, turned off all C states, unplugged hdmi kvm switcher and replaced with a dummy hdmi plug.

    I have been on 6.8.3 for months without a single freeze but as soon as I change over to 5.x kernel versions the freezes still occur.

     

    Jan 26 01:10:19 Firefly  rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="14592" x-info="https://www.rsyslog.com"] start
    Jan 26 08:44:23 Firefly  crond[1056]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
    Jan 26 18:32:38 Firefly kernel: md: sync done. time=63099sec
    Jan 26 18:32:38 Firefly kernel: md: recovery thread: exit status: 0
    Jan 26 19:38:33 Firefly kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-12
    Jan 26 19:38:33 Firefly kernel: Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022


    Still happening every 2-5 days. No consistency to when it is happening.

    I did notice these but I assume they are just dodgy files. Did not cause a freeze. Everything else looks fine.

     

    Jan 13 21:14:09 Firefly kernel: Plex Transcoder[21796]: segfault at 18 ip 0000154d3caea3ca sp 00007ffe25bf21d0 error 4 in libavcodec.so.59[154d3c782000+3d8000]
    Jan 13 21:14:09 Firefly kernel: Code: 4c 89 eb 0f 84 dc 00 00 00 c7 83 1c 07 00 00 01 00 00 00 48 8b 83 d0 01 00 00 48 3b 83 e8 01 00 00 7d 19 48 8b 83 c0 01 00 00 <48> 8b 40 18 48 2b 83 f8 01 00 00 48 89 83 00 02 00 00 49 83 bd b8
    Jan 13 21:14:59 Firefly kernel: Plex Transcoder[23296]: segfault at 18 ip 000014fa1a6ea3ca sp 00007ffd66073280 error 4 in libavcodec.so.59[14fa1a382000+3d8000]
    Jan 13 21:14:59 Firefly kernel: Code: 4c 89 eb 0f 84 dc 00 00 00 c7 83 1c 07 00 00 01 00 00 00 48 8b 83 d0 01 00 00 48 3b 83 e8 01 00 00 7d 19 48 8b 83 c0 01 00 00 <48> 8b 40 18 48 2b 83 f8 01 00 00 48 89 83 00 02 00 00 49 83 bd b8
      


    Diagnostics attached. But I assume that rolling back to 6.8.3 is going to be the solution for now.

    firefly-diagnostics-20230127-1210.zip

    i did the exact same thing last week. went from 6.8.3 to 6.11.5. started crashing during plex transcodes. reverted back last night. guess i'm stuck on 6.8.3 forever on that box.

    • Thanks 1
    Link to comment

    Another one today.

    I have turned off VT-d to see if that makes any difference.

     

    Jan 27 13:04:31 Firefly kernel: md: sync done. time=62729sec
    Jan 27 13:04:31 Firefly kernel: md: recovery thread: exit status: 0
    Jan 27 15:04:33 Firefly  emhttpd: spinning down /dev/sdj
    Jan 27 15:04:33 Firefly  emhttpd: spinning down /dev/sdg
    Jan 27 15:04:33 Firefly  emhttpd: spinning down /dev/sde
    Jan 27 15:11:10 Firefly  emhttpd: spinning down /dev/sdh
    Jan 27 15:11:10 Firefly  emhttpd: spinning down /dev/sdc
    Jan 27 15:36:15 Firefly  emhttpd: read SMART /dev/sdg
    Jan 27 15:56:21 Firefly  emhttpd: spinning down /dev/sdb
    Jan 27 16:06:22 Firefly  emhttpd: read SMART /dev/sdh
    Jan 27 16:06:22 Firefly  emhttpd: read SMART /dev/sdc
    Jan 27 16:56:01 Firefly  emhttpd: read SMART /dev/sde
    Jan 27 18:02:34 Firefly  emhttpd: read SMART /dev/sdb
    Jan 27 18:08:04 Firefly  emhttpd: spinning down /dev/sdh
    Jan 27 18:08:04 Firefly  emhttpd: spinning down /dev/sdc
    Jan 27 21:06:15 Firefly kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-12
    Jan 27 21:06:15 Firefly kernel: Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022
    Jan 27 21:06:15 Firefly kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot

     

    Link to comment
    2 hours ago, Tristankin said:

    Another one today.

    I have now the following Intel CPUs tested with Unraid 6.11.5:

    i3-6100T, i7-7700, i5-8400, i5-10600, J4105, i5-6300U and G4400T

     

    Motherboards are Asrock, Asrock, Fujitsu Esprimo, ASUS, Fujitsu Futro, Fujitsu Laptop, Fujitsu Esprimo.

     

    None of them crashed so far after about a month of uptime and continuous transcoding with Unmanic on Unraid.

     

    Most of the systems don't have anything else installed than Intel-GPU-TOP and Unmanic.

    Link to comment
    29 minutes ago, ich777 said:

    None of them crashed so far after about a month of uptime transcoding with Unmanic on Unraid.

     

    you can add the following with no issues

     

    i9 10850k on asrock z590

    i9 9900 on msi z370

    i5 2405S on asus P8Z77-V LX

    and 1 more asus with an celeron which is currently offline ;) 

     

    all have no issues with the intel igpu on the latest unraid releases in plex, the 10850k also no issues with ffmpeg encoding like unmanic (others are not used therefore, only plex).

    • Like 1
    Link to comment
    1 hour ago, ich777 said:

    I have now the following Intel CPUs tested with Unraid 6.11.5:

    i3-6100T, i7-7700, i5-8400, i5-10600, J4105, i5-6300U and G4400T

     

    Motherboards are Asrock, Asrock, Fujitsu Esprimo, ASUS, Fujitsu Futro, Fujitsu Laptop, Fujitsu Esprimo.

     

    None of them crashed so far after about a month of uptime and continuous transcoding with Unmanic on Unraid.

     

    Most of the systems don't have anything else installed than Intel-GPU-TOP and Unmanic.


    Well that is great for you but you understand it really doesn't help me out. Anything different you can see from your config to mine?

    I have had years of stable operation with 6.8.3. Anything with the 5.x kernel freezes. Do you have VT-d enabled on your system? I am not the only one experiencing this problem so there is something causing an issue and after trying to find a solution for over a year I am still no closer. You can understand that is pretty frustrating right?

    Link to comment
    2 minutes ago, Tristankin said:

    Do you have VT-d enabled on your system?

    i1080k and 9900 one, and the small currently offline one, yes

    the other one, nope

     

    2 minutes ago, Tristankin said:

    You can understand that is pretty frustrating right?

    of course ...

     

    i just had an issue with an beta which worked and while changing some BIOS settings it broke my VM's completely, returning back didnt help either ... only returning to last stable one did. after a week experimenting its been wiping VM's (without the disks), wiping the libvirt image, updating, adding the VM's again to get it running on the beta (which worked flawlessly before ...), so yes, i know its frustrating, and trying since a year way worse ...

     

    but as you see, may some also have the issue, but the most aint ... so its hard to debug and say why its happening in your case, may worse a try, make a clean install while resetting the BIOS and unraid (of course keep a backup), test it "bare metal" with either legacy mode and / or uefi mode (bios & unraid) and basic setup, array, plex, run it ... 

     

    if the error reappears, return the backup and you on the same state, if its running then build 1 by 1 up (plugins, dockers, changes, ...) until it breaks to narrow it down, its definately no general issue as you see, either hardware or setup, hardware would be a shame if its incompatible, setup could be something to solve.

     

    that would be my final approach.

    Link to comment

    I changed to UEFI this time around too, previous attempts in 6.9 and 6.10 were in bios mode.

    I have isolated the the dockers to just be the plex one as it is the only one using the igpu. If hardware transcoding is turned off then the system is fine. If everything else is turned off except for plex with hw transcodes it fails.

    There are no VMs on the machine, only docker containers.

    I have reset the bios multiple times and upgraded it with the upgrade to 6.11

    It is specifically the i915 in the 5.x kernel causing the freezes from what I can tell, but there is nothing out of the ordinary with my setup. As I mentioned, 100% flawless on 6.8.3

    Link to comment
    35 minutes ago, Tristankin said:

    I have had years of stable operation with 6.8.3. Anything with the 5.x kernel freezes. Do you have VT-d enabled on your system? I am not the only one experiencing this problem so there is something causing an issue and after trying to find a solution for over a year I am still no closer. You can understand that is pretty frustrating right?

    I just want to report my experience with it.

    I have VT-d on one system disable and on the other it's enabled.

     

    Please post your Diagnostics again from a 6.11.5 installation.

    I can't remember but have you tried Jellyfin back then and if it's the same?

     

    Are you also able to hook up a monitor to your system and issue this command from a Unraid console (this will prevent the monitor going to sleep):

    setterm --blank 0

    With this you will be able to take a picture when the crash happens since most of the times the syslog server will not be able to record those issues.

    Link to comment

    I did post my diagnostics but here they are again.

    Yeah, syslog doesn't catch anything. So if i issue that commend it will prevent sleep and show the last error on the screen so when I swap out the dummy plug for the monitor again I can grab a screenshot?

    I have about 50 users on my plex so I can't see a good way of switching over to jellyfin, that would be a last resort option.

    firefly-diagnostics-20230127-1210.zip

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.