• [6.9.x - 6.11.x] intel i915 module causing system hangs with no report in syslog (not alder lake)


    Tristankin
    • Minor

    Since the 5.x kernel based releases many users have been reporting system hangs every few days once the i915 module is loaded.

    With reports from a few users detailed in the thread below we have worked out that the issue is caused by the i915 module and is a persistent issue with both the 6.9.x release and 6.10 release candidates.


    The system does not need to be actively transcoding for the hang to occur. 6.8.3 does not have this issue and is not hardware related. Unloading the i915 module stops the hangs. Hangs are still present in 6.10.0RC2. I can provide a list of similar reports if required.

    • Like 8
    • Thanks 1
    • Haha 1



    User Feedback

    Recommended Comments



    6 minutes ago, Tristankin said:

    So if i issue that commend it will prevent sleep and show the last error on the screen so when I swap out the dummy plug for the monitor again I can grab a screenshot?

    No, you have to connect a monitor, swapping out the dummy plug with a real monitor won't work in most cases because the EDID is not the same on the dummy plug as on the monitor and because your system crashed the driver can't initialize the new EDID, hope that makes somewhat sense to you.

     

    7 minutes ago, Tristankin said:

    I have about 50 users on my plex so I can't see a good way of switching over to jellyfin, that would be a last resort option.

    But that is not the question, the question is if the same crashes happen on Jellyfin too...?

    Link to comment
    2 minutes ago, ich777 said:

    But that is not the question, the question is if the same crashes happen on Jellyfin too...?

    But I would need to turn off plex to isolate it correct?

    It happens randomly so I would need to turn off my plex service for a few days to test correct?

    Ok, so need to keep the monitor plugged in and turned on till it crashes.

    Link to comment
    2 hours ago, ich777 said:

    I have now the following Intel CPUs tested with Unraid 6.11.5:

    i3-6100T, i7-7700, i5-8400, i5-10600, J4105, i5-6300U and G4400T

     

    Motherboards are Asrock, Asrock, Fujitsu Esprimo, ASUS, Fujitsu Futro, Fujitsu Laptop, Fujitsu Esprimo.

     

    None of them crashed so far after about a month of uptime and continuous transcoding with Unmanic on Unraid.

     

    Most of the systems don't have anything else installed than Intel-GPU-TOP and Unmanic.


    I'd like to just chime in here to add to the list and confirm that I have no hang or crashing issues on 6.11.1 with an i5-12600k running on an MSI PRO Z690-A board.

    I have Windows VMs, about 20 docker containers, plex transcoding AND jellyfin transcoding confirmed to be working with no crashes or hangs. I have a monitor hooked up to the server directly at all times. I've never tried to run the transcoder with it unplugged.

    At the start of all of this, I was experiencing the hangs as described in the original report. It just took time for both unraid to release an update and for Plex to fix their transcoder. 

    Before I enabled it, I was getting crashes that logged to syslog randomly, and ich777 was kind enough to help and suggested I switch to ipvlan from macvlan which fixed those crashes. Then I re-enabled plex and jellyfin transcoding and it all works now. 

    I've been crash and hang free for over 3 months.

    Edited by Earendur
    • Like 1
    Link to comment
    5 minutes ago, Earendur said:

    Before I enabled it, I was getting crashes that logged to syslog randomly, and ich777 was kind enough to help and suggested I switch to ipvlan from macvlan which fixed those crashes. Then I re-enabled plex and jellyfin transcoding and it all works now. 


    I have already changed over to ipvlan as part of the upgrade.

    image.thumb.png.eaa9d41d2973a556a0de9fe3abf4e3ab.png

    • Like 1
    Link to comment
    26 minutes ago, flyize said:

    Wait, you were using the iGPU on 6.8.3 for Plex Transcoding?


    Yep, 100% rock solid. 4.x kernel has 0 issues.

    Link to comment
    13 minutes ago, Tristankin said:


    Yep, 100% rock solid. 4.x kernel has 0 issues.

     

    reading what you have been contributing today, i'd like to say that i am on the exact same boat with you. with all of the same baggage: no vms, ipvlan, plex transcoding i915, 683 stable, etc etc :) @Tristankin

    Edited by muzo178
    • Like 1
    Link to comment
    15 minutes ago, Tristankin said:


    Yep, 100% rock solid. 4.x kernel has 0 issues.

    Apologies. Ignore me, as I just realized that this is the non-Alder Lake thread.

    • Like 1
    Link to comment
    4 hours ago, Tristankin said:

    But I would need to turn off plex to isolate it correct?

    You can disable transcoding in Plex or better speaking disable HW transcoding.

     

    4 hours ago, Tristankin said:

    It happens randomly so I would need to turn off my plex service for a few days to test correct?

    No, if your system is fast enough to keep up with software encoding for the time you test Jellyfin it would be fine.

    Maybe test it intensely while playing a file over and over.

     

    4 hours ago, Tristankin said:

    Ok, so need to keep the monitor plugged in and turned on till it crashes.

    Exactly, then you can take a picture of that and maybe we can see more...

     

    4 hours ago, Earendur said:

    i5-12600k

    Please remember this is the non Alder Lake thread...

     

    4 hours ago, muzo178 said:

    plex transcoding i915

    Are you both on the official Plex version?

    Can you post your Diagnostics too @muzo178?

     

    Link to comment
    4 hours ago, Tristankin said:

    I'm using binhex/plex, I doubt it is very different though.

     

    yes they are, plexinc, binhex ... all use a different base OS for their dockers as note, also a reason why there are differences in encoding issues (not crashes, HDR tone map, hw capabilities, ...) on updates ... just as note.

    • Like 1
    Link to comment

    @Tristankin & @muzo178 you can at least try to change the container to the official one and see if that males a difference, they should be all interchangeable and you only have to replace the Repository in the template.

     

    But just to be sure create a backup from you plex directory in your appdata folder so in case that something goes wrong you can revert back <- but it shouldn‘t go anything wrong.

    • Like 1
    Link to comment

    OK, so I didn't end up changing the container but I did  check out the discussion on said container. 

    I found a discussion about changing the appdata directory from /mnt/user to /mnt/cache. 

    @muzo178 Could you try this too to see if it makes a difference with your container?

    So far I have had almost 4 days uptime. It's either the above or turning off VT-d.

     

    Link to comment
    57 minutes ago, Tristankin said:

    I found a discussion about changing the appdata directory from /mnt/user to /mnt/cache. 

    this will bypass the fuse system, just to be aware, you have your share /appdata also set to cache only ?, otherwise your data is gone for plex when the mover jumps in on cache yes or prefer ...

    Link to comment

    I have just changed it.

    I don't think anything was moved as I was in prefer mode. All good so far but thank you for the heads up. I guess I will have to be more proactive with backing up appdata too. But if it makes the system stable I am happy.

    Link to comment
    39 minutes ago, Tristankin said:

    But if it makes the system stable I am happy.

    But if this is the case then this has nothing to do with the iGPU transcoding...

     

    Where is your transcoding temp directory located? Have you changed it to something else than default? Maybe this is the cause of the issue but it would be the first time I hear of that.

     

    40 minutes ago, Tristankin said:

    I have just changed it.

    It should be also good if you have it on Prefer as long as the Cache doesn't is filled up completely.

    Link to comment

    No matter what container you're using, you should always point Plex at the /cache path (assuming appdata is only on cache). Sizable performance difference. Same with Nextcloud.

    Link to comment
    17 hours ago, ich777 said:

    Where is your transcoding temp directory located? Have you changed it to something else than default? Maybe this is the cause of the issue but it would be the first time I hear of that.

     

    /dev/shm

     

    image.thumb.png.afcbb80895c19e8d5d8e57dc6db844d2.png

    Link to comment
    On 1/28/2023 at 8:49 AM, ich777 said:

    @Tristankin & @muzo178 you can at least try to change the container to the official one and see if that males a difference, they should be all interchangeable and you only have to replace the Repository in the template.

     

    But just to be sure create a backup from you plex directory in your appdata folder so in case that something goes wrong you can revert back <- but it shouldn‘t go anything wrong.

     

    i just physically came back to the location with my 6.8.3 server. I'm gonna try upgrading again tomorrow. Are you still stable @Tristankin?

    • Like 1
    Link to comment
    On 2/8/2023 at 6:53 AM, muzo178 said:

    i just physically came back to the location with my 6.8.3 server. I'm gonna try upgrading again tomorrow. Are you still stable @Tristankin?


    Yep, the change in directory fixed it. Not sure what is causing it but changing appdata to cache ONLY and moving the config directory to cache fixed it.

    12 Day uptime so far.

    Edited by Tristankin
    Link to comment
    9 hours ago, Tristankin said:

    but changing appdata to cache yes and moving the config directory to cache fixed it

    This is a really bad constellation, please change your appdata share to use cache Only or Prefer.

     

    I hope you know that the setting Yes tries to move the data from the appdata share to the array entirely and on the other hand you are forcing it to stay on the appdata directory...

    • Thanks 1
    Link to comment
    On 2/8/2023 at 6:35 PM, ich777 said:

    This is a really bad constellation, please change your appdata share to use cache Only or Prefer.

     

    I hope you know that the setting Yes tries to move the data from the appdata share to the array entirely and on the other hand you are forcing it to stay on the appdata directory...

     

    Sorry, you are right, I meant only. Set cache to only. My bad, I will fix my other post too.

    • Like 1
    Link to comment

    EDIT: I'll try switching to ipvlan to see if that helps anything.... 


    This has been a continuous issue for me for a long time now (I first posted in this thread back in April/May), and my only solution has been to move my Plex container/transcoding to a second system running Unraid 6.8.3. The 6.8.3 system has been solid. 

     

    With hardware transcoding enabled, I get at most about a day and a half before a crash. Without, I'm rock solid. There are never any related/similar system logs at the time of crash (mirroring to flash and to a remote syslog server), and my system automatically restarts after the crash. 

     

    • i9-9900k
    • 6.11.1
    • Dummy Plug
    • GPU Top with nothing in the Go file
    • Official PlexInc container
    • Ran Memtest for 24 hours with no errors
    • Appdata on Cache:Prefer, with plenty of available space.
    • Transcoding on the Cache Drive (not in RAM) - thought this might have been the issue so I switched away from RAM

     

     

    tower-diagnostics-20230209-0719.zip

    Edited by mechmess
    • Like 1
    Link to comment
    31 minutes ago, mechmess said:

    6.11.1

    Why? Please update to 6.11.5

     

    31 minutes ago, mechmess said:

    Appdata on Cache:Prefer, with plenty of available space.

    Did you also change the path to /mnt/cache/... instead of /mnt/user/... in the template?

    • Like 1
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.