Intel iGPU i915 driver crashing issues on various linux kernels


Recommended Posts

Hi all,

 

I have an Intel Pentium G4560 CPU (LGA 1151 socket) on a Supermicro X11SSH-LN4-F motherboard. This CPU has an iGPU (Intel® HD Graphics 610 / QuickSync), which works under linux via the i915 drivers.

 

When I modprobe and pass-through this iGPU to my LinuxServer/Plex docker, hardware transcoding works for so long before the entire system hangs (I can only connect to IPMI and force shutdown the server).

 

When you Google for "i915 drivers linux crash", you find dozens of articles/posts reporting crashing issues with the Intel i915 driver, on linux kernels from 4.18 up to the latest 5.x (currently). By digging a bit, you can even read that this driver is blacklisted by Unraidhttps://unraid.net/blog/unraid-6-9-beta35

 

I believe Unraid 6.8.3 uses kernel 4.19 and that the new Unraid 6.9 will use kernel 5.9 (or 5.8). I am therefore a bit concerned.

 

Will I ever be able to use my intel iGPU to do Plex transcodes ? What are your thoughts ? Are you aware of any workaround ?

 

Many thanks.

 

Best,

OP

Edited by Opawesome
Link to comment
On 12/10/2020 at 11:17 AM, Opawesome said:

Will I ever be able to use my intel iGPU to do Plex transcodes ?

I and many others have been using the iGPU/Intel i915 drivers for a long time without any problems.  In my case I am still on 6.8.3 with the 4.19 Linux kernel and am having no issues at all.

 

The blacklisting referred to in the 6.9.0 beta 35/RC1 notes is due to the fact that, by default, these drivers are not loaded unless you make the specified changes.  The drivers themselves are not problematic and I have been using them with various Intel CPUs for years.  The 6.9 release is just giving you a way to load these drivers without a 'modprobe i915' entry in the 'go' file.

 

Multiple Plex transcodes (on the rare case transcoding is needed) have not been an issue for me at all.

 

You could be experiencing a memory issue.  I see you have 32GB RAM in your server; are you doing anything to limit the amount of RAM the transcodes can use? If many things are consuming RAM in your server, and you are not limiting the RAM for trancodes, it is likely that multiple or long transcoding sessions just keep using RAM until it crashes the server. 

 

I have 64GB RAM in my server and  I limit the transcode location in RAM to 16GB max and it probably does not even need that much.  I only have it that high because I also have HDHomeRun tuners which I have set to record and transcode on the fly via Plex and that can use quite a bit of RAM if multiple simultaneous records are happening.

 

EDIT:  Here are my go file entries for setting up a 16GB max. transcoding scratch area:

mkdir /tmp/PlexRamScratch
chmod -R 777 /tmp/PlexRamScratch
mount -t tmpfs -o size=16g tmpfs /tmp/PlexRamScratch

 

And, of course, /transcode in the Plex docker container points to PlexRamScratch and in Plex to /transcode.

image.png.d31ac0684d27523e2f76fa063e212c22.png

Edited by Hoopster
  • Like 1
Link to comment
9 hours ago, Zonediver said:

Have you tried this?

Hi @Zonediver! Yes, and Plex is is capable of HW transcoding on my system. The issue is that my system hangs from time to time when I do it ;). Unless you were pointing out to a specific post in that thread which covers my problem, in which case I missed it. 

 

9 hours ago, Hoopster said:

I and many others have been using the iGPU/Intel i915 drivers for a long time without any problems.  In my case I am still on 6.8.3 with the 4.19 Linux kernel and am having no issues at all.

Hi @Hoopster! Thank you very much for taking the time to answer in such great details. I really think those drivers are not stable. See for yourself the following Google search results: https://www.google.com/search?q=intel+i915+kernel+crash. But I am glad that they are not on your (and other's) system  ;-) !

 

9 hours ago, Hoopster said:

The blacklisting referred to in the 6.9.0 beta 35/RC1 notes is due to the fact that, by default, these drivers are not loaded unless you make the specified changes.  The drivers themselves are not problematic and I have been using them with various Intel CPUs for years.  The 6.9 release is just giving you a way to load these drivers without a 'modprobe i915' entry in the 'go' file.

Hmmm. OK. My understanding was that it is because the drivers are blacklisted in 6.9 (and 6.8.3 too I think) that you need to manually load the drivers with modprobe, and that it is because of instability that they were blacklisted.

 

9 hours ago, Hoopster said:

You could be experiencing a memory issue.  I see you have 32GB RAM in your server; are you doing anything to limit the amount of RAM the transcodes can use? If many things are consuming RAM in your server, and you are not limiting the RAM for trancodes, it is likely that multiple or long transcoding sessions just keep using RAM until it crashes the server. 

 

I have 64GB RAM in my server and  I limit the transcode location in RAM to 16GB max and it probably does not even need that much.  I only have it that high because I also have HDHomeRun tuners which I have set to record and transcode on the fly via Plex and that can use quite a bit of RAM if multiple simultaneous records are happening.

 

EDIT:  Here are my go file entries for setting up a 16GB max. transcoding scratch area:


mkdir /tmp/PlexRamScratch
chmod -R 777 /tmp/PlexRamScratch
mount -t tmpfs -o size=16g tmpfs /tmp/PlexRamScratch

 

And, of course, /transcode in the Plex docker container points to PlexRamScratch and in Plex to /transcode.

This is something I have not investigated indeed. However in my config, the /transcode folder in the Plex docker container already points to the /mnt/cache/user/Plex/Transcodes folder on my 1TB SSD (and I also really don't use that much RAM anyway) so I'd rather not get too excited. 

 

Thank you again guys.

 

I will keep you posted ;-)

 

Best,

OP

 

Link to comment
9 hours ago, Opawesome said:

However in my config, the /transcode folder in the Plex docker container already points to the /mnt/cache/user/Plex/Transcodes folder on my 1TB SSD

I made an obviously erroneous assumption that you were transcoding in RAM.  Before I switched to the limited RAM method outlined above, I was also transcoding to a 1TB Unassigned Devices SSD.  Worked fine there as well, but I switched to RAM to avoid the wear and tear on the SSD since I have plenty of RAM.  Prior to using the SSD, I was just sending transcodes to /tmp (RAM but not limited) and I did have some problems there with occasional instability.  This was on my prior system that had 'only' 32GB RAM.

 

I happened to look at how much RAM transcoding was using last night as my son was streaming a movie remotely and a TV recording was going on at the same time.  4GB of the 16GB allocated RAM was being used.  A  'df -h -t tmpfs' from the terminal shows me exactly how much RAM is being used by PlexRamScratch.

 

I hope you get something figured out on your system because it certainly can work.  I am not denying that there could be some issues, just pointing out that those issues are not universal.

 

I am pretty sure the blacklisting of the four included video drivers in 6.9 beta/RC is because they have not been previously included and each takes a bit of RAM to load since all of unRAID runs in RAM.  Limetech is letting each user decide which, if any, of the drivers are needed and which they wish to load in RAM on boot.

 

There are several reports in the forums of the i915 drivers being loaded and successfully used with the new method outlined in the release notes rather than the modprobe from the go file method.

 

Again, hop e you can find something that works for you.

Edited by Hoopster
  • Like 1
Link to comment
7 minutes ago, Hoopster said:

There are several reports in the forums of the i915 drivers being loaded and successfully used with the new method outlined in the release notes rather than the modprobe from the go file method.

 

Again, hop e you can find something that works for you.

This is good to know. I will try this new method as soon as the 6.9 version is released as stable then.

 

Best,

OP

Link to comment

I'm having a similar issue with QuickSync locking the system up hard.  This is on a Supermicro X11SSH-TF with E3-1275v6, 16GB of memory.  I've had it happen with Plex, Handbrake, and, lately, I've been using Tdarr, all of which can trigger the issue seemingly at random.  I thought it might be a thermal issue, but after putting a giant CPU cooler on, it hasn't seemed to resolve the issue.  Doesn't look like anything gets written to syslog, so I've had to do a hard boot via IPMI.

 

Would love to be resolve this so I can use transcoding consistently.  It can work for relatively long periods, or fail a couple of files in.  Is 16GB not enough memory maybe?

Link to comment

Hi @DavejaVu,

 

That seems very similar to what I have on my system indeed. A common denominator between us is the Supermicro X11SSH-series board. I am just pointing it out, I am not taking any conclusions. Also, as for me, I have fond anything in the syslog. I did not however has the Plex logging set to "debug" when the crashes occurred, so that would be mu next troubleshooting step when I will have the time/courage to give HW transcoding another shot (because non clean shutdown are a pain to deal with, mainly because you have the rebuild parity). I will be glad to share the result of my future tests with you in this thread.

 

FYI, I was considering upgrading my CPU to exactly your model (Xeon E3-1275v6), hoping that could resolve my issue. Now I think maybe I should not :).

 

About the possibility that your crashes are caused by a lack of RAM, I think one can rule-out this cause if the transcoding is done eg. on the SSD cache drive (see my post above) rather than in RAM. How is your Plex /transcode directory setup ? Where is it mounted in your system ? Also, if crashes are caused by a lack of RAM, I believe the crashes should be "consistantly" appear when RAM gets filled up, not randomly. Am I wrong ?

 

Best,

OP

 

(PS: Nice nickname ;) )

Edited by Opawesome
Link to comment
  • 2 months later...

Hi there, my system locking up randomly with i915 driver (hw transcode in jellyfin) in 6.9.0. Anyone found the solution? 

Jellyfin + i915, inorder to make it freeze/lock whole system, play some video and make sure the video in hw transcoding, then jumping around the video here and there to make the gpu busy, then it after awhile it will totally lock up the system.

i tried with 

touch /boot/config/modprobe.d/i915.conf

it freeze in hw transcode randomly.

then i add the following to i915.conf to load the extra firmware

options i915 enable_guc=2

still freeze in hw transcode randomly. any solution? 

 

*rollback to 6.8.3, stability is back. *phew.... 

Edited by Lee Kim Tatt
add more information
Link to comment
On 3/5/2021 at 11:56 AM, Lee Kim Tatt said:

Hi there, my system locking up randomly with i915 driver (hw transcode in jellyfin) in 6.9.0. Anyone found the solution? 

Jellyfin + i915, inorder to make it freeze/lock whole system, play some video and make sure the video in hw transcoding, then jumping around the video here and there to make the gpu busy, then it after awhile it will totally lock up the system.

i tried with 

touch /boot/config/modprobe.d/i915.conf

it freeze in hw transcode randomly.

then i add the following to i915.conf to load the extra firmware

options i915 enable_guc=2

still freeze in hw transcode randomly. any solution? 

 

*rollback to 6.8.3, stability is back. *phew.... 

Hi @Lee Kim Tatt,

Your problem was exactly as mine (freezing when jumping around the video), except that the problem occurred on 6.8.3 for me. Would you mind sharing your hardware configuration (MB, CPU, etc.) ?

Best,

OP

Link to comment

With 6.9 on my Asrock j5005 (UHD 605) I'd get occasional lockups not with Emby, as far as I could tell, but with Handbrake GPU encoding and the intel-gpu-telegraf docker. Installing an HDMI dummy plug seems to have fixed it - it hasn't happened since. I don't know if that's specific to my board or more general.

 

NOTE: I'm using the new modprobe method to load the driver.

Link to comment
  • 2 weeks later...
On 3/7/2021 at 6:10 PM, Opawesome said:

Hi @Lee Kim Tatt,

Your problem was exactly as mine (freezing when jumping around the video), except that the problem occurred on 6.8.3 for me. Would you mind sharing your hardware configuration (MB, CPU, etc.) ?

Best,

OP

Hi there, i'm using ASRock J3455-ITX, 16GB RAM. 

weird, my system is solid rock stable with 6.8.3, Jellyfin-AMD-Intel-Nvidia docker + Intel GPU Top. Capable to transcode HEVC to H.264 no less than 100fps (mostly around 120fps).

intel gpu top.jpg

Edited by Lee Kim Tatt
  • Thanks 1
Link to comment
  • 2 weeks later...
  • 2 months later...
2 minutes ago, mrow said:

Were you ever able to figure something out with your Intel GPU causing lock ups? I’ve been having the same issue. 

No, not yet. I am planning on doing another test when I upgrade to Unraid v6.9.x (I am currently running v 6.8.3), or when I change my CPU (I am planning to maybe upgrade to a Xeon E3 1275v6 or the like one of these mornings).

Link to comment
5 hours ago, Opawesome said:

No, not yet. I am planning on doing another test when I upgrade to Unraid v6.9.x (I am currently running v 6.8.3), or when I change my CPU (I am planning to maybe upgrade to a Xeon E3 1275v6 or the like one of these mornings).

I’m currently on 6.9.2 and I have an E3 1275v6 so that shouldn’t make a difference unfortunately. 
 

edit: current system specs: 

E58D501A-4751-4468-BED4-B530E8D243D2.thumb.jpeg.e825a23d4acdcad52fd2e7338ec5e232.jpeg

 

Edited by mrow
Remove motherboard serial number from screenshot.
Link to comment
  • 2 months later...

My system also freeze when transcoding over jellyfin within 5 minutes. I'm running Unraid 6.9.2 and have a asrock j3455. I tried to boot via csm and uefi, i disabled c states, tried with or without dummy hdmi plug, with or without monitor, but its still freezing and i have to pull the energy plug out for a restart. I tried the ich777/jellyfin docker and the jellyfin/jellyfin docker and the linuxserver/jellyfin docker. I also tried to install FFMPEG 4.3.2-1 in the linuxserver/jellyfin docker to change from vaapi to quicksync, but nothing works. Is there any solution?

Link to comment
1 hour ago, Zonediver said:

Use Plex (for testing)? On my server, nothing is crashing - but i dont use jellywhatever...

The issue occurs independent of the application you’re using to transcode. It happens with me with Plex. It also happens to me with Handbrake. 

Link to comment
19 hours ago, random672315 said:

I don’t have a Plex pass… is there any other way to test hw transcoding in plex without buying a month, just for testing? 

Plex for one month is only 4,99 - maybe worth it to find the problem...

But it seems (in your case) there is maybe a hardware issue.

Edited by Zonediver
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.