Upgrade to 6.9.2, Getting freezes requiring hard reboot every week or so


Recommended Posts

  • 1 month later...

I'm having the exact same issue with a J3355/HD 500.

 

6.9 tree crashes within minutes of Plex docker starting a hardware transcode using iGPU.

 

Stable as a rock on 6.8.3.

 

I wanna try 6.10.0-rc2, but I think the same thing will happen again and I'm gonna have to roll back and hack the docker case sensitivity, etc...

 

I'm in the if it ain't broke, don't fix it boat for now...

  • Like 1
Link to comment
  • 2 weeks later...

110 Days uptime on 6.8.3. I am pretty confident now it is not a hardware issue.

So it appears that the intel iGPU implementation could possibly be to blame. 

 

 

From this thread it appears not loading the intel iGPU prevents the lockup issue on 6.9.2 at least. Can the mods please chime in?

Edited by Tristankin
Link to comment

Whether or not you have the same configuration and have experienced any issues. Obviously you don't, but the snarkyiness sure helps.

Look, I have been reporting issues for weeks as have many others. I am prevented from being able to run the latest version of Unraid, and potentially even the next one. There are a few reports every week coming through for people having the exact issue of hangs without anything showing in the kernel log on intel hardware and the mods have always blamed the issue on hardware. I have provided a link to someone who has done the testing.

What I am looking for is for someone to take the issue a little more seriously...

  • Like 1
Link to comment

Also and if you read what I posted in this thread, I suggested it could be a hardware issue, not that it must be like you mentioned many times since, also suggested that "one thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.", by doing that you could have already confirmed if it was iGPU related or not.

Link to comment

Well it is certainly not a hardware issue as I have demonstrated due to 6.8.3 uptime.

I am worried with all the continuous reports from intel users there is an issue being glossed over and now I finally have found a thread that demonstrates a tangible solution to the issue. If I am going to attempt the upgrade again is there a syslog type service that I can install specifically looking for errors regarding intel iGPU issues? This server is used heavily and I don't mind trying to catch the issue but weeks without transcoding abilities is not feasible.

Link to comment
On 9/29/2021 at 3:04 AM, Tristankin said:

60 Days on 6.8.3 with no issues. Please recommend to other users on intel hardware with no obvious faults in the log to downgrade as the kernel used in the 6.9.x is not stable on consumer intel hardware. This should be the recommendation before replacing hardware as software is free and quite a simple test.

Hi,

 

Thanks @Tristankin for posting this. I have Intel® Core™ i5-4460 CPU with ASUS Z97M-PLUS mainboard and 16 GB RAM and no additional GPU.

After upgrading from 6.8.3 to 6.9.2 I had constant system turning offs every 1 or 2 weeks or so.

I enabled syslog server and mirrored it to flashdrive. Nothing special in logs - so is it a hardware issue? I don't think so, I had memtest86 running for 2 days with no errors.

 

Before - on 6.8.3 it was rock solid with 12 months uptime.

I just rolled back to 6.8.3 and we will see.

 

regards,

kocurek7

Link to comment

I assume it has to be loaded as per this to allow for the low power bitrate encoding control.

https://wiki.gentoo.org/wiki/Intel#GuC.2FHuC_firmware

So disabling it would most likely kill the transcoding anyway. Something else I noticed though is the option for :
Enable capturing GPU state following a hang

Is this enabled in 6.9.x kernels on unraid. I do not know enough about kernel inspection to comment any further but it could explain why no one sees reports in syslog?

Link to comment
1 hour ago, Tristankin said:

Perhaps it might have something to do with this? I am not sure if there is a specific date that the issue has been fixed, but this might be a workaround?

https://wiki.archlinux.org/title/intel_graphics#Enable_GuC_/_HuC_firmware_loading

For me, the lockups were caused by running a cron backup job for single, selected appdata from userscripts.  This interfered with the backup/restore appdata plugin in settings as they were trying to run at the same time.  Once I changed the time to something else, it never locked up after that.

 

I would turn off all cron jobs if you use any and see if that helps your issue.

Link to comment
1 hour ago, danktankk said:

For me, the lockups were caused by running a cron backup job for single, selected appdata from userscripts.  This interfered with the backup/restore appdata plugin in settings as they were trying to run at the same time.  Once I changed the time to something else, it never locked up after that.

 

I would turn off all cron jobs if you use any and see if that helps your issue.

 

Only crons I have going are mover and trim. They run at very different times so they shouldn't be causing an issue.

Link to comment

Everything works flawlessly for me until I enable the i915 drivers. Tested on both 6.9.2 and 6.10-rc.2

 

Doesn't even seem to be related to a transcode. It happened mostly overnight when there was nobody using the server.

 

Though one thing I haven't tested is, hardware transcoding with plex didn't work (even with 6.10) unless I installed ich777 intel-gpu-top plugin. I haven't tested without installing that yet.

  • Thanks 1
Link to comment
25 minutes ago, jkirkcaldy said:

Everything works flawlessly for me until I enable the i915 drivers. Tested on both 6.9.2 and 6.10-rc.2

 

Doesn't even seem to be related to a transcode. It happened mostly overnight when there was nobody using the server.

 

Though one thing I haven't tested is, hardware transcoding with plex didn't work (even with 6.10) unless I installed ich777 intel-gpu-top plugin. I haven't tested without installing that yet.

 

Have you considered submitting a bug report? It might get through to limetech and you have done more testing than I have. Perhaps reference this and your thread?

Yeah, it does not require active load, just having the module loaded seems to be causing the issue. 

I achieved gpu transcoding on 6.9.2 without the intel-gpu-top and still had the issue.

Edited by Tristankin
Link to comment
18 hours ago, Tristankin said:

 

Have you considered submitting a bug report? It might get through to limetech and you have done more testing than I have. Perhaps reference this and your thread?

Yeah, it does not require active load, just having the module loaded seems to be causing the issue. 

I achieved gpu transcoding on 6.9.2 without the intel-gpu-top and still had the issue.

I had to upgrade back to 6.10-rc2 (I tried the tpm bios and Windows won't boot without it and I couldn't be bothered with trying to do a fresh install of everything)

 

I have left the install as stock so it has loaded the i915 drivers and it has been up for nearly a day so far. It still needs way more time to be labeled as stable. But considering last time it was a matter of hours before it locked up I am cautiously hopeful. Perhaps it was installing GPU-TOP that caused the crashing?

 

Incidentally, hardware transcoding is working, it just takes a second for Tautulli to show that the transcode is hardware accelorated.

 

I will give it the weekend and if it locks up again I will submit the bug report.

 

The only thing that isn't working for me is the HDR tonemapping on plex, but I believe that is a Plex driver issue rather than Unraid specific.

Link to comment

Monday Morning and it's still up. uptime is nearly 4 days now.

 

I have had multiple people streaming from my plex server over the weekend, some with hardware transcodes some with direct play.

 

My working theories so far:

Something in Intel-GPU-TOP or GPU-Statistics wasn't playing nice with 6.10-rc.2 and my hardware. 

 

My other theory is that the server kept kicking off a parity check on each reboot so it could be something to do with that. I'm running a parity check now to see if anything happens. My thinking being that if one of my HBAs got too hot then it may have locked up the system. But I've been using these HBAs for a while now and they were in my old system so I don't think it should be that.

Link to comment

Whatever failed you would hope to see it in the syslog. It's a weird one for sure. Perhaps 6.10 might be a good option for intel iGPU users. I'm going to hold out until the official release is out and all the kinks are worked out. 6.8.3 is working well for the time being with some hacks to community applications to keep the docker updates coming though.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.