6.9.0 Random Crashes/Restarts Since Upgrading


Recommended Posts

9 hours ago, Tristankin said:

I'm just a little worried about formatting my cache drives if I need to roll back to 6.8.3 as they will need to have the alignment changed back for them to work again.

New alignment works with v.6.8 as long as it's a pool, you can even have a single device "pool".

Link to comment

There is a common thing in all the diagnostics I see posted here.  Everyone of them has the Intel hardware transcoding setup in the go file:

#Setup drivers for hardware transcoding in Plex
modprobe i915
chmod -R 777 /dev/dri

Try removing or commenting those lines out in the go file and see if the crashes stop.  Of course any docker using the hardware transcoding device /dev/dri will have to have it removed.

Edited by dlandon
  • Thanks 2
Link to comment
1 minute ago, Tristankin said:

Progress! I like it when someone sees a pattern! Unfortunately plex is the main reason for this box so I will have to roll back to 6.8.3 unless there is a new kernel in the upcoming release that will fix this issue.

I tried to use transcoding in dockers in the past.  One docker crashed consistently.  The interesting thing is I am using this currently in the official Plex docker with no issues.

 

If this is really the problem, I'm not sure you'll find it's an Unraid issue.  I would be suspicious of the docker that is trying to use it.

Link to comment

Interesting catch!

Current update is that I brought docker engine online Wednesday and then brought a small handful of containers on yesterday (tdarr, tdarr-node, telegraf, bazarr & hddtemp)

I'm going to get all my containers back up & running minus plex as thats the only one using the iGPU, if still no crashes after a day or two i'll start plex too.

 

Should mention I'm using the hotio image for plex, although have been for the longest time with no issue.

Link to comment
1 minute ago, dlandon said:

If someone wants to do some experimenting, change your go file to:


modprobe i915
chown -R nobody:users /dev/dri
chmod -R 777 /dev/dri

This is the way I do it.  The owner permission may have something to do with it.

That's exactly what my entry in my go file looks like

Link to comment

I am using behex plex on my server. I have rolled back to 6.8.3 as I have many friends and family relying on this machine and get harassed when it goes down let alone stopping the docker for longer than an hour.

I copied the 6..8.3 files onto the usb and still used the new disk config (did not have to revert to the .bak file) and everything mounted ok once reassigning the cache disks to the array (nothing lost off the cache)

Not going to be an early adopter again. Going to wat a few months next time.

Edited by Tristankin
Link to comment

Rolled back to 6.8.3, been stable for 5d2h now.

 

I use linuxserver/plex along with Intel hardware transcoding.

 

My go file has the following:

modprobe i915
chmod -R 777 /dev/dri

 

File permissions are set to root/root currently.

 

My server was restarting once every day, always between 5-7 PM. In multiple instances it restarted an hour after the first one. My server is accessed quite a lot by people who direct stream and transcode. There are records of numerous hardware transcodes between the server restarts.

Link to comment

I found this after doing some research:  https://linuxreviews.org/Linux_Kernel_5.5_Will_Not_Fix_The_Frequent_Intel_GPU_Hangs_In_Recent_Kernels

 

The reason the i915 driver is working in 6.8 is because of the older Linux kernel.  Doesn't look good for using Intel i915 driver in 6.9.  This isn't something LT has any control over.

 

I'll be removing the driver from my system and go back to the CPU based transcoding.

Link to comment

Just throwing my hat in with the same problem. Syslog server hasn't been working properly for me so I can't attach logs.

 

I rolled back to 6.8.3 for the time being.

 

Before the update to 6.9.1 I had over 200 days of uptime but after updating I was getting daily crashes and my family uses the server too much for that to happen. I don't have any VMs, no GPU, just about 20 docker containers.

Link to comment

Latest update, moved all my containers except plex back to the main box early/mid last week and still no crashes.

Moved plex yesterday and this evening my box has just crashed again, definitely something related to plex & the latest unraid version.

 

Could be iGPU related. I've pulled out my GTX GPU for now incase maybe it's conflicting with the iGPU (Which i doubt). If it crashes again which it probably will, next step will be removing any reference/usage of iGPU from plex and unraid and using software transcoding for a bit.

 

Weird thing is that I had plex running absolutely fine on my 2nd box which also runs 6.9.1 and uses an iGPU for hardware transcoding.

Link to comment
  • 2 weeks later...

Had another crash shortly after my last post. Put my GPU back in and then removed all reference/usage of iGPU and I've now been up for 9 days with no crash.

 

So for me the issue is related to iGPU with a 9700k on this box.

Still very strange that my 2nd box which runs an i5 8400 (which seems to use the exact same UHD Graphics 630 iGPU) didn't crash at all on 6.9.x

 

Theres not much else I can do now as I don't know exactly what with the iGPU is causing the crashes, unless anyone has any ideas for something to try.

Thankfully that CPU is powerful enough to run without iGPU so I'll just be doing that for the foreseeable future, any new releases of Unraid I'll try swapping back to the iGPU and see if it behaves.

Link to comment
  • 6 months later...

Was there any findings in the end here? I only recently updated to 6.9 from 6.8 and the machine has been crashing to POST ever since.

I can't for the life of me figure it out, however if I disable the PLEX docker the system runs just fine... I've reverted to running PLEX inside a VM rather than a Docker and it's been solid ever since but obviously not preferable.

I've got a Supermicro X11SPi-TF + Xeon GOLD 6242 so no iGPU for me.

 

Thanks!

Edited by Thanassos
Mentioning Hardware.
Link to comment

Unfortunately no real findings from this, the only findings were that iGPU caused unraid to crash but as you said not applicable to you unfortunately.

I just ran without using iGPU, then a couple weeks after upgrading to 6.10-rc1 i put the iGPU back on and had the same crashes again so have removed it and am currently using a spare GPU instead which seems fine.

 

Did you get any logs that could've shown what was happening? I had a crash a week or so ago and the logs pointed to the fact that I was using a separate IP for a couple extra Plex containers. Stopped those containers as they were only for testing and it's been fine but was weird as they hadn't caused an issue for the weeks that i was using them.

Link to comment

Hey mate, no useful logs but I get go scorched earth since this post as it was just annoying me.

I've been running the same unRaid for years so just got a new USB / Fresh Install / New Docker setup (using existing cache data).

Everything has been running perfectly from Plex since then, sure only 30 hours now but that's much better than it was under Plex Docker.

I'm assuming the over the course of multiple years / upgrades something gets a little skewed along the way.

Link to comment

I went back to 6.8.3 for 3 months then tried the upgrade to 6.9.2 again. Here is the thread covering my experience.
 

In short, 6.9.2 still hates my hardware. I got the iGPU stuff worked out the second time by turning off the iGPU load in the go file but the system would still hang every 2 days or so. Nothing appears in the syslog mirrored to flash, everyone blamed the hardware over and over again. I offered to help dig further to try and find out the root source of the issue, was ignored, went back to 6.8.3 and im up 80 days now with no issues. 

I still see so many threads on the forums and the admins/mods constantly blame hardware, but there is a software issue, most likely the kernel, with intel hardware on 6.9.2, and from what I have seen, 6.10 might even have the same problem. I'm just biding my time till we aren't seeing people with the same issue reported a few times a week, week after week.

Link to comment
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.