Zombie/defunct processes, unable to stop them


Recommended Posts

I'm running 5.0rc16c with the latest plex media server (0.9.8.4.125), and every day or so Plex seems to completely stop responding.  When I try to stop the service (with

/etc/rc.d/rc.plexmediaserver stop

) the command indefinitely hangs.  When I check the processes, I see the processes are now defunct.  From here on out, every process I try to stop becomes a zombie process (whether it's sab, sickbeard, transmission, java/crashplan).  Does anyone know why these processes are all becoming zombies?  How do I stop this?

 

Another thing I noticed, sometimes when I'm able to kill 'shfs', the defunct processes disappear and things start to behave normally again.

 

Moreover, I'd also like to figure out what is happening with plex, but none of the plex logs show anything useful before the server stops responding.

Link to comment
  • 2 weeks later...

Does no one else have this issue?  I could really use some help here, these zombie processes are causing me to have to hard-reset my server.

Zombie processes are already dead and other than taking a tiny bit of memory, do no harm.

 

Unix/Linux let the parent process read the exit status of a child process.  when a child process ends, if a parent process has not yet read the exit status, but the parent process still exists, it will show as a zombie process. 

 

The issue is with the parent process not doing its "read" of the child process exit status.  (That tiny bit of memory I was talking about is the exit status of the child process).

 

In other words... blame plex media server, not unRAID and don't be afraid of zombies... It is not as if they are hanging, they are already dead, the parent process has just not cared.

Link to comment

Does no one else have this issue?  I could really use some help here, these zombie processes are causing me to have to hard-reset my server.

Zombie processes are already dead and other than taking a tiny bit of memory, do no harm.

 

Unix/Linux let the parent process read the exit status of a child process.  when a child process ends, if a parent process has not yet read the exit status, but the parent process still exists, it will show as a zombie process. 

 

The issue is with the parent process not doing its "read" of the child process exit status.  (That tiny bit of memory I was talking about is the exit status of the child process).

 

In other words... blame plex media server, not unRAID and don't be afraid of zombies... It is not as if they are hanging, they are already dead, the parent process has just not cared.

 

I've already brought it up in the plex forums.  This problem doesn't exist on any other platforms running plex as best as I can tell.  Regarding the parent processes, the parent to all the processes I try to kill that eventually turn into zombies is 'init'.  The only time I've been able to successfully get rid of the zombie processes is when I manually kill 'shfs'.  Doesn't it seem like shfs might not be allowing the parent process time to read the exit status?

Link to comment
  • 1 month later...

I had similar behavior for a little while a few months ago. I kept focusing on Plex because it wasn't responding.

 

I finally went back to 100% stock, and *only* added the Plex plugin back. Everything worked OK. Slowly added plugins back until I found it was SickBeard that was choking on a .AppleDouble in one of its directories. Removed it and things have been solid since then.

 

Not sure if it directly applies, but certainly removing all plugins/non-stock addons, and just running Plex is a good place to start troubleshooting. Reduce variables wherever you can.

Link to comment

I had similar behavior for a little while a few months ago. I kept focusing on Plex because it wasn't responding.

 

I finally went back to 100% stock, and *only* added the Plex plugin back. Everything worked OK. Slowly added plugins back until I found it was SickBeard that was choking on a .AppleDouble in one of its directories. Removed it and things have been solid since then.

 

Not sure if it directly applies, but certainly removing all plugins/non-stock addons, and just running Plex is a good place to start troubleshooting. Reduce variables wherever you can.

 

Wow...interesting...I'll try removing all instances of those and see if my problem goes away.  How did you figure it was sickbeard choking?

Link to comment

How did you figure it was sickbeard choking?

 

Just a standard troubleshooting technique - half splitting.

 

Once I isolated that it was some plugin causing the issue by disabling all plugins, I then proceeded to add half back in, and seeing if the issue continued. If it did, then I knew my issue was within those group that was turned on - so I disabled half again and kept going until I isolated it to one specific plugin. From there, I looked through sickbeard's logs carefully.

 

Half split is useful for single variable troubleshooting - and it's fast, but it's a bit less accurate if you have multiple variables causing your issue.

Link to comment

How did you figure it was sickbeard choking?

 

Just a standard troubleshooting technique - half splitting.

 

Once I isolated that it was some plugin causing the issue by disabling all plugins, I then proceeded to add half back in, and seeing if the issue continued. If it did, then I knew my issue was within those group that was turned on - so I disabled half again and kept going until I isolated it to one specific plugin. From there, I looked through sickbeard's logs carefully.

 

Half split is useful for single variable troubleshooting - and it's fast, but it's a bit less accurate if you have multiple variables causing your issue.

 

Right, I guess I was wondering how you determined it was due to the apple metadata and sickbeard's consumption of it.  Seems like a bug in sickbeard that needs to be handled if this is the root cause.

Link to comment

It was a while ago, so I'm doing this from memory. Once I figured out it was SickBeard, I just launched it manually and instead of redirecting it to /dev/null or some kind of hidden output, I launched it and just read it in stdout. It was pretty clear that it was attempting to index some kind of Apple metadata folder and it was hanging up on it.

Link to comment
  • 1 month later...

I feel that this has something to do with a bug in shfs.  The moment I kill shfs, all the other processes (i.e. Plex, Crashplan, Sickbeard) become responsive again.  shfs appears to be holding something up.  I'd be happy to help debug, but I don't know enough about unraid to know where to start.  Does anyone have any ideas?

Link to comment
  • 2 weeks later...

Is it possible to get support for this?  I emailed the support address on the lime-tech site but to no avail.  I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

 

It's at the point where I have to basically stop the array and 'killall shfs' once a day, then start everything again.  The moment I 'killall shfs', everything (crashplan, plex, sabnzbd) become responsive again.

 

If I can't resolve this issue with Unraid, I'm going to put in the effort to migrate my NAS to Ubuntu.

Link to comment

Is it possible to get support for this?  I emailed the support address on the lime-tech site but to no avail.  I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

You emailed me on 12:00 on a Sunday and I took an hour and a half to reply - I don't think this is "to no avail".

 

Get it into the failing state on Monday and then send an email and we can debug it.

Link to comment

Is it possible to get support for this?  I emailed the support address on the lime-tech site but to no avail.  I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

You emailed me on 12:00 on a Sunday and I took an hour and a half to reply - I don't think this is "to no avail".

 

Get it into the failing state on Monday and then send an email and we can debug it.

 

My apologies if that is the case, but it seems I never got your email (even checked the spam folders).  Today's email was not the first time I used the online form.  Prior to that I sent an email in September 2013 and one in November of 2013 to [email protected], neither of which got a reply.  I also have not received an email today.  The only responses I've gotten have been through this forum posting alone.

 

This has continued to be a problem for me across many of the recent unraid releases, and I lack the familiarity with unraid and shfs to be able to debug it alone.  It has been extremely frustrating for me because of the nearly daily need to take down and bring up my array.

Link to comment

First order of business is to put your setup back to stcok without any plugins or modifications.  If it works without issue then it isn't an unraid caused problem.  Also, not sure how you are setup; unRAID basic or not but I never had issue with Plex running on a dedicated cache drive.

 

Kryspy

Link to comment
  • 5 weeks later...

Is it possible to get support for this?  I emailed the support address on the lime-tech site but to no avail.  I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

You emailed me on 12:00 on a Sunday and I took an hour and a half to reply - I don't think this is "to no avail".

 

Get it into the failing state on Monday and then send an email and we can debug it.

 

My apologies if that is the case, but it seems I never got your email (even checked the spam folders).  Today's email was not the first time I used the online form.  Prior to that I sent an email in September 2013 and one in November of 2013 to [email protected], neither of which got a reply.  I also have not received an email today.  The only responses I've gotten have been through this forum posting alone.

 

This has continued to be a problem for me across many of the recent unraid releases, and I lack the familiarity with unraid and shfs to be able to debug it alone.  It has been extremely frustrating for me because of the nearly daily need to take down and bring up my array.

 

 

Hi Tom,

 

Once again I seem to have completely lost contact with you over email.  I sent you logs on 1/18 and haven't even heard so much as a confirmation that you received them, despite regularly checking in.  Is there some other route I need to take to get support?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.