Zombie/defunct processes, unable to stop them

Kaveh · August 28, 2013

I'm running 5.0rc16c with the latest plex media server (0.9.8.4.125), and every day or so Plex seems to completely stop responding. When I try to stop the service (with

/etc/rc.d/rc.plexmediaserver stop

) the command indefinitely hangs. When I check the processes, I see the processes are now defunct. From here on out, every process I try to stop becomes a zombie process (whether it's sab, sickbeard, transmission, java/crashplan). Does anyone know why these processes are all becoming zombies? How do I stop this?

Another thing I noticed, sometimes when I'm able to kill 'shfs', the defunct processes disappear and things start to behave normally again.

Moreover, I'd also like to figure out what is happening with plex, but none of the plex logs show anything useful before the server stops responding.

Kaveh · September 5, 2013

Does no one else have this issue? I could really use some help here, these zombie processes are causing me to have to hard-reset my server.

Kaveh · September 13, 2013

Bump

xamindar · September 13, 2013

Bad flash card maybe?

Sent from my A510 using Tapatalk 4

Joe L. · September 13, 2013

Does no one else have this issue? I could really use some help here, these zombie processes are causing me to have to hard-reset my server.

Zombie processes are already dead and other than taking a tiny bit of memory, do no harm.

Unix/Linux let the parent process read the exit status of a child process. when a child process ends, if a parent process has not yet read the exit status, but the parent process still exists, it will show as a zombie process.

The issue is with the parent process not doing its "read" of the child process exit status. (That tiny bit of memory I was talking about is the exit status of the child process).

In other words... blame plex media server, not unRAID and don't be afraid of zombies... It is not as if they are hanging, they are already dead, the parent process has just not cared.

Kaveh · September 15, 2013

Does no one else have this issue? I could really use some help here, these zombie processes are causing me to have to hard-reset my server.

Zombie processes are already dead and other than taking a tiny bit of memory, do no harm.

Unix/Linux let the parent process read the exit status of a child process. when a child process ends, if a parent process has not yet read the exit status, but the parent process still exists, it will show as a zombie process.

The issue is with the parent process not doing its "read" of the child process exit status. (That tiny bit of memory I was talking about is the exit status of the child process).

In other words... blame plex media server, not unRAID and don't be afraid of zombies... It is not as if they are hanging, they are already dead, the parent process has just not cared.

I've already brought it up in the plex forums. This problem doesn't exist on any other platforms running plex as best as I can tell. Regarding the parent processes, the parent to all the processes I try to kill that eventually turn into zombies is 'init'. The only time I've been able to successfully get rid of the zombie processes is when I manually kill 'shfs'. Doesn't it seem like shfs might not be allowing the parent process time to read the exit status?

Kaveh · November 8, 2013

To the top...

I really need some help. If I can't figure this out, I'm might just invest a weekend and move my NAS over to ubuntu (though I'd really prefer to stick with unraid).

Why is it when I kill 'shfs' everything starts responding again?

WeeboTech · November 8, 2013

Have you upgraded to the actual 5.0 release. (not the RC).

Kaveh · November 9, 2013

Yes, I'm on the 5.0 release (now). I still exhibit this issue and it's been extremely frustrating. After 'killall shfs', I effectively have to shut down my array and reboot the server almost daily to get it to come back up.

dgaschk · November 13, 2013

Why is this a problem?

Kaveh · November 15, 2013

Why is this a problem?

I don't understand. What do you mean by this question? Rebooting a server nearly daily is not problematic?

ClunkClunk · November 15, 2013

I had similar behavior for a little while a few months ago. I kept focusing on Plex because it wasn't responding.

I finally went back to 100% stock, and *only* added the Plex plugin back. Everything worked OK. Slowly added plugins back until I found it was SickBeard that was choking on a .AppleDouble in one of its directories. Removed it and things have been solid since then.

Not sure if it directly applies, but certainly removing all plugins/non-stock addons, and just running Plex is a good place to start troubleshooting. Reduce variables wherever you can.

Kaveh · November 16, 2013

I had similar behavior for a little while a few months ago. I kept focusing on Plex because it wasn't responding.

I finally went back to 100% stock, and *only* added the Plex plugin back. Everything worked OK. Slowly added plugins back until I found it was SickBeard that was choking on a .AppleDouble in one of its directories. Removed it and things have been solid since then.

Not sure if it directly applies, but certainly removing all plugins/non-stock addons, and just running Plex is a good place to start troubleshooting. Reduce variables wherever you can.

Wow...interesting...I'll try removing all instances of those and see if my problem goes away. How did you figure it was sickbeard choking?

ClunkClunk · November 16, 2013

How did you figure it was sickbeard choking?

Just a standard troubleshooting technique - half splitting.

Once I isolated that it was some plugin causing the issue by disabling all plugins, I then proceeded to add half back in, and seeing if the issue continued. If it did, then I knew my issue was within those group that was turned on - so I disabled half again and kept going until I isolated it to one specific plugin. From there, I looked through sickbeard's logs carefully.

Half split is useful for single variable troubleshooting - and it's fast, but it's a bit less accurate if you have multiple variables causing your issue.

Kaveh · November 16, 2013

How did you figure it was sickbeard choking?

Just a standard troubleshooting technique - half splitting.

Once I isolated that it was some plugin causing the issue by disabling all plugins, I then proceeded to add half back in, and seeing if the issue continued. If it did, then I knew my issue was within those group that was turned on - so I disabled half again and kept going until I isolated it to one specific plugin. From there, I looked through sickbeard's logs carefully.

Half split is useful for single variable troubleshooting - and it's fast, but it's a bit less accurate if you have multiple variables causing your issue.

Right, I guess I was wondering how you determined it was due to the apple metadata and sickbeard's consumption of it. Seems like a bug in sickbeard that needs to be handled if this is the root cause.

ClunkClunk · November 16, 2013

It was a while ago, so I'm doing this from memory. Once I figured out it was SickBeard, I just launched it manually and instead of redirecting it to /dev/null or some kind of hidden output, I launched it and just read it in stdout. It was pretty clear that it was attempting to index some kind of Apple metadata folder and it was hanging up on it.

Kaveh · December 30, 2013

I feel that this has something to do with a bug in shfs. The moment I kill shfs, all the other processes (i.e. Plex, Crashplan, Sickbeard) become responsive again. shfs appears to be holding something up. I'd be happy to help debug, but I don't know enough about unraid to know where to start. Does anyone have any ideas?

Kaveh · January 12, 2014

Is it possible to get support for this? I emailed the support address on the lime-tech site but to no avail. I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

It's at the point where I have to basically stop the array and 'killall shfs' once a day, then start everything again. The moment I 'killall shfs', everything (crashplan, plex, sabnzbd) become responsive again.

If I can't resolve this issue with Unraid, I'm going to put in the effort to migrate my NAS to Ubuntu.

limetech · January 12, 2014

Is it possible to get support for this? I emailed the support address on the lime-tech site but to no avail. I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

You emailed me on 12:00 on a Sunday and I took an hour and a half to reply - I don't think this is "to no avail".

Get it into the failing state on Monday and then send an email and we can debug it.

Kaveh · January 13, 2014

Is it possible to get support for this? I emailed the support address on the lime-tech site but to no avail. I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

You emailed me on 12:00 on a Sunday and I took an hour and a half to reply - I don't think this is "to no avail".

Get it into the failing state on Monday and then send an email and we can debug it.

My apologies if that is the case, but it seems I never got your email (even checked the spam folders). Today's email was not the first time I used the online form. Prior to that I sent an email in September 2013 and one in November of 2013 to [email protected], neither of which got a reply. I also have not received an email today. The only responses I've gotten have been through this forum posting alone.

This has continued to be a problem for me across many of the recent unraid releases, and I lack the familiarity with unraid and shfs to be able to debug it alone. It has been extremely frustrating for me because of the nearly daily need to take down and bring up my array.

Kryspy · January 13, 2014

First order of business is to put your setup back to stcok without any plugins or modifications. If it works without issue then it isn't an unraid caused problem. Also, not sure how you are setup; unRAID basic or not but I never had issue with Plex running on a dedicated cache drive.

Kryspy

Kaveh · January 13, 2014

I have unraid running plex on a cache drive. I'd ideally like to get it into a bad state again and with Tom's help see if the root cause can be found.

Kaveh · February 12, 2014

Is it possible to get support for this? I emailed the support address on the lime-tech site but to no avail. I am a paying customer and I would be happy to help debug this issue, but it seems there is little to no support.

You emailed me on 12:00 on a Sunday and I took an hour and a half to reply - I don't think this is "to no avail".

Get it into the failing state on Monday and then send an email and we can debug it.

My apologies if that is the case, but it seems I never got your email (even checked the spam folders). Today's email was not the first time I used the online form. Prior to that I sent an email in September 2013 and one in November of 2013 to [email protected], neither of which got a reply. I also have not received an email today. The only responses I've gotten have been through this forum posting alone.

This has continued to be a problem for me across many of the recent unraid releases, and I lack the familiarity with unraid and shfs to be able to debug it alone. It has been extremely frustrating for me because of the nearly daily need to take down and bring up my array.

Hi Tom,

Once again I seem to have completely lost contact with you over email. I sent you logs on 1/18 and haven't even heard so much as a confirmation that you received them, despite regularly checking in. Is there some other route I need to take to get support?

Kaveh · February 19, 2014

Yup....no support or response whatsoever. What am I supposed to do?

Zombie/defunct processes, unable to stop them

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation