Losing Shares Every 24-36 hours

owan · June 24, 2019

I've been running Unraid for a few years now but within the last 6 months or so it has gone from being incredibly reliable to an unmitigated mess. It took a real turn when I attempted to diagnose a Plex issue (excessive CPU usage) and realized that my OS version was really old (like 6.3). Recently almost every time I try to go in to Plex on my devices the server is down, which took me an embarrassingly long time to realize was being caused by my user shares disappearing (taking docker down with it). Reboots get everything going again for a while but its only a matter of time until theres an issue. A brief look at the logs seems to indicate some kind of memory issue causing some processes to terminate and take everything down. I suspect the root cause may still be the original Plex issue since I've had periodic instances of persistent high CPU usage which may be related to Plex media scanner, but I'm not sure. I've attached the logs from today's incident.

PC is totally home built with a quad core haswell i5 and 16GB of ram and all hardware appears to be functioning normally. I have very, very little in the way of plugins or anything installed that would complicate things

server-diagnostics-20190624-1834.zip

testdasi · June 24, 2019

You have a lot of these:

Jun 24 14:01:13 Server shfs: error: shfs_readdir, 1301: Cannot allocate memory (12): filler: F464A7E26B25ED562EAF3576C2FB0B702F2D1255 (2019_05_26 18_07_02 UTC)

Looks like memory problem. You might want to resit your RAM sticks and then run memtest.

Frank1940 · June 24, 2019

Another possibility is one of your Dockers has configuration mistake in mapping the location of its files and the mis-configuration has the Docker writing its files to the RAM disk that Unraid uses for storing the OS. If you have any configuration settings that point to anywhere that don't begin /mnt/ those files are not being written to a physical drive but to the RAM disk!

owan · June 24, 2019

1 hour ago, Frank1940 said:

Another possibility is one of your Dockers has configuration mistake in mapping the location of its files and the mis-configuration has the Docker writing its files to the RAM disk that Unraid uses for storing the OS. If you have any configuration settings that point to anywhere that don't begin /mnt/ those files are not being written to a physical drive but to the RAM disk!

All docker configurations point to /mnt/ locations (docker img/appdata plus the docker configs). Plex is the only one at the moment so its fairly easy to check unless I'm not looking at the right thing.

1 hour ago, testdasi said:

You have a lot of these:

Yup, hence why I said it looks like a memory issue. Not sure if I should be interpreting this as a hardware failure or a software issue like a memory leak, but the highly predictable and regular nature of it hints at the latter rather than the former.

testdasi · June 24, 2019

2 minutes ago, owan said:

Yup, hence why I said it looks like a memory issue. Not sure if I should be interpreting this as a hardware failure or a software issue like a memory leak, but the highly predictable and regular nature of it hints at the latter rather than the former.

A memory leak due to a docker, at least in the cases I have seen, causes Linux to panic and kill some processes to free up more memory.

Your shfs errors suggest shfs can't allocate memory which is a different problem I think.

owan · June 24, 2019

11 minutes ago, testdasi said:

A memory leak due to a docker, at least in the cases I have seen, causes Linux to panic and kill some processes to free up more memory.

Your shfs errors suggest shfs can't allocate memory which is a different problem I think.

I've come back to the server many times where the logs ended with a process being killed. Unfortunately I don't have any of those saved. I also had some plugins installed (now removed in the hope of troubleshooting) where I'd get an e-mail reporting a segmentation fault that I assume was happening when the shares disappeared. I'll be running memtest soon to see if theres any indication of a hardware problem.

owan · June 25, 2019

Memtest completed 4 passes with 0 errors. With something occurring this frequently it probably would have showed up after a couple passes if not sooner. I'm going to boot back in to unraid and leave it running without Plex on and see if it screws up. Its been an almost daily occurrence for weeks now so I expect that it'll manifest itself within the next 48 hours if it isn't related to Plex. In the meantime, any thoughts on potential sources of this issue are much appreciated!

owan · June 26, 2019

Well, still having the same issue. I forgot to turn off Plex the other day so I still haven't quite isolated that part yet.. this is just a new log to look at I guess. Heres an excerpt from syslog.last200.txt which doesn't seem to be captured at the bottom of the normal syslog.txt showing the process being killed

Jun 26 09:29:53 Server kernel: [  21274]    99 21274    11385     2605   122880        0             0 smbd
Jun 26 09:29:53 Server kernel: [  21388]     0 21388    28442     3366   188416        0             0 php-fpm
Jun 26 09:29:53 Server kernel: [  21389]     0 21389    28442     3366   188416        0             0 php-fpm
Jun 26 09:29:53 Server kernel: [  21434]    99 21434    11299     2568   122880        0             0 smbd
Jun 26 09:29:53 Server kernel: [  21593]    99 21593    11299     2561   122880        0             0 smbd
Jun 26 09:29:53 Server kernel: [  21775]    99 21775    11385     2606   122880        0             0 smbd
Jun 26 09:29:53 Server kernel: [  21928]    99 21928    11299     2545   122880        0             0 smbd
Jun 26 09:29:53 Server kernel: [  22029]    99 22029    11385     2623   122880        0             0 smbd
Jun 26 09:29:53 Server kernel: [  22124]    99 22124    11299     2582   122880        0             0 smbd
Jun 26 09:29:53 Server kernel: [  22126]     0 22126    28426     3211   188416        0             0 php-fpm
Jun 26 09:29:53 Server kernel: Out of memory: Kill process 3523 (shfs) score 901 or sacrifice child
Jun 26 09:29:53 Server kernel: Killed process 3523 (shfs) total-vm:24646404kB, anon-rss:14745884kB, file-rss:4kB, shmem-rss:1188kB
Jun 26 09:29:54 Server kernel: oom_reaper: reaped process 3523 (shfs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Jun 26 17:00:50 Server emhttpd: req (2): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslog
Jun 26 17:00:50 Server emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog

server-diagnostics-20190626-2101.zip

owan · July 2, 2019

Turned off the plex docker and no issues for the last 5 days, so its definitely a docker/plex issue. Guess I need to look deeper at whats going on there

owan · October 19, 2019

Even with Plex off, I'm still having an issue where one core ends up pegged constantly after several days. With Plex on the server dies in 24-36 hours, with it off I still have issues eventually that cause problems when accessing remote files from SMB. Frustrated as hell by nigh on a year of this stuff. I still have no idea why my otherwise rock solid unraid server decided it just didn't want to live anymore. Nobody in the plex forums had any idea what was happening, but it seems like ultimately it may not be isolated strictly to Plex

server-diagnostics-20191019-0657.zip

Edited October 19, 2019 by owan

Losing Shares Every 24-36 hours

Recommended Posts

owan

Link to comment

testdasi

Link to comment

Frank1940

Link to comment

owan

Link to comment

testdasi

Link to comment

owan

Link to comment

owan

Link to comment

owan

Link to comment

owan

Link to comment

owan

Link to comment

Join the conversation