owan Posted June 24, 2019 Share Posted June 24, 2019 I've been running Unraid for a few years now but within the last 6 months or so it has gone from being incredibly reliable to an unmitigated mess. It took a real turn when I attempted to diagnose a Plex issue (excessive CPU usage) and realized that my OS version was really old (like 6.3). Recently almost every time I try to go in to Plex on my devices the server is down, which took me an embarrassingly long time to realize was being caused by my user shares disappearing (taking docker down with it). Reboots get everything going again for a while but its only a matter of time until theres an issue. A brief look at the logs seems to indicate some kind of memory issue causing some processes to terminate and take everything down. I suspect the root cause may still be the original Plex issue since I've had periodic instances of persistent high CPU usage which may be related to Plex media scanner, but I'm not sure. I've attached the logs from today's incident. PC is totally home built with a quad core haswell i5 and 16GB of ram and all hardware appears to be functioning normally. I have very, very little in the way of plugins or anything installed that would complicate things server-diagnostics-20190624-1834.zip Quote Link to comment
testdasi Posted June 24, 2019 Share Posted June 24, 2019 You have a lot of these: Jun 24 14:01:13 Server shfs: error: shfs_readdir, 1301: Cannot allocate memory (12): filler: F464A7E26B25ED562EAF3576C2FB0B702F2D1255 (2019_05_26 18_07_02 UTC) Looks like memory problem. You might want to resit your RAM sticks and then run memtest. Quote Link to comment
Frank1940 Posted June 24, 2019 Share Posted June 24, 2019 Another possibility is one of your Dockers has configuration mistake in mapping the location of its files and the mis-configuration has the Docker writing its files to the RAM disk that Unraid uses for storing the OS. If you have any configuration settings that point to anywhere that don't begin /mnt/ those files are not being written to a physical drive but to the RAM disk! Quote Link to comment
owan Posted June 24, 2019 Author Share Posted June 24, 2019 1 hour ago, Frank1940 said: Another possibility is one of your Dockers has configuration mistake in mapping the location of its files and the mis-configuration has the Docker writing its files to the RAM disk that Unraid uses for storing the OS. If you have any configuration settings that point to anywhere that don't begin /mnt/ those files are not being written to a physical drive but to the RAM disk! All docker configurations point to /mnt/ locations (docker img/appdata plus the docker configs). Plex is the only one at the moment so its fairly easy to check unless I'm not looking at the right thing. 1 hour ago, testdasi said: You have a lot of these: Yup, hence why I said it looks like a memory issue. Not sure if I should be interpreting this as a hardware failure or a software issue like a memory leak, but the highly predictable and regular nature of it hints at the latter rather than the former. Quote Link to comment
testdasi Posted June 24, 2019 Share Posted June 24, 2019 2 minutes ago, owan said: Yup, hence why I said it looks like a memory issue. Not sure if I should be interpreting this as a hardware failure or a software issue like a memory leak, but the highly predictable and regular nature of it hints at the latter rather than the former. A memory leak due to a docker, at least in the cases I have seen, causes Linux to panic and kill some processes to free up more memory. Your shfs errors suggest shfs can't allocate memory which is a different problem I think. Quote Link to comment
owan Posted June 24, 2019 Author Share Posted June 24, 2019 11 minutes ago, testdasi said: A memory leak due to a docker, at least in the cases I have seen, causes Linux to panic and kill some processes to free up more memory. Your shfs errors suggest shfs can't allocate memory which is a different problem I think. I've come back to the server many times where the logs ended with a process being killed. Unfortunately I don't have any of those saved. I also had some plugins installed (now removed in the hope of troubleshooting) where I'd get an e-mail reporting a segmentation fault that I assume was happening when the shares disappeared. I'll be running memtest soon to see if theres any indication of a hardware problem. Quote Link to comment
owan Posted June 25, 2019 Author Share Posted June 25, 2019 Memtest completed 4 passes with 0 errors. With something occurring this frequently it probably would have showed up after a couple passes if not sooner. I'm going to boot back in to unraid and leave it running without Plex on and see if it screws up. Its been an almost daily occurrence for weeks now so I expect that it'll manifest itself within the next 48 hours if it isn't related to Plex. In the meantime, any thoughts on potential sources of this issue are much appreciated! Quote Link to comment
owan Posted June 26, 2019 Author Share Posted June 26, 2019 Well, still having the same issue. I forgot to turn off Plex the other day so I still haven't quite isolated that part yet.. this is just a new log to look at I guess. Heres an excerpt from syslog.last200.txt which doesn't seem to be captured at the bottom of the normal syslog.txt showing the process being killed Jun 26 09:29:53 Server kernel: [ 21274] 99 21274 11385 2605 122880 0 0 smbd Jun 26 09:29:53 Server kernel: [ 21388] 0 21388 28442 3366 188416 0 0 php-fpm Jun 26 09:29:53 Server kernel: [ 21389] 0 21389 28442 3366 188416 0 0 php-fpm Jun 26 09:29:53 Server kernel: [ 21434] 99 21434 11299 2568 122880 0 0 smbd Jun 26 09:29:53 Server kernel: [ 21593] 99 21593 11299 2561 122880 0 0 smbd Jun 26 09:29:53 Server kernel: [ 21775] 99 21775 11385 2606 122880 0 0 smbd Jun 26 09:29:53 Server kernel: [ 21928] 99 21928 11299 2545 122880 0 0 smbd Jun 26 09:29:53 Server kernel: [ 22029] 99 22029 11385 2623 122880 0 0 smbd Jun 26 09:29:53 Server kernel: [ 22124] 99 22124 11299 2582 122880 0 0 smbd Jun 26 09:29:53 Server kernel: [ 22126] 0 22126 28426 3211 188416 0 0 php-fpm Jun 26 09:29:53 Server kernel: Out of memory: Kill process 3523 (shfs) score 901 or sacrifice child Jun 26 09:29:53 Server kernel: Killed process 3523 (shfs) total-vm:24646404kB, anon-rss:14745884kB, file-rss:4kB, shmem-rss:1188kB Jun 26 09:29:54 Server kernel: oom_reaper: reaped process 3523 (shfs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Jun 26 17:00:50 Server emhttpd: req (2): csrf_token=****************&title=System+Log&cmd=%2FwebGui%2Fscripts%2Ftail_log&arg1=syslog Jun 26 17:00:50 Server emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog server-diagnostics-20190626-2101.zip Quote Link to comment
owan Posted July 2, 2019 Author Share Posted July 2, 2019 Turned off the plex docker and no issues for the last 5 days, so its definitely a docker/plex issue. Guess I need to look deeper at whats going on there Quote Link to comment
owan Posted October 19, 2019 Author Share Posted October 19, 2019 (edited) Even with Plex off, I'm still having an issue where one core ends up pegged constantly after several days. With Plex on the server dies in 24-36 hours, with it off I still have issues eventually that cause problems when accessing remote files from SMB. Frustrated as hell by nigh on a year of this stuff. I still have no idea why my otherwise rock solid unraid server decided it just didn't want to live anymore. Nobody in the plex forums had any idea what was happening, but it seems like ultimately it may not be isolated strictly to Plex server-diagnostics-20191019-0657.zip Edited October 19, 2019 by owan Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.