goinsnoopin Posted September 30, 2015 Posted September 30, 2015 So I have been plagued with Webgui crashes since upgrading to the 6.x version of unraid. I have multiple threads on this issue, and I have marked my most recent thread as "Solved", however after 13 days of uptime I just had another crash. As mentioned in my old thread I made two changes at the same time: 1. upgraded CPU and 2. changed Plex Server Library to update daily (was previously set to hourly). Once I had uptime of 8 days I marked the thread as solved. Since I made two changes at the same time - I decided to go back and change the Plex Server Library update back to hourly. Needless to say my server did not make it to 14 days of uptime. Failure was the same...CPU usage in htop went to 100%. With a Q9300 cpu now installed...and stock unraid with no dockers/plugins etc...it doesnt make any sense that a Q9300 should go to 100% I started this thread to see if it would spark any discussions on why this Plex scanning would cause the spawning of the hundreds of smbd processes. Please note that Plex Server is running on a windows 7 x64 professional PC and connecting via samba. To me it seems like the crash occurs just after the mover script runs. I am going to attach my diagnostics, which has historically showed nothing, and a zip file containing the output of smbstatus with all the locked files...which must be getting locked as part of the Plex scan. Also here is a link to my old thread: http://lime-technology.com/forum/index.php?topic=42900.msg409193#msg409193 In the meantime, I have changed the Plex scanning back to daily. Also I have upgraded to unraid 6.13 as part of this manual reboot. Any suggestions would be greatly appreciated! Dan tower-diagnostics-20150930-0808.zip Unraid_putty_session_output.zip
trurl Posted October 1, 2015 Posted October 1, 2015 You could run Plex Media Server on unRAID instead of a separate PC.
goinsnoopin Posted October 1, 2015 Author Posted October 1, 2015 Trurl, You are correct, the reason I didn't was that I only had a Celeron E1200 CPU, which could not handle the transcoding. I threw the Q9300 in my unraid box to see if that would address the 100% CPU issue/webgui crashing issue. The CPU in my windows 7 PC has an i7 4790K CPU, so it has more than enough power to handle the PLEX transcoding. I guess I could try Plex on unraid now that I have a Q9300...but I am now running a CPU that uses much more power than my E1200. It just seams strange to me that this setup worked just fine with Unraid 5. Dan
MyKroFt Posted October 1, 2015 Posted October 1, 2015 I would run memtest overnight. Upgrading to a faster CPU - your current ram might not be able to handle it... Myk
Squid Posted October 1, 2015 Posted October 1, 2015 I wouldn't say that the log shows nothing... It's literally half filled with this: Sep 21 21:32:16 Tower kernel: swapper/1: page allocation failure: order:0, mode:0x20 Sep 21 21:32:16 Tower kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.1.5-unRAID #5 Sep 21 21:32:16 Tower kernel: Hardware name: Supermicro C2SEA/C2SEA, BIOS 1.1a 05/16/2012 Sep 21 21:32:16 Tower kernel: 0000000000000020 ffff880079883bd8 ffffffff816105e3 0000000000008e5c Sep 21 21:32:16 Tower kernel: 0000000000000000 ffff880079883c68 ffffffff810b54f4 ffffffff00000000 Sep 21 21:32:16 Tower kernel: ffffffff815771c0 ffff88004eada200 ffff88004eada200 0000000000000000 Sep 21 21:32:16 Tower kernel: Call Trace: Sep 21 21:32:16 Tower kernel: <IRQ> [<ffffffff816105e3>] dump_stack+0x4c/0x6e Sep 21 21:32:16 Tower kernel: [<ffffffff810b54f4>] warn_alloc_failed+0x102/0x116 Sep 21 21:32:16 Tower kernel: [<ffffffff815771c0>] ? inet_del_offload+0x3e/0x3e Sep 21 21:32:16 Tower kernel: [<ffffffff810b8183>] __alloc_pages_nodemask+0x66c/0x7b6 Sep 21 21:32:16 Tower kernel: [<ffffffff815217d2>] __alloc_page_frag+0xa4/0x119 Sep 21 21:32:16 Tower kernel: [<ffffffff8152677d>] __alloc_rx_skb+0x4c/0xe1 Sep 21 21:32:16 Tower kernel: [<ffffffff81526861>] __napi_alloc_skb+0x1b/0x3c Sep 21 21:32:16 Tower kernel: [<ffffffffa006474b>] rtl8169_poll+0x23d/0x4ba [r8169] Sep 21 21:32:16 Tower kernel: [<ffffffff815309c6>] net_rx_action+0xe0/0x230 Sep 21 21:32:16 Tower kernel: [<ffffffff8104a549>] __do_softirq+0xc9/0x1be Sep 21 21:32:16 Tower kernel: [<ffffffff8104a7cf>] irq_exit+0x3d/0x82 Sep 21 21:32:16 Tower kernel: [<ffffffff8100ca36>] do_IRQ+0xb3/0xcd Sep 21 21:32:16 Tower kernel: [<ffffffff8161676e>] common_interrupt+0x6e/0x6e Sep 21 21:32:16 Tower kernel: <EOI> [<ffffffff8101352c>] ? mwait_idle+0x83/0x9f Sep 21 21:32:16 Tower kernel: [<ffffffff8101352c>] ? mwait_idle+0x83/0x9f Sep 21 21:32:16 Tower kernel: [<ffffffff81013cd2>] arch_cpu_idle+0xa/0xc Sep 21 21:32:16 Tower kernel: [<ffffffff810728b4>] cpu_startup_entry+0x27e/0x2b2 Sep 21 21:32:16 Tower kernel: [<ffffffff810324f4>] start_secondary+0x10e/0x12c Sep 21 21:32:16 Tower kernel: Mem-Info: Sep 21 21:32:16 Tower kernel: active_anon:95715 inactive_anon:6273 isolated_anon:0 Sep 21 21:32:16 Tower kernel: active_file:170261 inactive_file:176939 isolated_file:0 Sep 21 21:32:16 Tower kernel: unevictable:2243 dirty:27236 writeback:19654 unstable:0 Sep 21 21:32:16 Tower kernel: slab_reclaimable:18967 slab_unreclaimable:10220 Sep 21 21:32:16 Tower kernel: mapped:11431 shmem:90502 pagetables:1167 bounce:0 Sep 21 21:32:16 Tower kernel: free:2465 free_pcp:194 free_cma:0 Sep 21 21:32:16 Tower kernel: Node 0 DMA free:7828kB min:40kB low:48kB high:60kB active_anon:2524kB inactive_anon:208kB active_file:264kB inactive_file:1472kB unevictable:248kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:248kB dirty:444kB writeback:524kB mapped:548kB shmem:2580kB slab_reclaimable:1396kB slab_unreclaimable:508kB kernel_stack:32kB pagetables:140kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no Sep 21 21:32:16 Tower kernel: lowmem_reserve[]: 0 1954 1954 1954 Sep 21 21:32:16 Tower kernel: Node 0 DMA32 free:2032kB min:5440kB low:6800kB high:8160kB active_anon:380336kB inactive_anon:24884kB active_file:680780kB inactive_file:706284kB unevictable:8724kB isolated(anon):0kB isolated(file):0kB present:2045504kB managed:2002412kB mlocked:8724kB dirty:108500kB writeback:78092kB mapped:45176kB shmem:359428kB slab_reclaimable:74472kB slab_unreclaimable:40372kB kernel_stack:2944kB pagetables:4528kB unstable:0kB bounce:0kB free_pcp:776kB local_pcp:240kB free_cma:0kB writeback_tmp:0kB pages_scanned:40 all_unreclaimable? no Sep 21 21:32:16 Tower kernel: lowmem_reserve[]: 0 0 0 0 Sep 21 21:32:16 Tower kernel: Node 0 DMA: 10*4kB (UE) 5*8kB (UEM) 16*16kB (UEM) 26*32kB (UEM) 22*64kB (UEM) 9*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 7824kB Sep 21 21:32:16 Tower kernel: Node 0 DMA32: 33*4kB (UM) 156*8kB (UEM) 9*16kB (UEMR) 1*32kB (R) 0*64kB 1*128kB (R) 1*256kB (R) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1940kB Sep 21 21:32:16 Tower kernel: 437738 total pagecache pages Sep 21 21:32:16 Tower kernel: 0 pages in swap cache Sep 21 21:32:16 Tower kernel: Swap cache stats: add 0, delete 0, find 0/0 Sep 21 21:32:16 Tower kernel: Free swap = 0kB Sep 21 21:32:16 Tower kernel: Total swap = 0kB Sep 21 21:32:16 Tower kernel: 515374 pages RAM Sep 21 21:32:16 Tower kernel: 0 pages HighMem/MovableOnly Sep 21 21:32:16 Tower kernel: 10794 pages reserved I did find a reference through google search talking about jumbo frames enabled and fragmented memory causing this. (which since you only have 2GB is a distinct possibility.) http://marc.info/?l=linux-kernel&m=123138990008095&w=3 Another cause was heavy network I/O (which plex scanning probably qualifies as) My best guess (if you're not using jumbo frames) would be to upgrade the memory from 2GB to 4GB.
unevent Posted October 1, 2015 Posted October 1, 2015 Suggest grabbing the swapfile plugin from the 6.1 verified list and create a 2gig (2048) swapfile and try again. Place the swapfile on your cache drive. Other idea is to link your Win7 and unRAID with NFS instead of Samba. Google Nekodrive and try that out on your Win7 PC. Export the shares you need using NFS on the server.
goinsnoopin Posted October 8, 2015 Author Posted October 8, 2015 Thanks everyone for your suggestions. I have run memtest and I get no errors. If you read through any of my old threads, I mentioned that this hardware has been working fine and stable for over 5 years. I have tried the swapfile plugin and configured it for 2GB, however I still get the crash. I am not using Jumbo frames. I have tried upgrading the ram from 2GB to 4GB...I have tried two different brands of Ram and neither of them will post....I just get a blank black screen with no beeps or anything. Called supermicro and they confirmed that the ram should work and they would talk to their bios engineer...and to date I have not heard back from them. NFS is an interesting option, but some people indicate that the NFS client is slow. Since my last post, I have had two additional crashes, one with the swap file active and one while booted to unraid safe mode. The safe mode crash was interesting as I was simply opening a folder on unraid from my Windows 7 x64 pro computer….so I know that Plex was not a culprit. I confirmed this by opening a telnet session and typing smbstatus. When I ran this I had several locked files (15 or so) which where all the root directory of the unraid share I was trying to open….previously when I thought it was just Plex, it was the individual files that were being scanned that were locked. So something is just not right with the samba connection. When this happened I ran htop and it showed the cpu at 400% (I assume this means that all 4 cores were at 100%). memory usage was not at 100%...however if I let the computer sit….6-12 hours later with the browser window open showing the tail of the syslog, entries get written about memory…once this occurs, I am no longer able to telnet into the console. Unraid 5 was flawless. I just don’t know what I should do moving forward. The constant crashing is an annoyance to say the least. As I see it my options are: 1. Revert to unraid 5 (I am having a hard time with this as I love the look, and concept of dockers, etc.) 2. Invest money to purchase new hardware to upgrade my unraid motherboard, memory and CPU. 3. Combine the hardware of my Windows 7 machine with my unraid box and fully leverage Unraid 6 and have windows 7 running as a VM. My windows 7 PC has an Asrock Z97 Extreme6 motherboard, 16 GB of Ram, i7 4790K CPU and a 250 GB SSD harddrive. Since I have more than 10 drives I believe I will still need my AOC SASLP MV8 card. I understand the concept of VM, but I have never implemented one, so I would have a lot of reading to do to get up to speed. For example I am unclear if I need a GPU beyond the integrated on built into the motherboard to run a windows 7 vm. We don’t play any games on the windows 7 PC, it is mainly used for Lightroom and basic internet surfing. I believe I have an old HD5450 video card floating around…maybe that would work?? Also the VM image files…am I making an image of my existing PC or am I starting with a stock Microsoft image and doing a fresh install? I am leaning towards option 3, but clearly I have a lot of learning to do. Any suggestions are greatly appreciated! Dan
Poprin Posted October 8, 2015 Posted October 8, 2015 I can't offer any advice as to the source of your crashing issue. However I would suggest (purely as a workaround rather than a solution) that you disable automatic library updates in Plex. I totally understand why you would use them, however in my personal situation I am the only one who updates my Plex server with new content. Ahead of this process I also confirm all the folder structures and file names before updating the library's. As this is a manual process it only takes me an extra second to trigger the update process. The also for me has the massive advantage of allowing my drives to spin down correctly and only spin up when they are needed. If you are scanning your library hourly what are you drive spin downs set to? You would realistically need to disable this feature. Also my hardware (FX4100, 8gb ECC RAM) is capable of handling three transcode streams simultaneously using Plex via the unRAID docker. With 4gb of RAM and Q9300 I would be surprised if your server would not handle a similar workload with Plex running as a docker. Have you tried this? I have migrated from a Windows Server running 24x7 with Plex and unRAID is infinitely more stable for me with the same hardware.
goinsnoopin Posted October 8, 2015 Author Posted October 8, 2015 Poprin, I agree with you...in my troubleshooting, I stumbled upon Plex being set to update hourly. Ultimately my discs were not spinning down as my time was set to two hours. My reason for the automatic updates, is that I get TV recordings from SageTV, so shows are added automatically versus me manually adding files to my Plex library. At this point I am running the Plex updates manually, which is somewhat of an inconvenience if I am in bed looking to watch a TV show via a Roku and then realize the show hasn't been added requiring me to run downstairs to my PC to kick off a manual library update. I wish I could get my server to 4GB of ram...but for some strange reason it won't post. Supermicro confirmed that the ram should work based on its specs...even tried a single stick and it wont post...just a black screen with no beeps, no other indicators. Dan
goinsnoopin Posted October 8, 2015 Author Posted October 8, 2015 Did a manual reboot this AM. Unraid came up and I started the array which kicked off a parity check due to the unclean shutdown. Came home at lunch tried to browse unraid shares via samba on win 7 PC and I get the Tower is not accessible. The webgui is still functioning. I opened a telnet session and typed diagnostics and got the following: root@Tower:~# diagnostics Starting diagnostics collection... Warning: file_put_contents(): Only 0 of 15 bytes written, possibly out of free d isk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 12 bytes written, possibly out of free d isk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 10024 bytes written, possibly out of fre e disk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 89 bytes written, possibly out of free d isk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 7698 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 5784 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 4638 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 4581 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 3738 bytes written, possibly out of free disk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 36 Warning: file_put_contents(): Only 0 of 2 bytes written, possibly out of free di sk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 50 Warning: file_put_contents(): Only 0 of 34 bytes written, possibly out of free d isk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 52 Warning: file_put_contents(): Only 0 of 36 bytes written, possibly out of free d isk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 66 Warning: file_put_contents(): Only 0 of 36 bytes written, possibly out of free d isk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 66 Warning: file_put_contents(): Only 0 of 36 bytes written, possibly out of free d isk space in /usr/local/emhttp/plugins/dynamix/scripts/diagnostics on line 66 done. ZIP file '/boot/logs/tower-diagnostics-20151008-1249.zip' created. There is plenty of space on the flash drive. Just didn't know if this strange behavior meant anything. Wondering if I should backup the unraid USB and prepare it from scratch?
trurl Posted October 8, 2015 Posted October 8, 2015 I suspect a problem with your flash drive. Did it actually write anything to the diagnostics zip?
goinsnoopin Posted October 8, 2015 Author Posted October 8, 2015 I am not sure, there is a parity check in progress from the manual reboot performed this morning. That won't finish until tonight. My plan was to wait for that to finish and then pull the flash drive to check the diagnostics.zip and to back it up and prepare it from scratch. Dan
JonathanM Posted October 8, 2015 Posted October 8, 2015 I am not sure, there is a parity check in progress from the manual reboot performed this morning. That won't finish until tonight. My plan was to wait for that to finish and then pull the flash drive to check the diagnostics.zip and to back it up and prepare it from scratch. Dan Keep in mind that if the USB is not writeable, the array will want a parity check again on next boot. Your call if you want to let the current check finish, if it's showing zero errors so far I'd be inclined to cut the process short and check for the diagnostics file. The location noted in the errors you posted I think is in RAM, not on the USB, so that would lead me to believe you have other issues, not necessarily USB stick related. I can't tell from recent posts whether or not memtest has been run for an extended period of time with no errors noted, but given your issues with new sticks of RAM, I'm wondering if something in the memory subsystem on your motherboard is having issues.
goinsnoopin Posted October 8, 2015 Author Posted October 8, 2015 I just checked the diagnostics zip file....all the files have 0 bytes...so no information. I clean booted this morning so this run had the swapfile active. After the parity check, I typed powerdown from the console and that grabbed a syslog. I have attached it for reference. Dan syslog-20151008-191817.txt
goinsnoopin Posted October 9, 2015 Author Posted October 9, 2015 I ran memtest overnight...17 passes no errors found. I reformatted and prepped a clean unraid install. Lets see what happens.
goinsnoopin Posted October 17, 2015 Author Posted October 17, 2015 Still Crashing. I think I am going to move forward with combining my windows 7 pc with my unraid server and using VMs. I backed up my windows 7 PC and invoked the mover to push the 105 gb image file from my cache to my backup share and it locked my unraid server and samba. Here is a link to a screenshot of htop showing the 100% cpu usage on two cores. https://www.dropbox.com/s/0vi9npgmvplb896/unraid%20htop%2010162015.jpg?dl=0
Drider Posted October 19, 2015 Posted October 19, 2015 I thought I would add to this discussion. I too am experiencing unRIAD hangups, and I've been suspecting it's something with PLEX. I moved to 6.0.1 after being a long time user of unRAID v5 and never experiencing issues with the same hardware now on v6. The only difference now being the addition of PLEX. You're symptoms sound eerily similar to what I find. after about a week of the server running I'll go to browse my shares & physical drives only to find them unresponsive. Funny thing is ... PLEX still works. I can stream and browse like nothing's wrong. I only notice when I'm trying to access the drives & shares. I currently run PLEX on the unRAID server itself. I've been testing if these lockups are due to not fully exiting PLEX from the device I'm streaming from, but I'm really just at the beginning of troubleshooting. I run my own company, and it's hard to find time to spend on my own personal projects. I'm going to update to the newest release tonight in hope that my issues may resolve themselves, but I'm also going to watch this thread closely. I'm sorry I wish I had more information to help with, but as I discover more, hopefully I can contribute. I just wanted to let you know, you're not alone, and hopefully we can figure this out. On a side note: When my server locks up, I can't even shut down. I have to hard power down, not a comfortable feeling... -D
trurl Posted October 19, 2015 Posted October 19, 2015 I thought I would add to this discussion. I too am experiencing unRIAD hangups, and I've been suspecting it's something with PLEX. I moved to 6.0.1 after being a long time user of unRAID v5 and never experiencing issues with the same hardware now on v6. The only difference now being the addition of PLEX. You're symptoms sound eerily similar to what I find. after about a week of the server running I'll go to browse my shares & physical drives only to find them unresponsive. Funny thing is ... PLEX still works. I can stream and browse like nothing's wrong. I only notice when I'm trying to access the drives & shares. I currently run PLEX on the unRAID server itself. I've been testing if these lockups are due to not fully exiting PLEX from the device I'm streaming from, but I'm really just at the beginning of troubleshooting. I run my own company, and it's hard to find time to spend on my own personal projects. I'm going to update to the newest release tonight in hope that my issues may resolve themselves, but I'm also going to watch this thread closely. I'm sorry I wish I had more information to help with, but as I discover more, hopefully I can contribute. I just wanted to let you know, you're not alone, and hopefully we can figure this out. On a side note: When my server locks up, I can't even shut down. I have to hard power down, not a comfortable feeling... -D If you aren't using a Docker for this, then you should try that. Next time it happens go to console or telnet, type diagnostics find the diagnostics zip on your flash and post it. Also, please read this stickies in all forums.
Drider Posted October 20, 2015 Posted October 20, 2015 I am using Docker. I will try the command next time I see this happen, thanks. Not exactly sure why you're telling me to read the stickies, that's a lot of stickies.
trurl Posted October 20, 2015 Posted October 20, 2015 ...Not exactly sure why you're telling me to read the stickies, that's a lot of stickies. Sorry. I just wish more people would read at least the stickies in the forum they are posting to, then I wouldn't have to keep asking them to post their diagnostics.
Drider Posted October 20, 2015 Posted October 20, 2015 As a past moderator for several forums, I can understand your frustration. Per my post however, I was merely indicating I've seen similar issues with my server while running PLEX in docker, under unRAID 6. As I also mentioned running a business leaves me with very little time to fully troubleshoot what may truly be the cause, I literally have only had time to get pissed off, and press the power button. I had originally suspected my hardware, which never had issues before, but after reading the thread I now turn my suspicions to PLEX. I merely wanted to contribute another user with a similar issue, and as I found time I would provide what information I could to help. If the issue should persist I will be forced into diagnostics, and contribute actual data to the thread.
Drider Posted November 4, 2015 Posted November 4, 2015 Just a quick update. I've now gone a couple weeks without my issue arising, causing me to go into troubleshooting. The only changes I had made: Updated unRAID to the latest build. Updated PLEX to the latest Build. I make sure whenever I'm done transferring media between my leech box and my unRAID server I <umount> the network drive. I make sure I, and all users full close out of PLEX every time I'm done streaming. Meaning I close the Chromecast connection, and fuly exit the App on my Android device(s). Everything seems t be functioning normally, and no lockups.
goinsnoopin Posted November 28, 2015 Author Posted November 28, 2015 Two weeks ago, I converted all of my reiserfs drives to XFS and my system has been running perfect! At this point I do not believe that any of my issues were related to Plex...rather some issue with the reiser file system. I just wanted to take a minute and thank those of you who took the time to respond to my numerous posts on Unraid 6 being unresponsive. Dan
Recommended Posts
Archived
This topic is now archived and is closed to further replies.