FraxTech Posted January 12, 2018 Share Posted January 12, 2018 Over the last few weeks I've been having a problem with the Web GUI crashing and not allowing me to do anything, but my dockers are still running (I can access my VPN, Minecraft Server, all the shares, and can SSH into the machine, etc.). I haven't been able to find a way to restart the GUI or to shut the system down gracefully through SSH, so I've been having to hard reboot the server every few days, which has lead to a crazy number of parity checks, so I'm afraid I'm going to start having drives fail. These crashes always seem to happen when I'm doing something mundane on the server. Today I tried to do a soft reboot from the UI (as the last 5 or 6 times the server was booted up was from these lockups and me having to hard reboot it), and I got the reboot message in my browser, but the server never came back online. I'm only a novice with Linux, so I've tried to understand the log file, but I'm not seeing anything that tells me what the issue may be. So with this, I have three questions...First, is there a way to restart the Web GUI through SSH if it crashes? If not, is there a way to gracefully shut down the system via SSH? And lastly, can someone with more knowledge than myself please take a look at the attached diagnostics file and tell me if there is something there I can change to make the server more stable? Thanks for any help you all can give. Cheers. epcot-diagnostics-20180112-0037.zip Link to comment
JorgeB Posted January 12, 2018 Share Posted January 12, 2018 Kernel oops, try running memtest: Jan 11 06:28:19 Epcot kernel: BUG: unable to handle kernel paging request at 000000000a6d7902 Jan 11 06:28:19 Epcot kernel: IP: [<ffffffff81137582>] iput+0x83/0x170 Jan 11 06:28:19 Epcot kernel: PGD 12cf88067 Jan 11 06:28:19 Epcot kernel: PUD 15eea6067 Jan 11 06:28:19 Epcot kernel: PMD 0 Jan 11 06:28:19 Epcot kernel: Jan 11 06:28:19 Epcot kernel: Oops: 0000 [#1] PREEMPT SMP Jan 11 06:28:19 Epcot kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net vhost macvtap macvlan tun iptable_mangle veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod it87 hwmon_vid bonding r8169 mii x86_pkg_temp_thermal coretemp kvm_intel mpt3sas kvm raid_class scsi_transport_sas i2c_i801 i2c_smbus i2c_core ahci libahci [last unloaded: md_mod] Jan 11 06:28:19 Epcot kernel: CPU: 0 PID: 23038 Comm: php Not tainted 4.9.30-unRAID #1 Jan 11 06:28:19 Epcot kernel: Hardware name: Gigabyte Technology Co., Ltd. P67A-UD3-B3/P67A-UD3-B3, BIOS F4 04/13/2011 Jan 11 06:28:19 Epcot kernel: task: ffff8803cac46600 task.stack: ffffc90014108000 Jan 11 06:28:19 Epcot kernel: RIP: 0010:[<ffffffff81137582>] [<ffffffff81137582>] iput+0x83/0x170 Jan 11 06:28:19 Epcot kernel: RSP: 0018:ffffc9001410b8e8 EFLAGS: 00010246 Jan 11 06:28:19 Epcot kernel: RAX: 0000000000000000 RBX: ffff88000bb0c000 RCX: 0000000000000000 Jan 11 06:28:19 Epcot kernel: RDX: 0000000000000001 RSI: ffff88000bb0c080 RDI: ffff88000bb0c080 Jan 11 06:28:19 Epcot kernel: RBP: ffffc9001410b908 R08: ffffffff81c05ef8 R09: ffff8803151acef0 Jan 11 06:28:19 Epcot kernel: R10: ffff88000bb76e38 R11: 0000000000000983 R12: ffff88000bb0c080 Jan 11 06:28:19 Epcot kernel: R13: 000000000a6d78d2 R14: ffffc9001410b9b0 R15: ffff88029a4fd840 Jan 11 06:28:19 Epcot kernel: FS: 00002b97d932f880(0000) GS:ffff88041f400000(0000) knlGS:0000000000000000 Jan 11 06:28:19 Epcot kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 11 06:28:19 Epcot kernel: CR2: 000000000a6d7902 CR3: 0000000130c57000 CR4: 00000000000426f0 Jan 11 06:28:19 Epcot kernel: Stack: Jan 11 06:28:19 Epcot kernel: ffff88029a4fd840 ffff88000bb0c000 ffff88029a4fd898 ffffc9001410b9b0 Jan 11 06:28:19 Epcot kernel: ffffc9001410b928 ffffffff8113289a ffff88029a4fd840 ffff8801167e69c0 Jan 11 06:28:19 Epcot kernel: ffffc9001410b950 ffffffff8113386d ffff8801167e69c0 ffff88029a4fd8c0 Jan 11 06:28:19 Epcot kernel: Call Trace: Jan 11 06:28:19 Epcot kernel: [<ffffffff8113289a>] dentry_unlink_inode+0xcd/0xd2 Jan 11 06:28:19 Epcot kernel: [<ffffffff8113386d>] __dentry_kill+0xcc/0x12d Jan 11 06:28:19 Epcot kernel: [<ffffffff81133d4f>] shrink_dentry_list+0x18e/0x296 Jan 11 06:28:19 Epcot kernel: [<ffffffff811345bd>] prune_dcache_sb+0x45/0x50 Jan 11 06:28:19 Epcot kernel: [<ffffffff811245e9>] super_cache_scan+0xfa/0x174 Jan 11 06:28:19 Epcot kernel: [<ffffffff810d41e0>] shrink_slab+0x1c8/0x26e Jan 11 06:28:19 Epcot kernel: [<ffffffff810d7495>] shrink_node+0xe6/0x27a Jan 11 06:28:19 Epcot kernel: [<ffffffff810d77cc>] do_try_to_free_pages+0x1a3/0x2b9 Jan 11 06:28:19 Epcot kernel: [<ffffffff810d7980>] try_to_free_pages+0x9e/0xa5 Jan 11 06:28:19 Epcot kernel: [<ffffffff810cbab9>] __alloc_pages_nodemask+0x493/0xc71 Jan 11 06:28:19 Epcot kernel: [<ffffffff810ea0ce>] ? __pte_alloc+0x94/0xc5 Jan 11 06:28:19 Epcot kernel: [<ffffffff81117aef>] ? get_mem_cgroup_from_mm+0x9c/0xa4 Jan 11 06:28:19 Epcot kernel: [<ffffffff813a487e>] ? cpumask_any_but+0x21/0x34 Jan 11 06:28:19 Epcot kernel: [<ffffffff810479e4>] ? flush_tlb_page+0x65/0x96 Jan 11 06:28:19 Epcot kernel: [<ffffffff81103969>] alloc_pages_vma+0x155/0x1f5 Jan 11 06:28:19 Epcot kernel: [<ffffffff8111343b>] do_huge_pmd_wp_page+0x21e/0xbea Jan 11 06:28:19 Epcot kernel: [<ffffffff810ea3e3>] ? do_wp_page+0x17a/0x5c8 Jan 11 06:28:19 Epcot kernel: [<ffffffff81068361>] ? resched_curr+0x3c/0x4a Jan 11 06:28:19 Epcot kernel: [<ffffffff810edbbd>] handle_mm_fault+0x319/0xf96 Jan 11 06:28:19 Epcot kernel: [<ffffffff81042252>] __do_page_fault+0x24a/0x3ed Jan 11 06:28:19 Epcot kernel: [<ffffffff81042438>] do_page_fault+0x22/0x27 Jan 11 06:28:19 Epcot kernel: [<ffffffff81680f18>] page_fault+0x28/0x30 Jan 11 06:28:19 Epcot kernel: Code: 01 00 00 48 81 a3 98 00 00 00 ff f7 ff ff 4c 89 e7 e8 57 7b 54 00 be 01 00 00 00 48 89 df e8 1e bd 00 00 eb ae 4c 8b 6b 28 a8 08 <4d> 8b 75 30 74 11 be cf 05 00 00 48 c7 c7 f2 13 95 81 e8 f0 5b Jan 11 06:28:19 Epcot kernel: RIP [<ffffffff81137582>] iput+0x83/0x170 Jan 11 06:28:19 Epcot kernel: RSP <ffffc9001410b8e8> Jan 11 06:28:19 Epcot kernel: CR2: 000000000a6d7902 Jan 11 06:28:19 Epcot kernel: ---[ end trace 2779e23755704d0b ]--- Link to comment
FraxTech Posted January 12, 2018 Author Share Posted January 12, 2018 50 minutes ago, johnnie.black said: Kernel oops, try running memtest: Thanks, Johnnie! I'll try that and see what I get. Link to comment
FraxTech Posted January 17, 2018 Author Share Posted January 17, 2018 So I let Memtest run for 30 passes (basically for 3 days) and it found zero errors. I restarted my server (running yet ANOTHER parity check) and it has been up for 2 days, but has crashed again. I was trying to start a Windows 7 VM when the GUI stopped working. Just like always, everything in the background is still running, but the web GUI is non-operational. I have run the diagnostics again and attached them. Thanks for any help you can give. On another note, as I'm running 1 to 2 parity checks a week because unRAID is being so unstable, I'm just curious if I need to start worrying about burning up drives? I have 15 drives in the system, most of which are WD Enterprise drives, so they should be pretty tough, but they are all a bit old (I moved them from a previous Windows machine where they were set up in a large raid 5). Thanks again. epcot-diagnostics-20180116-2212.zip Link to comment
pwm Posted January 17, 2018 Share Posted January 17, 2018 3 hours ago, FraxTech said: I'm just curious if I need to start worrying about burning up drives? There isn't much more loads on the drives while doing the parity sync compared to drives that just spins. It's only if you concurrently stream data from the system that one or more drives has to work hard to constantly seek between the sync read position and the position where your streamed media is. Link to comment
JorgeB Posted January 17, 2018 Share Posted January 17, 2018 Try upgrading to v6.4, might be worth starting with a clean install and just copy super.dat (disks assignments) and your key form the current install, both on the config folder, if all works it's not usually complicated to reconfig the server. Link to comment
FraxTech Posted January 17, 2018 Author Share Posted January 17, 2018 14 hours ago, pwm said: There isn't much more loads on the drives while doing the parity sync compared to drives that just spins. It's only if you concurrently stream data from the system that one or more drives has to work hard to constantly seek between the sync read position and the position where your streamed media is. Thanks, pwm, this is good to know; I was afraid I was burning up the drives with the constand full drive reads. The server is mainly for backing up data, a backup Plex server for when my main one goes down, and for encoding video, so it doesn't usually get hit too hard for other things during the parity checks. 12 hours ago, johnnie.black said: Try upgrading to v6.4, might be worth starting with a clean install and just copy super.dat (disks assignments) and your key form the current install, both on the config folder, if all works it's not usually complicated to reconfig the server. Thanks, johnnie, that's kind of what I'm thinking, too. I'm going to need to start looking into 6.4, as I haven't looked at it at all. Link to comment
FraxTech Posted January 18, 2018 Author Share Posted January 18, 2018 18 hours ago, johnnie.black said: Try upgrading to v6.4, might be worth starting with a clean install and just copy super.dat (disks assignments) and your key form the current install, both on the config folder, if all works it's not usually complicated to reconfig the server. Johnnie, do you know of a guide on how to upgrade from one version of unRAID to another? I'm new to unRAID (I've only been using it for a few months) and I'm not sure how to go about doing an upgrade. Any help you could give would be awesome. Thanks! Link to comment
FraxTech Posted January 19, 2018 Author Share Posted January 19, 2018 On 1/18/2018 at 2:31 AM, johnnie.black said: See this post: Awesome, thanks, Johnnie, I'll take a look at that! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.