unRAID 6.3.5 - Web GUI repeatedly crashing

FraxTech · January 12, 2018

Over the last few weeks I've been having a problem with the Web GUI crashing and not allowing me to do anything, but my dockers are still running (I can access my VPN, Minecraft Server, all the shares, and can SSH into the machine, etc.). I haven't been able to find a way to restart the GUI or to shut the system down gracefully through SSH, so I've been having to hard reboot the server every few days, which has lead to a crazy number of parity checks, so I'm afraid I'm going to start having drives fail.

These crashes always seem to happen when I'm doing something mundane on the server. Today I tried to do a soft reboot from the UI (as the last 5 or 6 times the server was booted up was from these lockups and me having to hard reboot it), and I got the reboot message in my browser, but the server never came back online. I'm only a novice with Linux, so I've tried to understand the log file, but I'm not seeing anything that tells me what the issue may be.

So with this, I have three questions...First, is there a way to restart the Web GUI through SSH if it crashes? If not, is there a way to gracefully shut down the system via SSH? And lastly, can someone with more knowledge than myself please take a look at the attached diagnostics file and tell me if there is something there I can change to make the server more stable?

Thanks for any help you all can give. Cheers.

epcot-diagnostics-20180112-0037.zip

JorgeB · January 12, 2018

Kernel oops, try running memtest:

Jan 11 06:28:19 Epcot kernel: BUG: unable to handle kernel paging request at 000000000a6d7902
Jan 11 06:28:19 Epcot kernel: IP: [<ffffffff81137582>] iput+0x83/0x170
Jan 11 06:28:19 Epcot kernel: PGD 12cf88067
Jan 11 06:28:19 Epcot kernel: PUD 15eea6067
Jan 11 06:28:19 Epcot kernel: PMD 0
Jan 11 06:28:19 Epcot kernel:
Jan 11 06:28:19 Epcot kernel: Oops: 0000 [#1] PREEMPT SMP
Jan 11 06:28:19 Epcot kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net vhost macvtap macvlan tun iptable_mangle veth xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod it87 hwmon_vid bonding r8169 mii x86_pkg_temp_thermal coretemp kvm_intel mpt3sas kvm raid_class scsi_transport_sas i2c_i801 i2c_smbus i2c_core ahci libahci [last unloaded: md_mod]
Jan 11 06:28:19 Epcot kernel: CPU: 0 PID: 23038 Comm: php Not tainted 4.9.30-unRAID #1
Jan 11 06:28:19 Epcot kernel: Hardware name: Gigabyte Technology Co., Ltd. P67A-UD3-B3/P67A-UD3-B3, BIOS F4 04/13/2011
Jan 11 06:28:19 Epcot kernel: task: ffff8803cac46600 task.stack: ffffc90014108000
Jan 11 06:28:19 Epcot kernel: RIP: 0010:[<ffffffff81137582>]  [<ffffffff81137582>] iput+0x83/0x170
Jan 11 06:28:19 Epcot kernel: RSP: 0018:ffffc9001410b8e8  EFLAGS: 00010246
Jan 11 06:28:19 Epcot kernel: RAX: 0000000000000000 RBX: ffff88000bb0c000 RCX: 0000000000000000
Jan 11 06:28:19 Epcot kernel: RDX: 0000000000000001 RSI: ffff88000bb0c080 RDI: ffff88000bb0c080
Jan 11 06:28:19 Epcot kernel: RBP: ffffc9001410b908 R08: ffffffff81c05ef8 R09: ffff8803151acef0
Jan 11 06:28:19 Epcot kernel: R10: ffff88000bb76e38 R11: 0000000000000983 R12: ffff88000bb0c080
Jan 11 06:28:19 Epcot kernel: R13: 000000000a6d78d2 R14: ffffc9001410b9b0 R15: ffff88029a4fd840
Jan 11 06:28:19 Epcot kernel: FS:  00002b97d932f880(0000) GS:ffff88041f400000(0000) knlGS:0000000000000000
Jan 11 06:28:19 Epcot kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 11 06:28:19 Epcot kernel: CR2: 000000000a6d7902 CR3: 0000000130c57000 CR4: 00000000000426f0
Jan 11 06:28:19 Epcot kernel: Stack:
Jan 11 06:28:19 Epcot kernel: ffff88029a4fd840 ffff88000bb0c000 ffff88029a4fd898 ffffc9001410b9b0
Jan 11 06:28:19 Epcot kernel: ffffc9001410b928 ffffffff8113289a ffff88029a4fd840 ffff8801167e69c0
Jan 11 06:28:19 Epcot kernel: ffffc9001410b950 ffffffff8113386d ffff8801167e69c0 ffff88029a4fd8c0
Jan 11 06:28:19 Epcot kernel: Call Trace:
Jan 11 06:28:19 Epcot kernel: [<ffffffff8113289a>] dentry_unlink_inode+0xcd/0xd2
Jan 11 06:28:19 Epcot kernel: [<ffffffff8113386d>] __dentry_kill+0xcc/0x12d
Jan 11 06:28:19 Epcot kernel: [<ffffffff81133d4f>] shrink_dentry_list+0x18e/0x296
Jan 11 06:28:19 Epcot kernel: [<ffffffff811345bd>] prune_dcache_sb+0x45/0x50
Jan 11 06:28:19 Epcot kernel: [<ffffffff811245e9>] super_cache_scan+0xfa/0x174
Jan 11 06:28:19 Epcot kernel: [<ffffffff810d41e0>] shrink_slab+0x1c8/0x26e
Jan 11 06:28:19 Epcot kernel: [<ffffffff810d7495>] shrink_node+0xe6/0x27a
Jan 11 06:28:19 Epcot kernel: [<ffffffff810d77cc>] do_try_to_free_pages+0x1a3/0x2b9
Jan 11 06:28:19 Epcot kernel: [<ffffffff810d7980>] try_to_free_pages+0x9e/0xa5
Jan 11 06:28:19 Epcot kernel: [<ffffffff810cbab9>] __alloc_pages_nodemask+0x493/0xc71
Jan 11 06:28:19 Epcot kernel: [<ffffffff810ea0ce>] ? __pte_alloc+0x94/0xc5
Jan 11 06:28:19 Epcot kernel: [<ffffffff81117aef>] ? get_mem_cgroup_from_mm+0x9c/0xa4
Jan 11 06:28:19 Epcot kernel: [<ffffffff813a487e>] ? cpumask_any_but+0x21/0x34
Jan 11 06:28:19 Epcot kernel: [<ffffffff810479e4>] ? flush_tlb_page+0x65/0x96
Jan 11 06:28:19 Epcot kernel: [<ffffffff81103969>] alloc_pages_vma+0x155/0x1f5
Jan 11 06:28:19 Epcot kernel: [<ffffffff8111343b>] do_huge_pmd_wp_page+0x21e/0xbea
Jan 11 06:28:19 Epcot kernel: [<ffffffff810ea3e3>] ? do_wp_page+0x17a/0x5c8
Jan 11 06:28:19 Epcot kernel: [<ffffffff81068361>] ? resched_curr+0x3c/0x4a
Jan 11 06:28:19 Epcot kernel: [<ffffffff810edbbd>] handle_mm_fault+0x319/0xf96
Jan 11 06:28:19 Epcot kernel: [<ffffffff81042252>] __do_page_fault+0x24a/0x3ed
Jan 11 06:28:19 Epcot kernel: [<ffffffff81042438>] do_page_fault+0x22/0x27
Jan 11 06:28:19 Epcot kernel: [<ffffffff81680f18>] page_fault+0x28/0x30
Jan 11 06:28:19 Epcot kernel: Code: 01 00 00 48 81 a3 98 00 00 00 ff f7 ff ff 4c 89 e7 e8 57 7b 54 00 be 01 00 00 00 48 89 df e8 1e bd 00 00 eb ae 4c 8b 6b 28 a8 08 <4d> 8b 75 30 74 11 be cf 05 00 00 48 c7 c7 f2 13 95 81 e8 f0 5b
Jan 11 06:28:19 Epcot kernel: RIP  [<ffffffff81137582>] iput+0x83/0x170
Jan 11 06:28:19 Epcot kernel: RSP <ffffc9001410b8e8>
Jan 11 06:28:19 Epcot kernel: CR2: 000000000a6d7902
Jan 11 06:28:19 Epcot kernel: ---[ end trace 2779e23755704d0b ]---

FraxTech · January 12, 2018

50 minutes ago, johnnie.black said:

Kernel oops, try running memtest:

Thanks, Johnnie! I'll try that and see what I get.

FraxTech · January 17, 2018

So I let Memtest run for 30 passes (basically for 3 days) and it found zero errors. I restarted my server (running yet ANOTHER parity check) and it has been up for 2 days, but has crashed again. I was trying to start a Windows 7 VM when the GUI stopped working. Just like always, everything in the background is still running, but the web GUI is non-operational.

I have run the diagnostics again and attached them. Thanks for any help you can give.

On another note, as I'm running 1 to 2 parity checks a week because unRAID is being so unstable, I'm just curious if I need to start worrying about burning up drives? I have 15 drives in the system, most of which are WD Enterprise drives, so they should be pretty tough, but they are all a bit old (I moved them from a previous Windows machine where they were set up in a large raid 5). Thanks again.

epcot-diagnostics-20180116-2212.zip

pwm · January 17, 2018

3 hours ago, FraxTech said:

I'm just curious if I need to start worrying about burning up drives?

There isn't much more loads on the drives while doing the parity sync compared to drives that just spins. It's only if you concurrently stream data from the system that one or more drives has to work hard to constantly seek between the sync read position and the position where your streamed media is.

JorgeB · January 17, 2018

Try upgrading to v6.4, might be worth starting with a clean install and just copy super.dat (disks assignments) and your key form the current install, both on the config folder, if all works it's not usually complicated to reconfig the server.

FraxTech · January 17, 2018

14 hours ago, pwm said:

There isn't much more loads on the drives while doing the parity sync compared to drives that just spins. It's only if you concurrently stream data from the system that one or more drives has to work hard to constantly seek between the sync read position and the position where your streamed media is.

Thanks, pwm, this is good to know; I was afraid I was burning up the drives with the constand full drive reads. The server is mainly for backing up data, a backup Plex server for when my main one goes down, and for encoding video, so it doesn't usually get hit too hard for other things during the parity checks.

12 hours ago, johnnie.black said:

Try upgrading to v6.4, might be worth starting with a clean install and just copy super.dat (disks assignments) and your key form the current install, both on the config folder, if all works it's not usually complicated to reconfig the server.

Thanks, johnnie, that's kind of what I'm thinking, too. I'm going to need to start looking into 6.4, as I haven't looked at it at all.

FraxTech · January 18, 2018

18 hours ago, johnnie.black said:

Try upgrading to v6.4, might be worth starting with a clean install and just copy super.dat (disks assignments) and your key form the current install, both on the config folder, if all works it's not usually complicated to reconfig the server.

Johnnie, do you know of a guide on how to upgrade from one version of unRAID to another? I'm new to unRAID (I've only been using it for a few months) and I'm not sure how to go about doing an upgrade. Any help you could give would be awesome. Thanks!

JorgeB · January 18, 2018

See this post:

FraxTech · January 19, 2018

On 1/18/2018 at 2:31 AM, johnnie.black said:

See this post:

Awesome, thanks, Johnnie, I'll take a look at that!

unRAID 6.3.5 - Web GUI repeatedly crashing

Recommended Posts

FraxTech

Link to comment

JorgeB

Link to comment

FraxTech

Link to comment

FraxTech

Link to comment

pwm

Link to comment

JorgeB

Link to comment

FraxTech

Link to comment

FraxTech

Link to comment

JorgeB

Link to comment

FraxTech

Link to comment

Archived