Overnight Freezes


RobV

Recommended Posts

Hi All, I've been using unraid for a few months and so far it has been brilliant. However, over the past two weeks my server is dying almost every night, I won't be able to access the webgui, the console will show the attached errors and all the VMs/Dockers will be frozen. Only a hard restart will bring it back up. I've also noticed that when the server enters this hung state it will be constantly pinging the network and seems to block the rest of the users on the network from accessing the internet.

 

System Details:

i7-8700k

AsRock Z370 Extreme4

GTX-1050

Additional PCIe USB3 controller

 

The system is stable, I've done 4k video trans-codes over 8+ hours using Handbrake without any issues and prior to installing Unraid I had windows on it with a whole bunch o stability tests. Due to the nature of the problem the attached screenshot is the best I can do with logging the fault, I've included a syslog as well but I don't know if that will include the error. If there is another way to log the issue please let me know and I will get it!

 

Thanks.

18-10-31 16-28-28 0608.jpg

gargantubrain-diagnostics-20181105-1538.zip

Link to comment

I would using the Tips & Tweaks plugin install MCELOG (you've got a machine check error).  Then from Fix Common Problems, set it to run every hour, and put it into Troubleshooting Mode.

 

Maybe it will catch the error.

 

You might want to also run a memtest for ~24hours to see if its the memory causing the issue.

Link to comment

My server managed to go a few days without the error but it came up tonight and I think Fix Common Problems caught it in the attached logs. Could this be the issue here?

Quote

Nov 13 09:43:42 Gargantubrain kernel: general protection fault: 0000 [#1] SMP PTI
Nov 13 09:43:42 Gargantubrain kernel: CPU: 6 PID: 13001 Comm: Thread Pool Wor Not tainted 4.18.17-unRAID #1
Nov 13 09:43:42 Gargantubrain kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Extreme4, BIOS P3.10 07/04/2018
Nov 13 09:43:42 Gargantubrain kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x12
Nov 13 09:43:42 Gargantubrain kernel: Code: c0 74 03 31 c0 c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 f0 b8 01 00 00 00 c3 c6 07 00 0f 1f 40 00 48 89 f7 57 9d 0f 1f 44 00 00 <c3> 8b 07 a9 ff 01 00 00 75 1d ba 00 02 00 00 f0 0f c1 17 81 e2 ff 
Nov 13 09:43:42 Gargantubrain kernel: RSP: 0018:ffffc900210f7eca EFLAGS: 00010207
Nov 13 09:43:42 Gargantubrain kernel: RAX: 0000000000000000 RBX: 0000000000000040 RCX: 00000000e2c8b213
Nov 13 09:43:42 Gargantubrain kernel: RDX: 0000000020878654 RSI: 0000000000000207 RDI: 0000000000000207
Nov 13 09:43:42 Gargantubrain kernel: RBP: 0000000000000000 R08: 0000000067b66157 R09: 000000008cbeec77
Nov 13 09:43:42 Gargantubrain kernel: R10: 00000000a76d25d1 R11: 00000000cfaea6fe R12: 34ab000000000000
Nov 13 09:43:42 Gargantubrain kernel: R13: 4d46ffffffff813d R14: 000014bf5a2fdc7c R15: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: FS:  000014bf5a2fe700(0000) GS:ffff88085e780000(0000) knlGS:0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 13 09:43:42 Gargantubrain kernel: CR2: 000000e363b94000 CR3: 00000003d92f8006 CR4: 00000000003626e0
Nov 13 09:43:42 Gargantubrain kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 13 09:43:42 Gargantubrain kernel: Call Trace:
Nov 13 09:43:42 Gargantubrain kernel: BUG: stack guard page was hit at 00000000bcf33f61 (stack is 000000000b3c078d..00000000c9dce5db)
Nov 13 09:43:42 Gargantubrain kernel: kernel stack overflow (page fault): 0000 [#2] SMP PTI
Nov 13 09:43:42 Gargantubrain kernel: CPU: 6 PID: 13001 Comm: Thread Pool Wor Not tainted 4.18.17-unRAID #1
Nov 13 09:43:42 Gargantubrain kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Extreme4, BIOS P3.10 07/04/2018
Nov 13 09:43:42 Gargantubrain kernel: RIP: 0010:show_trace_log_lvl+0x1bf/0x2c0
Nov 13 09:43:42 Gargantubrain kernel: Code: 48 8d 8d 50 ff ff ff 48 81 e3 00 f0 ff ff 48 8d 95 58 ff ff ff 48 89 df e8 36 f6 ff ff 85 c0 0f 84 54 ff ff ff e9 db 00 00 00 <48> 8b 03 48 8d bd 78 ff ff ff 48 89 85 40 ff ff ff e8 6a 3e 02 00 
Nov 13 09:43:42 Gargantubrain kernel: RSP: 0018:ffffc900210f7ce0 EFLAGS: 00010016
Nov 13 09:43:42 Gargantubrain kernel: RAX: 0000000000000000 RBX: ffffc900210f7ffa RCX: ffffc900210f8000
Nov 13 09:43:42 Gargantubrain kernel: RDX: ffffc900210f7eca RSI: 0000000000000001 RDI: 002b000014bf5a2f
Nov 13 09:43:42 Gargantubrain kernel: RBP: ffffc900210f7da8 R08: ffffffff81f6b57c R09: ffffffff81f6b580
Nov 13 09:43:42 Gargantubrain kernel: R10: ffffffff81f6b57c R11: 002b000014bf5a2f R12: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: R13: ffff88081aad9c00 R14: ffffffff81d28ef8 R15: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: FS:  000014bf5a2fe700(0000) GS:ffff88085e780000(0000) knlGS:0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 13 09:43:42 Gargantubrain kernel: CR2: ffffc900210f8000 CR3: 00000003d92f8006 CR4: 00000000003626e0
Nov 13 09:43:42 Gargantubrain kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 13 09:43:42 Gargantubrain kernel: Call Trace:
Nov 13 09:43:42 Gargantubrain kernel: __die+0x7c/0xbe
Nov 13 09:43:42 Gargantubrain kernel: die+0x2b/0x44
Nov 13 09:43:42 Gargantubrain kernel: general_protection+0x1e/0x30
Nov 13 09:43:42 Gargantubrain kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x12
Nov 13 09:43:42 Gargantubrain kernel: Code: c0 74 03 31 c0 c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 f0 b8 01 00 00 00 c3 c6 07 00 0f 1f 40 00 48 89 f7 57 9d 0f 1f 44 00 00 <c3> 8b 07 a9 ff 01 00 00 75 1d ba 00 02 00 00 f0 0f c1 17 81 e2 ff 
Nov 13 09:43:42 Gargantubrain kernel: RSP: 0018:ffffc900210f7eca EFLAGS: 00010207
Nov 13 09:43:42 Gargantubrain kernel: RAX: 0000000000000000 RBX: 0000000000000040 RCX: 00000000e2c8b213
Nov 13 09:43:42 Gargantubrain kernel: RDX: 0000000020878654 RSI: 0000000000000207 RDI: 0000000000000207
Nov 13 09:43:42 Gargantubrain kernel: RBP: 0000000000000000 R08: 0000000067b66157 R09: 000000008cbeec77
Nov 13 09:43:42 Gargantubrain kernel: R10: 00000000a76d25d1 R11: 00000000cfaea6fe R12: 34ab000000000000
Nov 13 09:43:42 Gargantubrain kernel: R13: 4d46ffffffff813d R14: 000014bf5a2fdc7c R15: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: ? do_syscall_64+0x57/0xe6
Nov 13 09:43:42 Gargantubrain kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 13 09:43:42 Gargantubrain kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT xt_nat ebtable_filter ebtables ip6table_filter ip6_tables veth vhost_net tun vhost tap ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod nct6775 hwmon_vid bonding x86_pkg_temp_thermal intel_powerclamp coretemp hid_logitech_hidpp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper hid_logitech_dj e1000e intel_cstate intel_uncore ahci i2c_i801 mxm_wmi intel_rapl_perf i2c_core pcc_cpufreq nvme libahci video nvme_core wmi backlight acpi_pad button
Nov 13 09:43:42 Gargantubrain kernel: ---[ end trace 6564042525a5e839 ]---
Nov 13 09:43:42 Gargantubrain kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x12
Nov 13 09:43:42 Gargantubrain kernel: Code: c0 74 03 31 c0 c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 f0 b8 01 00 00 00 c3 c6 07 00 0f 1f 40 00 48 89 f7 57 9d 0f 1f 44 00 00 <c3> 8b 07 a9 ff 01 00 00 75 1d ba 00 02 00 00 f0 0f c1 17 81 e2 ff 
Nov 13 09:43:42 Gargantubrain kernel: RSP: 0018:ffffc900210f7eca EFLAGS: 00010207
Nov 13 09:43:42 Gargantubrain kernel: RAX: 0000000000000000 RBX: 0000000000000040 RCX: 00000000e2c8b213
Nov 13 09:43:42 Gargantubrain kernel: RDX: 0000000020878654 RSI: 0000000000000207 RDI: 0000000000000207
Nov 13 09:43:42 Gargantubrain kernel: RBP: 0000000000000000 R08: 0000000067b66157 R09: 000000008cbeec77
Nov 13 09:43:42 Gargantubrain kernel: R10: 00000000a76d25d1 R11: 00000000cfaea6fe R12: 34ab000000000000
Nov 13 09:43:42 Gargantubrain kernel: R13: 4d46ffffffff813d R14: 000014bf5a2fdc7c R15: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: FS:  000014bf5a2fe700(0000) GS:ffff88085e780000(0000) knlGS:0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 13 09:43:42 Gargantubrain kernel: CR2: ffffc900210f8000 CR3: 00000003d92f8006 CR4: 00000000003626e0
Nov 13 09:43:42 Gargantubrain kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 13 09:43:42 Gargantubrain kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

I've attached the MCElog output and also noted that when the server froze it killed my entire network somehow! Once I restarted the server other devices could get back onto the network again but I had to restart the router to get the internet back up. Curious.

 

Any help appreciated! I might start a burn test and a memtest.

gargantubrain-diagnostics-20181113-1901.zip

FCPsyslog_tail.txt

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.