Parity Check/Sync Makes Server Unusable


Recommended Posts

6 minutes ago, edrohler said:

Is this expected?

Definitely not

6 minutes ago, edrohler said:

how to troubleshoot this

If you can, then post diagnostics when it's in that state.  If you can't, then enable the  syslog server (Settings tab) mirroring to the flash drive and then post that file up 

Link to comment
10 minutes ago, edrohler said:

Hello, 

 

I cannot run or schedule a parity check without the server becoming unusable. Is this expected? It's a pretty powerful server and I am not sure how to troubleshoot this. Thank you. 

 

 

Not sure why you are getting this problem in the first place, but you might want to look into using the Parity Check Tuning plugin to offload the tasks to periods where the server should be idle.

Link to comment

@Squid, thank you. I have enabled the syslog server and started the parity check again. I will report back if the issue happens again.

 

@itimpi, thank you for the plugin recommendation. I will check it out. 

 

I think that the issue was coming from the Tips and Tweaks CPU governer settings. I originally had this set as Performance and Enabled the Intel Turbo Boost and have reset it to default. It seems to be working ok for now. 

 

image.thumb.png.3290b0a0639185b0942f9d98e832348d.png

 

Link to comment

Welp, that didn't work. I have had to do several unclean shutdowns. The interesting thing is that many of my devices on my network also loose internet connectivity. 

 

The only docker containers I have is the Krusader. Not even sure what is happening.

 

Here is my syslog file.

syslog

Edited by edrohler
Link to comment
2 minutes ago, edrohler said:

I imagine the crashes are from force shutdowns of the server

No, Unraid was crashing possibly during the parity check, and it's very unusual, e.g.:

 

Jan 12 13:06:26 Tower kernel: ------------[ cut here ]------------
Jan 12 13:06:26 Tower kernel: kernel BUG at drivers/md/unraid.c:356!
Jan 12 13:06:26 Tower kernel: invalid opcode: 0000 [#1] SMP PTI
Jan 12 13:06:26 Tower kernel: CPU: 0 PID: 5378 Comm: mdrecoveryd Tainted: P           O      4.19.88-Unraid #1
Jan 12 13:06:26 Tower kernel: Hardware name: System manufacturer System Product Name/Z170-PRO, BIOS 1902 06/27/2016
Jan 12 13:06:26 Tower kernel: RIP: 0010:_get_active_stripe+0x337/0x339 [md_mod]
Jan 12 13:06:26 Tower kernel: Code: 00 00 4c 01 f0 48 8b 90 38 01 00 00 81 e2 00 02 00 00 52 4c 8b 88 50 01 00 00 44 89 d2 4c 8b 80 48 01 00 00 e8 15 f5 fa e0 58 <0f> 0b 41 57 41 56 41 89 ce 41 55 49 89 d5 41 54 41 89 f4 55 53 48
Jan 12 13:06:26 Tower kernel: RSP: 0018:ffffc90003383cd0 EFLAGS: 00010093
Jan 12 13:06:26 Tower kernel: RAX: ffff8888041ef568 RBX: 0000000000000000 RCX: 0000000000000000
Jan 12 13:06:26 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8888041ef568
Jan 12 13:06:26 Tower kernel: RBP: ffff8888514666e0 R08: 0000000000000007 R09: 00000003a3812a48
Jan 12 13:06:26 Tower kernel: R10: 0000000000000000 R11: ffff88885621fb40 R12: 0000000000000001
Jan 12 13:06:26 Tower kernel: R13: ffff8888510044fc R14: ffff8888041ef430 R15: 000000005bc906e0
Jan 12 13:06:26 Tower kernel: FS:  0000000000000000(0000) GS:ffff888856200000(0000) knlGS:0000000000000000
Jan 12 13:06:26 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 13:06:26 Tower kernel: CR2: 00001509252b1718 CR3: 0000000001e0a003 CR4: 00000000003606f0
Jan 12 13:06:26 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 12 13:06:26 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 12 13:06:26 Tower kernel: Call Trace:
Jan 12 13:06:26 Tower kernel: get_active_stripe+0x9f/0x109 [md_mod]
Jan 12 13:06:26 Tower kernel: ? wait_woken+0x6a/0x6a
Jan 12 13:06:26 Tower kernel: unraid_sync+0x17/0x76 [md_mod]
Jan 12 13:06:26 Tower kernel: md_do_sync+0xab/0x1e6 [md_mod]
Jan 12 13:06:26 Tower kernel: ? mod_timer+0x228/0x24f
Jan 12 13:06:26 Tower kernel: md_do_recovery+0x94/0x14d [md_mod]
Jan 12 13:06:26 Tower kernel: md_thread+0xee/0x115 [md_mod]
Jan 12 13:06:26 Tower kernel: ? wait_woken+0x6a/0x6a
Jan 12 13:06:26 Tower kernel: ? md_open+0x2c/0x2c [md_mod]
Jan 12 13:06:26 Tower kernel: kthread+0x10c/0x114
Jan 12 13:06:26 Tower kernel: ? kthread_park+0x89/0x89
Jan 12 13:06:26 Tower kernel: ret_from_fork+0x35/0x40
Jan 12 13:06:26 Tower kernel: Modules linked in: ccp xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod bonding nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd drm e1000e glue_helper wmi_bmof mxm_wmi intel_cstate intel_uncore agpgart i2c_i801 ahci i2c_core libahci intel_rapl_perf wmi video thermal acpi_pad button fan syscopyarea sysfillrect sysimgblt fb_sys_fops backlight pcc_cpufreq
Jan 12 13:06:26 Tower kernel: ---[ end trace c50b5f7c7c9555f1 ]---

 

Link to comment
3 minutes ago, edrohler said:

This is very strange behavior.

I've seen Unraid crash before during a parity check in v6.7.2 multiple times (IIRC only with dual parity), this could usually be fixed by lowering a tunable, but v6.8 uses very different code for check/sync, without that tunable, and I never saw those crashes with v6.8, that doesn't mean it's not indeed an Unraid problem, happening with specific hardware.

  • Like 1
Link to comment

Awesome. That is good to know. Well, I have recreated the boot image but this time I configured it to add UEFI and use the default BIOS settings. Also, re-start Ed the same array configuration. Hopefully, this solves the issue. Thanks again for the information! I'll post back in about 11 hours with the results. 

Edited by edrohler
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.