Parity Check fills log, crashes server

BinaryPatrick · April 27, 2020

I am running a parity sync and have lots of problems. 212786 problems to be exact and I'm only 50% of the way through. Likely drive issues aside, I can't seem to access the system log (Tools > System Log) to do any troubleshooting. All I get is this error:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 433421704 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20

I have increased my log size in my go file to 512MB with this command: mount -o remount,size=512m /var/log

Does anyone know why my log file is filling up (81% or 414MB), and why I can't access it? The array is online in Maintenance mode during the Parity-Check.

Downloading the log file, it's lots of these types of errors:

Apr 27 16:16:34 tower kernel: WARNING: CPU: 2 PID: 12696 at drivers/iommu/intel-iommu.c:2300 __domain_mapping+0x205/0x2dd
Apr 27 16:16:34 tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod i915 i2c_algo_bit iosf_mbi drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops it87 hwmon_vid bonding x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel btusb ghash_clmulni_intel pcbc btrtl btbcm btintel bluetooth aesni_intel aes_x86_64 crypto_simd cryptd glue_helper i2c_i801 i2c_core mxm_wmi e1000e intel_cstate intel_uncore intel_rapl_perf ecdh_generic ahci libahci video wmi pcc_cpufreq backlight thermal button fan [last unloaded: tun]
Apr 27 16:16:34 tower kernel: CPU: 2 PID: 12696 Comm: unraidd0 Tainted: G        W         4.19.107-Unraid #1
Apr 27 16:16:34 tower kernel: Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD3H/Z87X-UD3H-CF, BIOS F9 03/18/2014
Apr 27 16:16:34 tower kernel: RIP: 0010:__domain_mapping+0x205/0x2dd
Apr 27 16:16:34 tower kernel: Code: 48 c7 c7 b7 b6 d7 81 e8 1f a2 c7 ff 8b 05 8b 5c a5 00 85 c0 74 08 ff c8 89 05 7f 5c a5 00 48 c7 c7 79 fc d2 81 e8 01 a2 c7 ff <0f> 0b 8b 54 24 24 b8 34 00 00 00 8d 0c d2 83 e9 09 83 f9 34 0f 4f
Apr 27 16:16:34 tower kernel: RSP: 0018:ffffc90003e379c8 EFLAGS: 00010046
Apr 27 16:16:34 tower kernel: RAX: 0000000000000024 RBX: 0000000818b51002 RCX: 0000000000000007
Apr 27 16:16:34 tower kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88881f1164f0
Apr 27 16:16:34 tower kernel: RBP: 0000000000000001 R08: 0000000000000003 R09: 000000000001ae00
Apr 27 16:16:34 tower kernel: R10: 0000000000000000 R11: 0000000000000044 R12: 00000000000dd51f
Apr 27 16:16:34 tower kernel: R13: ffff8888183aeab0 R14: ffff888817cc0000 R15: ffff8887b9a548f8
Apr 27 16:16:34 tower kernel: FS:  0000000000000000(0000) GS:ffff88881f100000(0000) knlGS:0000000000000000
Apr 27 16:16:34 tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 27 16:16:34 tower kernel: CR2: 000014fc1e37c0a0 CR3: 0000000001e0a006 CR4: 00000000001606e0
Apr 27 16:16:34 tower kernel: Call Trace:
Apr 27 16:16:34 tower kernel: domain_mapping+0x16/0xa7
Apr 27 16:16:34 tower kernel: intel_map_sg+0x144/0x189
Apr 27 16:16:34 tower kernel: ata_qc_issue+0x10a/0x195
Apr 27 16:16:34 tower kernel: ? ata_scsi_write_same_xlat+0x2f7/0x2f7
Apr 27 16:16:34 tower kernel: ata_scsi_translate+0xdd/0x14d
Apr 27 16:16:34 tower kernel: ata_scsi_queuecmd+0x254/0x2a8
Apr 27 16:16:34 tower kernel: scsi_dispatch_cmd+0xa2/0xca
Apr 27 16:16:34 tower kernel: scsi_queue_rq+0x395/0x447
Apr 27 16:16:34 tower kernel: blk_mq_dispatch_rq_list+0x2b9/0x491
Apr 27 16:16:34 tower kernel: blk_mq_do_dispatch_sched+0xd0/0xf6
Apr 27 16:16:34 tower kernel: blk_mq_sched_dispatch_requests+0xf7/0x14b
Apr 27 16:16:34 tower kernel: __blk_mq_run_hw_queue+0xaf/0xd6
Apr 27 16:16:34 tower kernel: __blk_mq_delay_run_hw_queue+0x41/0x11f
Apr 27 16:16:34 tower kernel: blk_mq_run_hw_queue+0xb4/0xd4
Apr 27 16:16:34 tower kernel: blk_mq_flush_plug_list+0xc0/0x111
Apr 27 16:16:34 tower kernel: blk_flush_plug_list+0xd7/0x1e8
Apr 27 16:16:34 tower kernel: blk_finish_plug+0x1a/0x27
Apr 27 16:16:34 tower kernel: unraidd+0x130c/0x136e [md_mod]
Apr 27 16:16:34 tower kernel: ? __switch_to_asm+0x35/0x70
Apr 27 16:16:34 tower kernel: ? __schedule+0x4f7/0x548
Apr 27 16:16:34 tower kernel: ? md_thread+0xee/0x115 [md_mod]
Apr 27 16:16:34 tower kernel: ? rmw5_write_data+0x172/0x172 [md_mod]
Apr 27 16:16:34 tower kernel: md_thread+0xee/0x115 [md_mod]
Apr 27 16:16:34 tower kernel: ? wait_woken+0x6a/0x6a
Apr 27 16:16:34 tower kernel: ? md_open+0x2c/0x2c [md_mod]
Apr 27 16:16:34 tower kernel: kthread+0x10c/0x114
Apr 27 16:16:34 tower kernel: ? kthread_park+0x89/0x89
Apr 27 16:16:34 tower kernel: ret_from_fork+0x35/0x40
Apr 27 16:16:34 tower kernel: ---[ end trace 2bce6eca155e7542 ]---
Apr 27 16:16:34 tower kernel: DMAR: ERROR: DMA PTE for vPFN 0xdd51f already set (to 100000 not 7e598d002)

Edited April 27, 2020 by BinaryPatrick

trurl · April 27, 2020

No point in continuing with that. Zero errors are the only acceptable result so you need to start over with it anyway.

Stop, shutdown, check all connections, power and SATA, both ends, including any power splitters.

Then start again and if you get any errors at all, post Diagnostics instead of syslog.

Go to Tools-diagnostics and attach the complete Diagnostics zip file to your NEXT post.

BinaryPatrick · April 27, 2020

Here's the diagnostics. I checked the cables and then restarted the parity check. It got to about 36% before errors began to appear. It was at 1,200 errors when I pulled the diagnostics.

homeserve-diagnostics-20200427-1902.zip

trurl · April 27, 2020

2 hours ago, BinaryPatrick said:

running a parity sync

Parity sync would mean you were building parity. Your syslog indicates you are doing a correcting parity check, not a sync. The fact that it is correcting so many parity errors suggests you didn't have valid parity for some reason.

Can you tell us more about how you got to this place? Did you New Config for some reason and tell it parity was already valid?

BinaryPatrick · April 27, 2020

Sorry, yes. This all started with a scheduled parity check. I'm not sure what the cause might be. I didn't lose power or have any other issues since the parity check. I haven't added more than maybe 3GB in new files. I did set up a new Plex app/container, and add a new share.

BinaryPatrick · April 28, 2020

Also, my real question is why is the logs filling up, and can I do anything about it. When the server hits 100% it becomes unresponsive.

trurl · April 28, 2020

2 hours ago, BinaryPatrick said:

This all started with a scheduled parity check.

Did your last parity check have zero errors? Did you make any disk changes since then? Go to Main - Array Operation, click History, and post a screenshot.

BinaryPatrick · April 28, 2020

trurl · April 28, 2020

Have you done memtest?

BinaryPatrick · April 28, 2020

Yes. I saw some posts in the reddit group that mentioned bad RAM can cause mass parity issues, so I ran MEM86 yesterday. It didn't find any errors. I also ran a short SMART test on each drive, and those returned no errors as well.

The parity check just finished. I rebooted and am going to re-run the check again. I feel like I have a failing disk, but I'll see if there are still lots of errors. Not sure what else to check at this point.

Edited April 28, 2020 by BinaryPatrick

BinaryPatrick · April 28, 2020

The re-run ran successfully. 0 errors. Not sure what's going on but thanks for the support.

Parity Check fills log, crashes server

Recommended Posts

BinaryPatrick

Link to comment

trurl

Link to comment

BinaryPatrick

Link to comment

trurl

Link to comment

BinaryPatrick

Link to comment

BinaryPatrick

Link to comment

trurl

Link to comment

BinaryPatrick

Link to comment

trurl

Link to comment

BinaryPatrick

Link to comment

BinaryPatrick

Link to comment

Join the conversation