BinaryPatrick Posted April 27, 2020 Share Posted April 27, 2020 (edited) I am running a parity sync and have lots of problems. 212786 problems to be exact and I'm only 50% of the way through. Likely drive issues aside, I can't seem to access the system log (Tools > System Log) to do any troubleshooting. All I get is this error: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 433421704 bytes) in /usr/local/emhttp/plugins/dynamix/include/Syslog.php on line 20 I have increased my log size in my go file to 512MB with this command: mount -o remount,size=512m /var/log Does anyone know why my log file is filling up (81% or 414MB), and why I can't access it? The array is online in Maintenance mode during the Parity-Check. Downloading the log file, it's lots of these types of errors: Apr 27 16:16:34 tower kernel: WARNING: CPU: 2 PID: 12696 at drivers/iommu/intel-iommu.c:2300 __domain_mapping+0x205/0x2dd Apr 27 16:16:34 tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod i915 i2c_algo_bit iosf_mbi drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops it87 hwmon_vid bonding x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel btusb ghash_clmulni_intel pcbc btrtl btbcm btintel bluetooth aesni_intel aes_x86_64 crypto_simd cryptd glue_helper i2c_i801 i2c_core mxm_wmi e1000e intel_cstate intel_uncore intel_rapl_perf ecdh_generic ahci libahci video wmi pcc_cpufreq backlight thermal button fan [last unloaded: tun] Apr 27 16:16:34 tower kernel: CPU: 2 PID: 12696 Comm: unraidd0 Tainted: G W 4.19.107-Unraid #1 Apr 27 16:16:34 tower kernel: Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD3H/Z87X-UD3H-CF, BIOS F9 03/18/2014 Apr 27 16:16:34 tower kernel: RIP: 0010:__domain_mapping+0x205/0x2dd Apr 27 16:16:34 tower kernel: Code: 48 c7 c7 b7 b6 d7 81 e8 1f a2 c7 ff 8b 05 8b 5c a5 00 85 c0 74 08 ff c8 89 05 7f 5c a5 00 48 c7 c7 79 fc d2 81 e8 01 a2 c7 ff <0f> 0b 8b 54 24 24 b8 34 00 00 00 8d 0c d2 83 e9 09 83 f9 34 0f 4f Apr 27 16:16:34 tower kernel: RSP: 0018:ffffc90003e379c8 EFLAGS: 00010046 Apr 27 16:16:34 tower kernel: RAX: 0000000000000024 RBX: 0000000818b51002 RCX: 0000000000000007 Apr 27 16:16:34 tower kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff88881f1164f0 Apr 27 16:16:34 tower kernel: RBP: 0000000000000001 R08: 0000000000000003 R09: 000000000001ae00 Apr 27 16:16:34 tower kernel: R10: 0000000000000000 R11: 0000000000000044 R12: 00000000000dd51f Apr 27 16:16:34 tower kernel: R13: ffff8888183aeab0 R14: ffff888817cc0000 R15: ffff8887b9a548f8 Apr 27 16:16:34 tower kernel: FS: 0000000000000000(0000) GS:ffff88881f100000(0000) knlGS:0000000000000000 Apr 27 16:16:34 tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 27 16:16:34 tower kernel: CR2: 000014fc1e37c0a0 CR3: 0000000001e0a006 CR4: 00000000001606e0 Apr 27 16:16:34 tower kernel: Call Trace: Apr 27 16:16:34 tower kernel: domain_mapping+0x16/0xa7 Apr 27 16:16:34 tower kernel: intel_map_sg+0x144/0x189 Apr 27 16:16:34 tower kernel: ata_qc_issue+0x10a/0x195 Apr 27 16:16:34 tower kernel: ? ata_scsi_write_same_xlat+0x2f7/0x2f7 Apr 27 16:16:34 tower kernel: ata_scsi_translate+0xdd/0x14d Apr 27 16:16:34 tower kernel: ata_scsi_queuecmd+0x254/0x2a8 Apr 27 16:16:34 tower kernel: scsi_dispatch_cmd+0xa2/0xca Apr 27 16:16:34 tower kernel: scsi_queue_rq+0x395/0x447 Apr 27 16:16:34 tower kernel: blk_mq_dispatch_rq_list+0x2b9/0x491 Apr 27 16:16:34 tower kernel: blk_mq_do_dispatch_sched+0xd0/0xf6 Apr 27 16:16:34 tower kernel: blk_mq_sched_dispatch_requests+0xf7/0x14b Apr 27 16:16:34 tower kernel: __blk_mq_run_hw_queue+0xaf/0xd6 Apr 27 16:16:34 tower kernel: __blk_mq_delay_run_hw_queue+0x41/0x11f Apr 27 16:16:34 tower kernel: blk_mq_run_hw_queue+0xb4/0xd4 Apr 27 16:16:34 tower kernel: blk_mq_flush_plug_list+0xc0/0x111 Apr 27 16:16:34 tower kernel: blk_flush_plug_list+0xd7/0x1e8 Apr 27 16:16:34 tower kernel: blk_finish_plug+0x1a/0x27 Apr 27 16:16:34 tower kernel: unraidd+0x130c/0x136e [md_mod] Apr 27 16:16:34 tower kernel: ? __switch_to_asm+0x35/0x70 Apr 27 16:16:34 tower kernel: ? __schedule+0x4f7/0x548 Apr 27 16:16:34 tower kernel: ? md_thread+0xee/0x115 [md_mod] Apr 27 16:16:34 tower kernel: ? rmw5_write_data+0x172/0x172 [md_mod] Apr 27 16:16:34 tower kernel: md_thread+0xee/0x115 [md_mod] Apr 27 16:16:34 tower kernel: ? wait_woken+0x6a/0x6a Apr 27 16:16:34 tower kernel: ? md_open+0x2c/0x2c [md_mod] Apr 27 16:16:34 tower kernel: kthread+0x10c/0x114 Apr 27 16:16:34 tower kernel: ? kthread_park+0x89/0x89 Apr 27 16:16:34 tower kernel: ret_from_fork+0x35/0x40 Apr 27 16:16:34 tower kernel: ---[ end trace 2bce6eca155e7542 ]--- Apr 27 16:16:34 tower kernel: DMAR: ERROR: DMA PTE for vPFN 0xdd51f already set (to 100000 not 7e598d002) Edited April 27, 2020 by BinaryPatrick Quote Link to comment
trurl Posted April 27, 2020 Share Posted April 27, 2020 No point in continuing with that. Zero errors are the only acceptable result so you need to start over with it anyway. Stop, shutdown, check all connections, power and SATA, both ends, including any power splitters. Then start again and if you get any errors at all, post Diagnostics instead of syslog. Go to Tools-diagnostics and attach the complete Diagnostics zip file to your NEXT post. Quote Link to comment
BinaryPatrick Posted April 27, 2020 Author Share Posted April 27, 2020 Here's the diagnostics. I checked the cables and then restarted the parity check. It got to about 36% before errors began to appear. It was at 1,200 errors when I pulled the diagnostics. homeserve-diagnostics-20200427-1902.zip Quote Link to comment
trurl Posted April 27, 2020 Share Posted April 27, 2020 2 hours ago, BinaryPatrick said: running a parity sync Parity sync would mean you were building parity. Your syslog indicates you are doing a correcting parity check, not a sync. The fact that it is correcting so many parity errors suggests you didn't have valid parity for some reason. Can you tell us more about how you got to this place? Did you New Config for some reason and tell it parity was already valid? Quote Link to comment
BinaryPatrick Posted April 27, 2020 Author Share Posted April 27, 2020 Sorry, yes. This all started with a scheduled parity check. I'm not sure what the cause might be. I didn't lose power or have any other issues since the parity check. I haven't added more than maybe 3GB in new files. I did set up a new Plex app/container, and add a new share. Quote Link to comment
BinaryPatrick Posted April 28, 2020 Author Share Posted April 28, 2020 Also, my real question is why is the logs filling up, and can I do anything about it. When the server hits 100% it becomes unresponsive. Quote Link to comment
trurl Posted April 28, 2020 Share Posted April 28, 2020 2 hours ago, BinaryPatrick said: This all started with a scheduled parity check. Did your last parity check have zero errors? Did you make any disk changes since then? Go to Main - Array Operation, click History, and post a screenshot. Quote Link to comment
BinaryPatrick Posted April 28, 2020 Author Share Posted April 28, 2020 (edited) Yes. I saw some posts in the reddit group that mentioned bad RAM can cause mass parity issues, so I ran MEM86 yesterday. It didn't find any errors. I also ran a short SMART test on each drive, and those returned no errors as well. The parity check just finished. I rebooted and am going to re-run the check again. I feel like I have a failing disk, but I'll see if there are still lots of errors. Not sure what else to check at this point. Edited April 28, 2020 by BinaryPatrick Quote Link to comment
BinaryPatrick Posted April 28, 2020 Author Share Posted April 28, 2020 The re-run ran successfully. 0 errors. Not sure what's going on but thanks for the support. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.