Steinhose Posted January 2, 2022 Share Posted January 2, 2022 I hope you all started in the new year well, my year started with my two unraid machines having problems. The Unraid machine I'm talking about now, was the one I couldn't reach after I woke up. So I disconnected the power, rebooted and did a parity check. Including activating the syslog server + Mirror syslog to flash. A short time later (1 hour) my ram was as good as full, log 100% full and syslog under /boot/logs/syslog was 4GB large. Everything was then very slow because the cpu was 100% used. So I rebooted into safe mode and everything is running fine for now. I activated Syslog again. Then I started the parity check again cpu was by mid 50 to 80% and the log on the flash is written full again. 500MB after about 2 minutes. Here is an example, this repeats endlessly. Jan 1 17:46:47 Pontos kernel: Call Trace: Jan 1 17:46:47 Pontos kernel: dump_stack+0x6b/0x83 Jan 1 17:46:47 Pontos kernel: dequeue_task_idle+0x21/0x2a Jan 1 17:46:47 Pontos kernel: __schedule+0x135/0x4a4 Jan 1 17:46:47 Pontos kernel: schedule_idle+0x25/0x2e Jan 1 17:46:47 Pontos kernel: do_idle+0x1f2/0x214 Jan 1 17:46:47 Pontos kernel: cpu_startup_entry+0x18/0x1a Jan 1 17:46:47 Pontos kernel: secondary_startup_64_no_verify+0xb0/0xbb Jan 1 17:46:47 Pontos kernel: bad: scheduling from the idle thread! Jan 1 17:46:47 Pontos kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 5.10.28-Unraid #1 Jan 1 17:46:47 Pontos kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5040-ITX, BIOS P1.60 01/17/2020 The box has been in operation for just under a year, several memtest were made at the beginning, always without errors. In the meantime, my box was running continuously, once for over 160 days. My box has been unstable since mid-December, after a power outage. Hardware: CPU: Intel Pentium Silber J5040 Mainboard: ASRock J5040-ITX RAM: Kingston HyperX Impact SO-DIMM Kit 16GB, DDR4-2400, CL14-14-14-35 (HX424S14IB2K2/16) (2x 8GB) extra Storage controller: Syba SI-PEX40064 Greetings note: repost because incorrectly posted in Bug report pontos-diagnostics-20220101-1811.zip Quote Link to comment
bonienl Posted January 5, 2022 Share Posted January 5, 2022 First, there is some problem with your flash device when writing to it. Take it out and do a file system check and repair on a Windows machine, Second, have a look at this topic and see 2nd post for a possible solution (change conntrack max entries). Quote Link to comment
trurl Posted January 5, 2022 Share Posted January 5, 2022 13 minutes ago, bonienl said: Take it out and do a file system check and repair on a Windows machine, Then make a backup of flash while you have it in your Windows machine. Quote Link to comment
Steinhose Posted January 5, 2022 Author Share Posted January 5, 2022 (edited) 1 hour ago, bonienl said: First, there is some problem with your flash device when writing to it. Take it out and do a file system check and repair on a Windows machine, I'm guessing from these lines, right? Jan 1 17:21:59 Pontos rsyslogd: action 'action-1-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.2002.0 try https://www.rsyslog.com/e/2027 ] Jan 1 17:21:59 Pontos rsyslogd: file '/boot/logs/syslog'[8] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: File too large [v8.2002.0 try https://www.rsyslog.com/e/2027 ] If so, I deleted the syslog file from the flash to see if it would rewrite. My bad I have now rebooted the system again without deleting the syslog file on the flash. The lines have not reappeared. the diagnostics are attached. ------------------------ 1 hour ago, bonienl said: Second, have a look at this topic and see 2nd post for a possible solution (change conntrack max entries). In the linked thread, I have executed this command: "sysctl net/netfilter/nf_conntrack_max=131072" but the terminal says: "sysctl: cannot stat /proc/sys/net/ipv4/netfilter/ip_conntrack_max: No such file or directory" so i started conntrack with "modprobe ip_conntrack" and set the values accordantly, same call traces again. I restarted unraid again and executed the command directly after the start, again the same call traces, the diagnostics file is from this startup. pontos-diagnostics-20220105-2100.zip Edited January 5, 2022 by Steinhose Quote Link to comment
Steinhose Posted January 6, 2022 Author Share Posted January 6, 2022 I took a closer look at the log, the first time it occurs is here. Here it is even a bit more detailed, unfortunately googling doesn't get me anywhere Jan 5 20:59:11 Pontos kernel: ------------[ cut here ]------------ Jan 5 20:59:11 Pontos kernel: WARNING: CPU: 3 PID: 0 at kernel/sched/core.c:4629 schedule_idle+0x13/0x2e Jan 5 20:59:11 Pontos kernel: Modules linked in: dm_crypt dm_mod dax nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 md_mod wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl i2c_i801 ahci i2c_smbus i2c_core processor_thermal_device intel_soc_dts_iosf int3406_thermal libahci r8169 intel_cstate iosf_mbi video button thermal fan realtek backlight int3400_thermal int3403_thermal acpi_thermal_rel int340x_thermal_zone Jan 5 20:59:11 Pontos kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.28-Unraid #1 Jan 5 20:59:11 Pontos kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5040-ITX, BIOS P1.60 01/17/2020 Jan 5 20:59:11 Pontos kernel: RIP: 0010:schedule_idle+0x13/0x2e Jan 5 20:59:11 Pontos kernel: Code: 85 c0 75 0b e8 d3 ff ff ff b8 01 00 00 00 c3 e8 b0 24 9e ff 31 c0 c3 65 48 8b 04 25 c0 7b 01 00 53 48 8b 40 10 48 85 c0 74 02 <0f> 0b 65 48 8b 1c 25 c0 7b 01 00 31 ff e8 04 f9 ff ff 48 8b 03 a8 Jan 5 20:59:11 Pontos kernel: RSP: 0018:ffffc900000bfef8 EFLAGS: 00010206 Jan 5 20:59:11 Pontos kernel: RAX: 0000000000010000 RBX: ffff8881008a3600 RCX: 0000000000000040 Jan 5 20:59:11 Pontos kernel: RDX: 0000000000000009 RSI: 0000000000000001 RDI: ffffe8ffffd98f00 Jan 5 20:59:11 Pontos kernel: RBP: ffffe8ffffd98f00 R08: 0000000000000001 R09: 0000000000000004 Jan 5 20:59:11 Pontos kernel: R10: 000000000000afc7 R11: 071c71c71c71c71c R12: ffffffff820c5dc0 Jan 5 20:59:11 Pontos kernel: R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000 Jan 5 20:59:11 Pontos kernel: FS: 0000000000000000(0000) GS:ffff88846fd80000(0000) knlGS:0000000000000000 Jan 5 20:59:11 Pontos kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 5 20:59:11 Pontos kernel: CR2: 0000151c2a9ac000 CR3: 000000000400a000 CR4: 0000000000350ee0 Jan 5 20:59:11 Pontos kernel: Call Trace: Jan 5 20:59:11 Pontos kernel: do_idle+0x1f2/0x214 Jan 5 20:59:11 Pontos kernel: cpu_startup_entry+0x18/0x1a Jan 5 20:59:11 Pontos kernel: secondary_startup_64_no_verify+0xb0/0xbb Jan 5 20:59:11 Pontos kernel: ---[ end trace 519a0bf8beb91d14 ]--- Jan 5 20:59:11 Pontos kernel: bad: scheduling from the idle thread! I wonder why wireguard is loaded in safe mode when it is actually a plugin. It is not only loaded, it also works. 🤔 Quote Link to comment
trurl Posted January 6, 2022 Share Posted January 6, 2022 49 minutes ago, Steinhose said: why wireguard is loaded in safe mode when it is actually a plugin Wireguard is builtin, but the webUI to configure it is a plugin on Unraid versions below 6.10 Don't know why (or if) it is causing those dumps, but they have filled your log space Quote Link to comment
Steinhose Posted January 6, 2022 Author Share Posted January 6, 2022 25 minutes ago, trurl said: Wireguard is builtin, but the webUI to configure it is a plugin on Unraid versions below 6.10 Don't know why (or if) it is causing those dumps, but they have filled your log space ahh good to know, at first i was concerned that the safe mode would load some plugins in the background. Would it be worth trying to switch to 6.10-rc2, as the new kernel could perhaps solve the problem? or asked in another way, is there a possibility that it could solve the problem? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.