Jump to content

Unraid is crashing again and again, full log after parity check - Call Traces


Recommended Posts

I hope you all started in the new year well, my year started with my two unraid machines having problems.
 

The Unraid machine I'm talking about now, was the one I couldn't reach after I woke up.
 

So I disconnected the power, rebooted and did a parity check. Including activating the syslog server + Mirror syslog to flash. A short time later (1 hour) my ram was as good as full, log 100% full and syslog under /boot/logs/syslog was 4GB large. Everything was then very slow because the cpu was 100% used.


So I rebooted into safe mode and everything is running fine for now. I activated Syslog again. Then I started the parity check again cpu was by mid 50 to 80% and the log on the flash is written full again. 500MB after about 2 minutes.  Here is an example, this repeats endlessly.

Jan  1 17:46:47 Pontos kernel: Call Trace:
Jan  1 17:46:47 Pontos kernel: dump_stack+0x6b/0x83
Jan  1 17:46:47 Pontos kernel: dequeue_task_idle+0x21/0x2a
Jan  1 17:46:47 Pontos kernel: __schedule+0x135/0x4a4
Jan  1 17:46:47 Pontos kernel: schedule_idle+0x25/0x2e
Jan  1 17:46:47 Pontos kernel: do_idle+0x1f2/0x214
Jan  1 17:46:47 Pontos kernel: cpu_startup_entry+0x18/0x1a
Jan  1 17:46:47 Pontos kernel: secondary_startup_64_no_verify+0xb0/0xbb
Jan  1 17:46:47 Pontos kernel: bad: scheduling from the idle thread!
Jan  1 17:46:47 Pontos kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W         5.10.28-Unraid #1
Jan  1 17:46:47 Pontos kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5040-ITX, BIOS P1.60 01/17/2020


The box has been in operation for just under a year, several memtest were made at the beginning, always without errors. In the meantime, my box was running continuously, once for over 160 days. My box has been unstable since mid-December, after a power outage.

 

Hardware:
CPU: Intel Pentium Silber J5040

Mainboard: ASRock J5040-ITX
RAM: Kingston HyperX Impact SO-DIMM Kit 16GB, DDR4-2400, CL14-14-14-35 (HX424S14IB2K2/16)  (2x 8GB)
extra Storage controller: Syba SI-PEX40064

Greetings

note: repost because incorrectly posted in Bug report

pontos-diagnostics-20220101-1811.zip

Link to comment
1 hour ago, bonienl said:

First, there is some problem with your flash device when writing to it. Take it out and do a file system check and repair on a Windows machine,


I'm guessing from these lines, right?
 

Jan  1 17:21:59 Pontos rsyslogd: action 'action-1-builtin:omfile' (module 'builtin:omfile') message lost, could not be processed. Check for additional error messages before this one. [v8.2002.0 try https://www.rsyslog.com/e/2027 ]
Jan  1 17:21:59 Pontos rsyslogd: file '/boot/logs/syslog'[8] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: File too large [v8.2002.0 try https://www.rsyslog.com/e/2027 ]


If so, I deleted the syslog file from the flash to see if it would rewrite. My bad :D I have now rebooted the system again without deleting the syslog file on the flash. The lines have not reappeared. the diagnostics are attached.

------------------------

 

1 hour ago, bonienl said:

Second, have a look at this topic and see 2nd post for a possible solution (change conntrack max entries).


In the linked thread, I have executed this command: "sysctl net/netfilter/nf_conntrack_max=131072" but the terminal says: "sysctl: cannot stat /proc/sys/net/ipv4/netfilter/ip_conntrack_max: No such file or directory"
so i started
conntrack with "modprobe ip_conntrack" and set the values accordantly, same call traces again. I restarted unraid again and executed the command directly after the start, again the same call traces, the diagnostics file is from this startup.

 

pontos-diagnostics-20220105-2100.zip

Edited by Steinhose
Link to comment

I took a closer look at the log, the first time it occurs is here. Here it is even a bit more detailed, unfortunately googling doesn't get me anywhere
 

Jan  5 20:59:11 Pontos kernel: ------------[ cut here ]------------
Jan  5 20:59:11 Pontos kernel: WARNING: CPU: 3 PID: 0 at kernel/sched/core.c:4629 schedule_idle+0x13/0x2e
Jan  5 20:59:11 Pontos kernel: Modules linked in: dm_crypt dm_mod dax nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 md_mod wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl i2c_i801 ahci i2c_smbus i2c_core processor_thermal_device intel_soc_dts_iosf int3406_thermal libahci r8169 intel_cstate iosf_mbi video button thermal fan realtek backlight int3400_thermal int3403_thermal acpi_thermal_rel int340x_thermal_zone
Jan  5 20:59:11 Pontos kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.28-Unraid #1
Jan  5 20:59:11 Pontos kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J5040-ITX, BIOS P1.60 01/17/2020
Jan  5 20:59:11 Pontos kernel: RIP: 0010:schedule_idle+0x13/0x2e
Jan  5 20:59:11 Pontos kernel: Code: 85 c0 75 0b e8 d3 ff ff ff b8 01 00 00 00 c3 e8 b0 24 9e ff 31 c0 c3 65 48 8b 04 25 c0 7b 01 00 53 48 8b 40 10 48 85 c0 74 02 <0f> 0b 65 48 8b 1c 25 c0 7b 01 00 31 ff e8 04 f9 ff ff 48 8b 03 a8
Jan  5 20:59:11 Pontos kernel: RSP: 0018:ffffc900000bfef8 EFLAGS: 00010206
Jan  5 20:59:11 Pontos kernel: RAX: 0000000000010000 RBX: ffff8881008a3600 RCX: 0000000000000040
Jan  5 20:59:11 Pontos kernel: RDX: 0000000000000009 RSI: 0000000000000001 RDI: ffffe8ffffd98f00
Jan  5 20:59:11 Pontos kernel: RBP: ffffe8ffffd98f00 R08: 0000000000000001 R09: 0000000000000004
Jan  5 20:59:11 Pontos kernel: R10: 000000000000afc7 R11: 071c71c71c71c71c R12: ffffffff820c5dc0
Jan  5 20:59:11 Pontos kernel: R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
Jan  5 20:59:11 Pontos kernel: FS:  0000000000000000(0000) GS:ffff88846fd80000(0000) knlGS:0000000000000000
Jan  5 20:59:11 Pontos kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan  5 20:59:11 Pontos kernel: CR2: 0000151c2a9ac000 CR3: 000000000400a000 CR4: 0000000000350ee0
Jan  5 20:59:11 Pontos kernel: Call Trace:
Jan  5 20:59:11 Pontos kernel: do_idle+0x1f2/0x214
Jan  5 20:59:11 Pontos kernel: cpu_startup_entry+0x18/0x1a
Jan  5 20:59:11 Pontos kernel: secondary_startup_64_no_verify+0xb0/0xbb
Jan  5 20:59:11 Pontos kernel: ---[ end trace 519a0bf8beb91d14 ]---
Jan  5 20:59:11 Pontos kernel: bad: scheduling from the idle thread!


I wonder why wireguard is loaded in safe mode when it is actually a plugin. It is not only loaded, it also works. 🤔

Link to comment
  • Steinhose changed the title to Unraid is crashing again and again, full log after parity check - Call Traces
49 minutes ago, Steinhose said:

why wireguard is loaded in safe mode when it is actually a plugin

Wireguard is builtin, but the webUI to configure it is a plugin on Unraid versions below 6.10

 

Don't know why (or if) it is causing those dumps, but they have filled your log space

Link to comment
25 minutes ago, trurl said:

Wireguard is builtin, but the webUI to configure it is a plugin on Unraid versions below 6.10

 

Don't know why (or if) it is causing those dumps, but they have filled your log space

ahh good to know, at first i was concerned that the safe mode would load some plugins in the background.
 

Would it be worth trying to switch to 6.10-rc2, as the new kernel could perhaps solve the problem?
or asked in another way, is there a possibility that it could solve the problem?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...