February 1, 20251 yr Hello friends. I have a problem that my docker is crashing. Almost every day. I replaced the cache disks and nothing changed. I always look in the webUI on docker and see "Docker failed to start." and when I try to restore it through the webUI it doesn't work. The only thing that helped is to either create the whole docker again and reinstall the containers and for that I have to restart the server and it is very tedious. Or just stop it and start it again via command prompt using "/etc/rc.d/rc.docker stop and /etc/rc.d/rc.docker start" It's really annoying. I've attached the diagnostics. Thank you for any suggestions. aelothtower-diagnostics-20250201-1209.zip Edited March 3, 20251 yr by Aeloth
February 1, 20251 yr Community Expert Possibly, not the only issue, but you need to check filesystem on this device: Jan 31 15:59:40 AelothTower kernel: XFS (sde1): Metadata corruption detected at xfs_dinode_verify+0xa2/0x68d, inode 0xcd0b493b dinode Jan 31 15:59:40 AelothTower kernel: XFS (sde1): Unmount and run xfs_repair Then reboot to clear the logs and post new diags if it happens again.
February 1, 20251 yr Author I didn't think this would be a problem since it's a drive in unasigned. But now I rebooted the system and formatted the disk from xfs to ntfs. I'll see if that helps. Otherwise, I'll have to throw the disk away. aelothtower-diagnostics-20250201-1415.zip
February 1, 20251 yr Author No crash so far but I'm a little freaked out by the log and don't understand it. So I better add a new diagnostic file. aelothtower-diagnostics-20250201-2152.zip
February 2, 20251 yr Community Expert There's something strange going on with the logs, it keeps repeating the lines, try booting in safe mode.
February 6, 20251 yr Author Hi, so I'm sending 2x diagnostics. 1st is before rebooting into safemode, where it started writing me something in the log all the time. So I put it into chatgpt for an explanation. I tried the Live memory test to see if it found anything, and it didn't. This weekend I want to try memtest86+. The 2nd diagnostic file is in safemode after reboot. However, my docker crashes quite often. And sometimes the whole server crashes and is not accessible until a hard reboot. Feb 6 08:07:26 AelothTower kernel: Modules linked in: vhost_net vhost tap kvm_amd ccp kvm md_mod xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle udp_diag iptable_mangle vhost_iotlb veth xt_nat xt_conntrack nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter nfsd auth_rpcgss oid_registry lockd grace sunrpc zfs(PO) spl(O) ntfs3 tcp_diag inet_diag xt_tcpudp xt_mark tun nf_tables nfnetlink ip6table_nat af_packet it87(O) hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc amdgpu drm_exec amdxcp drm_buddy gpu_sched edac_mce_amd edac_core intel_rapl_common iosf_mbi radeon drm_ttm_helper ttm video i2c_algo_bit drm_suballoc_helper drm_display_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel drm_kms_helper Feb 6 08:07:26 AelothTower kernel: crypto_simd cryptd wmi_bmof drm rapl backlight mpt3sas agpgart i2c_piix4 acpi_cpufreq k10temp i2c_core raid_class input_leds r8169 scsi_transport_sas led_class joydev ahci realtek libahci wmi tpm_crb tpm_tis tpm_tis_core tpm button [last unloaded: md_mod] Feb 6 08:07:26 AelothTower kernel: ---[ end trace 0000000000000000 ]--- Feb 6 08:07:26 AelothTower kernel: RIP: 0033:0x14e0e61b7ca2 Feb 6 08:07:26 AelothTower kernel: RSP: 002b:000014e0de6bb4f8 EFLAGS: 00010206 Feb 6 08:07:26 AelothTower kernel: RAX: 000014e0de916580 RBX: 00005640fd1c0680 RCX: 000014e0d1095260 Feb 6 08:07:26 AelothTower kernel: RDX: 0000000000109a00 RSI: 000014e0d0f8b880 RDI: 000014e0de916580 Feb 6 08:07:26 AelothTower kernel: RIP: 0033:0x14e0e61b7ca2 Feb 6 08:07:26 AelothTower kernel: RSP: 002b:000014e0de6bb4f8 EFLAGS: 00010206 Feb 6 08:07:26 AelothTower kernel: RAX: 000014e0de916580 RBX: 00005640fd1c0680 RCX: 000014e0d1095260 Feb 6 08:07:26 AelothTower kernel: RDX: 0000000000109a00 RSI: 000014e0d0f8b880 RDI: 000014e0de916580 Feb 6 08:07:26 AelothTower kernel: RIP: 0033:0x14e0e61b7ca2 Feb 6 08:07:26 AelothTower kernel: RSP: 002b:000014e0de6bb4f8 EFLAGS: 00010206 Feb 6 08:07:26 AelothTower kernel: RAX: 000014e0de916580 RBX: 00005640fd1c0680 RCX: 000014e0d1095260 Feb 6 08:07:26 AelothTower kernel: RDX: 0000000000109a00 RSI: 000014e0d0f8b880 RDI: 000014e0de916580 Feb 6 08:07:26 AelothTower kernel: RBP: 000014e0de6bb580 R08: 0000000000000000 R09: 000014e0dea1ffe0 Feb 6 08:07:26 AelothTower kernel: R10: 000014e0d11e3860 R11: 000014e0dea42560 R12: 00005640fd2eb400 Feb 6 08:07:26 AelothTower kernel: RBP: 000014e0de6bb580 R08: 0000000000000000 R09: 000014e0dea1ffe0 Feb 6 08:07:26 AelothTower kernel: R10: 000014e0d11e3860 R11: 000014e0dea42560 R12: 00005640fd2eb400 Feb 6 08:07:26 AelothTower kernel: RBP: 000014e0de6bb580 R08: 0000000000000000 R09: 000014e0dea1ffe0 Feb 6 08:07:26 AelothTower kernel: R10: 000014e0d11e3860 R11: 000014e0dea42560 R12: 00005640fd2eb400 Feb 6 08:07:26 AelothTower kernel: R13: 00005640fd2eb410 R14: 00005640fd1c0b18 R15: 0000000000000002 Feb 6 08:07:26 AelothTower kernel: FS: 000014dc99591f00(0000) GS:ffff888bee8c0000(0000) knlGS:0000000000000000 Feb 6 08:07:26 AelothTower kernel: R13: 00005640fd2eb410 R14: 00005640fd1c0b18 R15: 0000000000000002 Feb 6 08:07:26 AelothTower kernel: FS: 000014dc99591f00(0000) GS:ffff888bee8c0000(0000) knlGS:0000000000000000 Feb 6 08:07:26 AelothTower kernel: R13: 00005640fd2eb410 R14: 00005640fd1c0b18 R15: 0000000000000002 Feb 6 08:07:26 AelothTower kernel: FS: 000014dc99591f00(0000) GS:ffff888bee8c0000(0000) knlGS:0000000000000000 Feb 6 08:07:26 AelothTower kernel: CS: 0033 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 6 08:07:26 AelothTower kernel: CR2: 0000000000496000 CR3: 0000000a1045c000 CR4: 00000000003506e0 Feb 6 08:07:26 AelothTower kernel: CS: 0033 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 6 08:07:26 AelothTower kernel: CR2: 0000000000496000 CR3: 0000000a1045c000 CR4: 00000000003506e0 Feb 6 08:07:26 AelothTower kernel: CS: 0033 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 6 08:07:26 AelothTower kernel: CR2: 0000000000496000 CR3: 0000000a1045c000 CR4: 00000000003506e0 CHATGPT: It looks like your system encountered a general protection fault (GPF), likely due to an invalid memory access by a process (lsof in this case). The stack trace suggests issues related to the proc filesystem (/proc), which could be triggered by a corrupted process entry, kernel bug, or a hardware-related issue. Possible Causes: Corrupt /proc File System Entries The function proc_pid_readdir is responsible for listing process information in /proc, and a crash there may indicate a problem with the way process entries are being read. Kernel Bug or Unstable Kernel Modules You are running Unraid 6.6.68, which may have a bug in its handling of the process table. Kernel modules like zfs, ntfs3, and wireguard are loaded, which may contribute to instability. Faulty Memory or Hardware Issues The non-canonical address 0xffc7888b66ca8fe8 in the crash log suggests that an invalid memory reference occurred. If you're experiencing frequent crashes, run a memory test (e.g., memtest86) to check for RAM issues. Bug in lsof or Other User-space Program The command lsof (used to list open files) might have encountered a problem when accessing the /proc filesystem, triggering the kernel crash. If this crash happens frequently with lsof, try updating or reinstalling lsof. aelothtower-diagnostics-20250206-0815.zip aelothtower-diagnostics-20250206-0939.zip Edited February 6, 20251 yr by Aeloth
February 6, 20251 yr Community Expert No error sin safe mode so far, but the logs still have repeing lines.
February 6, 20251 yr Community Expert The syslog issue is not normal, but may be unrelated to the problem, for now, see if it works in safe mode.
February 6, 20251 yr Author Docker is running, my parity is being calculated, so I'll let it count and maybe tomorrow I'll send the diagnostics again.
February 10, 20251 yr Author I'm sending the diagnostics after a longer run on safemode. Despite this, my whole server crashed. I tried to test the RAM test and it passed 3 times with no errors. Now I have the server running again on normal mode with plugins. I tried to solve with chatgpt those multiple log lines, unfortunately it didn't solve. I am also sending a second diagnostics file, which is already in normal mode after 11 hours of running. Could it be an unstable kernel as gpt told me according to the log??? It seems almost impossible to me. Any other ideas where the problem could be? How do I keep my log after a server crash? Thank you very much for your help. aelothtower-diagnostics-20250207-2046.zip aelothtower-diagnostics-20250210-0953.zip
February 10, 20251 yr Community Expert There are two diags, safe mode only ran for a couple of days, was that when it crashed?
February 10, 20251 yr Author Yes, it crashed on 09.02. around 4 in the morning. I saved the diagnostics continuously on Friday. And I don't have another diagnosis because it crashed. so there are about 2 days left before the server went down.
February 10, 20251 yr Author Again I see this problem in the log.... ChatGPT: This kernel listing indicates that a kernel bug has occurred on your system, possibly related to inotify, a system service that tracks changes to files and directories. https://chatgpt.com/share/67a9cf8e-abf0-8002-8c53-650ec826d4cd LOG: Feb 10 10:38:28 AelothTower kernel: RDX: 0000000082c1dcb0 RSI: ffffffff822451fd RDI: 00000000ffffffff Feb 10 10:38:28 AelothTower kernel: RBP: ffff888103924438 R08: 0000000000000000 R09: ffffffff82c1dcb0 Feb 10 10:38:28 AelothTower kernel: R10: 00003fffffffffff R11: 000000002d2d2d2d R12: ffff888103924438 Feb 10 10:38:28 AelothTower kernel: R13: ffff888103924438 R14: ffffffff812ed212 R15: ffff888103b7d278 Feb 10 10:38:28 AelothTower kernel: FS: 0000148097464f00(0000) GS:ffff888beeb00000(0000) knlGS:0000000000000000 Feb 10 10:38:28 AelothTower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 10 10:38:28 AelothTower kernel: CR2: 00000000004c6a98 CR3: 00000004a94e8000 CR4: 00000000003506e0 Feb 10 10:38:28 AelothTower kernel: Call Trace: Feb 10 10:38:28 AelothTower kernel: <TASK> Feb 10 10:38:28 AelothTower kernel: ? __warn+0x99/0x11a Feb 10 10:38:28 AelothTower kernel: ? report_bug+0xd9/0x153 Feb 10 10:38:28 AelothTower kernel: ? show_mark_fhandle+0x77/0xe6 Feb 10 10:38:28 AelothTower kernel: ? handle_bug+0x53/0x7c Feb 10 10:38:28 AelothTower kernel: ? exc_invalid_op+0x13/0x60 Feb 10 10:38:28 AelothTower kernel: ? asm_exc_invalid_op+0x16/0x20 Feb 10 10:38:28 AelothTower kernel: ? __pfx_inotify_fdinfo+0x10/0x10 Feb 10 10:38:28 AelothTower kernel: ? show_mark_fhandle+0x77/0xe6 Feb 10 10:38:28 AelothTower kernel: ? __pfx_inotify_fdinfo+0x10/0x10 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: ? seq_vprintf+0x2d/0x49 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: ? seq_printf+0x53/0x6e Feb 10 10:38:28 AelothTower kernel: ? preempt_latency_start+0x2b/0x46 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: inotify_fdinfo+0x83/0xaa Feb 10 10:38:28 AelothTower kernel: show_fdinfo.isra.0+0x66/0xab Feb 10 10:38:28 AelothTower kernel: seq_show+0x155/0x173 Feb 10 10:38:28 AelothTower kernel: seq_read_iter+0x171/0x353 Feb 10 10:38:28 AelothTower kernel: seq_read+0x91/0xbb Feb 10 10:38:28 AelothTower kernel: vfs_read+0xa7/0x1d1 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: ? __do_sys_newfstat+0x34/0x5c Feb 10 10:38:28 AelothTower kernel: ksys_read+0x74/0xc0 Feb 10 10:38:28 AelothTower kernel: do_syscall_64+0x57/0x7b Feb 10 10:38:28 AelothTower kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2 Feb 10 10:38:28 AelothTower kernel: RIP: 0033:0x1480976fc6ed Feb 10 10:38:28 AelothTower kernel: Code: 21 87 0e 00 f7 d8 64 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 80 3d 59 0b 0f 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec Feb 10 10:38:28 AelothTower kernel: RSP: 002b:00007ffd6a78f7c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Feb 10 10:38:28 AelothTower kernel: RAX: ffffffffffffffda RBX: 000000000043f600 RCX: 00001480976fc6ed Feb 10 10:38:28 AelothTower kernel: RDX: 0000000000000400 RSI: 00000000004488b0 RDI: 0000000000000007 Feb 10 10:38:28 AelothTower kernel: RBP: 00001480977e41f0 R08: 0000000000000001 R09: 0000000000000000 Feb 10 10:38:28 AelothTower kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00001480977e40a0 Feb 10 10:38:28 AelothTower kernel: R13: 0000000000000000 R14: 0000000000001000 R15: 000000000043f600 Feb 10 10:38:28 AelothTower kernel: </TASK> Feb 10 10:38:28 AelothTower kernel: ---[ end trace 0000000000000000 ]--- Feb 10 10:38:28 AelothTower kernel: RDX: 0000000082c1dcb0 RSI: ffffffff822451fd RDI: 00000000ffffffff Feb 10 10:38:28 AelothTower kernel: RBP: ffff888103924438 R08: 0000000000000000 R09: ffffffff82c1dcb0 Feb 10 10:38:28 AelothTower kernel: R10: 00003fffffffffff R11: 000000002d2d2d2d R12: ffff888103924438 Feb 10 10:38:28 AelothTower kernel: R13: ffff888103924438 R14: ffffffff812ed212 R15: ffff888103b7d278 Feb 10 10:38:28 AelothTower kernel: FS: 0000148097464f00(0000) GS:ffff888beeb00000(0000) knlGS:0000000000000000 Feb 10 10:38:28 AelothTower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 10 10:38:28 AelothTower kernel: CR2: 00000000004c6a98 CR3: 00000004a94e8000 CR4: 00000000003506e0 Feb 10 10:38:28 AelothTower kernel: Call Trace: Feb 10 10:38:28 AelothTower kernel: <TASK> Feb 10 10:38:28 AelothTower kernel: ? __warn+0x99/0x11a Feb 10 10:38:28 AelothTower kernel: ? report_bug+0xd9/0x153 Feb 10 10:38:28 AelothTower kernel: ? show_mark_fhandle+0x77/0xe6 Feb 10 10:38:28 AelothTower kernel: ? handle_bug+0x53/0x7c Feb 10 10:38:28 AelothTower kernel: ? exc_invalid_op+0x13/0x60 Feb 10 10:38:28 AelothTower kernel: ? asm_exc_invalid_op+0x16/0x20 Feb 10 10:38:28 AelothTower kernel: ? __pfx_inotify_fdinfo+0x10/0x10 Feb 10 10:38:28 AelothTower kernel: ? show_mark_fhandle+0x77/0xe6 Feb 10 10:38:28 AelothTower kernel: ? __pfx_inotify_fdinfo+0x10/0x10 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: ? seq_vprintf+0x2d/0x49 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: ? seq_printf+0x53/0x6e Feb 10 10:38:28 AelothTower kernel: ? preempt_latency_start+0x2b/0x46 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: inotify_fdinfo+0x83/0xaa Feb 10 10:38:28 AelothTower kernel: show_fdinfo.isra.0+0x66/0xab Feb 10 10:38:28 AelothTower kernel: seq_show+0x155/0x173 Feb 10 10:38:28 AelothTower kernel: seq_read_iter+0x171/0x353 Feb 10 10:38:28 AelothTower kernel: seq_read+0x91/0xbb Feb 10 10:38:28 AelothTower kernel: vfs_read+0xa7/0x1d1 Feb 10 10:38:28 AelothTower kernel: ? srso_return_thunk+0x5/0x5f Feb 10 10:38:28 AelothTower kernel: ? __do_sys_newfstat+0x34/0x5c Feb 10 10:38:28 AelothTower kernel: ksys_read+0x74/0xc0 Feb 10 10:38:28 AelothTower kernel: do_syscall_64+0x57/0x7b Feb 10 10:38:28 AelothTower kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2 Feb 10 10:38:28 AelothTower kernel: RIP: 0033:0x1480976fc6ed Feb 10 10:38:28 AelothTower kernel: Code: 21 87 0e 00 f7 d8 64 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 80 3d 59 0b 0f 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec Feb 10 10:38:28 AelothTower kernel: RSP: 002b:00007ffd6a78f7c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Feb 10 10:38:28 AelothTower kernel: RAX: ffffffffffffffda RBX: 000000000043f600 RCX: 00001480976fc6ed Feb 10 10:38:28 AelothTower kernel: RDX: 0000000000000400 RSI: 00000000004488b0 RDI: 0000000000000007 Feb 10 10:38:28 AelothTower kernel: RBP: 00001480977e41f0 R08: 0000000000000001 R09: 0000000000000000 Feb 10 10:38:28 AelothTower kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00001480977e40a0 Feb 10 10:38:28 AelothTower kernel: R13: 0000000000000000 R14: 0000000000001000 R15: 000000000043f600 Feb 10 10:38:28 AelothTower kernel: </TASK> Feb 10 10:38:28 AelothTower kernel: ---[ end trace 0000000000000000 ]--- Edited February 10, 20251 yr by Aeloth
February 10, 20251 yr Community Expert 1 hour ago, Aeloth said: Yes, it crashed on 09.02. around 4 in the morning. The log from the safe mode diags ends on 07.02 Where is that call trace logged?
February 10, 20251 yr Author 07.02. I quickly downloaded the diagnostics.. and 09.02. I found out that my server is not working again so it can't even be recorded anywhere. Because the logs and everything are deleted after the restert.
February 10, 20251 yr Author I originally had this on but it didn't work for me. however, I have now managed to solve the log line multiplication. It was in rsyslog.conf. So I'll try to set up syslog now and wait for the next crash.
February 17, 20251 yr Author Solution Hi, I have been running docker and the whole server for over 5 days and no crash. I fixed the logs where there were errors in rsyslog.conf. I tested RAM - no error. When I returned the RAM I deliberately switched slots. And suddenly it works... probably the problem was on one RAM, slot or some combination. So thank you for your help and I'm marking it as solved.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.