skler Posted July 2 Share Posted July 2 Hi all, my server crashes every 2/3 days without any visible message. Someone can help me to understand how to solve this? S. littleboy-diagnostics-20240702-1550.zip Quote Link to comment
JorgeB Posted July 2 Share Posted July 2 Enable the syslog server and post that after a crash. Quote Link to comment
JorgeB Posted July 2 Share Posted July 2 It's not, either set the remote IP (same as the local) or enable mirror to flash drive option. 1 Quote Link to comment
skler Posted July 2 Author Share Posted July 2 25 minutes ago, JorgeB said: It's not, either set the remote IP (same as the local) or enable mirror to flash drive option. ok thanks, now I can see the file in appdata. Quote Link to comment
skler Posted July 2 Author Share Posted July 2 (edited) I know that is not a good analysis, but every time I had a crash there was an increase of use on TMPFS how can I check what is using this space? Edited July 2 by skler Quote Link to comment
skler Posted July 2 Author Share Posted July 2 well.. I have a docker container with the same footprint. is it possible to limit the ram usage of a container? Quote Link to comment
JorgeB Posted July 2 Share Posted July 2 Yes. add to "Extra Parameters" for example --memory=4G 1 Quote Link to comment
skler Posted July 24 Author Share Posted July 24 I've the crash report now, there was a problem with that docker but sometimes also my NVIDIA GPU seems to crash Could be due to the driver update done without a reboot? Jul 24 06:30:38 littleboy kernel: NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x25:0x51:1589) Jul 24 06:30:38 littleboy kernel: NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 0 Jul 24 06:31:23 littleboy kernel: BUG: unable to handle page fault for address: ffffc90287112a00 Jul 24 06:31:23 littleboy kernel: #PF: supervisor write access in kernel mode Jul 24 06:31:23 littleboy kernel: #PF: error_code(0x0002) - not-present page Jul 24 06:31:23 littleboy kernel: PGD 100000067 P4D 100000067 PUD 3d339b9067 PMD 0 Jul 24 06:31:23 littleboy kernel: Oops: 0002 [#1] PREEMPT SMP PTI Jul 24 06:31:23 littleboy kernel: CPU: 50 PID: 2329 Comm: nv_open_q Tainted: P O 6.1.79-Unraid #1 Jul 24 06:31:23 littleboy kernel: Hardware name: VxRack AS PowerEdge R740xd/0RR8YK, BIOS 2.21.2 02/19/2024 Jul 24 06:31:23 littleboy kernel: RIP: 0010:os_mem_copy_custom+0x2c/0x60 [nvidia] Jul 24 06:31:23 littleboy kernel: Code: 44 00 00 83 fa 7f 49 89 f8 48 89 f9 76 0b 48 89 f8 48 09 f0 83 e0 03 74 06 89 d2 31 c0 eb 25 89 d1 83 e2 03 c1 e9 02 8b 3c 86 <41> 89 3c 80 48 ff c0 39 c1 75 f2 89 c8 48 c1 e0 02 49 8d 0c 00 48 Jul 24 06:31:23 littleboy kernel: RSP: 0018:ffffc9000f477930 EFLAGS: 00010206 Jul 24 06:31:23 littleboy kernel: RAX: 0000000000000000 RBX: ffff8882faf61cd8 RCX: 0000000000000200 Jul 24 06:31:23 littleboy kernel: RDX: 0000000000000000 RSI: ffff888eb7b24008 RDI: 000000000000000d Jul 24 06:31:23 littleboy kernel: RBP: ffff88af099aab80 R08: ffffc90287112a00 R09: 0000000000000000 Jul 24 06:31:23 littleboy kernel: R10: 000000000010a804 R11: 0000000000000000 R12: 000000000000000d Jul 24 06:31:23 littleboy kernel: R13: ffff889924c88008 R14: ffff8885b15a2010 R15: ffff8885b15a2f13 Jul 24 06:31:23 littleboy kernel: FS: 0000000000000000(0000) GS:ffff889fffc40000(0000) knlGS:0000000000000000 Jul 24 06:31:23 littleboy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 24 06:31:23 littleboy kernel: CR2: ffffc90287112a00 CR3: 00000023a29d6001 CR4: 00000000007706e0 Jul 24 06:31:23 littleboy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 24 06:31:23 littleboy kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 24 06:31:23 littleboy kernel: PKRU: 55555554 Jul 24 06:31:23 littleboy kernel: Call Trace: Jul 24 06:31:23 littleboy kernel: <TASK> Jul 24 06:31:23 littleboy kernel: ? __die_body+0x1a/0x5c Jul 24 06:31:23 littleboy kernel: ? page_fault_oops+0x329/0x376 Jul 24 06:31:23 littleboy kernel: ? fixup_exception+0x22/0x24b Jul 24 06:31:23 littleboy kernel: ? exc_page_fault+0xf4/0x11d Jul 24 06:31:23 littleboy kernel: ? asm_exc_page_fault+0x22/0x30 Jul 24 06:31:23 littleboy kernel: ? os_mem_copy_custom+0x2c/0x60 [nvidia] Jul 24 06:31:23 littleboy kernel: _nv011987rm+0x5c/0x100 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv011968rm+0x6e/0x1d0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv024663rm+0x1a3/0x290 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv043897rm+0x428/0xad2 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv043916rm+0x148/0x3f0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv043916rm+0x10d/0x3f0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv049576rm+0x6d/0xb0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv049576rm+0x35/0xb0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv014517rm+0x51/0xc0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv044401rm+0x1fd/0x260 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv013557rm+0xa8/0x210 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv044401rm+0x1fd/0x260 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv042458rm+0xd1/0x1d0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv013346rm+0x5a/0xd0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv044401rm+0x1fd/0x260 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv011239rm+0xe1/0x160 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv044401rm+0x1fd/0x260 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv050979rm+0x20/0x2e0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv014771rm+0x50/0x100 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv044401rm+0x1fd/0x260 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv014815rm+0xf1/0x2f0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv044401rm+0x1fd/0x260 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv017529rm+0x35/0x110 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv018649rm+0x13b/0x3d0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv026712rm+0x97/0x1a0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv000773rm+0x1b3/0x313 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _nv000720rm+0x482/0x20e0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? rm_init_adapter+0xcd/0xf0 [nvidia] Jul 24 06:31:23 littleboy kernel: ? ttwu_queue_wakelist+0x9a/0xcf Jul 24 06:31:23 littleboy kernel: ? nv_open_device+0x57a/0x869 [nvidia] Jul 24 06:31:23 littleboy kernel: ? nvidia_open_deferred+0x33/0x7f [nvidia] Jul 24 06:31:23 littleboy kernel: ? _raw_q_schedule+0x69/0x69 [nvidia] Jul 24 06:31:23 littleboy kernel: ? _main_loop+0xf1/0x115 [nvidia] Jul 24 06:31:23 littleboy kernel: ? kthread+0xe4/0xef Jul 24 06:31:23 littleboy kernel: ? kthread_complete_and_exit+0x1b/0x1b Jul 24 06:31:23 littleboy kernel: ? ret_from_fork+0x1f/0x30 Jul 24 06:31:23 littleboy kernel: </TASK> Jul 24 06:31:23 littleboy kernel: Modules linked in: joydev uinput af_packet bluetooth ecdh_generic ecc xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha nvidia_uvm(PO) veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod tcp_diag inet_diag ipmi_devintf ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap bonding tls ixgbe xfrm_algo mdio igb intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal nvidia_drm(PO) coretemp nvidia_modeset(PO) kvm_intel zfs(PO) kvm nvidia(PO) zunicode(PO) zzstd(O) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 mgag200 zlua(O) sha1_ssse3 Jul 24 06:31:23 littleboy kernel: ipmi_ssif video drm_shmem_helper zavl(PO) aesni_intel crypto_simd drm_kms_helper icp(PO) cryptd zcommon(PO) znvpair(PO) drm rapl spl(O) wmi_bmof backlight intel_cstate i2c_i801 acpi_ipmi mei_me syscopyarea nvme i2c_algo_bit i2c_smbus intel_uncore sysfillrect ahci sysimgblt mei i2c_core nvme_core megaraid_sas fb_sys_fops libahci intel_pch_thermal wmi ipmi_si acpi_power_meter button unix [last unloaded: xfrm_algo] Jul 24 06:31:23 littleboy kernel: CR2: ffffc90287112a00 Jul 24 06:31:23 littleboy kernel: ---[ end trace 0000000000000000 ]--- Jul 24 06:31:23 littleboy kernel: RIP: 0010:os_mem_copy_custom+0x2c/0x60 [nvidia] Jul 24 06:31:23 littleboy kernel: Code: 44 00 00 83 fa 7f 49 89 f8 48 89 f9 76 0b 48 89 f8 48 09 f0 83 e0 03 74 06 89 d2 31 c0 eb 25 89 d1 83 e2 03 c1 e9 02 8b 3c 86 <41> 89 3c 80 48 ff c0 39 c1 75 f2 89 c8 48 c1 e0 02 49 8d 0c 00 48 Jul 24 06:31:23 littleboy kernel: RSP: 0018:ffffc9000f477930 EFLAGS: 00010206 Jul 24 06:31:23 littleboy kernel: RAX: 0000000000000000 RBX: ffff8882faf61cd8 RCX: 0000000000000200 Jul 24 06:31:23 littleboy kernel: RDX: 0000000000000000 RSI: ffff888eb7b24008 RDI: 000000000000000d Jul 24 06:31:23 littleboy kernel: RBP: ffff88af099aab80 R08: ffffc90287112a00 R09: 0000000000000000 Jul 24 06:31:23 littleboy kernel: R10: 000000000010a804 R11: 0000000000000000 R12: 000000000000000d Jul 24 06:31:23 littleboy kernel: R13: ffff889924c88008 R14: ffff8885b15a2010 R15: ffff8885b15a2f13 Jul 24 06:31:23 littleboy kernel: FS: 0000000000000000(0000) GS:ffff889fffc40000(0000) knlGS:0000000000000000 Jul 24 06:31:23 littleboy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 24 06:31:23 littleboy kernel: CR2: ffffc90287112a00 CR3: 00000023a29d6001 CR4: 00000000007706e0 Jul 24 06:31:23 littleboy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 24 06:31:23 littleboy kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 24 06:31:23 littleboy kernel: PKRU: 55555554 Jul 24 06:31:23 littleboy kernel: note: nv_open_q[2329] exited with irqs disabled Jul 24 06:31:26 littleboy kernel: BUG: unable to handle page fault for address: ffffc9000f477df8 Jul 24 06:31:26 littleboy kernel: #PF: supervisor read access in kernel mode Jul 24 06:31:26 littleboy kernel: #PF: error_code(0x0000) - not-present page Jul 24 06:31:26 littleboy kernel: PGD 100000067 P4D 100000067 PUD 1001be067 PMD 208b1de067 PTE 0 Jul 24 06:31:26 littleboy kernel: Oops: 0000 [#2] PREEMPT SMP PTI Jul 24 06:31:26 littleboy kernel: CPU: 78 PID: 50540 Comm: nvidia-smi Tainted: P D O 6.1.79-Unraid #1 Jul 24 06:31:26 littleboy kernel: Hardware name: VxRack AS PowerEdge R740xd/0RR8YK, BIOS 2.21.2 02/19/2024 Jul 24 06:31:26 littleboy kernel: RIP: 0010:_nv012504rm+0x3c/0x310 [nvidia] Jul 24 06:31:26 littleboy kernel: Code: 48 63 47 08 48 01 c2 48 8b 07 48 85 c0 75 1b e9 2b 02 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 48 10 48 85 c9 74 17 48 89 c8 <48> 39 30 77 ef 0f 83 f9 01 00 00 48 8b 48 18 48 85 c9 75 e9 48 89 Jul 24 06:31:26 littleboy kernel: RSP: 0018:ffffc9002a203d98 EFLAGS: 00010082 Jul 24 06:31:26 littleboy kernel: RAX: ffffc9000f477df8 RBX: ffffffffa2910a67 RCX: fffffffdda3e82fd Jul 24 06:31:26 littleboy kernel: RDX: ffffc9002a203e10 RSI: 000000000000c56c RDI: ffffffffa4f1e5d8 Jul 24 06:31:26 littleboy kernel: RBP: ffff888321f86000 R08: 0000000000000000 R09: ffffc9002a203e38 Jul 24 06:31:26 littleboy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9002a203dc0 Jul 24 06:31:26 littleboy kernel: R13: ffff888ddf6d1780 R14: 0000000000000048 R15: 000000000000001f Jul 24 06:31:26 littleboy kernel: FS: 000014e2068e31c0(0000) GS:ffff889ffffc0000(0000) knlGS:0000000000000000 Jul 24 06:31:26 littleboy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 24 06:31:26 littleboy kernel: CR2: ffffc9000f477df8 CR3: 00000006cfee8004 CR4: 00000000007706e0 Jul 24 06:31:26 littleboy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 24 06:31:26 littleboy kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 24 06:31:26 littleboy kernel: PKRU: 55555554 Jul 24 06:31:26 littleboy kernel: Call Trace: Jul 24 06:31:26 littleboy kernel: <TASK> Jul 24 06:31:26 littleboy kernel: ? __die_body+0x1a/0x5c Jul 24 06:31:26 littleboy kernel: ? page_fault_oops+0x329/0x376 Jul 24 06:31:26 littleboy kernel: ? fixup_exception+0x22/0x24b Jul 24 06:31:26 littleboy kernel: ? exc_page_fault+0xf4/0x11d Jul 24 06:31:26 littleboy kernel: ? asm_exc_page_fault+0x22/0x30 Jul 24 06:31:26 littleboy kernel: ? rm_perform_version_check+0x37/0x150 [nvidia] Jul 24 06:31:26 littleboy kernel: ? _nv012504rm+0x3c/0x310 [nvidia] Jul 24 06:31:26 littleboy kernel: ? rm_perform_version_check+0x37/0x150 [nvidia] Jul 24 06:31:26 littleboy kernel: ? _nv049845rm+0xd6/0x1d0 [nvidia] Jul 24 06:31:26 littleboy kernel: ? rm_perform_version_check+0x37/0x150 [nvidia] Jul 24 06:31:26 littleboy kernel: ? nvidia_unlocked_ioctl+0x4b1/0x6c2 [nvidia] Jul 24 06:31:26 littleboy kernel: ? _raw_spin_unlock+0x14/0x29 Jul 24 06:31:26 littleboy kernel: ? do_fcntl+0x19a/0x569 Jul 24 06:31:26 littleboy kernel: ? vfs_ioctl+0x1b/0x2f Jul 24 06:31:26 littleboy kernel: ? __do_sys_ioctl+0x52/0x78 Jul 24 06:31:26 littleboy kernel: ? do_syscall_64+0x68/0x81 Jul 24 06:31:26 littleboy kernel: ? entry_SYSCALL_64_after_hwframe+0x64/0xce Jul 24 06:31:26 littleboy kernel: </TASK> Jul 24 06:31:26 littleboy kernel: Modules linked in: joydev uinput af_packet bluetooth ecdh_generic ecc xt_connmark xt_mark iptable_mangle xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha nvidia_uvm(PO) veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod tcp_diag inet_diag ipmi_devintf ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap bonding tls ixgbe xfrm_algo mdio igb intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal nvidia_drm(PO) coretemp nvidia_modeset(PO) kvm_intel zfs(PO) kvm nvidia(PO) zunicode(PO) zzstd(O) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 mgag200 zlua(O) sha1_ssse3 Jul 24 06:31:26 littleboy kernel: ipmi_ssif video drm_shmem_helper zavl(PO) aesni_intel crypto_simd drm_kms_helper icp(PO) cryptd zcommon(PO) znvpair(PO) drm rapl spl(O) wmi_bmof backlight intel_cstate i2c_i801 acpi_ipmi mei_me syscopyarea nvme i2c_algo_bit i2c_smbus intel_uncore sysfillrect ahci sysimgblt mei i2c_core nvme_core megaraid_sas fb_sys_fops libahci intel_pch_thermal wmi ipmi_si acpi_power_meter button unix [last unloaded: xfrm_algo] Jul 24 06:31:26 littleboy kernel: CR2: ffffc9000f477df8 Jul 24 06:31:26 littleboy kernel: ---[ end trace 0000000000000000 ]--- Jul 24 06:31:26 littleboy kernel: RIP: 0010:os_mem_copy_custom+0x2c/0x60 [nvidia] Jul 24 06:31:26 littleboy kernel: Code: 44 00 00 83 fa 7f 49 89 f8 48 89 f9 76 0b 48 89 f8 48 09 f0 83 e0 03 74 06 89 d2 31 c0 eb 25 89 d1 83 e2 03 c1 e9 02 8b 3c 86 <41> 89 3c 80 48 ff c0 39 c1 75 f2 89 c8 48 c1 e0 02 49 8d 0c 00 48 Jul 24 06:31:26 littleboy kernel: RSP: 0018:ffffc9000f477930 EFLAGS: 00010206 Jul 24 06:31:26 littleboy kernel: RAX: 0000000000000000 RBX: ffff8882faf61cd8 RCX: 0000000000000200 Jul 24 06:31:26 littleboy kernel: RDX: 0000000000000000 RSI: ffff888eb7b24008 RDI: 000000000000000d Jul 24 06:31:26 littleboy kernel: RBP: ffff88af099aab80 R08: ffffc90287112a00 R09: 0000000000000000 Jul 24 06:31:26 littleboy kernel: R10: 000000000010a804 R11: 0000000000000000 R12: 000000000000000d Jul 24 06:31:26 littleboy kernel: R13: ffff889924c88008 R14: ffff8885b15a2010 R15: ffff8885b15a2f13 Jul 24 06:31:26 littleboy kernel: FS: 000014e2068e31c0(0000) GS:ffff889ffffc0000(0000) knlGS:0000000000000000 Jul 24 06:31:26 littleboy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 24 06:31:26 littleboy kernel: CR2: ffffc9000f477df8 CR3: 00000006cfee8004 CR4: 00000000007706e0 Jul 24 06:31:26 littleboy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 24 06:31:26 littleboy kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jul 24 06:31:26 littleboy kernel: PKRU: 55555554 Jul 24 06:31:26 littleboy kernel: note: nvidia-smi[50540] exited with irqs disabled Jul 24 06:31:26 littleboy kernel: note: nvidia-smi[50540] exited with preempt_count 1 Jul 24 11:04:12 littleboy kernel: mdcmd (36): set md_write_method 1 Jul 24 11:04:12 littleboy kernel: Quote Link to comment
JorgeB Posted July 24 Share Posted July 24 3 hours ago, skler said: Could be due to the driver update done without a reboot? I don't know, but the crash is Nvidia related for sure, you can try asking in the plugin support thread. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.