Invincible Posted February 26 Share Posted February 26 I recently changed my server hardware (CPU, MB, RAM) and since doing that i've been dealing with random shutdowns overnight. I can see the computer is still on since my motherboard still has power and the Q-Code LED is still showing me CPU temp but i get nothing on a monitor if i connect to it and I can't ping the server either. All I can do is reboot to get it back online. I'm seeing xfs corruption on my disks in the log but i've tried running xfs_repair through the webUI with just changing the flag from -n to -v but that doesn't seem to resolve it either. Is there something else i'm missing? Feb 26 09:39:23 Anton kernel: XFS (dm-3): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x1876666ce dinode While i was reading into this error I saw it could be either disk3 or disk4 so i ran the repair on both but I still see this error after anton-diagnostics-20240226-0945.zip Quote Link to comment
JorgeB Posted February 26 Share Posted February 26 Enable the syslog server and post that after a crash. Quote Link to comment
JorgeB Posted February 26 Share Posted February 26 9 minutes ago, Invincible said: While i was reading into this error I saw it could be either disk3 or disk4 so i ran the repair on both but I still see this error after dm-3 is your cache device, check filesystem without -n Quote Link to comment
Invincible Posted February 26 Author Share Posted February 26 I enabled it on a cache only share on my server, when does the file populate? I thought it would start posting the logs there instantly I'll keep it running till i see a crash again and then post here Quote Link to comment
JorgeB Posted February 26 Share Posted February 26 10 minutes ago, Invincible said: I thought it would start posting the logs there instantly It does, check the configuration, there's a common mistake users make with the remote server IP, or just enable mirror to flash drive. Quote Link to comment
Invincible Posted February 27 Author Share Posted February 27 syslog-previous Here is the syslog after the shutdown Quote Link to comment
Invincible Posted February 27 Author Share Posted February 27 7 hours ago, JorgeB said: dm-3 is your cache device, check filesystem without -n Thanks this worked for the other error i was getting! I'm not seeing that now Quote Link to comment
JorgeB Posted February 27 Share Posted February 27 Feb 26 14:44:28 Anton kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Feb 26 14:44:28 Anton kernel: ? _raw_spin_unlock+0x14/0x29 Feb 26 14:44:28 Anton kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot. Quote Link to comment
Invincible Posted February 27 Author Share Posted February 27 11 hours ago, JorgeB said: Feb 26 14:44:28 Anton kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Feb 26 14:44:28 Anton kernel: ? _raw_spin_unlock+0x14/0x29 Feb 26 14:44:28 Anton kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot. Server hasnt crashed since this change but just wondering if this is something I should be concerned about? Feb 27 12:04:08 Anton kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI Feb 27 12:04:08 Anton kernel: CPU: 8 PID: 2951 Comm: find Tainted: P O 6.1.74-Unraid #1 Feb 27 12:04:08 Anton kernel: Hardware name: ASUS System Product Name/ROG STRIX Z790-E GAMING WIFI, BIOS 2001 02/15/2024 Feb 27 12:04:08 Anton kernel: RIP: 0010:task_work_run+0x70/0x80 Feb 27 12:04:08 Anton kernel: Code: 8d a5 74 07 00 00 4c 89 e7 e8 50 68 7f 00 4c 89 e7 e8 2e 69 7f 00 48 89 df 48 8b 1b 48 8b 47 08 ff d0 0f 1f 00 e8 1e 29 7f 00 <48> 85 db 75 e7 eb 9b 5b 5d 41 5c c3 cc cc cc cc 0f 1f 44 00 00 48 Feb 27 12:04:08 Anton kernel: RSP: 0018:ffffc9003b187ef0 EFLAGS: 00010286 Feb 27 12:04:08 Anton kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000017ab Feb 27 12:04:08 Anton kernel: RDX: 0000000080000000 RSI: ffffffff820d8766 RDI: 0000000000000008 Feb 27 12:04:08 Anton kernel: RBP: ffff8889bdf0a000 R08: 0000000000000000 R09: ffff888966d10538 Feb 27 12:04:08 Anton kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8889bdf0a774 Feb 27 12:04:08 Anton kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Feb 27 12:04:08 Anton kernel: FS: 0000150524948740(0000) GS:ffff88903f200000(0000) knlGS:0000000000000000 Feb 27 12:04:08 Anton kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 27 12:04:08 Anton kernel: CR2: 000014c0da18a000 CR3: 00000009bb6cc000 CR4: 0000000000752ee0 Feb 27 12:04:08 Anton kernel: PKRU: 55555554 Feb 27 12:04:08 Anton kernel: Call Trace: Feb 27 12:04:08 Anton kernel: <TASK> Feb 27 12:04:08 Anton kernel: ? __die_body+0x1a/0x5c Feb 27 12:04:08 Anton kernel: ? die+0x30/0x49 Feb 27 12:04:08 Anton kernel: ? do_trap+0x7b/0xfe Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80 Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80 Feb 27 12:04:08 Anton kernel: ? do_error_trap+0x6e/0x98 Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80 Feb 27 12:04:08 Anton kernel: ? exc_invalid_op+0x4c/0x60 Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80 Feb 27 12:04:08 Anton kernel: ? asm_exc_invalid_op+0x16/0x20 Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80 Feb 27 12:04:08 Anton kernel: exit_to_user_mode_prepare+0x75/0x112 Feb 27 12:04:08 Anton kernel: syscall_exit_to_user_mode+0x18/0x2c Feb 27 12:04:08 Anton kernel: do_syscall_64+0x77/0x81 Feb 27 12:04:08 Anton kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce Feb 27 12:04:08 Anton kernel: RIP: 0033:0x150524a4f190 Feb 27 12:04:08 Anton kernel: Code: 8b 05 8c 3c 0e 00 64 c7 00 0d 00 00 00 eb a9 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d 51 c4 0e 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c Feb 27 12:04:08 Anton kernel: RSP: 002b:00007ffec98eb758 EFLAGS: 00000202 ORIG_RAX: 0000000000000003 Feb 27 12:04:08 Anton kernel: RAX: 0000000000000000 RBX: 0000000000444c90 RCX: 0000150524a4f190 Feb 27 12:04:08 Anton kernel: RDX: 0000000000000000 RSI: 000000000000000c RDI: 000000000000000c Feb 27 12:04:08 Anton kernel: RBP: 000000000000000b R08: 000000000000000b R09: 000000000044ed30 Feb 27 12:04:08 Anton kernel: R10: 0000000000000100 R11: 0000000000000202 R12: 0000000000000000 Feb 27 12:04:08 Anton kernel: R13: 000000000000000b R14: 000000000000000b R15: 0000000000444c90 Feb 27 12:04:08 Anton kernel: </TASK> Feb 27 12:04:08 Anton kernel: Modules linked in: xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle xt_nat xt_tcpudp vhost_net tun vhost vhost_iotlb tap veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs dm_crypt dm_mod md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper btusb crct10dif_pclmul btrtl crc32_pclmul crc32c_intel btbcm drm_kms_helper ghash_clmulni_intel btintel sha512_ssse3 Feb 27 12:04:08 Anton kernel: sha256_ssse3 sha1_ssse3 bluetooth aesni_intel drm crypto_simd cryptd mei_hdcp mei_pxp intel_gtt rapl ecdh_generic ecc intel_cstate wmi_bmof mpt3sas agpgart i2c_i801 mei_me intel_uncore nvme i2c_smbus mei igc ahci i2c_core syscopyarea nvme_core raid_class libahci sysfillrect scsi_transport_sas vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix Feb 27 12:04:08 Anton kernel: ---[ end trace 0000000000000000 ]--- Feb 27 12:04:08 Anton kernel: RIP: 0010:task_work_run+0x70/0x80 Feb 27 12:04:08 Anton kernel: Code: 8d a5 74 07 00 00 4c 89 e7 e8 50 68 7f 00 4c 89 e7 e8 2e 69 7f 00 48 89 df 48 8b 1b 48 8b 47 08 ff d0 0f 1f 00 e8 1e 29 7f 00 <48> 85 db 75 e7 eb 9b 5b 5d 41 5c c3 cc cc cc cc 0f 1f 44 00 00 48 Feb 27 12:04:08 Anton kernel: RSP: 0018:ffffc9003b187ef0 EFLAGS: 00010286 Feb 27 12:04:08 Anton kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000017ab Feb 27 12:04:08 Anton kernel: RDX: 0000000080000000 RSI: ffffffff820d8766 RDI: 0000000000000008 Feb 27 12:04:08 Anton kernel: RBP: ffff8889bdf0a000 R08: 0000000000000000 R09: ffff888966d10538 Feb 27 12:04:08 Anton kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8889bdf0a774 Feb 27 12:04:08 Anton kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Feb 27 12:04:08 Anton kernel: FS: 0000150524948740(0000) GS:ffff88903f200000(0000) knlGS:0000000000000000 Feb 27 12:04:08 Anton kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 27 12:04:08 Anton kernel: CR2: 000014c0da18a000 CR3: 00000009bb6cc000 CR4: 0000000000752ee0 Feb 27 12:04:08 Anton kernel: PKRU: 55555554 syslog Quote Link to comment
trurl Posted February 27 Share Posted February 27 1 hour ago, Invincible said: wondering if this is something I should be concerned about? Have you done memtest recently? Quote Link to comment
Invincible Posted February 28 Author Share Posted February 28 4 hours ago, trurl said: Have you done memtest recently? This is a fresh build so no, should I do the one included in my bios or is there another recommended one? Quote Link to comment
itimpi Posted February 28 Share Posted February 28 3 hours ago, Invincible said: This is a fresh build so no, should I do the one included in my bios or is there another recommended one? There is one included on the Unraid boot menu. That version only works if you boot in Legacy mode, but you can get a newer one that will boot in UEFI mode from memtest86.com Quote Link to comment
Invincible Posted February 29 Author Share Posted February 29 (edited) 19 hours ago, itimpi said: There is one included on the Unraid boot menu. That version only works if you boot in Legacy mode, but you can get a newer one that will boot in UEFI mode from memtest86.com It ran 6 passes in the last 7.5 hours with 0 errors, is there anything else it could be? I've already tried swapping ram to a new pair and that didnt resolve it. Edited February 29 by Invincible Quote Link to comment
Invincible Posted February 29 Author Share Posted February 29 Here are the logs from the latest crashsyslog-previous Quote Link to comment
JorgeB Posted February 29 Share Posted February 29 Unraid driver is crashing in the latest logs, that's almost always a hardware issue, since memtest is only definitive if it finds an error, disable XMP and try with just one stick of RAM, if the same try the other one, that will basically rule out a RAM issue. Quote Link to comment
Invincible Posted April 1 Author Share Posted April 1 (edited) On 2/29/2024 at 1:17 AM, JorgeB said: Unraid driver is crashing in the latest logs, that's almost always a hardware issue, since memtest is only definitive if it finds an error, disable XMP and try with just one stick of RAM, if the same try the other one, that will basically rule out a RAM issue. I ran the unraid built in memtest for 24 hours with both ram sticks and it found no errors. I never had XMP enabled in the first place. Is there anything else I can test? I attached the latest syslog with a crash as well syslog-previous Edited April 1 by Invincible Quote Link to comment
JorgeB Posted April 1 Share Posted April 1 Unfortunately memtest is only definitive if it finds an error, try with just one stick of RAM, if the same try the other one, that will basically rule out a RAM issue, next suspects would be board/CPU. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.