Jump to content

Random shutdowns overnight


Recommended Posts

I recently changed my server hardware (CPU, MB, RAM) and since doing that i've been dealing with random shutdowns overnight. I can see the computer is still on since my motherboard still has power and the Q-Code LED is still showing me CPU temp but i get nothing on a monitor if i connect to it and I can't ping the server either. All I can do is reboot to get it back online.

I'm seeing xfs corruption on my disks in the log but i've tried running xfs_repair through the webUI with just changing the flag from -n to -v but that doesn't seem to resolve it either. Is there something else i'm missing?

 

Feb 26 09:39:23 Anton kernel: XFS (dm-3): Metadata corruption detected at xfs_dinode_verify+0xa0/0x732 [xfs], inode 0x1876666ce dinode


While i was reading into this error I saw it could be either disk3 or disk4 so i ran the repair on both but I still see this error after

anton-diagnostics-20240226-0945.zip

Link to comment
Feb 26 14:44:28 Anton kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Feb 26 14:44:28 Anton kernel: ? _raw_spin_unlock+0x14/0x29
Feb 26 14:44:28 Anton kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

 

Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot.

Link to comment
11 hours ago, JorgeB said:
Feb 26 14:44:28 Anton kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Feb 26 14:44:28 Anton kernel: ? _raw_spin_unlock+0x14/0x29
Feb 26 14:44:28 Anton kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]

 

Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot.

 

Server hasnt crashed since this change but just wondering if this is something I should be concerned about?

 

Feb 27 12:04:08 Anton kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Feb 27 12:04:08 Anton kernel: CPU: 8 PID: 2951 Comm: find Tainted: P           O       6.1.74-Unraid #1
Feb 27 12:04:08 Anton kernel: Hardware name: ASUS System Product Name/ROG STRIX Z790-E GAMING WIFI, BIOS 2001 02/15/2024
Feb 27 12:04:08 Anton kernel: RIP: 0010:task_work_run+0x70/0x80
Feb 27 12:04:08 Anton kernel: Code: 8d a5 74 07 00 00 4c 89 e7 e8 50 68 7f 00 4c 89 e7 e8 2e 69 7f 00 48 89 df 48 8b 1b 48 8b 47 08 ff d0 0f 1f 00 e8 1e 29 7f 00 <48> 85 db 75 e7 eb 9b 5b 5d 41 5c c3 cc cc cc cc 0f 1f 44 00 00 48
Feb 27 12:04:08 Anton kernel: RSP: 0018:ffffc9003b187ef0 EFLAGS: 00010286
Feb 27 12:04:08 Anton kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000017ab
Feb 27 12:04:08 Anton kernel: RDX: 0000000080000000 RSI: ffffffff820d8766 RDI: 0000000000000008
Feb 27 12:04:08 Anton kernel: RBP: ffff8889bdf0a000 R08: 0000000000000000 R09: ffff888966d10538
Feb 27 12:04:08 Anton kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8889bdf0a774
Feb 27 12:04:08 Anton kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Feb 27 12:04:08 Anton kernel: FS:  0000150524948740(0000) GS:ffff88903f200000(0000) knlGS:0000000000000000
Feb 27 12:04:08 Anton kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 27 12:04:08 Anton kernel: CR2: 000014c0da18a000 CR3: 00000009bb6cc000 CR4: 0000000000752ee0
Feb 27 12:04:08 Anton kernel: PKRU: 55555554
Feb 27 12:04:08 Anton kernel: Call Trace:
Feb 27 12:04:08 Anton kernel: <TASK>
Feb 27 12:04:08 Anton kernel: ? __die_body+0x1a/0x5c
Feb 27 12:04:08 Anton kernel: ? die+0x30/0x49
Feb 27 12:04:08 Anton kernel: ? do_trap+0x7b/0xfe
Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80
Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80
Feb 27 12:04:08 Anton kernel: ? do_error_trap+0x6e/0x98
Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80
Feb 27 12:04:08 Anton kernel: ? exc_invalid_op+0x4c/0x60
Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80
Feb 27 12:04:08 Anton kernel: ? asm_exc_invalid_op+0x16/0x20
Feb 27 12:04:08 Anton kernel: ? task_work_run+0x70/0x80
Feb 27 12:04:08 Anton kernel: exit_to_user_mode_prepare+0x75/0x112
Feb 27 12:04:08 Anton kernel: syscall_exit_to_user_mode+0x18/0x2c
Feb 27 12:04:08 Anton kernel: do_syscall_64+0x77/0x81
Feb 27 12:04:08 Anton kernel: entry_SYSCALL_64_after_hwframe+0x64/0xce
Feb 27 12:04:08 Anton kernel: RIP: 0033:0x150524a4f190
Feb 27 12:04:08 Anton kernel: Code: 8b 05 8c 3c 0e 00 64 c7 00 0d 00 00 00 eb a9 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d 51 c4 0e 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
Feb 27 12:04:08 Anton kernel: RSP: 002b:00007ffec98eb758 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
Feb 27 12:04:08 Anton kernel: RAX: 0000000000000000 RBX: 0000000000444c90 RCX: 0000150524a4f190
Feb 27 12:04:08 Anton kernel: RDX: 0000000000000000 RSI: 000000000000000c RDI: 000000000000000c
Feb 27 12:04:08 Anton kernel: RBP: 000000000000000b R08: 000000000000000b R09: 000000000044ed30
Feb 27 12:04:08 Anton kernel: R10: 0000000000000100 R11: 0000000000000202 R12: 0000000000000000
Feb 27 12:04:08 Anton kernel: R13: 000000000000000b R14: 000000000000000b R15: 0000000000444c90
Feb 27 12:04:08 Anton kernel: </TASK>
Feb 27 12:04:08 Anton kernel: Modules linked in: xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle xt_nat xt_tcpudp vhost_net tun vhost vhost_iotlb tap veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs dm_crypt dm_mod md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc bonding tls i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper btusb crct10dif_pclmul btrtl crc32_pclmul crc32c_intel btbcm drm_kms_helper ghash_clmulni_intel btintel sha512_ssse3
Feb 27 12:04:08 Anton kernel: sha256_ssse3 sha1_ssse3 bluetooth aesni_intel drm crypto_simd cryptd mei_hdcp mei_pxp intel_gtt rapl ecdh_generic ecc intel_cstate wmi_bmof mpt3sas agpgart i2c_i801 mei_me intel_uncore nvme i2c_smbus mei igc ahci i2c_core syscopyarea nvme_core raid_class libahci sysfillrect scsi_transport_sas vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix
Feb 27 12:04:08 Anton kernel: ---[ end trace 0000000000000000 ]---
Feb 27 12:04:08 Anton kernel: RIP: 0010:task_work_run+0x70/0x80
Feb 27 12:04:08 Anton kernel: Code: 8d a5 74 07 00 00 4c 89 e7 e8 50 68 7f 00 4c 89 e7 e8 2e 69 7f 00 48 89 df 48 8b 1b 48 8b 47 08 ff d0 0f 1f 00 e8 1e 29 7f 00 <48> 85 db 75 e7 eb 9b 5b 5d 41 5c c3 cc cc cc cc 0f 1f 44 00 00 48
Feb 27 12:04:08 Anton kernel: RSP: 0018:ffffc9003b187ef0 EFLAGS: 00010286
Feb 27 12:04:08 Anton kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000017ab
Feb 27 12:04:08 Anton kernel: RDX: 0000000080000000 RSI: ffffffff820d8766 RDI: 0000000000000008
Feb 27 12:04:08 Anton kernel: RBP: ffff8889bdf0a000 R08: 0000000000000000 R09: ffff888966d10538
Feb 27 12:04:08 Anton kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8889bdf0a774
Feb 27 12:04:08 Anton kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Feb 27 12:04:08 Anton kernel: FS:  0000150524948740(0000) GS:ffff88903f200000(0000) knlGS:0000000000000000
Feb 27 12:04:08 Anton kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 27 12:04:08 Anton kernel: CR2: 000014c0da18a000 CR3: 00000009bb6cc000 CR4: 0000000000752ee0
Feb 27 12:04:08 Anton kernel: PKRU: 55555554

 

syslog

Link to comment
3 hours ago, Invincible said:

This is a fresh build so no, should I do the one included in my bios or is there another recommended one?

There is one included on the Unraid boot menu.    That version only works if you boot in Legacy mode, but you can get a newer one that will boot in UEFI mode from memtest86.com

Link to comment
19 hours ago, itimpi said:

There is one included on the Unraid boot menu.    That version only works if you boot in Legacy mode, but you can get a newer one that will boot in UEFI mode from memtest86.com

It ran 6 passes in the last 7.5 hours with 0 errors, is there anything else it could be? I've already tried swapping ram to a new pair and that didnt resolve it.

Edited by Invincible
Link to comment
  • 1 month later...
Posted (edited)
On 2/29/2024 at 1:17 AM, JorgeB said:

Unraid driver is crashing in the latest logs, that's almost always a hardware issue, since memtest is only definitive if it finds an error, disable XMP and try with just one stick of RAM, if the same try the other one, that will basically rule out a RAM issue.

I ran the unraid built in memtest for 24 hours with both ram sticks and it found no errors. I never had XMP enabled in the first place. Is there anything else I can test?
I attached the latest syslog with a crash as well

syslog-previous

IMG_1001.jpeg

Edited by Invincible
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...