Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Kernel issues, Bugs can't run parity

Featured Replies

Any insight?

[  104.608783] IPv6: ADDRCONF(NETDEV_CHANGE): veth17ee8b2: link becomes ready
[ 2310.522979] BUG: unable to handle page fault for address: 0000000000001358
[ 2310.523471] #PF: supervisor read access in kernel mode
[ 2310.523958] #PF: error_code(0x0000) - not-present page
[ 2310.524445] PGD 0 P4D 0
[ 2310.524920] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2310.525396] CPU: 8 PID: 15516 Comm: unraidd0 Tainted: P           O       6.1.118-Unraid #1
[ 2310.525884] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[ 2310.526375] RIP: 0010:bio_associate_blkg_from_css+0x166/0x18b
[ 2310.526875] Code: 7f 30 eb e9 e8 bb d2 cc ff eb 2d 48 8b 45 08 48 8b 80 58 03 00 00 48 8b b8 88 01 00 00 48 83 c7 38 e8 f3 f4 ff ff 48 8b 45 08 <48> 8b 80 58 03 00 00 4c 8b b8 88 01 00 00 4c 89 7d 48 48 83 c4 20
[ 2310.527921] RSP: 0018:ffffc90005bf7d68 EFLAGS: 00010202
[ 2310.528434] RAX: 0000000000001000 RBX: ffffffff829f1720 RCX: 0000000000000000
[ 2310.528952] RDX: ffff88813c68e000 RSI: ffffffff829f1720 RDI: ffff8881096eb038
[ 2310.529463] RBP: ffff888151f8e810 R08: 0000000000000000 R09: 0000000000000000
[ 2310.529971] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 2310.530470] R13: ffff888151f8e810 R14: ffff888151f8e888 R15: ffff88813e821a58
[ 2310.530967] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[ 2310.531472] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2310.531973] CR2: 0000000000001358 CR3: 000000000420a000 CR4: 0000000000750ee0
[ 2310.532476] PKRU: 55555554
[ 2310.532969] Call Trace:
[ 2310.533455]  <TASK>
[ 2310.533928]  ? __die_body+0x1a/0x5c
[ 2310.534402]  ? page_fault_oops+0x329/0x376
[ 2310.534872]  ? do_user_addr_fault+0x12e/0x465
[ 2310.535335]  ? exc_page_fault+0xfb/0x11d
[ 2310.535802]  ? asm_exc_page_fault+0x22/0x30
[ 2310.536258]  ? bio_associate_blkg_from_css+0x166/0x18b
[ 2310.536715]  ? bio_associate_blkg_from_css+0x162/0x18b
[ 2310.537153]  ? submit_bio_noacct_nocheck+0x134/0x269
[ 2310.537592]  bio_associate_blkg+0x2f/0x35
[ 2310.538018]  bio_init+0x59/0x92
[ 2310.538435]  unraidd+0xfe0/0x1140 [md_mod]
[ 2310.538854]  md_thread+0xf4/0x122 [md_mod]
[ 2310.539269]  ? _raw_spin_rq_lock_irqsave+0x20/0x20
[ 2310.539684]  ? signal_pending+0x1d/0x1d [md_mod]
[ 2310.540095]  kthread+0xe4/0xef
[ 2310.540499]  ? kthread_complete_and_exit+0x1b/0x1b
[ 2310.540906]  ret_from_fork+0x1f/0x30
[ 2310.541314]  </TASK>
[ 2310.541710] Modules linked in: ipvlan veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp zfs(PO) coretemp kvm_intel zunicode(PO) kvm zzstd(O) zlua(O) ast crct10dif_pclmul drm_vram_helper crc32_pclmul crc32c_intel i2c_algo_bit zavl(PO) ghash_clmulni_intel drm_ttm_helper sha512_ssse3 ttm sha256_ssse3 sha1_ssse3 icp(PO) aesni_intel drm_kms_helper crypto_simd cryptd drm zcommon(PO) rapl mei_hdcp mei_pxp
[ 2310.541741]  znvpair(PO) agpgart i2c_i801 ipmi_ssif intel_cstate syscopyarea spl(O) mpt3sas nvme sysfillrect mei_me i2c_smbus input_leds ahci wmi_bmof sysimgblt intel_uncore mpi3mr acpi_ipmi raid_class video joydev fb_sys_fops i2c_core led_class libahci nvme_core mei thermal scsi_transport_sas fan wmi backlight ipmi_si intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc]
[ 2310.547653] CR2: 0000000000001358
[ 2310.549533] ---[ end trace 0000000000000000 ]---
[ 2311.182237] RIP: 0010:bio_associate_blkg_from_css+0x166/0x18b
[ 2311.182893] Code: 7f 30 eb e9 e8 bb d2 cc ff eb 2d 48 8b 45 08 48 8b 80 58 03 00 00 48 8b b8 88 01 00 00 48 83 c7 38 e8 f3 f4 ff ff 48 8b 45 08 <48> 8b 80 58 03 00 00 4c 8b b8 88 01 00 00 4c 89 7d 48 48 83 c4 20
[ 2311.184209] RSP: 0018:ffffc90005bf7d68 EFLAGS: 00010202
[ 2311.184847] RAX: 0000000000001000 RBX: ffffffff829f1720 RCX: 0000000000000000
[ 2311.185487] RDX: ffff88813c68e000 RSI: ffffffff829f1720 RDI: ffff8881096eb038
[ 2311.186122] RBP: ffff888151f8e810 R08: 0000000000000000 R09: 0000000000000000
[ 2311.186759] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 2311.187392] R13: ffff888151f8e810 R14: ffff888151f8e888 R15: ffff88813e821a58
[ 2311.188026] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[ 2311.188675] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2311.189313] CR2: 0000000000001358 CR3: 000000000420a000 CR4: 0000000000750ee0
[ 2311.189946] PKRU: 55555554
[ 2311.190572] note: unraidd0[15516] exited with irqs disabled
[ 2311.191244] ------------[ cut here ]------------
[ 2311.191892] WARNING: CPU: 8 PID: 15516 at kernel/exit.c:816 do_exit+0x87/0x923
[ 2311.192546] Modules linked in: ipvlan veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp zfs(PO) coretemp kvm_intel zunicode(PO) kvm zzstd(O) zlua(O) ast crct10dif_pclmul drm_vram_helper crc32_pclmul crc32c_intel i2c_algo_bit zavl(PO) ghash_clmulni_intel drm_ttm_helper sha512_ssse3 ttm sha256_ssse3 sha1_ssse3 icp(PO) aesni_intel drm_kms_helper crypto_simd cryptd drm zcommon(PO) rapl mei_hdcp mei_pxp
[ 2311.192575]  znvpair(PO) agpgart i2c_i801 ipmi_ssif intel_cstate syscopyarea spl(O) mpt3sas nvme sysfillrect mei_me i2c_smbus input_leds ahci wmi_bmof sysimgblt intel_uncore mpi3mr acpi_ipmi raid_class video joydev fb_sys_fops i2c_core led_class libahci nvme_core mei thermal scsi_transport_sas fan wmi backlight ipmi_si intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc]
[ 2311.201228] CPU: 8 PID: 15516 Comm: unraidd0 Tainted: P      D    O       6.1.118-Unraid #1
[ 2311.202132] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[ 2311.203041] RIP: 0010:do_exit+0x87/0x923
[ 2311.203948] Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 1f 47 81 00 48 83 bb b0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 21 46 81 00 48 8b 83 d0 06 00 00 83
[ 2311.205819] RSP: 0018:ffffc90005bf7ee0 EFLAGS: 00010286
[ 2311.206758] RAX: 0000000000000000 RBX: ffff88813c68e000 RCX: 0000000000000000
[ 2311.207709] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
[ 2311.208640] RBP: 0000000000000009 R08: 0000000000000000 R09: ffffc90002c8c020
[ 2311.209547] R10: 0000000000aaaaaa R11: 0000000000000001 R12: ffff88810206b000
[ 2311.210438] R13: ffff888102067380 R14: 0000000000000000 R15: 0000000000000000
[ 2311.211308] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[ 2311.212167] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2311.213006] CR2: 0000000000001358 CR3: 000000000420a000 CR4: 0000000000750ee0
[ 2311.213833] PKRU: 55555554
[ 2311.214635] Call Trace:
[ 2311.215412]  <TASK>
[ 2311.216168]  ? __warn+0xab/0x122
[ 2311.216906]  ? report_bug+0x109/0x17e
[ 2311.217622]  ? do_exit+0x87/0x923
[ 2311.218335]  ? handle_bug+0x41/0x6f
[ 2311.219042]  ? exc_invalid_op+0x13/0x60
[ 2311.219736]  ? asm_exc_invalid_op+0x16/0x20
[ 2311.220415]  ? do_exit+0x87/0x923
[ 2311.221086]  make_task_dead+0x11c/0x11c
[ 2311.221752]  rewind_stack_and_make_dead+0x17/0x17
[ 2311.222419] RIP: 0000:0x0
[ 2311.223082] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 2311.223759] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[ 2311.224441] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2311.225115] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2311.225778] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2311.226428] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 2311.227059] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2311.227693]  </TASK>
[ 2311.228313] ---[ end trace 0000000000000000 ]---
[11200.506017] .NET ThreadPool[18089]: segfault at 3b6303668 ip 000014e8b7bc3abb sp 000014e8b62febf0 error 6 in libclrjit.so[14e8b7aa6000+1e1000] likely on CPU 8 (core 16, socket 0)
[11200.507245] Code: ff ff ff ff 41 89 46 14 e9 df fd ff ff 48 8d 7d b0 4c 89 fe 41 89 d8 48 8b 5d 80 48 89 da 4c 89 e1 e8 f9 f1 ff ff 48 8b 45 c0 <49> 89 46 10 0f 10 45 b0 41 0f 11 06 4c 89 ff 48 89 de 4c 89 e2 4c
[14172.070513] elogind-daemon[2090]: New session c1 of user root.

 

Plex, downloads, etc seem to work fine but parity hangs at a low percentage and I can't kill it.  Had to reboot.

beyonder-nas-diagnostics-20250206-2049.zip

Solved by scs3jb

  • Author

Here's parity dying

 

Screenshot 2025-02-06 210939.png

  • Author

If i'm seeing this right I've either got hardware failure (I really hope not) or there's something wrong with unraidd0 on kernel 6.1.118-Unraid

[ 2310.522979] BUG: unable to handle page fault for address: 0000000000001358
[ 2310.523471] #PF: supervisor read access in kernel mode
[ 2310.523958] #PF: error_code(0x0000) - not-present page
[ 2310.524445] PGD 0 P4D 0
[ 2310.524920] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2310.525396] CPU: 8 PID: 15516 Comm: unraidd0 Tainted: P           O       6.1.118-Unraid #1

 

Going to try roll back to 6.12.13 with 6.1.106-Unraid, and see if i can get a valid parity check.

  • Author
[  546.817281] kernel BUG at drivers/md/unraid.c:1617!
[  546.817796] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  546.818279] CPU: 8 PID: 15431 Comm: unraidd0 Tainted: P           O       6.1.106-Unraid #1
[  546.818772] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[  546.819262] RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
[  546.819759] Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 83 65 a0 48 8b 73 20 e8 82 1e 21 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
[  546.820792] RSP: 0018:ffffc900018a7df0 EFLAGS: 00010246
[  546.821305] RAX: 0000000000000000 RBX: ffff888171e428d8 RCX: 0000000000000000
[  546.821826] RDX: 0000000000000000 RSI: ffffffff829f0720 RDI: ffff8881312a9a38
[  546.822343] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[  546.822858] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888141cda120
[  546.823367] R13: ffff888171e42c30 R14: ffff888171e42ca8 R15: ffff88813e25d458
[  546.823868] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  546.824375] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  546.824901] CR2: 0000146db4f7a840 CR3: 000000000420a005 CR4: 0000000000770ee0
[  546.825422] PKRU: 55555554
[  546.825936] Call Trace:
[  546.826423]  <TASK>
[  546.826900]  ? __die_body+0x1a/0x5c
[  546.827386]  ? die+0x30/0x49
[  546.827847]  ? do_trap+0x7b/0xfe
[  546.828302]  ? unraidd+0x1051/0x1140 [md_mod]
[  546.828752]  ? unraidd+0x1051/0x1140 [md_mod]
[  546.829194]  ? do_error_trap+0x6e/0x98
[  546.829647]  ? unraidd+0x1051/0x1140 [md_mod]
[  546.830114]  ? exc_invalid_op+0x4c/0x60
[  546.830568]  ? unraidd+0x1051/0x1140 [md_mod]
[  546.831019]  ? asm_exc_invalid_op+0x16/0x20
[  546.831451]  ? unraidd+0x1051/0x1140 [md_mod]
[  546.831861]  md_thread+0xf4/0x122 [md_mod]
[  546.832263]  ? _raw_spin_rq_lock_irqsave+0x20/0x20
[  546.832664]  ? signal_pending+0x1d/0x1d [md_mod]
[  546.833063]  kthread+0xe4/0xef
[  546.833453]  ? kthread_complete_and_exit+0x1b/0x1b
[  546.833849]  ret_from_fork+0x1f/0x30
[  546.834240]  </TASK>
[  546.834620] Modules linked in: veth ipvlan xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e zfs(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm zunicode(PO) ast zzstd(O) drm_vram_helper i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_kms_helper sha512_ssse3 zlua(O) sha256_ssse3 sha1_ssse3 zavl(PO) aesni_intel mei_hdcp crypto_simd cryptd icp(PO) mei_pxp drm zcommon(PO) rapl
[  546.834652]  znvpair(PO) spl(O) ipmi_ssif intel_cstate mei_me agpgart i2c_i801 wmi_bmof mpt3sas syscopyarea mpi3mr sysfillrect i2c_smbus input_leds sysimgblt nvme ahci raid_class intel_uncore joydev led_class acpi_ipmi fb_sys_fops i2c_core mei nvme_core scsi_transport_sas libahci thermal fan video wmi ipmi_si backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc]
[  546.840354] ---[ end trace 0000000000000000 ]---
[  547.483269] RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
[  547.483904] Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 83 65 a0 48 8b 73 20 e8 82 1e 21 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
[  547.485209] RSP: 0018:ffffc900018a7df0 EFLAGS: 00010246
[  547.485837] RAX: 0000000000000000 RBX: ffff888171e428d8 RCX: 0000000000000000
[  547.486459] RDX: 0000000000000000 RSI: ffffffff829f0720 RDI: ffff8881312a9a38
[  547.487075] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[  547.487694] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888141cda120
[  547.488311] R13: ffff888171e42c30 R14: ffff888171e42ca8 R15: ffff88813e25d458
[  547.488924] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  547.489553] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  547.490174] CR2: 0000146db4f7a840 CR3: 000000000420a005 CR4: 0000000000770ee0
[  547.490804] PKRU: 55555554
[  547.491418] ------------[ cut here ]------------
[  547.492026] WARNING: CPU: 8 PID: 15431 at kernel/exit.c:816 do_exit+0x87/0x923
[  547.492639] Modules linked in: veth ipvlan xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e zfs(PO) intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm zunicode(PO) ast zzstd(O) drm_vram_helper i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_kms_helper sha512_ssse3 zlua(O) sha256_ssse3 sha1_ssse3 zavl(PO) aesni_intel mei_hdcp crypto_simd cryptd icp(PO) mei_pxp drm zcommon(PO) rapl
[  547.492670]  znvpair(PO) spl(O) ipmi_ssif intel_cstate mei_me agpgart i2c_i801 wmi_bmof mpt3sas syscopyarea mpi3mr sysfillrect i2c_smbus input_leds sysimgblt nvme ahci raid_class intel_uncore joydev led_class acpi_ipmi fb_sys_fops i2c_core mei nvme_core scsi_transport_sas libahci thermal fan video wmi ipmi_si backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc]
[  547.501188] CPU: 8 PID: 15431 Comm: unraidd0 Tainted: P      D    O       6.1.106-Unraid #1
[  547.502085] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[  547.502981] RIP: 0010:do_exit+0x87/0x923
[  547.503882] Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 41 30 81 00 48 83 bb b0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 43 2f 81 00 48 8b 83 d0 06 00 00 83
[  547.505729] RSP: 0018:ffffc900018a7ee0 EFLAGS: 00010286
[  547.506650] RAX: 0000000000000000 RBX: ffff88813cf7b000 RCX: 0000000000000000
[  547.507578] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
[  547.508505] RBP: 000000000000000b R08: 0000000000000000 R09: ffffc900016bc020
[  547.509421] R10: 0000000000aaaaaa R11: 0000000000000001 R12: ffff888130b7f400
[  547.510325] R13: ffff888104d90840 R14: 0000000000000002 R15: ffffffff820b3185
[  547.511223] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  547.512110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  547.512975] CR2: 0000146db4f7a840 CR3: 000000000420a005 CR4: 0000000000770ee0
[  547.513836] PKRU: 55555554
[  547.514663] Call Trace:
[  547.515466]  <TASK>
[  547.516250]  ? __warn+0xab/0x122
[  547.517009]  ? report_bug+0x109/0x17e
[  547.517751]  ? do_exit+0x87/0x923
[  547.518473]  ? handle_bug+0x41/0x6f
[  547.519166]  ? exc_invalid_op+0x13/0x60
[  547.519868]  ? asm_exc_invalid_op+0x16/0x20
[  547.520559]  ? do_exit+0x87/0x923
[  547.521222]  make_task_dead+0x11c/0x11c
[  547.521879]  rewind_stack_and_make_dead+0x17/0x17
[  547.522539] RIP: 0000:0x0
[  547.523190] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  547.523861] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[  547.524549] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  547.525219] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  547.525876] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  547.526526] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  547.527159] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  547.527791]  </TASK>

 

No luck.

  • Community Expert

Disable Docker and VM Manager in Settings, reboot in SAFE mode, and see if you can check parity.

  • Author
44 minutes ago, trurl said:

Disable Docker and VM Manager in Settings, reboot in SAFE mode, and see if you can check parity.

No luck, same issue:

[  272.118236] ------------[ cut here ]------------
[  272.118238] kernel BUG at drivers/md/unraid.c:1617!
[  272.118643] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  272.119013] CPU: 8 PID: 11802 Comm: unraidd0 Tainted: P           O       6.6.68-Unraid #1
[  272.119392] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[  272.119775] RIP: 0010:unraidd+0x1189/0x1278 [md_mod]
[  272.120153] Code: 00 83 3d 01 80 fb ff 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 b3 ac a0 48 8b 73 20 e8 7b c4 5c e0 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 18 41 c7 46 b0 00 10 00 00 49 8b 56 10
[  272.120925] RSP: 0018:ffffc900014cbda8 EFLAGS: 00010246
[  272.121300] RAX: 0000000000000000 RBX: ffff88816c655f38 RCX: 0000000000000000
[  272.121680] RDX: 0000000000000000 RSI: ffffffff82cb9420 RDI: ffff888108682438
[  272.122060] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
[  272.122439] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88815653c928
[  272.122813] R13: ffff88816c656340 R14: ffff88816c6563b8 R15: ffff888144431540
[  272.123184] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  272.123560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.123923] CR2: 000015112a864fd8 CR3: 0000000005416000 CR4: 0000000000750ee0
[  272.124289] PKRU: 55555554
[  272.124650] Call Trace:
[  272.124997]  <TASK>
[  272.125344]  ? __die_body+0x1a/0x5c
[  272.125693]  ? die+0x30/0x49
[  272.126027]  ? do_trap+0x7b/0xfe
[  272.126357]  ? unraidd+0x1189/0x1278 [md_mod]
[  272.126684]  ? unraidd+0x1189/0x1278 [md_mod]
[  272.126995]  ? do_error_trap+0x6e/0x98
[  272.127308]  ? unraidd+0x1189/0x1278 [md_mod]
[  272.127624]  ? exc_invalid_op+0x4c/0x60
[  272.127937]  ? unraidd+0x1189/0x1278 [md_mod]
[  272.128247]  ? asm_exc_invalid_op+0x16/0x20
[  272.128548]  ? unraidd+0x1189/0x1278 [md_mod]
[  272.128847]  ? unraidd+0x1159/0x1278 [md_mod]
[  272.129137]  ? preempt_latency_start+0x2b/0x46
[  272.129420]  ? preempt_latency_start+0x2b/0x46
[  272.129693]  md_thread+0xf7/0x127 [md_mod]
[  272.129968]  ? __pfx_autoremove_wake_function+0x10/0x10
[  272.130246]  ? __pfx_md_thread+0x10/0x10 [md_mod]
[  272.130526]  kthread+0xf1/0xfc
[  272.130801]  ? __pfx_kthread+0x10/0x10
[  272.131063]  ret_from_fork+0x21/0x36
[  272.131329]  ? __pfx_kthread+0x10/0x10
[  272.131594]  ret_from_fork_asm+0x1b/0x30
[  272.131860]  </TASK>
[  272.132117] Modules linked in: ipmi_devintf md_mod i915 drm_buddy ttm drm_display_helper intel_gtt agpgart iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel ast crypto_simd drm_shmem_helper cryptd i2c_algo_bit drm_kms_helper mei_pxp mei_hdcp ipmi_ssif zfs(PO) i2c_i801 rapl intel_cstate acpi_ipmi mei_me i2c_smbus drm spl(O) video ahci input_leds wmi_bmof intel_uncore led_class i2c_core mei ipmi_si libahci joydev backlight wmi thermal acpi_pad fan acpi_tad nvme mpt3sas mpi3mr nvme_core raid_class button
[  272.132154]  scsi_transport_sas [last unloaded: igc]
[  272.135364] ---[ end trace 0000000000000000 ]---
[  272.778589] pstore: backend (erst) writing error (-28)
[  272.778977] RIP: 0010:unraidd+0x1189/0x1278 [md_mod]
[  272.779356] Code: 00 83 3d 01 80 fb ff 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 b3 ac a0 48 8b 73 20 e8 7b c4 5c e0 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 18 41 c7 46 b0 00 10 00 00 49 8b 56 10
[  272.780149] RSP: 0018:ffffc900014cbda8 EFLAGS: 00010246
[  272.780524] RAX: 0000000000000000 RBX: ffff88816c655f38 RCX: 0000000000000000
[  272.780913] RDX: 0000000000000000 RSI: ffffffff82cb9420 RDI: ffff888108682438
[  272.781321] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
[  272.781702] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88815653c928
[  272.782080] R13: ffff88816c656340 R14: ffff88816c6563b8 R15: ffff888144431540
[  272.782450] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  272.782826] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.783208] CR2: 000015112a864fd8 CR3: 0000000005416000 CR4: 0000000000750ee0
[  272.783589] PKRU: 55555554
[  272.783969] ------------[ cut here ]------------
[  272.784350] WARNING: CPU: 8 PID: 11802 at kernel/exit.c:820 do_exit+0x81/0x90b
[  272.784747] Modules linked in: ipmi_devintf md_mod i915 drm_buddy ttm drm_display_helper intel_gtt agpgart iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel ast crypto_simd drm_shmem_helper cryptd i2c_algo_bit drm_kms_helper mei_pxp mei_hdcp ipmi_ssif zfs(PO) i2c_i801 rapl intel_cstate acpi_ipmi mei_me i2c_smbus drm spl(O) video ahci input_leds wmi_bmof intel_uncore led_class i2c_core mei ipmi_si libahci joydev backlight wmi thermal acpi_pad fan acpi_tad nvme mpt3sas mpi3mr nvme_core raid_class button
[  272.784778]  scsi_transport_sas [last unloaded: igc]
[  272.789193] CPU: 8 PID: 11802 Comm: unraidd0 Tainted: P      D    O       6.6.68-Unraid #1
[  272.789729] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[  272.790271] RIP: 0010:do_exit+0x81/0x90b
[  272.790814] Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 a2 e9 9b 00 48 83 bb c0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 6c e8 9b 00 48 8b 83 d0 06 00 00 83
[  272.791955] RSP: 0018:ffffc900014cbee0 EFLAGS: 00010286
[  272.792499] RAX: 0000000000000000 RBX: ffff8881090f7000 RCX: 0000000000000000
[  272.793060] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
[  272.793614] RBP: 000000000000000b R08: 0000000000000000 R09: ffff888146ed7800
[  272.794171] R10: 0000000000000001 R11: ffffc90002a00020 R12: ffff888106024400
[  272.794721] R13: ffff888108e70000 R14: 0000000000000002 R15: ffffffff8222702e
[  272.795292] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  272.795862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.796436] CR2: 000015112a864fd8 CR3: 0000000005416000 CR4: 0000000000750ee0
[  272.796990] PKRU: 55555554
[  272.797518] Call Trace:
[  272.798020]  <TASK>
[  272.798511]  ? __warn+0x99/0x11a
[  272.798996]  ? report_bug+0xd9/0x153
[  272.799459]  ? do_exit+0x81/0x90b
[  272.799910]  ? handle_bug+0x53/0x7c
[  272.800358]  ? exc_invalid_op+0x13/0x60
[  272.800796]  ? asm_exc_invalid_op+0x16/0x20
[  272.801241]  ? do_exit+0x81/0x90b
[  272.801673]  ? __pfx_md_thread+0x10/0x10 [md_mod]
[  272.802107]  ? kthread+0xf1/0xfc
[  272.802521]  make_task_dead+0x113/0x113
[  272.802935]  rewind_stack_and_make_dead+0x17/0x17
[  272.803347] RIP: 0000:0x0
[  272.803757] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  272.804181] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[  272.804621] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  272.805061] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  272.805480] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  272.805890] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  272.806306] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  272.806701]  </TASK>
[  272.807098] ---[ end trace 0000000000000000 ]---

Seem to be able to read/write fine, but parity check throws this kernel issue.

 

I can't shutdown if i hit this.

  • Community Expert
3 hours ago, scs3jb said:

roll back to 6.12.13

Was it working OK with that version before?

  • Community Expert

Unraid driver is crashing, this is almost always a hardware issue, RAM or in this case CPU, since it's a 13900K, would be my main suspects.

  • Author

OIkay, so for the CPU, I'm currently running:

docker run -t --rm polinux/stress-ng --cpu 20 --cpu-method fft --timeout 30m

Reading reddit that seems like the way to force a crash.   You think that's enough to prove it out?

 

Memory, I'm using kingston ECC memory and my understanding is if that was a problem I would be getting errors in syslog about memory corrections?  I didn't see anything.  Is the fact i'm using ECC that is not overclocked and from a kit enough to eliminate that? I've got 128GB so a memtest8 will take a long time.

 

Could this be caused by an LSI controller?  I have not seen any issues with reads or writes (no smart errors, etc) on a daily basis, other than parity just stopping but that seems to be related to this error.  The parity check dies really early on then drops to 256kb/s on the UI, but has zero activity on the array which is odd, and only a hard reboot gets the machine back.

 

I do occasionally see this:

[20606.488498] traps: .NET ThreadPool[2356421] general protection fault ip:149489986585 sp:1493edb36b08 error:0 in ld-musl-x86_64.so.1[149489959000+57000]
[23215.145055] .NET ThreadPool[2683605]: segfault at 1522ad06bc60 ip 00001522ad06bc60 sp 00001522abfa8078 error 15 likely on CPU 8 (core 16, socket 0)
[23215.145061] Code: 00 00 48 df 06 ad 22 15 00 00 98 e8 08 ad 22 15 00 00 20 bc 06 ad 22 15 00 00 00 00 00 00 00 00 00 00 e8 e8 08 ad 22 15 00 00 <c8> bb 06 ad 22 15 00 00 88 77 02 ad 22 15 00 00 90 b9 06 ad 22 15

but i believe this is Radarr. 

 

Screenshot 2025-02-07 at 10.33.14 am.png

  • Author
8 hours ago, trurl said:

Was it working OK with that version before?

I haven't rolled back that far yet.   I actually upgraded to 7.0.0 as part of the testing and running on that atm.  When stress-ng finishes, assuming it doesn't show a cpu issue, i'll try go further back and run a parity check.

 

To hedge my bets i've opened an RMA case with Intel in parallel.

 

 

 

edit:  first round of stress-ng has no issues.  I will do 4-5x more and if it passes, I'm not sure this is the CPU unless there's something else i can do to stress it more? 

Edited by scs3jb

  • Author

Ran 4x transcodes, and saturating the cpu with stress-ng with 100GB of ram usage.  No issues.

 

I rolled back to 6.12.11 now and trying a parity check.

 

edit:

[  494.598353] ------------[ cut here ]------------
[  494.598363] kernel BUG at drivers/md/unraid.c:1617!
[  494.598915] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  494.599427] CPU: 8 PID: 14798 Comm: unraidd0 Tainted: P           O       6.1.99-Unraid #1
[  494.599917] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[  494.600408] RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
[  494.600905] Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 93 71 a0 48 8b 73 20 e8 b2 07 15 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
[  494.601952] RSP: 0018:ffffc900018efdf0 EFLAGS: 00010246
[  494.602469] RAX: 0000000000000000 RBX: ffff88814a25a8d8 RCX: 0000000000000000
[  494.602993] RDX: 0000000000000000 RSI: ffffffff829ee720 RDI: ffff8881099a6238
[  494.603511] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  494.604026] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881531cc110
[  494.604532] R13: ffff88814a25aad0 R14: ffff88814a25ab48 R15: ffff88814ae352d8
[  494.605038] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  494.605545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  494.606049] CR2: 000015235ec7269c CR3: 000000000420a004 CR4: 0000000000770ee0
[  494.606556] PKRU: 55555554
[  494.607052] Call Trace:
[  494.607539]  <TASK>
[  494.608021]  ? __die_body+0x1a/0x5c
[  494.608502]  ? die+0x30/0x49
[  494.608964]  ? do_trap+0x7b/0xfe
[  494.609416]  ? unraidd+0x1051/0x1140 [md_mod]
[  494.609868]  ? unraidd+0x1051/0x1140 [md_mod]
[  494.610310]  ? do_error_trap+0x6e/0x98
[  494.610752]  ? unraidd+0x1051/0x1140 [md_mod]
[  494.611195]  ? exc_invalid_op+0x4c/0x60
[  494.611630]  ? unraidd+0x1051/0x1140 [md_mod]
[  494.612062]  ? asm_exc_invalid_op+0x16/0x20
[  494.612478]  ? unraidd+0x1051/0x1140 [md_mod]
[  494.612888]  md_thread+0xf4/0x122 [md_mod]
[  494.613292]  ? _raw_spin_rq_lock_irqsave+0x20/0x20
[  494.613694]  ? signal_pending+0x1d/0x1d [md_mod]
[  494.614090]  kthread+0xe4/0xef
[  494.614485]  ? kthread_complete_and_exit+0x1b/0x1b
[  494.614883]  ret_from_fork+0x1f/0x30
[  494.615276]  </TASK>
[  494.615655] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls e1000e igc intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp zfs(PO) coretemp kvm_intel zunicode(PO) zzstd(O) kvm ast drm_vram_helper i2c_algo_bit drm_ttm_helper zlua(O) ttm zavl(PO) drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel icp(PO) sha512_ssse3 sha256_ssse3 sha1_ssse3 drm aesni_intel crypto_simd zcommon(PO) znvpair(PO) agpgart i2c_i801 mei_hdcp
[  494.615685]  mei_pxp cryptd rapl spl(O) wmi_bmof ipmi_ssif intel_cstate mpt3sas intel_uncore syscopyarea i2c_smbus mei_me raid_class sysfillrect mpi3mr video ahci nvme input_leds sysimgblt acpi_ipmi i2c_core joydev mei fb_sys_fops led_class thermal nvme_core libahci scsi_transport_sas fan wmi ipmi_si backlight acpi_pad acpi_tad intel_pmc_core button unix [last unloaded: e1000e]
[  494.621395] ---[ end trace 0000000000000000 ]---
[  495.257484] RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
[  495.258185] Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 93 71 a0 48 8b 73 20 e8 b2 07 15 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
[  495.259585] RSP: 0018:ffffc900018efdf0 EFLAGS: 00010246
[  495.260268] RAX: 0000000000000000 RBX: ffff88814a25a8d8 RCX: 0000000000000000
[  495.260909] RDX: 0000000000000000 RSI: ffffffff829ee720 RDI: ffff8881099a6238
[  495.261562] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  495.262201] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881531cc110
[  495.262842] R13: ffff88814a25aad0 R14: ffff88814a25ab48 R15: ffff88814ae352d8
[  495.263483] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  495.264126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  495.264782] CR2: 000015235ec7269c CR3: 000000000420a004 CR4: 0000000000770ee0
[  495.265440] PKRU: 55555554
[  495.266076] ------------[ cut here ]------------
[  495.266695] WARNING: CPU: 8 PID: 14798 at kernel/exit.c:816 do_exit+0x87/0x923
[  495.267349] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls e1000e igc intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp zfs(PO) coretemp kvm_intel zunicode(PO) zzstd(O) kvm ast drm_vram_helper i2c_algo_bit drm_ttm_helper zlua(O) ttm zavl(PO) drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel icp(PO) sha512_ssse3 sha256_ssse3 sha1_ssse3 drm aesni_intel crypto_simd zcommon(PO) znvpair(PO) agpgart i2c_i801 mei_hdcp
[  495.267379]  mei_pxp cryptd rapl spl(O) wmi_bmof ipmi_ssif intel_cstate mpt3sas intel_uncore syscopyarea i2c_smbus mei_me raid_class sysfillrect mpi3mr video ahci nvme input_leds sysimgblt acpi_ipmi i2c_core joydev mei fb_sys_fops led_class thermal nvme_core libahci scsi_transport_sas fan wmi ipmi_si backlight acpi_pad acpi_tad intel_pmc_core button unix [last unloaded: e1000e]
[  495.276096] CPU: 8 PID: 14798 Comm: unraidd0 Tainted: P      D    O       6.1.99-Unraid #1
[  495.277044] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
[  495.277992] RIP: 0010:do_exit+0x87/0x923
[  495.278948] Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 d1 2b 81 00 48 83 bb b0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 d3 2a 81 00 48 8b 83 d0 06 00 00 83
[  495.280879] RSP: 0018:ffffc900018efee0 EFLAGS: 00010286
[  495.281871] RAX: 0000000000000000 RBX: ffff88812cfe7000 RCX: 0000000000000000
[  495.282805] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
[  495.283794] RBP: 000000000000000b R08: 0000000000000000 R09: ffffc90002ab2020
[  495.284745] R10: 0000000000aaaaaa R11: 0000000000000001 R12: ffff8881343b0800
[  495.285706] R13: ffff88812f006b40 R14: 0000000000000002 R15: ffffffff820b2ea5
[  495.286659] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[  495.287592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  495.288518] CR2: 000015235ec7269c CR3: 000000000420a004 CR4: 0000000000770ee0
[  495.289413] PKRU: 55555554
[  495.290287] Call Trace:
[  495.291115]  <TASK>
[  495.291904]  ? __warn+0xab/0x122
[  495.292687]  ? report_bug+0x109/0x17e
[  495.293472]  ? do_exit+0x87/0x923
[  495.294241]  ? handle_bug+0x41/0x6f
[  495.294986]  ? exc_invalid_op+0x13/0x60
[  495.295742]  ? asm_exc_invalid_op+0x16/0x20
[  495.296446]  ? do_exit+0x87/0x923
[  495.297145]  make_task_dead+0x11c/0x11c
[  495.297851]  rewind_stack_and_make_dead+0x17/0x17
[  495.298554] RIP: 0000:0x0
[  495.299261] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  495.299950] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[  495.300670] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  495.301385] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  495.302093] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  495.302792] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  495.303462] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  495.304135]  </TASK>
[  495.304791] ---[ end trace 0000000000000000 ]---

 

Same on that version too.

Edited by scs3jb

  • Community Expert

The driver is still crashing, and if it does it with too different kernels, the more likely it is a hardware problem.

  • Author
4 hours ago, JorgeB said:

The driver is still crashing, and if it does it with too different kernels, the more likely it is a hardware problem.

Any way to narrow it down?  CPU was the obvious one given the press but i'm unable to repro without a parity check.  I will try a x265 encode today.

 

Could this be caused by a faulty backplane or SAS Controller?

  • Community Expert

Unlikely, typically it's RAM, CPU or board.

 

Also, if you have multiple RAM sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

  • Author

4k x265 encode on placebo and solid, memory test passed.  I've noticed there's a new bios (3.3b > 4.1) so I'll try that next.  No actual release notes from supermicro so other than the intel microcode, not sure what they've done.

 

Seems only parity check is a problem so far from stress tests.

  • Community Expert
2 minutes ago, scs3jb said:

Seems only parity check is a problem so far from stress tests.

That suggests to me that it might be power related as a parity check is one time all drives are simultaneously active.

  • Author
32 minutes ago, itimpi said:

That suggests to me that it might be power related as a parity check is one time all drives are simultaneously active.

I think the power draw is the same/higher with a x265 encode (~450w), but I guess a parity check would be more lines (LSI card, all the drives, cpu, etc.) and could be susceptible to that.  Its a Corsair 1000w platinum so I'd pretty shuck with my view of corsair and their PSUs if its this, they've been really solid for me even in server builds.  I do have a replacement PSU spare but don't have access to the server until march to switch it.

 

I kicked off a new parity check with the updated bios, 4.1, and its got to 2.5% (not out of the woods by a long shot but further than it got the last two times).  UPS load is currently ~ 360w and CPU isn't running as hot.  Let's hope its a motherboard / bios issue with 3.3b.  Really wish Supermicro would provide release notes :(

  • Community Expert

Do you use power splitters?    If so then splitting too many ways can cause issues due to voltage sag, particularily if using SATA->SATA splitters.

  • Author
34 minutes ago, itimpi said:

Do you use power splitters?    If so then splitting too many ways can cause issues due to voltage sag, particularily if using SATA->SATA splitters.

No, all the original cables and into a sas backplane.  The sas backplane isn't super good though.  The read/writes to the drives seem to be fine.

 

 

Unfortunately at 3.8%, its happened again:

[ 4079.681264] ------------[ cut here ]------------
[ 4079.681274] kernel BUG at drivers/md/unraid.c:1617!
[ 4079.681812] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 4079.682290] CPU: 8 PID: 15720 Comm: unraidd0 Tainted: P           O       6.1.99-Unraid #1
[ 4079.682796] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 4.1 10/01/2024
[ 4079.683290] RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
[ 4079.683791] Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 e3 7c a0 48 8b 73 20 e8 b2 b7 09 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
[ 4079.684820] RSP: 0018:ffffc900205efdf0 EFLAGS: 00010246
[ 4079.685333] RAX: 0000000000000000 RBX: ffff88814a680da8 RCX: 0000000000000000
[ 4079.685854] RDX: 0000000000000000 RSI: ffffffff829ee720 RDI: ffff888108624038
[ 4079.686377] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[ 4079.686893] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888148f62110
[ 4079.687398] R13: ffff88814a680fa0 R14: ffff88814a681018 R15: ffff8881049152d8
[ 4079.687899] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[ 4079.688404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.688903] CR2: 000015173aee8000 CR3: 000000000420a005 CR4: 0000000000770ee0
[ 4079.689404] PKRU: 55555554
[ 4079.689896] Call Trace:
[ 4079.690382]  <TASK>
[ 4079.690860]  ? __die_body+0x1a/0x5c
[ 4079.691340]  ? die+0x30/0x49
[ 4079.691806]  ? do_trap+0x7b/0xfe
[ 4079.692274]  ? unraidd+0x1051/0x1140 [md_mod]
[ 4079.692729]  ? unraidd+0x1051/0x1140 [md_mod]
[ 4079.693171]  ? do_error_trap+0x6e/0x98
[ 4079.693607]  ? unraidd+0x1051/0x1140 [md_mod]
[ 4079.694044]  ? exc_invalid_op+0x4c/0x60
[ 4079.694482]  ? unraidd+0x1051/0x1140 [md_mod]
[ 4079.694904]  ? asm_exc_invalid_op+0x16/0x20
[ 4079.695323]  ? unraidd+0x1051/0x1140 [md_mod]
[ 4079.695731]  md_thread+0xf4/0x122 [md_mod]
[ 4079.696132]  ? _raw_spin_rq_lock_irqsave+0x20/0x20
[ 4079.696531]  ? signal_pending+0x1d/0x1d [md_mod]
[ 4079.696922]  kthread+0xe4/0xef
[ 4079.697317]  ? kthread_complete_and_exit+0x1b/0x1b
[ 4079.697719]  ret_from_fork+0x1f/0x30
[ 4079.698109]  </TASK>
[ 4079.698487] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp zfs(PO) kvm_intel kvm ast crct10dif_pclmul drm_vram_helper crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_ttm_helper zunicode(PO) sha512_ssse3 sha256_ssse3 ttm zzstd(O) sha1_ssse3 drm_kms_helper zlua(O) aesni_intel zavl(PO) drm crypto_simd icp(PO) agpgart mei_pxp mei_hdcp syscopyarea i2c_i801 cryptd
[ 4079.698517]  zcommon(PO) rapl znvpair(PO) intel_cstate spl(O) ipmi_ssif wmi_bmof mpt3sas intel_uncore sysfillrect i2c_smbus mei_me nvme input_leds mpi3mr sysimgblt video ahci raid_class acpi_ipmi i2c_core joydev led_class fb_sys_fops mei nvme_core libahci scsi_transport_sas thermal fan wmi ipmi_si backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc]
[ 4079.705481] ---[ end trace 0000000000000000 ]---
[ 4079.745331] RIP: 0010:unraidd+0x1051/0x1140 [md_mod]
[ 4079.745970] Code: 00 83 3d 83 50 00 00 03 7e 16 41 8b 56 98 89 e9 48 c7 c7 21 e3 7c a0 48 8b 73 20 e8 b2 b7 09 e1 41 f6 86 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 47 10 41 c7 46 b0 00 10 00 00 49 8b 56 10
[ 4079.747220] RSP: 0018:ffffc900205efdf0 EFLAGS: 00010246
[ 4079.747846] RAX: 0000000000000000 RBX: ffff88814a680da8 RCX: 0000000000000000
[ 4079.748451] RDX: 0000000000000000 RSI: ffffffff829ee720 RDI: ffff888108624038
[ 4079.749071] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[ 4079.749694] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888148f62110
[ 4079.750313] R13: ffff88814a680fa0 R14: ffff88814a681018 R15: ffff8881049152d8
[ 4079.750899] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[ 4079.751505] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.752120] CR2: 000015173aee8000 CR3: 000000000420a005 CR4: 0000000000770ee0
[ 4079.752723] PKRU: 55555554
[ 4079.753339] ------------[ cut here ]------------
[ 4079.753955] WARNING: CPU: 8 PID: 15720 at kernel/exit.c:816 do_exit+0x87/0x923
[ 4079.754582] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter ipmi_devintf md_mod xfs tcp_diag inet_diag i915 drm_buddy drm_display_helper intel_gtt iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap af_packet 8021q garp mrp bridge stp llc bonding tls igc e1000e intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp zfs(PO) kvm_intel kvm ast crct10dif_pclmul drm_vram_helper crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel drm_ttm_helper zunicode(PO) sha512_ssse3 sha256_ssse3 ttm zzstd(O) sha1_ssse3 drm_kms_helper zlua(O) aesni_intel zavl(PO) drm crypto_simd icp(PO) agpgart mei_pxp mei_hdcp syscopyarea i2c_i801 cryptd
[ 4079.754606]  zcommon(PO) rapl znvpair(PO) intel_cstate spl(O) ipmi_ssif wmi_bmof mpt3sas intel_uncore sysfillrect i2c_smbus mei_me nvme input_leds mpi3mr sysimgblt video ahci raid_class acpi_ipmi i2c_core joydev led_class fb_sys_fops mei nvme_core libahci scsi_transport_sas thermal fan wmi ipmi_si backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igc]
[ 4079.763171] CPU: 8 PID: 15720 Comm: unraidd0 Tainted: P      D    O       6.1.99-Unraid #1
[ 4079.764063] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 4.1 10/01/2024
[ 4079.764958] RIP: 0010:do_exit+0x87/0x923
[ 4079.765855] Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 d1 2b 81 00 48 83 bb b0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 d3 2a 81 00 48 8b 83 d0 06 00 00 83
[ 4079.767710] RSP: 0018:ffffc900205efee0 EFLAGS: 00010286
[ 4079.768635] RAX: 0000000000000000 RBX: ffff888109c4f000 RCX: 0000000000000000
[ 4079.769571] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
[ 4079.770509] RBP: 000000000000000b R08: 0000000000000000 R09: ffffc900037e5020
[ 4079.771442] R10: 0000000000aaaaaa R11: 0000000000000001 R12: ffff888104636c00
[ 4079.772362] R13: ffff88814822dac0 R14: 0000000000000002 R15: ffffffff820b2ea5
[ 4079.773257] FS:  0000000000000000(0000) GS:ffff889fff800000(0000) knlGS:0000000000000000
[ 4079.774142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.775011] CR2: 000015173aee8000 CR3: 000000000420a005 CR4: 0000000000770ee0
[ 4079.775867] PKRU: 55555554
[ 4079.776700] Call Trace:
[ 4079.777510]  <TASK>
[ 4079.778290]  ? __warn+0xab/0x122
[ 4079.779051]  ? report_bug+0x109/0x17e
[ 4079.779793]  ? do_exit+0x87/0x923
[ 4079.780513]  ? handle_bug+0x41/0x6f
[ 4079.781215]  ? exc_invalid_op+0x13/0x60
[ 4079.781914]  ? asm_exc_invalid_op+0x16/0x20
[ 4079.782601]  ? do_exit+0x87/0x923
[ 4079.783272]  make_task_dead+0x11c/0x11c
[ 4079.783934]  rewind_stack_and_make_dead+0x17/0x17
[ 4079.784598] RIP: 0000:0x0
[ 4079.785257] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 4079.785925] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[ 4079.786611] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 4079.787290] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 4079.787953] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 4079.788609] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 4079.789252] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 4079.789882]  </TASK>

 

4.1 bios didn't help.  No other errors but the parity is stuck/crashed and i can only reboot.  The system hasn't hung/dockers are working fine.

Edited by scs3jb

  • Author

So I'm at a loss:

  • CPU is stable under duress (x265, ng-stress)
  • Memory passes mem test, and is ECC so should be logging if there was bitflipping
  • Bios/motherboard on multiple versions
  • disk read/write seems to be fine... i'm able to transcode and play direct
  • docker is fine, only the main unraid thread crashes
  • parity is under less load than the CPU stress
  • No power events in ipmi or kernel logs

I was sure it would be the processor but seems only parity and its got 100% reproduce on that between 1-5% completed.

 

Could it be LSI SAS controller, usb drive or something else and can they be isolated for testing without physical access to the server?  Is there a parity like test i can run to simulate a failure?

Edited by scs3jb

  • Author

I disabled all but 2 P-cores and 4 E-cores and currently at 6.9% parity, which is further than its got with any other test.

 

Lets see how it goes, but broken intel CPU core you think or still power?

  • Community Expert

Like mentioned, 

On 2/8/2025 at 12:30 PM, JorgeB said:

typically it's RAM, CPU or board.

 

  • Author

28% on a parity done with 2 p-cores and 4 e-cores so looking like a faulty core.

I will see if this parity completes, then try increase the number of cores and repeat.  RMA for intel I guess.

  • 4 weeks later...
  • Author
  • Solution

Switched out the 13900k for a 14900k, same power, same motherboard and all cores enabled.  worked fine.

 

13900k was trash and had a broken core, returning it to amazon as Intel are horrible to deal with.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.