TheSkaz Posted December 12, 2021 Share Posted December 12, 2021 I have had 1000s of these over the past year, and I cant read them. I have a quasi-software eng background so I should be able to pick it up quickly. Can someone break this down and tell me what the issue is? seems its zfs, but what in zfs is the issue? Dec 12 06:26:23 Tower kernel: general protection fault, probably for non-canonical address 0x454848a8154e84c5: 0000 [#4] SMP NOPTI Dec 12 06:26:23 Tower kernel: CPU: 105 PID: 55009 Comm: grafana-server Tainted: P S D O 5.14.15-Unraid #1 Dec 12 06:26:23 Tower kernel: Hardware name: ASUS System Product Name/ROG ZENITH II EXTREME ALPHA, BIOS 1402 01/15/2021 Dec 12 06:26:23 Tower kernel: RIP: 0010:kmem_cache_alloc+0x9c/0x176 Dec 12 06:26:23 Tower kernel: Code: 48 89 04 24 74 05 48 85 c0 75 16 4c 89 f1 83 ca ff 89 ee 4c 89 e7 e8 17 ff ff ff 48 89 04 24 eb 26 41 8b 4c 24 28 49 8b 3c 24 <48> 8b 1c 08 48 8d 4a 01 65 48 0f c7 0f 0f 94 c0 84 c0 74 a9 41 8b Dec 12 06:26:23 Tower kernel: RSP: 0018:ffffc9001271f7b0 EFLAGS: 00010202 Dec 12 06:26:23 Tower kernel: RAX: 454848a8154e84ad RBX: ffff8881242c1e00 RCX: 0000000000000018 Dec 12 06:26:23 Tower kernel: RDX: 0000000000069ad6 RSI: 0000000000042c00 RDI: 00006040c103eae0 Dec 12 06:26:23 Tower kernel: RBP: 0000000000042c00 R08: ffffe8ffff87eae0 R09: 0000000000000200 Dec 12 06:26:23 Tower kernel: R10: ffffc9001271fa98 R11: ffff888121c40000 R12: ffff888186a3c400 Dec 12 06:26:23 Tower kernel: R13: ffff888186a3c400 R14: ffffffffa0060cbc R15: ffff88b749691e00 Dec 12 06:26:23 Tower kernel: FS: 0000151080106f20(0000) GS:ffff88bf3e840000(0000) knlGS:0000000000000000 Dec 12 06:26:23 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 12 06:26:23 Tower kernel: CR2: 00000078000fdf50 CR3: 00000035541d8000 CR4: 0000000000350ee0 Dec 12 06:26:23 Tower kernel: Call Trace: Dec 12 06:26:23 Tower kernel: spl_kmem_cache_alloc+0x4a/0x609 [spl] Dec 12 06:26:23 Tower kernel: ? spl_kmem_cache_alloc+0x5e0/0x609 [spl] Dec 12 06:26:23 Tower kernel: zio_add_child+0x3a/0x14f [zfs] Dec 12 06:26:23 Tower kernel: zio_create+0x2e5/0x303 [zfs] Dec 12 06:26:23 Tower kernel: zio_read+0x62/0x67 [zfs] Dec 12 06:26:23 Tower kernel: ? arc_buf_alloc_impl.isra.0+0x28c/0x28c [zfs] Dec 12 06:26:23 Tower kernel: arc_read+0xe51/0xef3 [zfs] Dec 12 06:26:23 Tower kernel: dbuf_read_impl.constprop.0+0x4ce/0x54a [zfs] Dec 12 06:26:23 Tower kernel: dbuf_read+0x2be/0x4ce [zfs] Dec 12 06:26:23 Tower kernel: ? dmu_buf_hold_noread+0xa4/0xfd [zfs] Dec 12 06:26:23 Tower kernel: dmu_buf_hold+0x50/0x76 [zfs] Dec 12 06:26:23 Tower kernel: zap_lockdir+0x4e/0xab [zfs] Dec 12 06:26:23 Tower kernel: zap_cursor_retrieve+0x82/0x24d [zfs] Dec 12 06:26:23 Tower kernel: ? verify_dirent_name+0x22/0x2b Dec 12 06:26:23 Tower kernel: ? filldir64+0x8b/0x1a2 Dec 12 06:26:23 Tower kernel: zfs_readdir+0x274/0x3a0 [zfs] Dec 12 06:26:23 Tower kernel: ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e Dec 12 06:26:23 Tower kernel: ? do_filp_open+0x8a/0xb0 Dec 12 06:26:23 Tower kernel: ? __raw_spin_unlock+0x5/0x8 [zfs] Dec 12 06:26:23 Tower kernel: ? __down_read_common+0x84/0x2c2 Dec 12 06:26:23 Tower kernel: ? __fget_files+0x57/0x63 Dec 12 06:26:23 Tower kernel: zpl_iterate+0x46/0x64 [zfs] Dec 12 06:26:23 Tower kernel: iterate_dir+0x98/0x136 Dec 12 06:26:23 Tower kernel: __do_sys_getdents64+0x6b/0xd4 Dec 12 06:26:23 Tower kernel: ? filldir+0x1a3/0x1a3 Dec 12 06:26:23 Tower kernel: do_syscall_64+0x83/0xa5 Dec 12 06:26:23 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Dec 12 06:26:23 Tower kernel: RIP: 0033:0x48015b Dec 12 06:26:23 Tower kernel: Code: e8 6a 79 fe ff eb 88 cc cc cc cc cc cc cc cc e8 1b bf fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 Dec 12 06:26:23 Tower kernel: RSP: 002b:000000c0006b79c8 EFLAGS: 00000216 ORIG_RAX: 00000000000000d9 Dec 12 06:26:23 Tower kernel: RAX: ffffffffffffffda RBX: 000000c00005e000 RCX: 000000000048015b Dec 12 06:26:23 Tower kernel: RDX: 0000000000002000 RSI: 000000c0005fe000 RDI: 0000000000000008 Dec 12 06:26:23 Tower kernel: RBP: 000000c0006b7a18 R08: 000000c0009cf801 R09: 0000000000000000 Dec 12 06:26:23 Tower kernel: R10: 00007ffd2eb80080 R11: 0000000000000216 R12: 000000c0006b7908 Dec 12 06:26:23 Tower kernel: R13: 0000000000000000 R14: 000000c001580820 R15: 0000000000000000 Dec 12 06:26:23 Tower kernel: Modules linked in: xt_mark nvidia_modeset(PO) nvidia_uvm(PO) rpcsec_gss_krb5 xt_CHECKSUM ipt_REJECT nf_reject_ipv4 nfsv4 nfs ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nvidia(PO) drm backlight efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd wmi_bmof mxm_wmi kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci rapl atlantic libahci ccp i2c_piix4 nvme corsair_cpro nvme_core i2c_core k10temp tpm_crb tpm_tis tpm_tis_core tpm wmi button acpi_cpufreq Dec 12 06:26:23 Tower kernel: ---[ end trace 175c948d9f3e665b ]--- Dec 12 06:26:24 Tower kernel: RIP: 0010:kmem_cache_alloc+0x9c/0x176 Dec 12 06:26:24 Tower kernel: Code: 48 89 04 24 74 05 48 85 c0 75 16 4c 89 f1 83 ca ff 89 ee 4c 89 e7 e8 17 ff ff ff 48 89 04 24 eb 26 41 8b 4c 24 28 49 8b 3c 24 <48> 8b 1c 08 48 8d 4a 01 65 48 0f c7 0f 0f 94 c0 84 c0 74 a9 41 8b Dec 12 06:26:24 Tower kernel: RSP: 0018:ffffc90045a83b90 EFLAGS: 00010202 Dec 12 06:26:24 Tower kernel: RAX: 454848a8154e84ad RBX: ffff8881242c1e00 RCX: 0000000000000018 Dec 12 06:26:24 Tower kernel: RDX: 0000000000069ad6 RSI: 0000000000042c00 RDI: 00006040c103eae0 Dec 12 06:26:24 Tower kernel: RBP: 0000000000042c00 R08: ffffe8ffff87eae0 R09: 0000000000000600 Dec 12 06:26:24 Tower kernel: R10: ffff88b498377e50 R11: ffff888121c40000 R12: ffff888186a3c400 Dec 12 06:26:24 Tower kernel: R13: ffff888186a3c400 R14: ffffffffa0060cbc R15: ffff88abae75cb00 Dec 12 06:26:24 Tower kernel: FS: 0000151080106f20(0000) GS:ffff88bf3e840000(0000) knlGS:0000000000000000 Dec 12 06:26:24 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 12 06:26:24 Tower kernel: CR2: 00000078000fdf50 CR3: 00000035541d8000 CR4: 0000000000350ee0 Quote Link to comment
trurl Posted December 12, 2021 Share Posted December 12, 2021 Could be RAM. Have you done memtest recently? Quote Link to comment
TheSkaz Posted December 12, 2021 Author Share Posted December 12, 2021 I have done 10+ full MemTest86 runs over the course of past year. Every response to a fault has been about memory trying write to an invalid address, or other memory issues. I have replaced the RAM with 2 different kits. did, in fact, RMA one kit. but I have ran a recent memtest (takes 3-4 days) and 0 issues. Quote Link to comment
Squid Posted December 12, 2021 Share Posted December 12, 2021 Could simply be a bug with whatever app is triggering them. eg: Plex is infamous for causing General Protection Faults. Whether it's an actual issue or not varies. Quote Link to comment
TheSkaz Posted December 12, 2021 Author Share Posted December 12, 2021 3 hours ago, Squid said: Could simply be a bug with whatever app is triggering them. eg: Plex is infamous for causing General Protection Faults. Whether it's an actual issue or not varies. thats the thing. I would like to be able to read these GPFs and get to the bottom of what it is. if its Plex, or Grafana, or whatever, I can address it. My ask is not what caused the GPF, its more "how can I read it" so that I can become more self-sufficient, i guess. Quote Link to comment
trurl Posted December 12, 2021 Share Posted December 12, 2021 4 hours ago, TheSkaz said: seems its zfs Quote Link to comment
Squid Posted December 12, 2021 Share Posted December 12, 2021 Some module within Grafana (zfs related?) would be my best guess.. You'd have to correlate the PID (55009) against the output of ps -aux The other hex numbers etc only are meaningful to the devs themselves after a ton of diagnosis involving de-compilers etc. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.