Jump to content

How do you read linux faults?


TheSkaz

Recommended Posts

I have had 1000s of these over the past year, and I cant read them. I have a quasi-software eng background so I should be able to pick it up quickly. 

 

Can someone break this down and tell me what the issue is? seems its zfs, but what in zfs is the issue?

 

Dec 12 06:26:23 Tower kernel: general protection fault, probably for non-canonical address 0x454848a8154e84c5: 0000 [#4] SMP NOPTI
Dec 12 06:26:23 Tower kernel: CPU: 105 PID: 55009 Comm: grafana-server Tainted: P S    D    O      5.14.15-Unraid #1
Dec 12 06:26:23 Tower kernel: Hardware name: ASUS System Product Name/ROG ZENITH II EXTREME ALPHA, BIOS 1402 01/15/2021
Dec 12 06:26:23 Tower kernel: RIP: 0010:kmem_cache_alloc+0x9c/0x176
Dec 12 06:26:23 Tower kernel: Code: 48 89 04 24 74 05 48 85 c0 75 16 4c 89 f1 83 ca ff 89 ee 4c 89 e7 e8 17 ff ff ff 48 89 04 24 eb 26 41 8b 4c 24 28 49 8b 3c 24 <48> 8b 1c 08 48 8d 4a 01 65 48 0f c7 0f 0f 94 c0 84 c0 74 a9 41 8b
Dec 12 06:26:23 Tower kernel: RSP: 0018:ffffc9001271f7b0 EFLAGS: 00010202
Dec 12 06:26:23 Tower kernel: RAX: 454848a8154e84ad RBX: ffff8881242c1e00 RCX: 0000000000000018
Dec 12 06:26:23 Tower kernel: RDX: 0000000000069ad6 RSI: 0000000000042c00 RDI: 00006040c103eae0
Dec 12 06:26:23 Tower kernel: RBP: 0000000000042c00 R08: ffffe8ffff87eae0 R09: 0000000000000200
Dec 12 06:26:23 Tower kernel: R10: ffffc9001271fa98 R11: ffff888121c40000 R12: ffff888186a3c400
Dec 12 06:26:23 Tower kernel: R13: ffff888186a3c400 R14: ffffffffa0060cbc R15: ffff88b749691e00
Dec 12 06:26:23 Tower kernel: FS:  0000151080106f20(0000) GS:ffff88bf3e840000(0000) knlGS:0000000000000000
Dec 12 06:26:23 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 12 06:26:23 Tower kernel: CR2: 00000078000fdf50 CR3: 00000035541d8000 CR4: 0000000000350ee0
Dec 12 06:26:23 Tower kernel: Call Trace:
Dec 12 06:26:23 Tower kernel: spl_kmem_cache_alloc+0x4a/0x609 [spl]
Dec 12 06:26:23 Tower kernel: ? spl_kmem_cache_alloc+0x5e0/0x609 [spl]
Dec 12 06:26:23 Tower kernel: zio_add_child+0x3a/0x14f [zfs]
Dec 12 06:26:23 Tower kernel: zio_create+0x2e5/0x303 [zfs]
Dec 12 06:26:23 Tower kernel: zio_read+0x62/0x67 [zfs]
Dec 12 06:26:23 Tower kernel: ? arc_buf_alloc_impl.isra.0+0x28c/0x28c [zfs]
Dec 12 06:26:23 Tower kernel: arc_read+0xe51/0xef3 [zfs]
Dec 12 06:26:23 Tower kernel: dbuf_read_impl.constprop.0+0x4ce/0x54a [zfs]
Dec 12 06:26:23 Tower kernel: dbuf_read+0x2be/0x4ce [zfs]
Dec 12 06:26:23 Tower kernel: ? dmu_buf_hold_noread+0xa4/0xfd [zfs]
Dec 12 06:26:23 Tower kernel: dmu_buf_hold+0x50/0x76 [zfs]
Dec 12 06:26:23 Tower kernel: zap_lockdir+0x4e/0xab [zfs]
Dec 12 06:26:23 Tower kernel: zap_cursor_retrieve+0x82/0x24d [zfs]
Dec 12 06:26:23 Tower kernel: ? verify_dirent_name+0x22/0x2b
Dec 12 06:26:23 Tower kernel: ? filldir64+0x8b/0x1a2
Dec 12 06:26:23 Tower kernel: zfs_readdir+0x274/0x3a0 [zfs]
Dec 12 06:26:23 Tower kernel: ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
Dec 12 06:26:23 Tower kernel: ? do_filp_open+0x8a/0xb0
Dec 12 06:26:23 Tower kernel: ? __raw_spin_unlock+0x5/0x8 [zfs]
Dec 12 06:26:23 Tower kernel: ? __down_read_common+0x84/0x2c2
Dec 12 06:26:23 Tower kernel: ? __fget_files+0x57/0x63
Dec 12 06:26:23 Tower kernel: zpl_iterate+0x46/0x64 [zfs]
Dec 12 06:26:23 Tower kernel: iterate_dir+0x98/0x136
Dec 12 06:26:23 Tower kernel: __do_sys_getdents64+0x6b/0xd4
Dec 12 06:26:23 Tower kernel: ? filldir+0x1a3/0x1a3
Dec 12 06:26:23 Tower kernel: do_syscall_64+0x83/0xa5
Dec 12 06:26:23 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Dec 12 06:26:23 Tower kernel: RIP: 0033:0x48015b
Dec 12 06:26:23 Tower kernel: Code: e8 6a 79 fe ff eb 88 cc cc cc cc cc cc cc cc e8 1b bf fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
Dec 12 06:26:23 Tower kernel: RSP: 002b:000000c0006b79c8 EFLAGS: 00000216 ORIG_RAX: 00000000000000d9
Dec 12 06:26:23 Tower kernel: RAX: ffffffffffffffda RBX: 000000c00005e000 RCX: 000000000048015b
Dec 12 06:26:23 Tower kernel: RDX: 0000000000002000 RSI: 000000c0005fe000 RDI: 0000000000000008
Dec 12 06:26:23 Tower kernel: RBP: 000000c0006b7a18 R08: 000000c0009cf801 R09: 0000000000000000
Dec 12 06:26:23 Tower kernel: R10: 00007ffd2eb80080 R11: 0000000000000216 R12: 000000c0006b7908
Dec 12 06:26:23 Tower kernel: R13: 0000000000000000 R14: 000000c001580820 R15: 0000000000000000
Dec 12 06:26:23 Tower kernel: Modules linked in: xt_mark nvidia_modeset(PO) nvidia_uvm(PO) rpcsec_gss_krb5 xt_CHECKSUM ipt_REJECT nf_reject_ipv4 nfsv4 nfs ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nvidia(PO) drm backlight efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bonding edac_mce_amd wmi_bmof mxm_wmi kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci rapl atlantic libahci ccp i2c_piix4 nvme corsair_cpro nvme_core i2c_core k10temp tpm_crb tpm_tis tpm_tis_core tpm wmi button acpi_cpufreq
Dec 12 06:26:23 Tower kernel: ---[ end trace 175c948d9f3e665b ]---
Dec 12 06:26:24 Tower kernel: RIP: 0010:kmem_cache_alloc+0x9c/0x176
Dec 12 06:26:24 Tower kernel: Code: 48 89 04 24 74 05 48 85 c0 75 16 4c 89 f1 83 ca ff 89 ee 4c 89 e7 e8 17 ff ff ff 48 89 04 24 eb 26 41 8b 4c 24 28 49 8b 3c 24 <48> 8b 1c 08 48 8d 4a 01 65 48 0f c7 0f 0f 94 c0 84 c0 74 a9 41 8b
Dec 12 06:26:24 Tower kernel: RSP: 0018:ffffc90045a83b90 EFLAGS: 00010202
Dec 12 06:26:24 Tower kernel: RAX: 454848a8154e84ad RBX: ffff8881242c1e00 RCX: 0000000000000018
Dec 12 06:26:24 Tower kernel: RDX: 0000000000069ad6 RSI: 0000000000042c00 RDI: 00006040c103eae0
Dec 12 06:26:24 Tower kernel: RBP: 0000000000042c00 R08: ffffe8ffff87eae0 R09: 0000000000000600
Dec 12 06:26:24 Tower kernel: R10: ffff88b498377e50 R11: ffff888121c40000 R12: ffff888186a3c400
Dec 12 06:26:24 Tower kernel: R13: ffff888186a3c400 R14: ffffffffa0060cbc R15: ffff88abae75cb00
Dec 12 06:26:24 Tower kernel: FS:  0000151080106f20(0000) GS:ffff88bf3e840000(0000) knlGS:0000000000000000
Dec 12 06:26:24 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 12 06:26:24 Tower kernel: CR2: 00000078000fdf50 CR3: 00000035541d8000 CR4: 0000000000350ee0

 

Link to comment

I have done 10+ full MemTest86 runs over the course of past year. Every response to a fault has been about memory trying write to an invalid address, or other memory issues. I have replaced the RAM with 2 different kits. did, in fact, RMA one kit. but I have ran a recent memtest (takes 3-4 days) and 0 issues. 

Link to comment
3 hours ago, Squid said:

Could simply be a bug with whatever app is triggering them.  eg: Plex is infamous for causing General Protection Faults.  Whether it's an actual issue or not varies.

thats the thing. I would like to be able to read these GPFs and get to the bottom of what it is. if its Plex, or Grafana, or whatever, I can address it. My ask is not what caused the GPF, its more "how can I read it" so that I can become more self-sufficient, i guess. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...